Selenium Webdrive Chrome - Sometimes it hangs - python

I come with a problem regarding Selenium I'm using it in a python application.
I created a simple code that opens the --headless browser, selects the chrome profile, and gets the OAUTH2 code from the URL.
The code works fine, but sometimes it freezes completely and hangs on the landing page and won't download the code from the url.
I don't get any feedback other than errors about DEV USB etc.
I'm using Selenium 4.8.0
I tried using selenium stealth but it didn't help.
Before starting I call a subprocess which kills chrome.exe.
Once out of 4 launches, it freezes.
The entire function in the code below.
async def allegroCODE(status):
status.update(status="[bold green]Zdobywanie kodu...",spinner="bouncingBall", spinner_style="yellow")
authorization_redirect_url = CODE_URL + '?response_type=code&client_id=' + CLIENT_ID + \
'&redirect_uri=' + REDIRECT_URL
try:
FNULL = open(os.devnull, 'w')
subprocess.call("taskkill /F /IM chrome.exe",stdout=FNULL,stderr=subprocess.STDOUT)
except:
pass
await asyncio.sleep(5)
options = webdriver.ChromeOptions()
# options.add_argument("--log-level=3")
options.add_argument("--headless=new")
options.add_argument("user-data-dir=C:\\Users\\crepe\\AppData\\Local\\Google\\Chrome\\User Data\\")
options.add_argument("profile-directory=Profile 1")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\chromedriver.exe")
driver = webdriver.Chrome(service=ser, options=options, service_log_path=os.path.devnull)
driver.set_page_load_timeout(15)
# stealth(driver,
# languages=["en-US", "en"],
# vendor="Google Inc.",
# platform="Win32",
# webgl_vendor="Intel Inc.",
# renderer="Intel Iris OpenGL Engine",
# fix_hairline=True,
# )
driver.get(authorization_redirect_url)
code = driver.current_url
driver.quit()
code = code.split('=')
code_ex = code[1]
console.log(f"[bold green]✅ Udało się zdobyć kod: {code_ex}")
authorization_code = code_ex
return authorization_code
Pls ignore my Polish logs : )
P.S. I saw a similar problem already on the forum, but unfortunately nothing solved my problem.
Therefore, I am writing with a separate question.
EDITED:
Logs:
DevTools listening on ws://127.0.0.1:1059/devtools/browser/0497f13c-2bc0-439d-8a8f-2d5c2a597cb7
[17848:12192:0214/231451.183:ERROR:device_event_log_impl.cc(215)] [23:14:51.183] USB: usb_device_handle_win.cc:1046 Failed to read descriptor from node connection: Urz╣dzenie do│╣czone do komputera nie dzia│a. (0x1F)
[17848:12192:0214/231451.183:ERROR:device_event_log_impl.cc(215)] [23:14:51.183] Bluetooth: bluetooth_adapter_winrt.cc:1074 Getting Default Adapter failed.
[17848:12192:0214/231451.185:ERROR:device_event_log_impl.cc(215)] [23:14:51.184] USB: usb_device_handle_win.cc:1046 Failed to read descriptor from node connection: Urz╣dzenie do│╣czone do komputera nie dzia│a. (0x1F)

For people who experienced a similar problem.
I dropped the get() function and replaced it with execute_script().
That is, instead of going to the landing page using get(), it calls the JS code that takes me there without crashing.
Note that this is just the way I handled the problem, I still don't know why get() freezes so often.
#my webdrive settings
options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("user-data-dir=C:\\Users\\**USER_NAME**\\AppData\\Local\\Google\\Chrome\\User Data\\")
options.add_argument("profile-directory=Default")
options.add_argument("--no-sandbox")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\chromedriver_win32\chromedriver.exe")
try:
#run chrome
driver = webdriver.Chrome(service=ser, options=options)
driver.set_window_size(800, 600)
driver.set_page_load_timeout(30)
try:
#execute js script
#This code takes me to the landing page. Then I check if I arrived
#by comparing the domain name **END_URL**, e.g. https://facebook.com/.
driver.execute_script('''
var callback = arguments[arguments.length - 1];
var targetUrl = '**END_URL**';
if (window.location.href.indexOf(targetUrl) === 0) {
callback();
} else {
window.location.href = '**WHERE_YOU_WANT_TO_GO**';
}
''')
#By using WebDriverWait, wait for the page to load the element.
wait = WebDriverWait(driver, 30)
wait.until(EC.presence_of_element_located((By.ID, '**ELEMENT_ID**')))
except Exception as err:
print(err)
finally:
#Do what you want with the data, I collect the information from the url.
code = driver.current_url
#close driver
driver.quit()
except Exception as err:
print(err)

Related

Unable to python test script with saucelabs

I tried running a python test script for a login page with saucelabs. I got this error.
selenium.common.exceptions.WebDriverException: Message: failed serving request POST /wd/hub/session: Unauthorized
I looked up online for a solution, found something on this link. But got nothing
selenium - 4.0.0
python - 3.8
Here's the code:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
options = ChromeOptions()
options.browser_version = '96'
options.platform_name = 'Windows 10'
options.headless = True
sauce_options = {'username': 'sauce_username',
'accessKey': 'sauce_access_key',
}
options.set_capability('sauce:options', sauce_options)
sauce_url = "https://{}:{}#ondemand.us-west-1.saucelabs.com/wd/hub".format(sauce_options['username'],sauce_options['accessKey'])
driver = webdriver.Remote(command_executor=sauce_url, options=options)
driver.get('https://cq-portal.qa.webomates.com/#/login')
time.sleep(10)
user=driver.find_element('css selector','#userName')
password=driver.find_element('id','password')
login=driver.find_element('xpath','//*[#id="submitButton"]')
user.send_keys('userId')
password.send_keys('password')
login.click()
time.sleep(10)
print('login successful')
driver.quit()
Your code is authenticating two different ways, which I suspect is the problem.
You're passing in sauce_options in a W3C compatible way (which is good), but you've also configured HTTP-style credentials, even though they're empty. In the sauce_url, the {}:{} section basically sets up a username and accessKey of nil.
If you're going to pass in credentials via sauce:options, you should remove that everything between the protocol and the # symbol in the URL, eg:
sauce_url = "https://ondemand.us-west-1.saucelabs.com/wd/hub".format(sauce_options['username'],sauce_options['accessKey'])

Selenium Python - Multiprocessing Only Controlling One Browser

I'm attempting to run a selenium script locally and open three non-headless browsers. I'm using multiprocesing Pools (and have tried with just regular multiprocessing as well) and come across an interesting issue where 3 browser sessions open, but only the first one actually navigates to the target_url and attempts control. The other two just sit and wait and do nothing.
Here is the execution code that's relevant
run_id = str(uuid.uuid4())
options = Options()
#options.binary_location = '/opt/headless-chromium' #works for lambda
start_time = time.time()
options.binary_location = '/usr/bin/google-chrome' #testing
#options.add_argument('--headless') don't need headless
options.add_argument('--no-sandbox')
options.add_argument('--verbose')
#options.add_argument('--single-process')
options.add_argument('--user-data-dir=/tmp/user-data')# test add
options.add_argument('--data-path=/tmp/data-path')
options.add_argument('--disk-cache-dir=/tmp/cache-dir')
options.add_argument('--homedir=/tmp')
#options.add_argument('--disable-gpu')# test add
#options.add_argument("--remote-debugging-port=9222") test remove
#options.add_argument('--disable-dev-shm-usage')
#'/opt/chromedriver' not found
logger.info("Before driver initiated")
# job_id = event['job_id']
# run_id = event['run_id']
send_log(job_id, run_id, "JOB START", True, "", time.time() - start_time)
retries = 0
drivers = []
try:
#driver = webdriver.Chrome('/opt/chromedriver_89', chrome_options=options)
driver = webdriver.Chrome('/opt/chromedriver90', chrome_options=options)
#driver2 = webdriver.Chrome('/opt/chromedriver90', chrome_options=options)
break
except Exception as e:
print(str(e))
logger.info('exception with driver instantiation, retrying... ' + str(e))
#time.sleep(5)
driver = None
driver = webdriver.Chrome('/opt/chromedriver', chrome_options=options)
....
and here is how i'm invoking each process
from multiprocessing import Pool
pool = Pool(processes=3)
for i in range(3):
pool.apply_async(invoke, args=("https://macau-flash-sale.myshopify.com/",))
pool.close()
pool.join()
Is it possible that despite the multiple processes, selenium is not communicating with the other two browser instances properly?
Any ideas are greatly appreciated!
I don't think selenium can handle multiple browsers at once. I recommend writing 3 different python scripts and running them all at once through the terminal. If they need to communicate information to each other, probably the easiest way is to just get them to write to a text file.
For anyone else struggling with this, Selenium CAN be used with async/multiprocessing.
But you cannot specify the same user/data directories for each session. I had to remove these parameters so that each chrome session creates unique directories for iteself.
just remove the below and it'll work.
options.add_argument('--user-data-dir=/tmp/user-data')# test add
options.add_argument('--data-path=/tmp/data-path')

Issue when scraping data on McMaster-Carr website

I'm writing a crawler for McMaster-Carr. For example, the page https://www.mcmaster.com/98173A200 , if I open the page directly in browser, I can view all the product data.
Because the data is in dynamically-loaded content, so I'm using Selenium + bs4.
if __name__ == "__main__":
url = "https://www.mcmaster.com/98173A200"
options = webdriver.ChromeOptions()
options.add_argument("--enable-javascript")
driver = webdriver.Chrome("C:/chromedriver/chromedriver.exe", options=options)
driver.set_page_load_timeout(20)
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
delay = 20
try:
email_input = WebDriverWait(driver, delay).until(
EC.presence_of_element_located((By.ID, 'MainContent')))
except TimeoutException:
print("Timeout loading DOM!")
print(soup)
However, if I run the code I would get a login dialog, which I wouldn't get if I open this page directly in a browser like I mentioned.
I also tried logging in with the code below
try:
email_input = WebDriverWait(driver, delay).until(
EC.presence_of_element_located((By.ID, 'Email')))
print("Page is ready!!")
input("Press Enter to continue...")
except TimeoutException:
print("Loading took too much time!")
email_input.send_keys(email)
password_input = driver.find_element_by_id('Password')
password_input.send_keys(password)
login_button = driver.find_element_by_class_name("FormButton_primaryButton__1kNXY")
login_button.click()
Then it shows access restricted.
I compared the requested header in the page opened by Selenium and the page in my browser, I couldn't find anything wrong. I also tried other webdrivers like PhantomJS and FireFox, and I got the same result.
I also tried using random user-agent using the code below
from random_user_agent.user_agent import UserAgent
from random_user_agent.params import SoftwareName, OperatingSystem
software_names = [SoftwareName.CHROME.value]
operating_systems = [OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value]
user_agent_rotator = UserAgent(software_names=software_names,
operating_systems=operating_systems,
limit=100)
user_agent = user_agent_rotator.get_random_user_agent()
chrome_options = Options()
chrome_options.add_argument('user-agent=' + user_agent)
Still same result.
The developer tool in the page opened by Selenium showed there were a bunch of errors. I guess the tokenauthorization one is the key to this issue, but I don't know what should I do with it.
Any help would be appreciated!
The reason you saw a login window is that you were accessing McMaster carr via a chrome driver. When the server recognizes your behaviour, it will require you to sign in.
A typical login wouldn't work if you haven't been authenticated by McMaster (need to sign NDA)
You should look into McMaster API. With the API, you can access the database directly. However, you need to sign an NDA with McMaster Carr before obtaining access to the API. https://www.mcmaster.com/help/api/

ReadTimoutError with opening multiple webdrivers in selenium

My problem arises when multiple instances of the selenium webdriver get launched. I tried several things like changing the changing the method of requesting and going with and without headless, however the problem still remains. My program tries to parallelize the selenium webdriver and automate web interaction. Could somebody please help me to resolve this issue, either by handling the error or changing the code so that the error does not occur anymore. Thanks in advance.
if url:
options = Options()
# options.headless = True
options.set_preference('dom.block_multiple_popups', False)
options.set_preference('dom.popup_maximum', 100000000)
driver = webdriver.Firefox(options=options)
driver.set_page_load_timeout(30)
pac = dict()
try:
# driver.get(url)
# driver.execute_script('''window.location.href = '{0}';'''.format(url))
driver.execute_script('''window.location.replace('{0}');'''.format(url))
WebDriverWait(driver, 1000).until(lambda x: self.onload(pac, driver))
pac['code'] = 200
except ReadTimeoutError as ex:
pac['code'] = 404
print("Exception has been thrown. " + str(ex))
return pac
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='127.0.0.1', port=61322): Read timed out. (read timeout=)

Cannot attach to an existing Selenium session via geckodriver

After upgrading to geckodriver I'm unable to reuse my Selenium's sessions. Here's my setup:
I have a start_browser.py script, which launches a Firefox instance and prints a port to connect to, like:
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
browser = webdriver.Firefox(capabilities=firefox_capabilities)
print browser.service.port
wait_forever()
... and another script, which tries to connect to the existing instance via Remote driver:
caps = DesiredCapabilities.FIREFOX
caps['marionette'] = True
driver = webdriver.Remote(
command_executor='http://localhost:{port}'.format(port=port),
desired_capabilities=caps)
But it seems to be trying to launch a new session, and failing with a message:
selenium.common.exceptions.WebDriverException: Message: Session is already started
Is there an ability to just attach to the existing session, like in previous versions of Selenium? Or is this an intended behaviour of geckodriver (hope not)?
Alright, so unless anyone comes up with more elegant solution, here's a quick dirty hack:
class SessionRemote(webdriver.Remote):
def start_session(self, desired_capabilities, browser_profile=None):
# Skip the NEW_SESSION command issued by the original driver
# and set only some required attributes
self.w3c = True
driver = SessionRemote(command_executor=url, desired_capabilities=caps)
driver.session_id = session_id
The bad thing is that it still doesn't work, complaining that it doesn't know the moveto command, but at least it connects to a launched browser.
Update: Well, geckodriver seems to lack some functionality at the moment, so if you guys are going to keep using Firefox, just downgrade it to a version which supports old webdriver (45 plays fine), and keep an eye on tickets like https://github.com/SeleniumHQ/selenium/issues/2285 .
You can reconnect to a session by using the session id.
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
browser = webdriver.Firefox(capabilities=firefox_capabilities)
print browser.service.port
wait_forever()
# get the ID and URL from the browser
url = browser.command_executor._url
session_id = browser.session_id
# Connect to the existing instance
driver = webdriver.Remote(command_executor=url,desired_capabilities={})
driver.session_id = session_id

Categories

Resources