My problem arises when multiple instances of the selenium webdriver get launched. I tried several things like changing the changing the method of requesting and going with and without headless, however the problem still remains. My program tries to parallelize the selenium webdriver and automate web interaction. Could somebody please help me to resolve this issue, either by handling the error or changing the code so that the error does not occur anymore. Thanks in advance.
if url:
options = Options()
# options.headless = True
options.set_preference('dom.block_multiple_popups', False)
options.set_preference('dom.popup_maximum', 100000000)
driver = webdriver.Firefox(options=options)
driver.set_page_load_timeout(30)
pac = dict()
try:
# driver.get(url)
# driver.execute_script('''window.location.href = '{0}';'''.format(url))
driver.execute_script('''window.location.replace('{0}');'''.format(url))
WebDriverWait(driver, 1000).until(lambda x: self.onload(pac, driver))
pac['code'] = 200
except ReadTimeoutError as ex:
pac['code'] = 404
print("Exception has been thrown. " + str(ex))
return pac
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='127.0.0.1', port=61322): Read timed out. (read timeout=)
Related
I come with a problem regarding Selenium I'm using it in a python application.
I created a simple code that opens the --headless browser, selects the chrome profile, and gets the OAUTH2 code from the URL.
The code works fine, but sometimes it freezes completely and hangs on the landing page and won't download the code from the url.
I don't get any feedback other than errors about DEV USB etc.
I'm using Selenium 4.8.0
I tried using selenium stealth but it didn't help.
Before starting I call a subprocess which kills chrome.exe.
Once out of 4 launches, it freezes.
The entire function in the code below.
async def allegroCODE(status):
status.update(status="[bold green]Zdobywanie kodu...",spinner="bouncingBall", spinner_style="yellow")
authorization_redirect_url = CODE_URL + '?response_type=code&client_id=' + CLIENT_ID + \
'&redirect_uri=' + REDIRECT_URL
try:
FNULL = open(os.devnull, 'w')
subprocess.call("taskkill /F /IM chrome.exe",stdout=FNULL,stderr=subprocess.STDOUT)
except:
pass
await asyncio.sleep(5)
options = webdriver.ChromeOptions()
# options.add_argument("--log-level=3")
options.add_argument("--headless=new")
options.add_argument("user-data-dir=C:\\Users\\crepe\\AppData\\Local\\Google\\Chrome\\User Data\\")
options.add_argument("profile-directory=Profile 1")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\chromedriver.exe")
driver = webdriver.Chrome(service=ser, options=options, service_log_path=os.path.devnull)
driver.set_page_load_timeout(15)
# stealth(driver,
# languages=["en-US", "en"],
# vendor="Google Inc.",
# platform="Win32",
# webgl_vendor="Intel Inc.",
# renderer="Intel Iris OpenGL Engine",
# fix_hairline=True,
# )
driver.get(authorization_redirect_url)
code = driver.current_url
driver.quit()
code = code.split('=')
code_ex = code[1]
console.log(f"[bold green]✅ Udało się zdobyć kod: {code_ex}")
authorization_code = code_ex
return authorization_code
Pls ignore my Polish logs : )
P.S. I saw a similar problem already on the forum, but unfortunately nothing solved my problem.
Therefore, I am writing with a separate question.
EDITED:
Logs:
DevTools listening on ws://127.0.0.1:1059/devtools/browser/0497f13c-2bc0-439d-8a8f-2d5c2a597cb7
[17848:12192:0214/231451.183:ERROR:device_event_log_impl.cc(215)] [23:14:51.183] USB: usb_device_handle_win.cc:1046 Failed to read descriptor from node connection: Urz╣dzenie do│╣czone do komputera nie dzia│a. (0x1F)
[17848:12192:0214/231451.183:ERROR:device_event_log_impl.cc(215)] [23:14:51.183] Bluetooth: bluetooth_adapter_winrt.cc:1074 Getting Default Adapter failed.
[17848:12192:0214/231451.185:ERROR:device_event_log_impl.cc(215)] [23:14:51.184] USB: usb_device_handle_win.cc:1046 Failed to read descriptor from node connection: Urz╣dzenie do│╣czone do komputera nie dzia│a. (0x1F)
For people who experienced a similar problem.
I dropped the get() function and replaced it with execute_script().
That is, instead of going to the landing page using get(), it calls the JS code that takes me there without crashing.
Note that this is just the way I handled the problem, I still don't know why get() freezes so often.
#my webdrive settings
options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("user-data-dir=C:\\Users\\**USER_NAME**\\AppData\\Local\\Google\\Chrome\\User Data\\")
options.add_argument("profile-directory=Default")
options.add_argument("--no-sandbox")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\chromedriver_win32\chromedriver.exe")
try:
#run chrome
driver = webdriver.Chrome(service=ser, options=options)
driver.set_window_size(800, 600)
driver.set_page_load_timeout(30)
try:
#execute js script
#This code takes me to the landing page. Then I check if I arrived
#by comparing the domain name **END_URL**, e.g. https://facebook.com/.
driver.execute_script('''
var callback = arguments[arguments.length - 1];
var targetUrl = '**END_URL**';
if (window.location.href.indexOf(targetUrl) === 0) {
callback();
} else {
window.location.href = '**WHERE_YOU_WANT_TO_GO**';
}
''')
#By using WebDriverWait, wait for the page to load the element.
wait = WebDriverWait(driver, 30)
wait.until(EC.presence_of_element_located((By.ID, '**ELEMENT_ID**')))
except Exception as err:
print(err)
finally:
#Do what you want with the data, I collect the information from the url.
code = driver.current_url
#close driver
driver.quit()
except Exception as err:
print(err)
I tried running a python test script for a login page with saucelabs. I got this error.
selenium.common.exceptions.WebDriverException: Message: failed serving request POST /wd/hub/session: Unauthorized
I looked up online for a solution, found something on this link. But got nothing
selenium - 4.0.0
python - 3.8
Here's the code:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
options = ChromeOptions()
options.browser_version = '96'
options.platform_name = 'Windows 10'
options.headless = True
sauce_options = {'username': 'sauce_username',
'accessKey': 'sauce_access_key',
}
options.set_capability('sauce:options', sauce_options)
sauce_url = "https://{}:{}#ondemand.us-west-1.saucelabs.com/wd/hub".format(sauce_options['username'],sauce_options['accessKey'])
driver = webdriver.Remote(command_executor=sauce_url, options=options)
driver.get('https://cq-portal.qa.webomates.com/#/login')
time.sleep(10)
user=driver.find_element('css selector','#userName')
password=driver.find_element('id','password')
login=driver.find_element('xpath','//*[#id="submitButton"]')
user.send_keys('userId')
password.send_keys('password')
login.click()
time.sleep(10)
print('login successful')
driver.quit()
Your code is authenticating two different ways, which I suspect is the problem.
You're passing in sauce_options in a W3C compatible way (which is good), but you've also configured HTTP-style credentials, even though they're empty. In the sauce_url, the {}:{} section basically sets up a username and accessKey of nil.
If you're going to pass in credentials via sauce:options, you should remove that everything between the protocol and the # symbol in the URL, eg:
sauce_url = "https://ondemand.us-west-1.saucelabs.com/wd/hub".format(sauce_options['username'],sauce_options['accessKey'])
I'm attempting to run a selenium script locally and open three non-headless browsers. I'm using multiprocesing Pools (and have tried with just regular multiprocessing as well) and come across an interesting issue where 3 browser sessions open, but only the first one actually navigates to the target_url and attempts control. The other two just sit and wait and do nothing.
Here is the execution code that's relevant
run_id = str(uuid.uuid4())
options = Options()
#options.binary_location = '/opt/headless-chromium' #works for lambda
start_time = time.time()
options.binary_location = '/usr/bin/google-chrome' #testing
#options.add_argument('--headless') don't need headless
options.add_argument('--no-sandbox')
options.add_argument('--verbose')
#options.add_argument('--single-process')
options.add_argument('--user-data-dir=/tmp/user-data')# test add
options.add_argument('--data-path=/tmp/data-path')
options.add_argument('--disk-cache-dir=/tmp/cache-dir')
options.add_argument('--homedir=/tmp')
#options.add_argument('--disable-gpu')# test add
#options.add_argument("--remote-debugging-port=9222") test remove
#options.add_argument('--disable-dev-shm-usage')
#'/opt/chromedriver' not found
logger.info("Before driver initiated")
# job_id = event['job_id']
# run_id = event['run_id']
send_log(job_id, run_id, "JOB START", True, "", time.time() - start_time)
retries = 0
drivers = []
try:
#driver = webdriver.Chrome('/opt/chromedriver_89', chrome_options=options)
driver = webdriver.Chrome('/opt/chromedriver90', chrome_options=options)
#driver2 = webdriver.Chrome('/opt/chromedriver90', chrome_options=options)
break
except Exception as e:
print(str(e))
logger.info('exception with driver instantiation, retrying... ' + str(e))
#time.sleep(5)
driver = None
driver = webdriver.Chrome('/opt/chromedriver', chrome_options=options)
....
and here is how i'm invoking each process
from multiprocessing import Pool
pool = Pool(processes=3)
for i in range(3):
pool.apply_async(invoke, args=("https://macau-flash-sale.myshopify.com/",))
pool.close()
pool.join()
Is it possible that despite the multiple processes, selenium is not communicating with the other two browser instances properly?
Any ideas are greatly appreciated!
I don't think selenium can handle multiple browsers at once. I recommend writing 3 different python scripts and running them all at once through the terminal. If they need to communicate information to each other, probably the easiest way is to just get them to write to a text file.
For anyone else struggling with this, Selenium CAN be used with async/multiprocessing.
But you cannot specify the same user/data directories for each session. I had to remove these parameters so that each chrome session creates unique directories for iteself.
just remove the below and it'll work.
options.add_argument('--user-data-dir=/tmp/user-data')# test add
options.add_argument('--data-path=/tmp/data-path')
Now, I am using selenium to execute script "window.performance.timing" to get the full load time of a page. It can run without opening a browser. I want this keep running 24x7 and return the loading time.
Here is my code:
import time
from selenium import webdriver
import getpass
u=getpass.getuser()
print(u)
# # initialize Chrome options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('user-data-dir=C:\\Users\\%s\\AppData\\Local\\Google\\Chrome\\User Data'%(u))
source = "https://na66.salesforce.com/5000y00001SgXm0?srPos=0&srKp=500"
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(source)
driver.find_element_by_id("Login").click()
while True:
try:
navigationStart = driver.execute_script("return window.performance.timing.navigationStart")
domComplete = driver.execute_script("return window.performance.timing.domComplete")
loadEvent = driver.execute_script("return window.performance.timing. loadEventEnd")
onloadPerformance = loadEvent - navigationStart
print("%s--It took about: %s" % (time.ctime(), onloadPerformance))
driver.refresh()
except TimeoutException:
print("It took too long")
driver.quit()
I have two questions:
Is it a good idea to keep refresh the page and print the page loading time? Does it have any risk?
Anything needs to get improvement for my code?
Someone suggested using docker and jerkin when I searched any suggestions on Google, but it will need to download more things. This code will be package into exe in the end and let others to use. It will be good if it does not acquire many software packages.
Thank you very much as I am a fresh man in the web side. Any suggestions will be appreciated.
from selenium import webdriver
driver = webdriver.Chrome()
driver.set_page_load_timeout(7)
def urlOpen(url):
try:
driver.get(url)
print driver.current_url
except:
return
Then I have URL lists and call above methods.
if __name__ == "__main__":
urls = ['http://motahari.ir/', 'http://facebook.com', 'http://google.com']
# It doesn't print anything
# urls = ['http://facebook.com', 'http://google.com', 'http://motahari.ir/']
# This prints https://www.facebook.com/ https://www.google.co.kr/?gfe_rd=cr&dcr=0&ei=3bfdWdzWAYvR8geelrqQAw&gws_rd=ssl
for url in urls:
urlOpen(url)
The problem is when website 'http://motahari.ir/' throws Timeout Exception, websites 'http://facebook.com' and 'http://google.com' always throw Timeout Exception.
Browser keeps waiting for 'motahari.ir/' to load. But the loop just goes on (It doesn't open 'facebook.com' but wait for 'motahari.ir/') and keep throwing timeout exception
Initializing a webdriver instance takes long, so I pulled that out of the method and I think that caused the problem. Then, should I always reinitialize webdriver instance whenever there's a timeout exception? And How? (Since I initialized driver outside of the function, I can't reinitialize it in except)
You will just need to clear the browser's cookies before continuing. (Sorry, I missed seeing this in your previous code)
from selenium import webdriver
driver = webdriver.Chrome()
driver.set_page_load_timeout(7)
def urlOpen(url):
try:
driver.get(url)
print(driver.current_url)
except:
driver.delete_all_cookies()
print("Failed")
return
urls = ['http://motahari.ir/', 'https://facebook.com', 'https://google.com']
for url in urls:
urlOpen(url)
Output:
Failed
https://www.facebook.com/
https://www.google.com/?gfe_rd=cr&dcr=0&ei=o73dWfnsO-vs8wfc5pZI
P.S. It is not very wise to do try...except... without a clear Exception type, is this might mask different unexpected errors.