Since I have upgraded to Chrome 106 (and now to 107) the driver manages to open the URL at the first time but after some timeout (?) or other criteria, the driver.refresh() does not return (i.e. no resposne).
The way it looks to me, if the Refresh occurs within seconds since the initial open, then the refresh succeeds.
Yet, if longer interval has elapsed since the initial open, then the refresh() does not complete.
Code is in Python.
Before version 106 this seemed to work well
Does anyone experience similar behaviour? Any idea how to fix that?
#Code to create the driver:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-blink-features")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(r'... \chromedriver', options=chrome_options)`
#Then, the attempt to refresh:
driver.refresh()
Expecting the webpage to refresh
Can you do it on google colab? try this
import sys sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',options=chrome_options)
driver.get(url)
if your work does not depend much on it, just use earlier versions where it used to work.
if not, try the following:
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'r')
Related
I'm making a price scraping program and have ran into the issue of antiscraping systems. I managed to get around these with the undetected_chromedriver but now I'm running into 2 issues
the first is that the UC is significantly slower than the standard chrome driver, through I need it for some sites, so I have some sites scraped with a normal driver and others with the UC
the second problem is that I have the standard Chrome driver install at the beginning of the program, but once I do that, the UC feels the need to install every time I open it?? this causes some sites to be scraped really slowly. can you help with why that is? and any other tips for running scraper faster would be appreciated.
I have this run at the beginning of the program as global variables:
chrome_path = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.headless = True
options.add_experimental_option('excludeSwitches', ['enable-logging'])
and this runs as a function every time I need a UC:
def start_uc():
options = webdriver.ChromeOptions()
# just some options passing in to skip annoying popups
options.add_argument('--no-first-run --no-service-autorun --password-store=basic')
driver = uc.Chrome(options=options)
driver.minimize_window()
return driver
My scraping functions just loop looking up the url and scrape the info, and restart the driver to clear the cookies if I run into a captcha .The scraping functions look like this (this is psuedo code to give you an idea):
driver = start_uc()
for url in url_list:
while true:
try:
driver.get(url)
#scrape info
break
except:
driver.close()
driver = start_uc()
I dont see why chrome_path would affect the UC? and are there any suggestions to make the scraping functions run more efficiently? Im not an expert on drivers and their intricacies so I could be doing something terribly wrong that I dont recognize.
thankyou in advance!
You can use https://github.com/seleniumbase/SeleniumBase to speed things up.
(It has a special undetected-chromedriver mode that works with headless mode.)
pip install -U seleniumbase
And then run the following with python:
from seleniumbase import Driver
from seleniumbase import page_actions
driver = Driver(headless=True, uc=True)
driver.get("https://nowsecure.nl")
page_actions.wait_for_text(driver, "OH YEAH, you passed!", "h1")
print(driver.find_element("css selector", "body").text)
screenshot_name = "now_secure_image.png"
driver.save_screenshot(screenshot_name)
print("\nScreenshot saved to: %s" % screenshot_name)
driver.quit()
I am using selenium and python in order to scrape data on a website.
The problem is I need to manually log in because there is a CAPTCHA after the login.
My question is the following : is there a way to start the program on a page that is already loaded ? (for example, here I would log to the website, solve the CAPTCHA manually, and then launch the program that would scrape the data)
Note: I have already been looking for an answer on SO but did not find it, might have missed it as it seems to be an obvious question.
don't open in headless mode. open in head mode.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
options = Options()
options.headless = False # Set false here
driver = webdriver.Chrome(options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get("http://google.com/")
print ("Headless Chrome Initialized")
time.sleep(30) # wait 30 seconds, this should give enough time to manually do the capture
# do other code here
driver.quit()
Using python and selenium, I have a function for IE to run headless, but for some reason it's not working. It works perfect for Chrome, but not IE. I could've sworn it worked previously. Any ideas?
from selenium import webdriver
from selenium.webdriver.ie.options import Options as IEOptions
def openie():
setglobalvariables()
window_size = '1920,1080'
ie_options = IEOptions()
ie_options.add_argument('--headless')
ie_options.add_argument('--window-size=%s' % window_size)
ie_options.add_argument('--no-sandbox')
driver = webdriver.Ie(input_path + 'IEDriverServer.exe', options=ie_options)
url = settingsfile('url').strip()
statusmessage(url)
driver.get(url)
driver.maximize_window()
driver.implicitly_wait(3)
return driver
As far as I know, IE does not support Headless Browsing.
You can refer to this thread for verification and a workaround on how it can work:
The IE driver does not support execution without an active, logged-in
desktop session running. You'll need to take this up with the author
of the solution you're using to achieve "headless" (scare quotes
intentional) execution of IE.
https://github.com/SeleniumHQ/selenium/issues/4551#issuecomment-324319508
https://community.lambdatest.com/t/how-can-i-run-my-selenium-tests-in-headless-ie/5447
EDIT:
The second thread is of the LambdaTest community and answered by me.
I have been using Selenium and python to web scrape for a couple of weeks now. It has been working fairly good. Been running on a macOS and windows 7. However all the sudden the headless web driver has stopped working. I have been using chromedriver with the following settings:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
chrome_options.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(chrome_options=options)
driver.get('url')
Initially I had to add the window, gpu and sandbox arguments to get it work and it did work up until now. However, when running the script now it gets stuck at driver.get('url'). It doesn't produce an error or anything just seems to run indefinitely. When I run without headless and simply run:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('url')
it works exactly as intended. This problem is also isolated to my windows machine. Where do I start?
Solved
For some reason the proxy setting was slowing it down. Therefore it got solved by adding:
options.add_argument(f'--proxy-server={None}')
I had exactly the same problem. It appeared randomly after the script has run fine for weeks. OP has led me to the right direction, but his solution doesnt worked for me. I had to add:
chrome_options.add_argument("--no-proxy-server")
chrome_options.add_argument("--proxy-server='direct://'");
chrome_options.add_argument("--proxy-bypass-list=*");
My complete code:
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--start-fullscreen")
chrome_options.add_argument("--no-proxy-server")
chrome_options.add_argument("--proxy-server='direct://'");
chrome_options.add_argument("--proxy-bypass-list=*");
chrome_options.binary_location = "C:\Program Files (x86)\Google\Chrome Dev\\Application\chrome.exe"
browser = webdriver.Chrome(options=chrome_options)
browser.set_window_size(2000, 1080)
please see also:
Headless chrome driver too slow
and:
Chrome webdriver produces timeout in selenium
How can Selenium Chrome WebDriver notifications be handled in Python?
Have tried to dismiss/accept alert and active element but seems notifications have to be treated other way. Also, all the Google search results are driving me to Java solution which I do not really need. I'm a newbie in Python.
Thanks in advance.
You can disable the browser notifications, using chrome options.
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
In December, 2021, using ChromeDriver Version: 96, the python code structure would look like below to handle the ChromeDriver/ Browser notification:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# Creating Instance
option = Options()
# Working with the 'add_argument' Method to modify Driver Default Notification
option.add_argument('--disable-notifications')
# Passing Driver path alongside with Driver modified Options
browser = webdriver.Chrome(executable_path= your_chrome_driver_path, chrome_options= option)
# Do your stuff and quit your driver/ browser
browser.get('https://www.google.com')
browser.quit()
The answer above from Dec '16 didn't work for me in August 2020. Both Selenium and Chrome have changed, I believe.
I'm using the binary for Chrome 84, which is listed as the current version, even though there's a binary for Chrome 85 available from selenium.dev. I have to presume that's a beta since I have no other info on that.
I only have a partial answer so far, but I believe the following two lines are in the right direction. They're as far as I've gotten tonight. I have a test window that I've opened to Facebook and have the exact window shown in the Question. The code given in pythonjar's answer from Dec 30 '16 above produced error messages for me. These code lines worked without error, so I believe this is the right track.
>>> from selenium.webdriver import ChromeOptions
>>> chrome_options = ChromeOptions()
I'll update this tomorrow, if I get time to work on it.
Sorry for the partial answer. I hope to complete it soon. I hope this helps. If you get there first, please let me know.
Using Chrome Versão 85.0.4183.83 (Versão oficial) 64 bits and ChromeDriver 85.0.4183.83
Here is the code and it is working:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("https://www.google.com/")
2020/08/27
Follow the below code for handling the notification settings with Python selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
option = Options()
option.add_argument("--disable-infobars")
option.add_argument("start-maximized")
option.add_argument("--disable-extensions")
option.add_experimental_option("prefs",
{"profile.default_content_setting_values.notifications": 2
})
Note 1-to allow notification, 2- to block notification