I have already read a couple of threads about this problem but none of them really helped me so here goes. I am trying to use selenium webdriver on google colab. i had some problems installing it but finally with the code below, i was able to install it:
!pip install selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.webite-url.com")
however, now when i run this two lines of codes:
from selenium import webdriver
driver = webdriver.Chrome()
this is the error I get:
WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
It runs on server which doesn't have video card and monitor so you have to always use --headless and maybe other options too
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver', chrome_options=chrome_options)
driver.get("...your_url...")
Related
So I'm trying some stuff out with selenium and I really want it to be quick.
So my thought is that running it with headless chrome would make my script faster.
First is that assumption correct, or does it not matter if i run my script with a headless driver?
Anyways I still want to get it to work to run headless, but I somehow can't, I tried different things and most suggested that it would work as said here in the October update
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
But when I try that, I get weird console output and it still doesn't seem to work.
Any tipps appreciated.
To run chrome-headless just add --headless via chrome_options.add_argument, i.e.:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)
start_url = "https://duckgo.com"
driver.get(start_url)
print(driver.page_source.encode("utf-8"))
# b'<!DOCTYPE html><html xmlns="http://www....
driver.quit()
So my thought is that running it with headless chrome would make my
script faster.
Try using chrome options like --disable-extensions or --disable-gpu and benchmark it, but I wouldn't count with much improvement.
References: headless-chrome
Install & run containerized Chrome:
docker pull selenium/standalone-chrome
docker run --rm -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome
Connect using webdriver.Remote:
driver = webdriver.Remote('http://localhost:4444/wd/hub', webdriver.DesiredCapabilities.CHROME)
driver.set_window_size(1280, 1024)
driver.get('https://www.google.com')
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
url = "https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver"
driver.get(url)
sleep(5)
h1 = driver.find_element_by_xpath("//h1[#itemprop='name']").text
print(h1)
Then I run script on our local machine
➜ python script.py
Running Selenium with Headless Chrome Webdriver
It is working and it is with headless Chrome.
If you are using Linux environment, may be you have to add --no-sandbox as well and also specific window size settings. The --no-sandbox flag is no needed on Windows if you set user container properly.
Use --disable-gpu only on Windows. Other platforms no longer require it. The --disable-gpu flag is a temporary work around for a few bugs.
//Headless chrome browser and configure
WebDriverManager.chromedriver().setup();
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("disable-gpu");
// chromeOptions.addArguments("window-size=1400,2100"); // Linux should be activate
driver = new ChromeDriver(chromeOptions);
Once you have selenium and web driver installed. Below worked for me with headless Chrome on linux cluster :
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs",{"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)
Todo (tested on headless server Debian Linux 9.4):
Do this:
# install chrome
curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
apt-get -y update
apt-get -y install google-chrome-stable
# install chrome driver
wget https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
mv chromedriver /usr/bin/chromedriver
chown root:root /usr/bin/chromedriver
chmod +x /usr/bin/chromedriver
Install selenium:
pip install selenium
and run this Python code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("no-sandbox")
options.add_argument("headless")
options.add_argument("start-maximized")
options.add_argument("window-size=1900,1080");
driver = webdriver.Chrome(chrome_options=options, executable_path="/usr/bin/chromedriver")
driver.get("https://www.example.com")
html = driver.page_source
print(html)
As stated by the accepted answer:
options.add_argument("--headless")
These tips might help to speed things up especially for headless:
There are quite a few things you can do in headless that you cant do in non headless
Since you will be using Chrome Headless, I've found adding this reduces the CPU usage by about 20% for me (I found this to be a CPU and memory hog when looking at htop)
--disable-crash-reporter
This will only disable when you are running in headless This might speed things up for you!!!
My settings are currently as follows and I reduce the CPU (but only a marginal time saving) by about 20%:
options.add_argument("--no-sandbox");
options.add_argument("--disable-dev-shm-usage");
options.add_argument("--disable-renderer-backgrounding");
options.add_argument("--disable-background-timer-throttling");
options.add_argument("--disable-backgrounding-occluded-windows");
options.add_argument("--disable-client-side-phishing-detection");
options.add_argument("--disable-crash-reporter");
options.add_argument("--disable-oopr-debug-crash-dump");
options.add_argument("--no-crash-upload");
options.add_argument("--disable-gpu");
options.add_argument("--disable-extensions");
options.add_argument("--disable-low-res-tiling");
options.add_argument("--log-level=3");
options.add_argument("--silent");
I found this to be a pretty good list (full list I think) of command line switches with explanations: https://peter.sh/experiments/chromium-command-line-switches/
Some additional things you can turn off are also mentioned here: https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md
I hope this helps someone
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Program
Files\Google\Chrome\Application\chromedriver.exe", options=chrome_options)
This is ok for me.
So I'm trying some stuff out with selenium and I really want it to be quick.
So my thought is that running it with headless chrome would make my script faster.
First is that assumption correct, or does it not matter if i run my script with a headless driver?
Anyways I still want to get it to work to run headless, but I somehow can't, I tried different things and most suggested that it would work as said here in the October update
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
But when I try that, I get weird console output and it still doesn't seem to work.
Any tipps appreciated.
To run chrome-headless just add --headless via chrome_options.add_argument, i.e.:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)
start_url = "https://duckgo.com"
driver.get(start_url)
print(driver.page_source.encode("utf-8"))
# b'<!DOCTYPE html><html xmlns="http://www....
driver.quit()
So my thought is that running it with headless chrome would make my
script faster.
Try using chrome options like --disable-extensions or --disable-gpu and benchmark it, but I wouldn't count with much improvement.
References: headless-chrome
Install & run containerized Chrome:
docker pull selenium/standalone-chrome
docker run --rm -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome
Connect using webdriver.Remote:
driver = webdriver.Remote('http://localhost:4444/wd/hub', webdriver.DesiredCapabilities.CHROME)
driver.set_window_size(1280, 1024)
driver.get('https://www.google.com')
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
url = "https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver"
driver.get(url)
sleep(5)
h1 = driver.find_element_by_xpath("//h1[#itemprop='name']").text
print(h1)
Then I run script on our local machine
➜ python script.py
Running Selenium with Headless Chrome Webdriver
It is working and it is with headless Chrome.
If you are using Linux environment, may be you have to add --no-sandbox as well and also specific window size settings. The --no-sandbox flag is no needed on Windows if you set user container properly.
Use --disable-gpu only on Windows. Other platforms no longer require it. The --disable-gpu flag is a temporary work around for a few bugs.
//Headless chrome browser and configure
WebDriverManager.chromedriver().setup();
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("disable-gpu");
// chromeOptions.addArguments("window-size=1400,2100"); // Linux should be activate
driver = new ChromeDriver(chromeOptions);
Once you have selenium and web driver installed. Below worked for me with headless Chrome on linux cluster :
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs",{"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)
Todo (tested on headless server Debian Linux 9.4):
Do this:
# install chrome
curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
apt-get -y update
apt-get -y install google-chrome-stable
# install chrome driver
wget https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
mv chromedriver /usr/bin/chromedriver
chown root:root /usr/bin/chromedriver
chmod +x /usr/bin/chromedriver
Install selenium:
pip install selenium
and run this Python code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("no-sandbox")
options.add_argument("headless")
options.add_argument("start-maximized")
options.add_argument("window-size=1900,1080");
driver = webdriver.Chrome(chrome_options=options, executable_path="/usr/bin/chromedriver")
driver.get("https://www.example.com")
html = driver.page_source
print(html)
As stated by the accepted answer:
options.add_argument("--headless")
These tips might help to speed things up especially for headless:
There are quite a few things you can do in headless that you cant do in non headless
Since you will be using Chrome Headless, I've found adding this reduces the CPU usage by about 20% for me (I found this to be a CPU and memory hog when looking at htop)
--disable-crash-reporter
This will only disable when you are running in headless This might speed things up for you!!!
My settings are currently as follows and I reduce the CPU (but only a marginal time saving) by about 20%:
options.add_argument("--no-sandbox");
options.add_argument("--disable-dev-shm-usage");
options.add_argument("--disable-renderer-backgrounding");
options.add_argument("--disable-background-timer-throttling");
options.add_argument("--disable-backgrounding-occluded-windows");
options.add_argument("--disable-client-side-phishing-detection");
options.add_argument("--disable-crash-reporter");
options.add_argument("--disable-oopr-debug-crash-dump");
options.add_argument("--no-crash-upload");
options.add_argument("--disable-gpu");
options.add_argument("--disable-extensions");
options.add_argument("--disable-low-res-tiling");
options.add_argument("--log-level=3");
options.add_argument("--silent");
I found this to be a pretty good list (full list I think) of command line switches with explanations: https://peter.sh/experiments/chromium-command-line-switches/
Some additional things you can turn off are also mentioned here: https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md
I hope this helps someone
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Program
Files\Google\Chrome\Application\chromedriver.exe", options=chrome_options)
This is ok for me.
I am using the following code to run my script in local machine
from seleniumwire import webdriver
import pytest
from selenium.webdriver.chrome.options import Options
import time
import allure
class Test_main():
#pytest.fixture()
def test_setup(self):
# instantiate browser
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--headless')
self.driver = webdriver.Chrome(executable_path=r"D:/Python/Sel_python/drivers/chromedriverv86/chromedriver.exe", chrome_options=chrome_options)
# terminate script
yield
self.driver.close()
self.driver.quit()
print("Test completed")
##Remaining functions/test cases followed. Not adding the entire script here
I pushed this code onto git and then tried to run the same in jenkins using following build commands:
cd "D:\Python\Sel_python\Pytest"
pip install -r requirements.txt
pytest Test_Tracking_code_scripts.py -s -v
But then jenkins threw an error that chromedriver cannot be located. My questions are:
Do I need to upload chromedriver.exe as well into my git repository
Does jenkins have its own chrome browser? If yes how do I use it and what path has to be specified?
I am new to jenkins, please help me out here
Check chrome version in Jenkins system
Download chrome driver based on Jenkins system from here
Copy the chrome driver to "C:/drivers/" in Jenkins server (As C driver is common to all windows system)
update code as below
self.driver = webdriver.Chrome(executable_path=r"D:/Python/Sel_python/drivers/chromedriverv86/chromedriver.exe", chrome_options=chrome_options)
as
self.driver = webdriver.Chrome(executable_path=r"C:/drivers/chromedriver.exe", chrome_options=chrome_options)
Let me know if you face any issues with this.
NOTE:
In local system please move driver to "C:/driver" so that both remote and local system path is same.
If chrome version is updated in local or remote, please update chrome driver version i.e. chromedriver.exe
I found the solution. My code was missing the chrome binary path. Adding the same as an Options() argument resolved the error.
Sharing the updated patch of code:
from seleniumwire import webdriver
import pytest
from selenium.webdriver.chrome.options import Options
import time
import allure
class Test_main():
#pytest.fixture()
def test_setup(self):
# initiating browser
chrome_options = Options()
chrome_options.binary_location=r"C:\Users\libin.thomas\AppData\Local\Google\Chrome\Application\chrome.exe"
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--headless')
self.driver = webdriver.Chrome(executable_path=r"D:/Python/Sel_python/drivers/chromedriver v86/chromedriver.exe",options=chrome_options)
# terminate script
yield
self.driver.close()
self.driver.quit()
print("Test completed")
#test cases followed below
I want so scrape some dynamic url (the page is built using JavaScript and does a redirect to an external page). I understand that I need to use a headless browser and I am using Selenium with Chrome driver. The following code does what I want on my Windows machine:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
class MySpider(scrapy.Spider):
name = 'my_spider'
def __init__(self):
self.driver = webdriver.Chrome(ChromeDriverManager().install())
def parse(self, response):
link = "some-url-which-uses-javascript.com"
self.driver.get(link)
time.sleep(1) # without this wait, driver.current_url is not the final redirect
url = self.driver.current_url
But when I run the same code on my Ubuntu server (which does not have GUI) I get the following error:
builtins.ValueError: Could not get version for Chrome with this
command: google-chrome --version || google-chrome-stable --version
I have installed both google-chrome-stable and chromium-chromedriver on the Ubuntu server.
I have also tried the following code:
class MySpider(scrapy.Spider):
name = 'my_spider'
def __init__(self):
# self.driver = webdriver.Chrome(ChromeDriverManager().install())
# self.driver = webdriver.Chrome('/usr/bin/chromedriver')
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
self.driver = webdriver.Chrome('/usr/bin/chromedriver', chrome_options=chrome_options)
Bu I get the following error:
selenium.common.exceptions.WebDriverException: Message: unknown error:
Chrome failed to start: exited abnormally. (unknown error:
DevToolsActivePort file doesn't exist) (The process started from
chrome location /usr/bin/google-chrome is no longer running, so
ChromeDriver is assuming that Chrome has crashed.)
I am getting this error when I run my tests with Selenium using chromedriver.
selenium.common.exceptions.WebDriverException: Message:
unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.9.248316,platform=Linux 3.8.0-29-generic x86)
I did download google-chrome stable and also chromedriver and have used this code to start the browser.
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
Any suggestions anyone? Thanks.
For Linux :
Start the Display before start the Chrome. for more info click here
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 800))
display.start()
driver = webdriver.Chrome()
To help debug this problem you can use the service_log_path and service_args arguments to the selenium webdriver to see output from the chromedriver:
service_log_path = "{}/chromedriver.log".format(outputdir)
service_args = ['--verbose']
driver = webdriver.Chrome('/path/to/chromedriver',
service_args=service_args,
service_log_path=service_log_path)
I was getting this same exception message and found two ways to get past it; I'm not sure if the OP's problem is the same, but if not, the chromedriver log will hopefully help. Looking at my log, I discovered that the chromedriver (I tried 2.9 down to 2.6 while trying to fix this problem) decides which browser to run in a very unexpected way. In the directory where my chromedriver is located I have these files:
$ ls -l /path/to/
-rwx------ 1 pjh grad_cs 5503600 Feb 3 00:07 chromedriver-2.9
drwxr-xr-x 3 pjh grad_cs 4096 Mar 28 15:51 chromium
When I invoke the chromedriver using the same python code as the OP:
driver = webdriver.Chrome('/path/to/chromedriver-2.9')
This leads to the exception message. In the chromedriver.log I found this message:
[1.043][INFO]: Launching chrome: /path/to/chromium ...
Unbelievable! The chromedriver is trying to use /path/to/chromium (which is not an executable file, but a directory containing source code) as the browser to execute! Apparently chromedriver tries to search the current directory for a browser to run before searching my PATH. So, one easy solution to this problem is to check the directory where the chromedriver is located for files/directories like chrome and chromium and move them to a different directory than the chromedriver.
A better solution is to explicitly tell selenium / chromedriver which browser to execute by using the chrome_options argument:
options = webdriver.ChromeOptions()
options.binary_location = '/opt/google/chrome/google-chrome'
service_log_path = "{}/chromedriver.log".format(outputdir)
service_args = ['--verbose']
driver = webdriver.Chrome('/path/to/chromedriver',
chrome_options=options,
service_args=service_args,
service_log_path=service_log_path)
The chromedriver.log now shows:
[0.999][INFO]: Launching chrome: /opt/google/chrome/google-chrome ...
as expected.
An alternative solution of using a virtual display is the headless mode.
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,1080')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options)
If using Linux make sure you are not running as root. That what gave me the error.
Someone already mentioned about --no-sandbox option, but to expand on it: make sure, it's the first option you pass:
System.setProperty("webdriver.chrome.driver",
Paths.get("setups", driverFolder, driverFile).toAbsolutePath().toString());
ChromeOptions options = new ChromeOptions();
Map<String, Object> prefs = new HashMap<>();
prefs.put("intl.accept_languages", "English");
options.setExperimentalOption("prefs", prefs);
options.addArguments("--no-sandbox");
options.addArguments("--disable-features=VizDisplayCompositor");
options.addArguments("--incognito");
options.addArguments("enable-automation");
options.addArguments("--headless");
options.addArguments("--window-size=1920,1080");
options.addArguments("--disable-gpu");
options.addArguments("--disable-extensions");
options.addArguments("--dns-prefetch-disable");
options.setPageLoadStrategy(PageLoadStrategy.NORMAL);
options.addArguments("enable-features=NetworkServiceInProcess");
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability("marionette", true);
capabilities.setCapability(ChromeOptions.CAPABILITY, options);
WebDriver driver = new ChromeDriver(capabilities);
driver.manage().timeouts().implicitlyWait(15, SECONDS);
driver.manage().timeouts().pageLoadTimeout(15, SECONDS);
When it was added after other options, I got the error.
You may be able to fix this issue by making sure your version of chromedriver is right for the version of Chrome you have installed, which you can check here. You will also need to remove your current version of chromedriver before installing the new one, as described in Delete Chromedriver from Ubuntu
This issue resolved using below steps
Install Xvfb
Centos 7 : yum install chromedriver chromium xorg-x11-server-Xvfb
update chrome driver
Centos 7 : wget https://chromedriver.storage.googleapis.com/2.40/chromedriver_linux64.zip
I was faced with the same issue and fixed it by installing Chrome in:
C:\Users\..\AppData\Local\Google\Chrome\Application
You can do this by running the Chrome Setup and saying no when prompted by the User Account Control.
I got the same error when I crawl something using scrapy + selenium + chrome driver on Centos 7,and the method of following url solved my problem.
yum install mesa-libOSMesa-devel gnu-free-sans-fonts
refer:https://bugs.chromium.org/p/chromium/issues/detail?id=695212
Another solution for selenium webdriver is X virtual frame buffer:
with Xvfb() as _:
timeout_request = ConfigTargetsManager.target_global_configs.get('timeout_request', 10)
driver = webdriver.Chrome(executable_path=ConfigTargetsManager.target_global_configs.get('chrome_browser_path',
'/usr/lib/chromium-browser/chromedriver'))
driver.get(url)
Ubuntu 22.04.
May be useful for someone.
I got this bug when trying to get selenium to work with a version of Brave (Deb) installed from the brave.com repository.
Additionally installed Brave from the snap image, added it:
options.add_argument('--remote-debugging-port=9224')
options.binary_location = '/snap/bin/brave'
This solved the problem.