Selenium crashing (chrome & firefox) within a flask route on ubuntu 20.04 - python

Running Ubuntu 20.04 LTS server
Trying to save a screenshot via selenium within a flask route.
Issue is no matter what I try it crashes.
Using --headless
#api.route('/image/<path:encoded_url>.png')
def generate_image(encoded_url):
"""
Returns an image (PNG) of a URL. The URL is encoded in the path of the image being requested.
"""
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
options.add_argument("--disable-gpu")
options.add_argument("--disable-dev-shm-using")
driver = webdriver.Chrome(f"{os.getcwd()}/chromedriver", options=options)
url = urllib.parse.unquote_plus(encoded_url)
driver.get(url if "http" in url else "https://" + url)
driver.set_window_size(1200, 630)
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
break
driver.save_screenshot("screen.png")
driver.close()
return send_file("screen.png", mimetype='image/png')
I've tried everything but firefox exits with error 127 (not much online regarding this)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 127
I've tried running with Xvfb with no luck.

Ok, so after trying a whole load of options it turns out the issue was with my setup of gunicorn.
go to
cd /etc/systemd/system/
open the service you made for guinocorn
sudo nano {servicename}
The issue was the Environment line
Originally I has it as this:
Environment="PATH=/home/ubuntu/retrotex/environment/bin"
But once I changed it to the below everything worked:
Environment="PATH=/home/ubuntu/retrotex/environment/bin:/usr/bin:/bin"
It seems that flask didn't have access to the folders needed to run chome or selenium.
Make sure you run the below to reload the service after
sudo systemctl daemon-reload
sudo systemctl restart {service_name}
What a waste of a day.

Related

Web Automation without browser in Linux Server [duplicate]

So I'm trying some stuff out with selenium and I really want it to be quick.
So my thought is that running it with headless chrome would make my script faster.
First is that assumption correct, or does it not matter if i run my script with a headless driver?
Anyways I still want to get it to work to run headless, but I somehow can't, I tried different things and most suggested that it would work as said here in the October update
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
But when I try that, I get weird console output and it still doesn't seem to work.
Any tipps appreciated.
To run chrome-headless just add --headless via chrome_options.add_argument, i.e.:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)
start_url = "https://duckgo.com"
driver.get(start_url)
print(driver.page_source.encode("utf-8"))
# b'<!DOCTYPE html><html xmlns="http://www....
driver.quit()
So my thought is that running it with headless chrome would make my
script faster.
Try using chrome options like --disable-extensions or --disable-gpu and benchmark it, but I wouldn't count with much improvement.
References: headless-chrome
Install & run containerized Chrome:
docker pull selenium/standalone-chrome
docker run --rm -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome
Connect using webdriver.Remote:
driver = webdriver.Remote('http://localhost:4444/wd/hub', webdriver.DesiredCapabilities.CHROME)
driver.set_window_size(1280, 1024)
driver.get('https://www.google.com')
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
url = "https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver"
driver.get(url)
sleep(5)
h1 = driver.find_element_by_xpath("//h1[#itemprop='name']").text
print(h1)
Then I run script on our local machine
➜ python script.py
Running Selenium with Headless Chrome Webdriver
It is working and it is with headless Chrome.
If you are using Linux environment, may be you have to add --no-sandbox as well and also specific window size settings. The --no-sandbox flag is no needed on Windows if you set user container properly.
Use --disable-gpu only on Windows. Other platforms no longer require it. The --disable-gpu flag is a temporary work around for a few bugs.
//Headless chrome browser and configure
WebDriverManager.chromedriver().setup();
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("disable-gpu");
// chromeOptions.addArguments("window-size=1400,2100"); // Linux should be activate
driver = new ChromeDriver(chromeOptions);
Once you have selenium and web driver installed. Below worked for me with headless Chrome on linux cluster :
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs",{"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)
Todo (tested on headless server Debian Linux 9.4):
Do this:
# install chrome
curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
apt-get -y update
apt-get -y install google-chrome-stable
# install chrome driver
wget https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
mv chromedriver /usr/bin/chromedriver
chown root:root /usr/bin/chromedriver
chmod +x /usr/bin/chromedriver
Install selenium:
pip install selenium
and run this Python code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("no-sandbox")
options.add_argument("headless")
options.add_argument("start-maximized")
options.add_argument("window-size=1900,1080");
driver = webdriver.Chrome(chrome_options=options, executable_path="/usr/bin/chromedriver")
driver.get("https://www.example.com")
html = driver.page_source
print(html)
As stated by the accepted answer:
options.add_argument("--headless")
These tips might help to speed things up especially for headless:
There are quite a few things you can do in headless that you cant do in non headless
Since you will be using Chrome Headless, I've found adding this reduces the CPU usage by about 20% for me (I found this to be a CPU and memory hog when looking at htop)
--disable-crash-reporter
This will only disable when you are running in headless This might speed things up for you!!!
My settings are currently as follows and I reduce the CPU (but only a marginal time saving) by about 20%:
options.add_argument("--no-sandbox");
options.add_argument("--disable-dev-shm-usage");
options.add_argument("--disable-renderer-backgrounding");
options.add_argument("--disable-background-timer-throttling");
options.add_argument("--disable-backgrounding-occluded-windows");
options.add_argument("--disable-client-side-phishing-detection");
options.add_argument("--disable-crash-reporter");
options.add_argument("--disable-oopr-debug-crash-dump");
options.add_argument("--no-crash-upload");
options.add_argument("--disable-gpu");
options.add_argument("--disable-extensions");
options.add_argument("--disable-low-res-tiling");
options.add_argument("--log-level=3");
options.add_argument("--silent");
I found this to be a pretty good list (full list I think) of command line switches with explanations: https://peter.sh/experiments/chromium-command-line-switches/
Some additional things you can turn off are also mentioned here: https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md
I hope this helps someone
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Program
Files\Google\Chrome\Application\chromedriver.exe", options=chrome_options)
This is ok for me.

Is it possible to temporarily hide a chrome window when using selenium chromedriver? [duplicate]

So I'm trying some stuff out with selenium and I really want it to be quick.
So my thought is that running it with headless chrome would make my script faster.
First is that assumption correct, or does it not matter if i run my script with a headless driver?
Anyways I still want to get it to work to run headless, but I somehow can't, I tried different things and most suggested that it would work as said here in the October update
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
But when I try that, I get weird console output and it still doesn't seem to work.
Any tipps appreciated.
To run chrome-headless just add --headless via chrome_options.add_argument, i.e.:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)
start_url = "https://duckgo.com"
driver.get(start_url)
print(driver.page_source.encode("utf-8"))
# b'<!DOCTYPE html><html xmlns="http://www....
driver.quit()
So my thought is that running it with headless chrome would make my
script faster.
Try using chrome options like --disable-extensions or --disable-gpu and benchmark it, but I wouldn't count with much improvement.
References: headless-chrome
Install & run containerized Chrome:
docker pull selenium/standalone-chrome
docker run --rm -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome
Connect using webdriver.Remote:
driver = webdriver.Remote('http://localhost:4444/wd/hub', webdriver.DesiredCapabilities.CHROME)
driver.set_window_size(1280, 1024)
driver.get('https://www.google.com')
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
url = "https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver"
driver.get(url)
sleep(5)
h1 = driver.find_element_by_xpath("//h1[#itemprop='name']").text
print(h1)
Then I run script on our local machine
➜ python script.py
Running Selenium with Headless Chrome Webdriver
It is working and it is with headless Chrome.
If you are using Linux environment, may be you have to add --no-sandbox as well and also specific window size settings. The --no-sandbox flag is no needed on Windows if you set user container properly.
Use --disable-gpu only on Windows. Other platforms no longer require it. The --disable-gpu flag is a temporary work around for a few bugs.
//Headless chrome browser and configure
WebDriverManager.chromedriver().setup();
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("disable-gpu");
// chromeOptions.addArguments("window-size=1400,2100"); // Linux should be activate
driver = new ChromeDriver(chromeOptions);
Once you have selenium and web driver installed. Below worked for me with headless Chrome on linux cluster :
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs",{"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)
Todo (tested on headless server Debian Linux 9.4):
Do this:
# install chrome
curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
apt-get -y update
apt-get -y install google-chrome-stable
# install chrome driver
wget https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
mv chromedriver /usr/bin/chromedriver
chown root:root /usr/bin/chromedriver
chmod +x /usr/bin/chromedriver
Install selenium:
pip install selenium
and run this Python code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("no-sandbox")
options.add_argument("headless")
options.add_argument("start-maximized")
options.add_argument("window-size=1900,1080");
driver = webdriver.Chrome(chrome_options=options, executable_path="/usr/bin/chromedriver")
driver.get("https://www.example.com")
html = driver.page_source
print(html)
As stated by the accepted answer:
options.add_argument("--headless")
These tips might help to speed things up especially for headless:
There are quite a few things you can do in headless that you cant do in non headless
Since you will be using Chrome Headless, I've found adding this reduces the CPU usage by about 20% for me (I found this to be a CPU and memory hog when looking at htop)
--disable-crash-reporter
This will only disable when you are running in headless This might speed things up for you!!!
My settings are currently as follows and I reduce the CPU (but only a marginal time saving) by about 20%:
options.add_argument("--no-sandbox");
options.add_argument("--disable-dev-shm-usage");
options.add_argument("--disable-renderer-backgrounding");
options.add_argument("--disable-background-timer-throttling");
options.add_argument("--disable-backgrounding-occluded-windows");
options.add_argument("--disable-client-side-phishing-detection");
options.add_argument("--disable-crash-reporter");
options.add_argument("--disable-oopr-debug-crash-dump");
options.add_argument("--no-crash-upload");
options.add_argument("--disable-gpu");
options.add_argument("--disable-extensions");
options.add_argument("--disable-low-res-tiling");
options.add_argument("--log-level=3");
options.add_argument("--silent");
I found this to be a pretty good list (full list I think) of command line switches with explanations: https://peter.sh/experiments/chromium-command-line-switches/
Some additional things you can turn off are also mentioned here: https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md
I hope this helps someone
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Program
Files\Google\Chrome\Application\chromedriver.exe", options=chrome_options)
This is ok for me.

Chromedriver test work locally and on CI CD env python

What I have: CURRENT_BROWSER=chrome in Win Environments
def before_scenario(context, scenario):
use_fixture(browser, context)
def after_scenario(context, scenario):
context.cache.clear()
context.driver.quit()
#fixture
def browser(context):
browser_type = os.getenv('CURRENT_BROWSER', 'chrome')
if browser_type is None:
raise Exception(f"Unable to identify test browser which is {browser_type}")
if browser_type == 'chrome':
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--incognito')
context.driver = webdriver.Chrome(desired_capabilities=chrome_options.to_capabilities())
if browser_type == 'firefox':
pass
yield context.driver
What I need is: the answer how to deal with the chromedriver on CI CD (azureDevops) should I also put ENV variable similar to Browser in to the PATH and do the same on CI CD or there is different way to deal with chrome driver. I need above code will work locally and on CI CD and I never did that before. Locally I use above code + chromedriver.exe added in to project structure
If you are using a Microsoft-hosted agent: windows-latest, windows-2019 or vs2017-win2016, the Chrome Driver 87.0.4280.88 is already installed.
If you want to use another version of Chrome Driver, you can download it using npm:
- script: npm install chromedriver --chromedriver_version=LATEST
Click this document for detailed information.
If you are using a Self-hosted agent and the agent is on a machine that has already downloaded the Chrome Driver and configured PATH, you can use Chrome Driver just as you work on your own machine.

Unknown error: Chrome failed to start: exited abnormally

I am getting this error when I run my tests with Selenium using chromedriver.
selenium.common.exceptions.WebDriverException: Message:
unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.9.248316,platform=Linux 3.8.0-29-generic x86)
I did download google-chrome stable and also chromedriver and have used this code to start the browser.
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
Any suggestions anyone? Thanks.
For Linux :
Start the Display before start the Chrome. for more info click here
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 800))
display.start()
driver = webdriver.Chrome()
To help debug this problem you can use the service_log_path and service_args arguments to the selenium webdriver to see output from the chromedriver:
service_log_path = "{}/chromedriver.log".format(outputdir)
service_args = ['--verbose']
driver = webdriver.Chrome('/path/to/chromedriver',
service_args=service_args,
service_log_path=service_log_path)
I was getting this same exception message and found two ways to get past it; I'm not sure if the OP's problem is the same, but if not, the chromedriver log will hopefully help. Looking at my log, I discovered that the chromedriver (I tried 2.9 down to 2.6 while trying to fix this problem) decides which browser to run in a very unexpected way. In the directory where my chromedriver is located I have these files:
$ ls -l /path/to/
-rwx------ 1 pjh grad_cs 5503600 Feb 3 00:07 chromedriver-2.9
drwxr-xr-x 3 pjh grad_cs 4096 Mar 28 15:51 chromium
When I invoke the chromedriver using the same python code as the OP:
driver = webdriver.Chrome('/path/to/chromedriver-2.9')
This leads to the exception message. In the chromedriver.log I found this message:
[1.043][INFO]: Launching chrome: /path/to/chromium ...
Unbelievable! The chromedriver is trying to use /path/to/chromium (which is not an executable file, but a directory containing source code) as the browser to execute! Apparently chromedriver tries to search the current directory for a browser to run before searching my PATH. So, one easy solution to this problem is to check the directory where the chromedriver is located for files/directories like chrome and chromium and move them to a different directory than the chromedriver.
A better solution is to explicitly tell selenium / chromedriver which browser to execute by using the chrome_options argument:
options = webdriver.ChromeOptions()
options.binary_location = '/opt/google/chrome/google-chrome'
service_log_path = "{}/chromedriver.log".format(outputdir)
service_args = ['--verbose']
driver = webdriver.Chrome('/path/to/chromedriver',
chrome_options=options,
service_args=service_args,
service_log_path=service_log_path)
The chromedriver.log now shows:
[0.999][INFO]: Launching chrome: /opt/google/chrome/google-chrome ...
as expected.
An alternative solution of using a virtual display is the headless mode.
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,1080')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options)
If using Linux make sure you are not running as root. That what gave me the error.
Someone already mentioned about --no-sandbox option, but to expand on it: make sure, it's the first option you pass:
System.setProperty("webdriver.chrome.driver",
Paths.get("setups", driverFolder, driverFile).toAbsolutePath().toString());
ChromeOptions options = new ChromeOptions();
Map<String, Object> prefs = new HashMap<>();
prefs.put("intl.accept_languages", "English");
options.setExperimentalOption("prefs", prefs);
options.addArguments("--no-sandbox");
options.addArguments("--disable-features=VizDisplayCompositor");
options.addArguments("--incognito");
options.addArguments("enable-automation");
options.addArguments("--headless");
options.addArguments("--window-size=1920,1080");
options.addArguments("--disable-gpu");
options.addArguments("--disable-extensions");
options.addArguments("--dns-prefetch-disable");
options.setPageLoadStrategy(PageLoadStrategy.NORMAL);
options.addArguments("enable-features=NetworkServiceInProcess");
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability("marionette", true);
capabilities.setCapability(ChromeOptions.CAPABILITY, options);
WebDriver driver = new ChromeDriver(capabilities);
driver.manage().timeouts().implicitlyWait(15, SECONDS);
driver.manage().timeouts().pageLoadTimeout(15, SECONDS);
When it was added after other options, I got the error.
You may be able to fix this issue by making sure your version of chromedriver is right for the version of Chrome you have installed, which you can check here. You will also need to remove your current version of chromedriver before installing the new one, as described in Delete Chromedriver from Ubuntu
This issue resolved using below steps
Install Xvfb
Centos 7 : yum install chromedriver chromium xorg-x11-server-Xvfb
update chrome driver
Centos 7 : wget https://chromedriver.storage.googleapis.com/2.40/chromedriver_linux64.zip
I was faced with the same issue and fixed it by installing Chrome in:
C:\Users\..\AppData\Local\Google\Chrome\Application
You can do this by running the Chrome Setup and saying no when prompted by the User Account Control.
I got the same error when I crawl something using scrapy + selenium + chrome driver on Centos 7,and the method of following url solved my problem.
yum install mesa-libOSMesa-devel gnu-free-sans-fonts
refer:https://bugs.chromium.org/p/chromium/issues/detail?id=695212
Another solution for selenium webdriver is X virtual frame buffer:
with Xvfb() as _:
timeout_request = ConfigTargetsManager.target_global_configs.get('timeout_request', 10)
driver = webdriver.Chrome(executable_path=ConfigTargetsManager.target_global_configs.get('chrome_browser_path',
'/usr/lib/chromium-browser/chromedriver'))
driver.get(url)
Ubuntu 22.04.
May be useful for someone.
I got this bug when trying to get selenium to work with a version of Brave (Deb) installed from the brave.com repository.
Additionally installed Brave from the snap image, added it:
options.add_argument('--remote-debugging-port=9224')
options.binary_location = '/snap/bin/brave'
This solved the problem.

How can I disable web security in selenium through Python?

Apparently it's common for google-chrome to get this: http://jira.openqa.org/browse/SRC-740
The key is to start it without security enabled. To disable security,
"--disable-web-security",
I'm having trouble wondering how to actually specify these command line arguments, though, so it fails on the open invocation here:
from selenium import selenium
sel = selenium('localhost', 4444, '*googlechrome', 'http://www.google.com/')
sel.start()
sel.open('/')
Here's how I start the selenium server:
shogun#box:~$ java -jar selenium-server-standalone-2.0b3.jar
To get this to work, I had to create an external script to wrap the chrome browser. Place a script somewhere your Selenium server can reach it (mine is at ~/bin/startchrome, and chmod it executable:
#!/bin/sh
# chrome expects to be run from the .app dir, so cd into it
# (the spaces in the path are a Mac thing)
cd /Applications/Google\ Chrome.app
exec ./Contents/MacOS/Google\ Chrome --disable-security $*
Then in your Python code, do this:
from selenium import selenium
browser = '*googlechrome /Users/pat/bin/startchrome'
sel = selenium('localhost', 4444, browser, 'http://www.google.com')
sel.start()
sel.open('/')

Categories

Resources