Downloading with chrome headless and selenium - python

I'm using python-selenium and Chrome 59 and trying to automate a simple download sequence. When I launch the browser normally, the download works, but when I do so in headless mode, the download doesn't work.
# Headless implementation
from selenium import webdriver
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("headless")
driver = webdriver.Chrome(chrome_options=chromeOptions)
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download doesn't start
# Normal Mode
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download works normally
I've even tried adding a default path:
prefs = {"download.default_directory" : "/Users/Chetan/Desktop/"}
chromeOptions.add_argument("headless")
chromeOptions.add_experimental_option("prefs",prefs)
Adding a default path works in the normal implementation, but the same problem persists in the headless version.
How do I get the download to start in headless mode?

Yes, it's a "feature", for security. As mentioned before here is the bug discussion: https://bugs.chromium.org/p/chromium/issues/detail?id=696481
Support was added in chrome version 62.0.3196.0 or above to enable downloading.
Here is a python implementation. I had to add the command to the chromedriver commands. I will try to submit a PR so it is included in the library in the future.
def enable_download_in_headless_chrome(self, driver, download_dir):
# add missing support for chrome "send_command" to selenium webdriver
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
command_result = driver.execute("send_command", params)
For reference here is a little repo to demonstrate how to use this:
https://github.com/shawnbutton/PythonHeadlessChrome
update 2020-05-01 There have been comments saying this is not working anymore. Given this patch is now over a year old it's quite possible they have changed the underlying library.

Here's a working example for Python based on Shawn Button's answer. I've tested this with Chromium 68.0.3440.75 & chromedriver 2.38
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
"download.default_directory": "/path/to/download/dir",
"download.prompt_for_download": False,
})
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': "/path/to/download/dir"}}
command_result = driver.execute("send_command", params)
driver.get('http://download-page.url/')
driver.find_element_by_css_selector("#download_link").click()

The Chromium developers recently added a 2nd headless mode (in 2021). See https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c36
They later renamed the option in 2023 for Chrome 109 -> https://github.com/chromium/chromium/commit/e9c516118e2e1923757ecb13e6d9fff36775d1f4
For Chrome 109 and above, the --headless=new flag will now allow you to get the full functionality of Chrome in the new headless mode, and you can even run extensions in it. (For Chrome versions 96 through 108, use --headless=chrome)
Usage: (Chrome 109 and above):
options.add_argument("--headless=new")
Usage: (Chrome 96 through Chrome 108):
options.add_argument("--headless=chrome")
If something works in regular Chrome, it should now work with the newer headless mode too.

This is a feature of Chrome to prevent from software to download files to your computer. There is a workaround though. Read more about it here.
What you need to do is enable it via DevTools, Something like that:
async function setDownload () {
const client = await CDP({tab: 'ws://localhost:9222/devtools/browser'});
const info = await client.send('Browser.setDownloadBehavior', {behavior : "allow", downloadPath: "/tmp/"});
await client.close();
}
This is the solution some one gave in the mentioned topic. Here is his comment.

UPDATED PYTHON SOLUTION -
TESTED Mar 4, 2021 on chromedriver v88 and v89
This will allow you to click to download files in headless mode.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
# Instantiate headless driver
chrome_options = Options()
# Windows path
chromedriver_location = 'C:\\path\\to\\chromedriver_win32\\chromedriver.exe'
# Mac path. May have to allow chromedriver developer in os system prefs
'/Users/path/to/chromedriver'
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_prefs = {"download.default_directory": r"C:\path\to\Downloads"} # (windows)
chrome_options.experimental_options["prefs"] = chrome_prefs
driver = webdriver.Chrome(chromedriver_location,options=chrome_options)
# Download your file
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()

Maybe the website that you handle returns different HTML pages for browsers, means the XPath or Id that you want maybe differently in headless browser.
Try to download pageSource in headless browser and open it as HTML page to see the Id or XPath that you want.
You can see this as c# example How to hide FirefoxDriver (using Selenium) without findElement function error in PhantomDriver? .

Usually it's redundant seeing the same thing just written in another language, but because this issue drove me crazy, I hope I'm saving someone else from the pain... so here's the C# version of Shawn Button's answer (tested with headless chrome=71.0.3578.98, chromedriver=2.45.615279, platform=Linux 4.9.125-linuxkit x86_64)):
var enableDownloadCommandParameters = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadDirectoryPath }
};
var result = ((OpenQA.Selenium.Chrome.ChromeDriver)driver).ExecuteChromeCommandWithResult("Page.setDownloadBehavior", enableDownloadCommandParameters);

A full working example for JavaScript with selenium-cucumber-js / selenium-webdriver:
const chromedriver = require('chromedriver');
const selenium = require('selenium-webdriver');
const command = require('selenium-webdriver/lib/command');
const chrome = require('selenium-webdriver/chrome');
module.exports = function() {
const chromeOptions = new chrome.Options()
.addArguments('--no-sandbox', '--headless', '--start-maximized', '--ignore-certificate-errors')
.setUserPreferences({
'profile.default_content_settings.popups': 0, // disable download file dialog
'download.default_directory': '/tmp/downloads', // default file download location
"download.prompt_for_download": false,
'download.directory_upgrade': true,
'safebrowsing.enabled': false,
'plugins.always_open_pdf_externally': true,
'plugins.plugins_disabled': ["Chrome PDF Viewer"]
})
.windowSize({width: 1600, height: 1200});
const driver = new selenium.Builder()
.withCapabilities({
browserName: 'chrome',
javascriptEnabled: true,
acceptSslCerts: true,
path: chromedriver.path
})
.setChromeOptions(chromeOptions)
.build();
driver.manage().window().maximize();
driver.getSession()
.then(session => {
const cmd = new command.Command("SEND_COMMAND")
.setParameter("cmd", "Page.setDownloadBehavior")
.setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
return driver.execute(cmd);
});
return driver;
};
The key part is:
driver.getSession()
.then(session => {
const cmd = new command.Command("SEND_COMMAND")
.setParameter("cmd", "Page.setDownloadBehavior")
.setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
return driver.execute(cmd);
});
Tested with:
Chrome 67.0.3396.99
Chromedriver 2.36.540469
selenium-cucumber-js 1.5.12
selenium-webdriver 3.0.0

Following is the equivalent in Java, selenium, chromedriver and chrome v 71.x. The code in is the key to allow saving of downloads
Additional jars: com.fasterxml.jackson.core, com.fasterxml.jackson.annotation, com.fasterxml.jackson.databind
System.setProperty("webdriver.chrome.driver","C:\libraries\chromedriver.exe");
String downloadFilepath = "C:\\Download";
HashMap<String, Object> chromePreferences = new HashMap<String, Object>();
chromePreferences.put("profile.default_content_settings.popups", 0);
chromePreferences.put("download.prompt_for_download", "false");
chromePreferences.put("download.default_directory", downloadFilepath);
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.setBinary("C:\\pathto\\Chrome SxS\\Application\\chrome.exe");
//ChromeOptions options = new ChromeOptions();
//chromeOptions.setExperimentalOption("prefs", chromePreferences);
chromeOptions.addArguments("start-maximized");
chromeOptions.addArguments("disable-infobars");
//HEADLESS CHROME
**chromeOptions.addArguments("headless");**
chromeOptions.setExperimentalOption("prefs", chromePreferences);
DesiredCapabilities cap = DesiredCapabilities.chrome();
cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
cap.setCapability(ChromeOptions.CAPABILITY, chromeOptions);
**ChromeDriverService driverService = ChromeDriverService.createDefaultService();
ChromeDriver driver = new ChromeDriver(driverService, chromeOptions);
Map<String, Object> commandParams = new HashMap<>();
commandParams.put("cmd", "Page.setDownloadBehavior");
Map<String, String> params = new HashMap<>();
params.put("behavior", "allow");
params.put("downloadPath", downloadFilepath);
commandParams.put("params", params);
ObjectMapper objectMapper = new ObjectMapper();
HttpClient httpClient = HttpClientBuilder.create().build();
String command = objectMapper.writeValueAsString(commandParams);
String u = driverService.getUrl().toString() + "/session/" + driver.getSessionId() + "/chromium/send_command";
HttpPost request = new HttpPost(u);
request.addHeader("content-type", "application/json");
request.setEntity(new StringEntity(command));**
try {
httpClient.execute(request);
} catch (IOException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}**
//Continue using the driver for automation
driver.manage().window().maximize();

I solved this problem by using the workaround shared by #Shawn Button and using the full path for the 'downloadPath' parameter. Using a relative path did not work and give me the error.
Versions:
Chrome Version 75.0.3770.100 (Official Build) (32-bit)
ChromeDriver 75.0.3770.90

Using: google-chrome-stable amd64 86.0.4240.111-1,chromedriver 86.0.4240.22, selenium 3.141.0 python 3.8.3
Tried multiple proposed solutions, and nothing really worked for chrome headless, also my testing website opens a new blank tab and then the data is downloaded.
Finally gave up on headless and implemented pyvirtualdisplay and xvfd to emulate X server, something like:
from selenium.webdriver.chrome.options import Options # and other imports
import selenium.webdriver as webdriver
import tempfile
url = "https://really_badly_programmed_website.org"
tmp_dir = tempfile.mkdtemp(prefix="hamster_")
driver_path="/usr/bin/chromedriver"
chrome_options = Options()
chrome_options.binary_location = "/usr/bin/google-chrome"
prefs = {'download.default_directory': tmp_dir,}
chrome_options.add_experimental_option("prefs", prefs)
with Display(backend="xvfb",size=(1920,1080),color_depth=24) as disp:
driver = webdriver.Chrome(options=chrome_options, executable_path=driver_path)
driver.get(url)
At the end everything worked and had the dowload file on the tmp folder.

I finally got it to work by upgrading to Chromium 90! I previously had version 72-78, but I saw that it had been fixed recently: https://bugs.chromium.org/p/chromium/issues/detail?id=696481 so i decided to give it a shot.
So after upgrading, which took a while (home brew in MacOS is so slow...), I simply did, without setting options or anything (this is a JavaScript example):
await driver.findElement(By.className('download')).click();
And it worked! I saw the downloaded PDF in the same working folder that I had been trying to download for a long time...

Related

Why is the file not downloading in headless mode in selenium? [duplicate]

I'm do me code in Cromedrive in 'normal' mode and works fine. When I change to headless mode it don't download the file. I already try the code I found alround internet, but didn't work.
chrome_options = Options()
chrome_options.add_argument("--headless")
self.driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'{}/chromedriver'.format(os.getcwd()))
self.driver.set_window_size(1024, 768)
self.driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': os.getcwd()}}
self.driver.execute("send_command", params)
Anyone have any idea about how solve this problem?
PS: I don't need to use Chomedrive necessarily. If it works in another drive it's fine for me.
First the solution
Minimum Prerequisites:
Selenium client version: Selenium v3.141.59
Chrome version: Chrome v77.0
ChromeDriver version: ChromeDriver v77.0
To download the file clicking on the element with text as Download Data within this website you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe', service_args=["--log-path=./Logs/DubiousDan.log"])
print ("Headless Chrome Initialized")
params = {'behavior': 'allow', 'downloadPath': r'C:\Users\Debanjan.B\Downloads'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
driver.get("https://www.mockaroo.com/")
driver.execute_script("scroll(0, 250)");
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#download"))).click()
print ("Download button clicked")
#driver.quit()
Console Output:
Headless Chrome Initialized
Download button clicked
File Downloading snapshot:
Details
Downloading files through Headless Chromium was one of the most sought functionality since Headless Chrome was introduced.
Since then there were different work-arounds published by different contributors and some of them are:
Downloading with chrome headless and selenium
Python equivalent of a given wget command
Now the, the good news is Chromium team have officially announced the arrival of the functionality Downloading file through Headless Chromium.
In the discussion Headless mode doesn't save file downloads #eseckler mentioned:
Downloads in headless work a little differently. There's the Page.setDownloadBehavior devtools command to set a download folder. We're working on a way to use DevTools network interception to stream the downloaded file via DevTools as well.
A detailed discussion can be found at Issue 696481: Headless mode doesn't save file downloads
Finally, #bugdroid revision seems to have nailed the issue for us.
[ChromeDriver] Added support for headless mode to download files
Previously, Chromedriver running in headless mode would not properly download files due to the fact it sparsely parses the preference file given to it. Engineers from the headless chrome team recommended using DevTools's "Page.setDownloadBehavior" to fix this. This changelist implements this fix. Downloaded files default to the current directory and can be set using download_dir when instantiating a chromedriver instance. Also added tests to ensure proper download functionality.
Here is the revision and commit
From ChromeDriver v77.0.3865.40 (2019-08-20) release notes:
Resolved issue 2454: Headless mode doesn't save file downloads [Pri-2]
Solution
Update ChromeDriver to latest ChromeDriver v77.0 level.
Update Chrome to Chrome Version 77.0 level. (as per ChromeDriver v76.0 release notes)
Note: Chrome v77.0 is yet to be GAed/pushed for release so till then you can download and install a development build and test either from:
Chrome Canary
Latest build from the Dev Channel
Outro
However Mac OSX users have a wait for their pie as On Chromedriver, headless chrome crashes after sending Page.setDownloadBehavior on MacOSX.
Chomedriver Version: 95.0.4638.54
Chrome Version 95.0.4638.69
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--start-maximized")
options.add_argument("--no-sandbox")
options.add_argument("--disable-extensions")
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--disable-gpu")
options.add_argument('--disable-software-rasterizer')
options.add_argument("user-agent=Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166")
options.add_argument("--disable-notifications")
options.add_experimental_option("prefs", {
"download.default_directory": "C:\\link\\to\\folder",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing_for_trusted_sources_enabled": False,
"safebrowsing.enabled": False
}
)
What seemed to work was that I used "\\" instead of "/" for the address. The latter approach didn't throw any error, but didn't download any documents either. But, using double back slashes did the job.
For javascript use below code:
const chrome = require('selenium-webdriver/chrome');
let options = new chrome.Options();
options.addArguments('--headless --window-size=1500,1200');
options.setUserPreferences({ 'plugins.always_open_pdf_externally': true,
"profile.default_content_settings.popups": 0,
"download.default_directory": Download_File_Path });
driver = await new webdriver.Builder().setChromeOptions(options).forBrowser('chrome').build();
Then switch tabs as soon as you click the download button:
await driver.sleep(1000);
var Handle = await driver.getAllWindowHandles();
await driver.switchTo().window(Handle[1]);
import pathlib
from selenium.webdriver import Chrome
driver = Chrome()
driver.execute_cdp_cmd("Page.setDownloadBehavior", {
"behavior": "allow",
"downloadPath": str(pathlib.Path.home().joinpath("Downloads"))
})
This C# works for me
Note the new headless option https://www.selenium.dev/blog/2023/headless-is-going-away/
private IWebDriver StartBrowserChromeHeadlessDriver()
{
var chromeOptions = new ChromeOptions();
chromeOptions.AddArgument("--headless=new");
chromeOptions.AddArgument("--window-size=1920,1080");
chromeOptions.AddUserProfilePreference("download.default_directory", downloadFolder);
var chromeDownload = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadFolder }
};
var driver = new ChromeDriver(driverFolder, chromeOptions, TimeSpan.FromSeconds(timeoutSecs));
driver.ExecuteCdpCommand("Browser.setDownloadBehavior", chromeDownload);
return driver;
}
I don't think you should be using the browser for downloading content, leave it to Chrome developers/testers.
I believe you should rather get href attribute of the element you want to download and obtain it using requests library
If your site requires authentication you could fetch cookies from the browser instance and pass them to requests.Session.

How can I download files using Selenium with Python in my linux docker container? [duplicate]

I'm do me code in Cromedrive in 'normal' mode and works fine. When I change to headless mode it don't download the file. I already try the code I found alround internet, but didn't work.
chrome_options = Options()
chrome_options.add_argument("--headless")
self.driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'{}/chromedriver'.format(os.getcwd()))
self.driver.set_window_size(1024, 768)
self.driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': os.getcwd()}}
self.driver.execute("send_command", params)
Anyone have any idea about how solve this problem?
PS: I don't need to use Chomedrive necessarily. If it works in another drive it's fine for me.
First the solution
Minimum Prerequisites:
Selenium client version: Selenium v3.141.59
Chrome version: Chrome v77.0
ChromeDriver version: ChromeDriver v77.0
To download the file clicking on the element with text as Download Data within this website you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe', service_args=["--log-path=./Logs/DubiousDan.log"])
print ("Headless Chrome Initialized")
params = {'behavior': 'allow', 'downloadPath': r'C:\Users\Debanjan.B\Downloads'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
driver.get("https://www.mockaroo.com/")
driver.execute_script("scroll(0, 250)");
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#download"))).click()
print ("Download button clicked")
#driver.quit()
Console Output:
Headless Chrome Initialized
Download button clicked
File Downloading snapshot:
Details
Downloading files through Headless Chromium was one of the most sought functionality since Headless Chrome was introduced.
Since then there were different work-arounds published by different contributors and some of them are:
Downloading with chrome headless and selenium
Python equivalent of a given wget command
Now the, the good news is Chromium team have officially announced the arrival of the functionality Downloading file through Headless Chromium.
In the discussion Headless mode doesn't save file downloads #eseckler mentioned:
Downloads in headless work a little differently. There's the Page.setDownloadBehavior devtools command to set a download folder. We're working on a way to use DevTools network interception to stream the downloaded file via DevTools as well.
A detailed discussion can be found at Issue 696481: Headless mode doesn't save file downloads
Finally, #bugdroid revision seems to have nailed the issue for us.
[ChromeDriver] Added support for headless mode to download files
Previously, Chromedriver running in headless mode would not properly download files due to the fact it sparsely parses the preference file given to it. Engineers from the headless chrome team recommended using DevTools's "Page.setDownloadBehavior" to fix this. This changelist implements this fix. Downloaded files default to the current directory and can be set using download_dir when instantiating a chromedriver instance. Also added tests to ensure proper download functionality.
Here is the revision and commit
From ChromeDriver v77.0.3865.40 (2019-08-20) release notes:
Resolved issue 2454: Headless mode doesn't save file downloads [Pri-2]
Solution
Update ChromeDriver to latest ChromeDriver v77.0 level.
Update Chrome to Chrome Version 77.0 level. (as per ChromeDriver v76.0 release notes)
Note: Chrome v77.0 is yet to be GAed/pushed for release so till then you can download and install a development build and test either from:
Chrome Canary
Latest build from the Dev Channel
Outro
However Mac OSX users have a wait for their pie as On Chromedriver, headless chrome crashes after sending Page.setDownloadBehavior on MacOSX.
Chomedriver Version: 95.0.4638.54
Chrome Version 95.0.4638.69
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--start-maximized")
options.add_argument("--no-sandbox")
options.add_argument("--disable-extensions")
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--disable-gpu")
options.add_argument('--disable-software-rasterizer')
options.add_argument("user-agent=Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166")
options.add_argument("--disable-notifications")
options.add_experimental_option("prefs", {
"download.default_directory": "C:\\link\\to\\folder",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing_for_trusted_sources_enabled": False,
"safebrowsing.enabled": False
}
)
What seemed to work was that I used "\\" instead of "/" for the address. The latter approach didn't throw any error, but didn't download any documents either. But, using double back slashes did the job.
For javascript use below code:
const chrome = require('selenium-webdriver/chrome');
let options = new chrome.Options();
options.addArguments('--headless --window-size=1500,1200');
options.setUserPreferences({ 'plugins.always_open_pdf_externally': true,
"profile.default_content_settings.popups": 0,
"download.default_directory": Download_File_Path });
driver = await new webdriver.Builder().setChromeOptions(options).forBrowser('chrome').build();
Then switch tabs as soon as you click the download button:
await driver.sleep(1000);
var Handle = await driver.getAllWindowHandles();
await driver.switchTo().window(Handle[1]);
import pathlib
from selenium.webdriver import Chrome
driver = Chrome()
driver.execute_cdp_cmd("Page.setDownloadBehavior", {
"behavior": "allow",
"downloadPath": str(pathlib.Path.home().joinpath("Downloads"))
})
This C# works for me
Note the new headless option https://www.selenium.dev/blog/2023/headless-is-going-away/
private IWebDriver StartBrowserChromeHeadlessDriver()
{
var chromeOptions = new ChromeOptions();
chromeOptions.AddArgument("--headless=new");
chromeOptions.AddArgument("--window-size=1920,1080");
chromeOptions.AddUserProfilePreference("download.default_directory", downloadFolder);
var chromeDownload = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadFolder }
};
var driver = new ChromeDriver(driverFolder, chromeOptions, TimeSpan.FromSeconds(timeoutSecs));
driver.ExecuteCdpCommand("Browser.setDownloadBehavior", chromeDownload);
return driver;
}
I don't think you should be using the browser for downloading content, leave it to Chrome developers/testers.
I believe you should rather get href attribute of the element you want to download and obtain it using requests library
If your site requires authentication you could fetch cookies from the browser instance and pass them to requests.Session.

Why does the result cannot create when I run webdriver chrome on headless option [duplicate]

I'm do me code in Cromedrive in 'normal' mode and works fine. When I change to headless mode it don't download the file. I already try the code I found alround internet, but didn't work.
chrome_options = Options()
chrome_options.add_argument("--headless")
self.driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'{}/chromedriver'.format(os.getcwd()))
self.driver.set_window_size(1024, 768)
self.driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': os.getcwd()}}
self.driver.execute("send_command", params)
Anyone have any idea about how solve this problem?
PS: I don't need to use Chomedrive necessarily. If it works in another drive it's fine for me.
First the solution
Minimum Prerequisites:
Selenium client version: Selenium v3.141.59
Chrome version: Chrome v77.0
ChromeDriver version: ChromeDriver v77.0
To download the file clicking on the element with text as Download Data within this website you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe', service_args=["--log-path=./Logs/DubiousDan.log"])
print ("Headless Chrome Initialized")
params = {'behavior': 'allow', 'downloadPath': r'C:\Users\Debanjan.B\Downloads'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
driver.get("https://www.mockaroo.com/")
driver.execute_script("scroll(0, 250)");
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#download"))).click()
print ("Download button clicked")
#driver.quit()
Console Output:
Headless Chrome Initialized
Download button clicked
File Downloading snapshot:
Details
Downloading files through Headless Chromium was one of the most sought functionality since Headless Chrome was introduced.
Since then there were different work-arounds published by different contributors and some of them are:
Downloading with chrome headless and selenium
Python equivalent of a given wget command
Now the, the good news is Chromium team have officially announced the arrival of the functionality Downloading file through Headless Chromium.
In the discussion Headless mode doesn't save file downloads #eseckler mentioned:
Downloads in headless work a little differently. There's the Page.setDownloadBehavior devtools command to set a download folder. We're working on a way to use DevTools network interception to stream the downloaded file via DevTools as well.
A detailed discussion can be found at Issue 696481: Headless mode doesn't save file downloads
Finally, #bugdroid revision seems to have nailed the issue for us.
[ChromeDriver] Added support for headless mode to download files
Previously, Chromedriver running in headless mode would not properly download files due to the fact it sparsely parses the preference file given to it. Engineers from the headless chrome team recommended using DevTools's "Page.setDownloadBehavior" to fix this. This changelist implements this fix. Downloaded files default to the current directory and can be set using download_dir when instantiating a chromedriver instance. Also added tests to ensure proper download functionality.
Here is the revision and commit
From ChromeDriver v77.0.3865.40 (2019-08-20) release notes:
Resolved issue 2454: Headless mode doesn't save file downloads [Pri-2]
Solution
Update ChromeDriver to latest ChromeDriver v77.0 level.
Update Chrome to Chrome Version 77.0 level. (as per ChromeDriver v76.0 release notes)
Note: Chrome v77.0 is yet to be GAed/pushed for release so till then you can download and install a development build and test either from:
Chrome Canary
Latest build from the Dev Channel
Outro
However Mac OSX users have a wait for their pie as On Chromedriver, headless chrome crashes after sending Page.setDownloadBehavior on MacOSX.
Chomedriver Version: 95.0.4638.54
Chrome Version 95.0.4638.69
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--start-maximized")
options.add_argument("--no-sandbox")
options.add_argument("--disable-extensions")
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--disable-gpu")
options.add_argument('--disable-software-rasterizer')
options.add_argument("user-agent=Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166")
options.add_argument("--disable-notifications")
options.add_experimental_option("prefs", {
"download.default_directory": "C:\\link\\to\\folder",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing_for_trusted_sources_enabled": False,
"safebrowsing.enabled": False
}
)
What seemed to work was that I used "\\" instead of "/" for the address. The latter approach didn't throw any error, but didn't download any documents either. But, using double back slashes did the job.
For javascript use below code:
const chrome = require('selenium-webdriver/chrome');
let options = new chrome.Options();
options.addArguments('--headless --window-size=1500,1200');
options.setUserPreferences({ 'plugins.always_open_pdf_externally': true,
"profile.default_content_settings.popups": 0,
"download.default_directory": Download_File_Path });
driver = await new webdriver.Builder().setChromeOptions(options).forBrowser('chrome').build();
Then switch tabs as soon as you click the download button:
await driver.sleep(1000);
var Handle = await driver.getAllWindowHandles();
await driver.switchTo().window(Handle[1]);
This C# works for me
Note the new headless option https://www.selenium.dev/blog/2023/headless-is-going-away/
private IWebDriver StartBrowserChromeHeadlessDriver()
{
var chromeOptions = new ChromeOptions();
chromeOptions.AddArgument("--headless=new");
chromeOptions.AddArgument("--window-size=1920,1080");
chromeOptions.AddUserProfilePreference("download.default_directory", downloadFolder);
var chromeDownload = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadFolder }
};
var driver = new ChromeDriver(driverFolder, chromeOptions, TimeSpan.FromSeconds(timeoutSecs));
driver.ExecuteCdpCommand("Browser.setDownloadBehavior", chromeDownload);
return driver;
}
import pathlib
from selenium.webdriver import Chrome
driver = Chrome()
driver.execute_cdp_cmd("Page.setDownloadBehavior", {
"behavior": "allow",
"downloadPath": str(pathlib.Path.home().joinpath("Downloads"))
})
I don't think you should be using the browser for downloading content, leave it to Chrome developers/testers.
I believe you should rather get href attribute of the element you want to download and obtain it using requests library
If your site requires authentication you could fetch cookies from the browser instance and pass them to requests.Session.

How to prevent selenium window from becoming an active one on session start? [duplicate]

I have a Selenium test suite that runs many tests and on each new test it opens a browser window on top of any other windows I have open. Very jarring while working in a local environment. Is there a way to tell Selenium or the OS (Mac) to open the windows in the background?
If you are using Selenium web driver with Python, you can use PyVirtualDisplay, a Python wrapper for Xvfb and Xephyr.
PyVirtualDisplay needs Xvfb as a dependency. On Ubuntu, first install Xvfb:
sudo apt-get install xvfb
Then install PyVirtualDisplay from PyPI:
pip install pyvirtualdisplay
Sample Selenium script in Python in a headless mode with PyVirtualDisplay:
#!/usr/bin/env python
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(800, 600))
display.start()
# Now Firefox will run in a virtual display.
# You will not see the browser.
browser = webdriver.Firefox()
browser.get('http://www.google.com')
print browser.title
browser.quit()
display.stop()
EDIT
The initial answer was posted in 2014 and now we are at the cusp of 2018. Like everything else, browsers have also advanced. Chrome has a completely headless version now which eliminates the need to use any third-party libraries to hide the UI window. Sample code is as follows:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
CHROME_PATH = '/usr/bin/google-chrome'
CHROMEDRIVER_PATH = '/usr/bin/chromedriver'
WINDOW_SIZE = "1920,1080"
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.binary_location = CHROME_PATH
driver = webdriver.Chrome(executable_path=CHROMEDRIVER_PATH,
chrome_options=chrome_options
)
driver.get("https://www.google.com")
driver.get_screenshot_as_file("capture.png")
driver.close()
There are a few ways, but it isn't a simple "set a configuration value". Unless you invest in a headless browser, which doesn't suit everyone's requirements, it is a little bit of a hack:
How to hide Firefox window (Selenium WebDriver)?
and
Is it possible to hide the browser in Selenium RC?
You can 'supposedly', pass in some parameters into Chrome, specifically: --no-startup-window
Note that for some browsers, especially Internet Explorer, it will hurt your tests to not have it run in focus.
You can also hack about a bit with AutoIt, to hide the window once it's opened.
Chrome 57 has an option to pass the --headless flag, which makes the window invisible.
This flag is different from the --no-startup-window as the last doesn't launch a window. It is used for hosting background apps, as this page says.
Java code to pass the flag to Selenium webdriver (ChromeDriver):
ChromeOptions options = new ChromeOptions();
options.addArguments("--headless");
ChromeDriver chromeDriver = new ChromeDriver(options);
For running without any browser, you can run it in headless mode.
I show you one example in Python that is working for me right now
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("headless")
self.driver = webdriver.Chrome(executable_path='/Users/${userName}/Drivers/chromedriver', chrome_options=options)
I also add you a bit more of info about this in the official Google website https://developers.google.com/web/updates/2017/04/headless-chrome
I used this code for Firefox in Windows and got answer(reference here):
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
Options = Options()
Options.headless = True
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get(...)
...
But I didn't test it for other browsers.
Since Chrome 57 you have the headless argument:
var options = new ChromeOptions();
options.AddArguments("headless");
using (IWebDriver driver = new ChromeDriver(options))
{
// The rest of your tests
}
The headless mode of Chrome performs 30.97% better than the UI version. The other headless driver PhantomJS delivers 34.92% better than the Chrome's headless mode.
PhantomJSDriver
using (IWebDriver driver = new PhantomJSDriver())
{
// The rest of your test
}
The headless mode of Mozilla Firefox performs 3.68% better than the UI version. This is a disappointment since the Chrome's headless mode achieves > 30% better time than the UI one. The other headless driver PhantomJS delivers 34.92% better than the Chrome's headless mode. Surprisingly for me, the Edge browser beats all of them.
var options = new FirefoxOptions();
options.AddArguments("--headless");
{
// The rest of your test
}
This is available from Firefox 57+
The headless mode of Mozilla Firefox performs 3.68% better than the UI version. This is a disappointment since the Chrome's headless mode achieves > 30% better time than the UI one. The other headless driver PhantomJS delivers 34.92% better than the Chrome's headless mode. Surprisingly for me, the Edge browser beats all of them.
Note: PhantomJS is not maintained any more!
On Windows you can use win32gui:
import win32gui
import win32con
import subprocess
class HideFox:
def __init__(self, exe='firefox.exe'):
self.exe = exe
self.get_hwnd()
def get_hwnd(self):
win_name = get_win_name(self.exe)
self.hwnd = win32gui.FindWindow(0,win_name)
def hide(self):
win32gui.ShowWindow(self.hwnd, win32con.SW_MINIMIZE)
win32gui.ShowWindow(self.hwnd, win32con.SW_HIDE)
def show(self):
win32gui.ShowWindow(self.hwnd, win32con.SW_SHOW)
win32gui.ShowWindow(self.hwnd, win32con.SW_MAXIMIZE)
def get_win_name(exe):
''' Simple function that gets the window name of the process with the given name'''
info = subprocess.STARTUPINFO()
info.dwFlags |= subprocess.STARTF_USESHOWWINDOW
raw = subprocess.check_output('tasklist /v /fo csv', startupinfo=info).split('\n')[1:-1]
for proc in raw:
try:
proc = eval('[' + proc + ']')
if proc[0] == exe:
return proc[8]
except:
pass
raise ValueError('Could not find a process with name ' + exe)
Example:
hider = HideFox('firefox.exe') # Can be anything, e.q., phantomjs.exe, notepad.exe, etc.
# To hide the window
hider.hide()
# To show again
hider.show()
However, there is one problem with this solution - using send_keys method makes the window show up. You can deal with it by using JavaScript which does not show a window:
def send_keys_without_opening_window(id_of_the_element, keys)
YourWebdriver.execute_script("document.getElementById('" + id_of_the_element + "').value = '" + keys + "';")
I suggest using PhantomJS. For more information, you may visit the Phantom Official Website.
As far as I know PhantomJS works only with Firefox...
After downloading PhantomJs.exe you need to import it to your project as you can see in the picture below PhantomJS.
I have placed mine inside: common → Library → phantomjs.exe
Now all you have to do inside your Selenium code is to change the line
browser = webdriver.Firefox()
To something like
import os
path2phantom = os.getcwd() + "\common\Library\phantomjs.exe"
browser = webdriver.PhantomJS(path2phantom)
The path to PhantomJS may be different... change as you like :)
This hack worked for me, and I'm pretty sure it will work for u too ;)
It may be in options. Here is the identical Java code.
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.setHeadless(true);
WebDriver driver = new ChromeDriver(chromeOptions);
This is a simple Node.js solution that works in the new version 4.x (maybe also 3.x) of Selenium.
Chrome
const { Builder } = require('selenium-webdriver')
const chrome = require('selenium-webdriver/chrome');
let driver = await new Builder().forBrowser('chrome').setChromeOptions(new chrome.Options().headless()).build()
await driver.get('https://example.com')
Firefox
const { Builder } = require('selenium-webdriver')
const firefox = require('selenium-webdriver/firefox');
let driver = await new Builder().forBrowser('firefox').setFirefoxOptions(new firefox.Options().headless()).build()
await driver.get('https://example.com')
The whole thing just runs in the background. It is exactly what we want.
If you are using the Google Chrome driver, you can use this very simple code (it worked for me):
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome('chromedriver2_win32/chromedriver.exe', options=chrome_options)
driver.get('https://www.anywebsite.com')
On *nix, you can also run a headless X Window server like Xvfb and point the DISPLAY environment variable to it:
Fake X server for testing?
One way to achieve this is by running the browser in headless mode. Another advantage of this is that tests are executed faster.
Please find the code below to set headless mode in the Chrome browser.
package chrome;
public class HeadlessTesting {
public static void main(String[] args) throws IOException {
System.setProperty("webdriver.chrome.driver",
"ChromeDriverPath");
ChromeOptions options = new ChromeOptions();
options.addArguments("headless");
options.addArguments("window-size=1200x600");
WebDriver driver = new ChromeDriver(options);
driver.get("https://contentstack.built.io");
driver.get("https://www.google.co.in/");
System.out.println("title is: " + driver.getTitle());
File scrFile = ((TakesScreenshot) driver)
.getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(scrFile, new File("pathTOSaveFile"));
driver.quit();
}
}
If you are using Ubuntu (Gnome), one simple workaround is to install Gnome extension auto-move-window: https://extensions.gnome.org/extension/16/auto-move-windows/
Then set the browser (eg. Chrome) to another workspace (eg. Workspace 2). The browser will silently run in other workspace and not bother you anymore. You can still use Chrome in your workspace without any interruption.
Here is a .NET solution that worked for me:
Download PhantomJS at http://phantomjs.org/download.html.
Copy the .exe file from the bin folder in the download folder and paste it to the bin debug/release folder of your Visual Studio project.
Add this using
using OpenQA.Selenium.PhantomJS;
In your code, open the driver like this:
PhantomJSDriver driver = new PhantomJSDriver();
using (driver)
{
driver.Navigate().GoToUrl("http://testing-ground.scraping.pro/login");
// Your code here
}
I had the same problem with my chromedriver using Python and options.add_argument("headless") did not work for me, but then I realized how to fix it so I bring it in the code below:
opt = webdriver.ChromeOptions()
opt.arguments.append("headless")
Just add a simple "headless" option argument.
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome("PATH_TO_DRIVER", options=options)
Use it ...
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(CHROMEDRIVER_PATH, chrome_options=options)

How to use chrome webdriver in selenium to download files in python?

Based on the posts here and here I am trying to use a chrome webdriver in selenium to be able to download a file. Here is the code so far
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_experimental_option("profile.default_content_settings.popups", 0)
chrome_options.add_experimental_option("download.prompt_for_download", "false")
chrome_options.add_experimental_option("download.default_directory", "/tmp")
driver = webdriver.Chrome(chrome_options=chrome_options)
But this alone results in the following error:
WebDriverException: Message: unknown error: cannot parse capability: chromeOptions
from unknown error: unrecognized chrome option: download.default_directory
(Driver info: chromedriver=2.24.417424 (c5c5ea873213ee72e3d0929b47482681555340c3),platform=Linux 4.10.0-37-generic x86_64)
So how to fix this? Do I have to use this 'capability' thing? If so, how exactly?
Try this. Executed on windows
(How to control the download of files with Selenium Python bindings in Chrome)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_experimental_option("prefs", {
"download.default_directory": r"C:\Users\xxx\downloads\Test",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
I think that the easiest way to save arbitrary file (i.e. image) using WebDriver is to execute JavaScript which will save file. No configuration required at all!
I use this library FileSaver.js to save a file with desired name with ease.
from selenium import webdriver
import requests
FILE_SAVER_MIN_JS_URL = "https://raw.githubusercontent.com/eligrey/FileSaver.js/master/dist/FileSaver.min.js"
file_saver_min_js = requests.get(FILE_SAVER_MIN_JS_URL).content
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
# Execute FileSaver.js in page's context
driver.execute_script(file_saver_min_js)
# Now you can use saveAs() function
download_script = f'''
return fetch('https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/so/so-logo.svg?v=a010291124bf',
{{
"credentials": "same-origin",
"headers": {{"accept":"image/webp,image/apng,image/*,*/*;q=0.8","accept-language":"en-US,en;q=0.9"}},
"referrerPolicy": "no-referrer-when-downgrade",
"body": null,
"method": "GET",
"mode": "cors"
}}
).then(resp => {{
return resp.blob();
}}).then(blob => {{
saveAs(blob, 'stackoverflow_logo.svg');
}});
'''
driver.execute_script(download_script)
# Done! Your browser has saved an SVG image!
Some tips:
chromium and chromedriver should have same version.
Typically chromium package should have chromedriver inside, you can find it in the install dir. If you are using ubuntu/debian, execute dpkg -L chromium-chromedriver.
Have a correct Chrome preference config.
as Satish said, use options.add_experimental_option("prefs", ...) to config selenium+chrome. But sometimes the config may change by time. The beset way to get newest and workable prefs is to check it in the chromium config dir.
For example,
Launch a chromium in Xorg desktop
Change settings in menu
Quit chromium
Find out the real settings in ~/.config/chromium/Default/Preferences
Read it, pick out the exact options you need.
In my case, the code is:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
options = webdriver.ChromeOptions()
options.gpu = False
options.headless = True
options.add_experimental_option("prefs", {
"download.default_directory" : "/data/books/chrome/",
'profile.default_content_setting_values.automatic_downloads': 2,
})
desired = options.to_capabilities()
desired['loggingPrefs'] = { 'performance': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=desired)
for chrome in mac os, the download.defaultdirectory did not work for me and fortunately savefile.default_directory works.
prefs = {
"printing.print_preview_sticky_settings.appState": json.dumps(settings),
"savefile.default_directory": "/Users/creative/python-apps",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"download.safebrowsing.enabled": True
}
One of the reasons you can't set "download.default_directory" may be that you have a system variable XDG_DOWNLOAD_DIR in file ~/.config/user-dirs.dirs
You can remove variable form that file or, you can set it to whatever you like before running your program.
I was looking for a solution for two days...
My SW set:
Ubuntu bionic, 18.04.5 LTS
chromedriver.86.0.4240.22.lin64
Python 3.9
selenium 3.141.0
splinter 0.14.0
From your exception, you are using chromedriver=2.24.417424.
What are the versions of Selenium and Chrome browser that are you using?
I tried the following code with:
Selenium 3.6.0
chromedriver 2.33
Google Chrome 62.0.3202.62 (Official Build) (64-bit)
And it works:
from selenium import webdriver
download_dir = "/pathToDownloadDir"
chrome_options = webdriver.ChromeOptions()
preferences = {"download.default_directory": download_dir ,
"directory_upgrade": True,
"safebrowsing.enabled": True }
chrome_options.add_experimental_option("prefs", preferences)
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=r'/pathTo/chromedriver')
driver.get("urlFileToDownload");
Make sure you are using a browser that is supported by your chromedriver (from here, should be Chrome v52-54).
Can't you use requests lib?
If so, here's an example:
import re
import requests
urls = [ '...' ]
for url in urls:
# verify = False ==> for HTTPS requests without SSL certificates
r = requests.get( url, allow_redirects = True, verify = False )
cd = r.headers.get( 'content-disposition' )
fa = re.findall( 'filename=(.+)', cd )
if len( fa ) == 0:
print( f'Error message: {link}' )
continue
filename = fa[ 0 ]
f = open( os.path.join( 'desired_path', filename ), 'wb' )
f.write( r.content )
f.close()

Categories

Resources