Define download directory for chromedriver selenium with python - python

Everything is in the title!
Is there a way to define the download directory for selenium-chromedriver used with python?
In spite of many research, I haven't found something conclusive...
As a newbie, I've seen many things about "the desired_capabilities" or "the options" for Chromedriver but nothing has resolved my problem... (and I still don't know if it will!)
To explain a little bit more my issue:
I have a lot of url to scan (200 000) and for each url a file to download.
I have to create a table with the url, the information i scrapped on it, AND the name of the file I've just downloaded for each webpage.
With the volume I have to treat, I've created threads that open multiple instance of chromedriver to speed up the treatment.
The problem is that every downloaded file arrives in the same default directory and I'm no more able to link a file to an url...
So, the idea is to create a download directory for every thread to manage them one by one.
If someone have the answer to my question in the title OR a workaround to identify the file downloaded and link it with the current url, I will be grateful!

For chromedriver1 create a new profile, and inside that profile set download.default_directory to the desired location, and set this profile for chrome using chrome.profile. The selenium-chromedriver package should have some methods for creating new profiles (at least it does with ruby), as they need some special handling.
Chromedriver2 doesn't support setting the profile. You can set preferences with it. If you want to set the download directory this is how you do it:
prefs: { download: { default_directory: "/tmp" } }
The ruby selenium-webdriver doesn't support this feature yet, the python variant might do however.

I have faced recently the same issue. Tried a lot of solutions found in the Internet, no one helped. So finally I came to this:
Launch chrome with empty user-data-dir (in /tmp folder) to let chrome initialize it
Quit chrome
Modify Default/Preferences in newly created user-data-dir, add those fields to the root object (just an example):
"download": {
"default_directory": "/tmp/tmpX7EADC.downloads",
"directory_upgrade": true
}
Launch chrome again with the same user-data-dir
Now it works just fine.
Another tip: If you don't know file name of file that is going to be downloaded, create snapshot (list of files) of downloads directory, then download the file and find its name by comparin snapshot and current list of files in the downloads directory.

Please try the below code....
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver");
String downloadFilepath = "/path/to/download";
HashMap<String, Object> chromePrefs = new HashMap<String, Object>();
chromePrefs.put("profile.default_content_settings.popups", 0);
chromePrefs.put("download.default_directory", downloadFilepath);
ChromeOptions options = new ChromeOptions();
HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>();
options.setExperimentalOptions("prefs", chromePrefs);
options.addArguments("--test-type");
DesiredCapabilities cap = DesiredCapabilities.chrome();
cap.setCapability(ChromeOptions.CAPABILITY, chromeOptionsMap);
cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
cap.setCapability(ChromeOptions.CAPABILITY, options);
WebDriver driver = new ChromeDriver(cap);

Related

Changing The Download Path After Opening Selenium Firefox Browser

I have a collection of files that i download with a loop. I want to folder these files separately. Before the browser opens, I can change the default download path with the options parameter. However, I want to folder the files I will download separately after the browser is opened.
Can you please help?
`
optns = Options()
optns.set_preference("browser.download.folderList", 2)
optns.set_preference("browser.download.manager.showWhenStarting", False)
optns.set_preference("browser.download.dir", "/Users/emrevolkanucar/Work/Ekap Py/Files")
optns.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/x-gzip")
browser = webdriver.Firefox(options=optns)
???
optns.set_preference("browser.download.dir", "/Users/emrevolkanucar/Work/Ekap Py/Files")
`
You can not do that.
Once driver object instance is created by browser = webdriver.Firefox(options=optns) you can no more change it settings.
The only ways you can do here are:
re-define optns.set_preference("browser.download.dir", "/Users/emrevolkanucar/Work/Ekap Py/Files") and then create a new driver instance. I'm not sure you really want to do that.
After downloading files to defined (or default) downloading folder you can programmatically move those file to separate target folders. This approach seems better.

Scrape website data without BS or selenium (Python)

Basically my scenario is that I have the webpage open and want to copy some of the text from the website that is open on my screen ( there is a whole login process every time ) . For Security reasons, I do not want to have to continuously login to the webpage and for that reason, requests are not suitable. I also do not want to use selenium as it will open up a new browser when I wish to use my existing one. My question is with my browser already open on the page I want info from, is there some sort of script I can make that will retrieve certain information on the page for me and save it somewhere (almost like a macro but it's able to retrieve certain elements) . Is this a possibility?
I'm not sure if I understood the question correctly.
One way might be to download the entire .html and process the respective data "locally" after downloading the .html.
If you use "request", like with postman, you don't need to log in each time. If you have a valid JWT token you will skip login. But that depend how your stuff work (lack of details in your question).
I don't know about selenium, but with puppeteer (a concurrent), you can re-use an already installed browser instead of downloading a new one.
Also... do you even need selenium or puppeteer ? Can't you just run some code into your console (browser console) ? You can create and save snippets in source tab in chrome. If you need access to your file system directly (meaning the data you collect being downloaded automatically in download folder, or having download pop-up to choose folder, is not enough for you), you may give a look at TamperMonkey extension. Or maybe you need to make a chrome extension.
Update after reading your comment #JeanVanNiekerk:
// to get user name of the one asking.
console.log(
document.querySelector('#question .user-details a').innerText
); // 'Jean Van Niekerk'
navigator.clipboard.writeText('stuff').then(
e => {
console.log('Copied text ready !');
}
);
// If you write that above in the console, you
// will get `Uncaught (in promise) DOMException: Document is not focused.`
// This is a security (maybe it can be disabled for your special case, another
// option is to make an extension that has this kind of rights).
// To try it out right now, paste this code bellow into you console, and swiffly click on the page (anywhere)
setTimeout(() => {
navigator.clipboard.writeText('stuff').then(
e => {
console.log('Copied text ready !');
}
);
}, 1000);
// Ctrl+V to paste your text :)

Problem with browser.helperApps.neverAsk.saveToDisk in selenium

I want to automatically download a .ics file (a calendar file) from a website using selenium with python. The goal is to disable the pop-up window that firefox open when you download a file. To do this, I use the following code :
#I set my preferences
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)#not using default folder for downloading
profile.set_preference("browser.download.manager.showWhenStarting", False)#dont show downloading process
profile.set_preference("browser.download.dir", 'C:/Users/UserName/Documents/rpi/some folder')#set the directory for download
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/calendar')#tell it to automaticaly download a file
#using the profile to access firefox
browser = webdriver.Firefox(executable_path='geckodriver',firefox_profile=profile)
I first thought that it was a MIME type problem in the line browser.helperApps.neverAsk.saveToDisk, but after changing the MIME type it's still not working.
From here I have no idea of what is going wrong because my MIME type seems right according to all internet resources I have been able to find.
Perhaps it is a problem of settings or something I haven't noticed...
Anyway thank you for reading this, ask me if you need a bit more code.
Hello from what I understand, downloading the file prompts a alert box asking if you want to download it, have you tried,
browser.switchtoalert.accept?
Sorry i write in VBA, but you should get the idea,
Thanks
So I did find a solution by avoiding the problem : i created a new firefox profile > launched firefox with this profile (all of this is done in about:profiles) > went to the web site from wich I wanted to download the file > downloaded the file and checked the "always do this with this type of file" box > launch my programm with this custom profile using the line :
profile= webdriver.FirefoxProfile("C:/Users/user/AppData/Roaming/Mozilla/Firefox/Profiles/bqpa3bzv.nameofprofile")
This seems to work fine and might be an efficient way of doing this.
Well done for figuring it out, I know the
.switchtoalert.accept works in chromedriver
Let me know if you want any help with anything 👍👍

How to change geolocation of chrome selenium driver in Python?

I am trying to trick the chromedriver to make it believe that it is running in a different city. Under normal circumstances, this can easily be done manually as shown in a quick diagram
Then, when a google search is done, the new coordinates are used, and the results that would normally originate from that location are displayed. You can confirm that this worked when you look at the bottom of a Google search page as seen
.
However, Selenium can only control what the browser displays, not the browser in itself. I cannot tell Selenium to automatically click the buttons needed to change the coordinates. I tried the solutions posted here but that is not meant for Python, and even after I tried to adapt the script, nothing seemed to happen.
Is there a browser.execute_script() argument that could work, or is this the wrong way to change the geolocation?
You can do this by importing Selenium DevTools package. Please refer below for complete java code sample:
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.devtools.DevTools;
public void geoLocationTest(){
ChromeDriver driver = new ChromeDriver();
Map coordinates = new HashMap()
{{
put("latitude", 50.2334);
put("longitude", 0.2334);
put("accuracy", 1);
}};
driver.executeCdpCommand("Emulation.setGeolocationOverride", coordinates);
driver.get("<your site url>");
}
Reference : Selenium Documentation
Try this code below :
driver.execute_script("window.navigator.geolocation.getCurrentPosition=function(success){"+
"var position = {\"coords\" : {\"latitude\": \"555\",\"longitude\": \"999\"}};"+
"success(position);}");
print(driver.execute_script("var positionStr=\"\";"+
"window.navigator.geolocation.getCurrentPosition(function(pos){positionStr=pos.coords.latitude+\":\"+pos.coords.longitude});"+
"return positionStr;"))
In search for a solution to the same problem I also came across the already postet scripts. While I am yet to find a solution, I assume that the scripts don't work because they do not change the sensors permanently. The sensors are only changed for that one specific call of window.navigator.geolocation.getCurrentPosition.
The website (in this case google) will later call the same function but with the regular (unchanged) geolocation. Happy to hear solutions to permanently change the sensors to then also affect future geolocation requests.
This Can be Done Using Selenium 4.
HashMap<String ,Object> coordinate = new HashMap<String ,Object>();
coordinate.put("latitude", 39.913818);
coordinate.put("longitude", 116.363625);
coordinate.put("accuracy", 1);
((ChromeDriver)driver).executeCdpCommand("Emulation.setGeolocationOverride",coordinate);
driver.navigate().to("URL");

Disable firefox save as dialog-selenium

I am web scraping with selenium and whenever i try to download i file the firefox download/save as file pops up however, even If i apply profile.set_preference('browser.helperApps.neverAsk.saveToDisk', "application/csv"), it still doesnt work, I have tried everyt .csv related MIME but doesn't work, is it possible to either click save as radio button and then click ok on the dialog or disable it entirely.
you should do two things, first set these three preferences as follows (this is in Java but I guess you manage to translate that to python :-):
profile.setPreference("browser.download.dir", "c:/yourDownloadDir");
profile.setPreference("browser.download.folderList", 2);
profile.setPreference("browser.helperApps.neverAsk.saveToDisk", "application/csv, text/csv");
secondly, you should make sure the download file has the desired mime type. To do that, you can use the web developer tools and inspect the download.
EDIT:
To find out the MIME type open Chrome, press Ctrl+Shift+I (Cmd+Alt+I on Mac OS) change to the 'Network' tab and click your download link. You should see something like this:
Just an additional answer that might help someone, as comments to the accepted answer put me on the right track (thanks!). Another MIME type of CSV you might be dealing with is application/x-csv - that was my case and once I looked it up in the Network tab of the browser, I became a happier man :)
In C#
FirefoxOptions options = new FirefoxOptions();
options.SetPreference("browser.download.folderList", 2);
options.SetPreference("browser.download.manager.showWhenStarting", false);
options.SetPreference("browser.download.dir", "c:\\temp");
options.SetPreference("browser.helperApps.neverAsk.saveToDisk", "text/csv");

Categories

Resources