Automating PDF download with Python Selenium Chromedriver - python

I am using Chrome/Chromedriver version: 93.0.4577 and Selenium version: 3.141.0 on Win 10. I'm trying to get Selenium to download PDF's without opening the file prompt.
I'm working with a work site unfortunately so I'm unable to share it but I've tried the libraries: urllib, request, as well as adjusting the chrome Options (which I'd like to use):
chrome_profile = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled": False,
"name": "Chrome PDF Viewer"}],
"download.default_directory": "C:\\PDFDownload\\PDFs",
"download.prompt_for_download": False,
"download.directory_upgrade": True}
chrome_profile.add_experimental_option("prefs", profile)
driver = webdriver.Chrome(executable_path="C:\\Selenium\\chromedriver.exe",
chrome_options=chrome_profile)
I've also tried switching the chrome PDF site settings from 'Open PDFs in Chrome' to 'Download PDFs' when visiting them. However, everything still prompts me to save the file manually.
I feel like the chrome Options should be the winning ticket but it doesn't seem to have worked for me. Is there something I need to change/add in order to get past the pop up prompt?
I appreciate any advice, thank you!

You can try the following:
First get the 'href' or download link of the PDF you're looking for. There must be some button or link which you might want to catch.
use self.driver.get(the_link_you_extracted_in_step_1) and the PDF should be automatically downloaded

Related

Chrome Options Doesn't Apply Upon Loading The Page In Selenium

I'm trying to scrape an Amazon French page using Selenium. I want this page to be translated from French to English upon loading. I have attempted to do that using following code:
myoptions = webdriver.ChromeOptions()
prefs = {
"translate_whitelists": {"fr":"en"},
"translate": {"enabled":"true"}
}
myoptions.add_experimental_option("prefs", prefs)
path = r'C:\chromedriver.exe'
browser = webdriver.Chrome(executable_path=path, options=myoptions)
browser.get("https://www.amazon.fr/dp/0001002791")
However, when the page loads, it still shows up in French, as you can see in the image below:
Now, if I navigate to any other link from this webpage, the option to translate the webpage works and the icon shows in the search bar. Plus, the message of translation also pop ups as shown below:
From here on, all of the webpages gets translated, even the initial one, as you can see:
Why didn't it work earlier when the page loaded? How do I fix this?
As described in this post setting the language option should fix this:
myoptions.add_argument("--lang=en")

Selenium Python Edge disable `open office files in browser` setting

Python: 3.9.9
Selenium: 4.1.5
Edge: 101.0.1210.39 (X64) driver link
I am trying to automate downloading excel file from a website, but due to Edge's default setting of open office files in browser set to True, on pressing download button with selenium it redirects to Edge file viewer instead of downloading it.
Since I want to automate the process I don't want to manually go to settings and disable it every time.
Any work arounds will be appreciated too...
Thank you!
This is what worked for me:
from pathlib import Path
from selenium import webdriver
if Path('..\msedgedriver.exe').exists():
driver = webdriver.Edge('..\msedgedriver.exe')
# Settings
driver.get('edge://settings/downloads')
toggle = driver.execute_script('''
return document.querySelector(' input[aria-label="Open Office files in
the browser"]');
''')
toggle.click()
# continue...
similarly you can change any setting as per required.

Python Selenium Beautifulsoup pdf

I'm trying to make a program that downloads pdfs after I've done a search. The web is an aspx and with selenium I can enter the information in the fields correctly:
input_user=driver.find_element(By.XPATH, '//*[#name="ctl00$SPWebPartManager1$g_93bc4c3a_0f69_4097_bed1_978c8b545335$freetext"]')
textolibre="jamaica"
input_user.send_keys(textolibre)
but the page that returns the results is the same (aspx) and I can't download the pdfs.
I would like to be able to enter the fields in the form without selenium opening a browser. I tried with PhantomJS but it says:
UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
And when you return the results, I can download them, considering that the url doesn't change.
While using PhantomJS just add 2 extra lines to your code
import warnings
warnings.filterwarnings("ignore")
Or else if you are using gekodriver for the firefox browser you can use this in your code
opts = webdriver.FirefoxOptions()
opts.headless = True
driver = webdriver.Firefox(options=opts, executable_path='Your gekdriver exe path')
If you could share a more detailed complete code i could help you more effeciently

Selenium Firefox profile preferences download dialogue box

I am trying to automatically log into several websites and download reports.
In my profile preferences, I have set the following:
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", folder1)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("browser.helperApps.neverAsk.openFile","text/plain,text/x-csv,text/csv,application/vnd.ms-excel,application/csv,application/x-csv,text/csv,text/comma-separated-values,text/x-comma-separated-values,text/tab-separated-values,application/pdf,text/html")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/plain,text/x-csv,text/csv,application/vnd.ms-excel,application/csv,application/x-csv,text/csv,text/comma-separated-values,text/x-comma-separated-values,text/tab-separated-values,application/pdf,text/html")
For all the websites, except for 1, the files download without the dialog box showing.
However, one of them always shows the dialog box.
I am thinking it is due to the filename being called "Download.CSV", with the csv file extension in capitals...but I'm not convinced.
All the other files that successfully downloaded from the other websites were in csv file with a lower case csv file extension. This is the only difference I can think of.
Am I missing something?
For anyone who will come across this issue, I solved it.
Although the file extension was "CSV", the MIME type from the server was listed as "application/octet-stream".
Adding this to my Firefox profile preferences fixed the issue.

Selenium in Python to download file: even after setting Firefox Profile the Download Window opens

I am trying to use Selenium in Python to download a file from a website. In order to do that, I have read that I need to change the settings in my Firefox Profile to avoid opening the download dialogue window. I provided sample code below. This code works absolutely great at home, but it does not function properly with my work PC. I am suspecting that somehow Python can not change the settings of the firefox profile, even though the code below does not throw an error but rather works fine and in the end opens the download dialogue window.
from selenium import webdriver
import os
profile = webdriver.FirefoxProfile("C:\\Users\\Ric\\Documents\\Python Scripts\\FirefoxProfileCopies\\ric.copy")
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', os.getcwd())
profile.set_preference('browser.helperApps.neverAsk.saveToDisk',('application/vnd.ms-excel'))
browser = webdriver.Firefox(profile)
browser.get("http://www.sample-videos.com/download-sample-xls.php")
elem1 = browser.find_element_by_css_selector(".push-form > table:nth-child(2) > tbody:nth-child(2) > tr:nth-child(4) > td:nth-child(4) > a:nth-child(1)")
elem1.click()
This code works perfectly with my Firefox and its profile at home, but not with my computer at work. Does anybody know why this might be? Thank you in advance.
EDIT
I tried to add all the MIMEtypes from the Microsoft webpage, but still, the download manager window opens. When stopping the code to execute before opening the download link and trying to look at the settings for the used firefox profile with about:configthe following values are displayed:
So, after a lot of trying, I figured to look at the firefox settings in Firefox again, since it worked with an empty profile. I managed to resolve my issue and finally have the download window disappear by going to firefox, settings and changing the settings for applications:
Then, when opening this menu, search for excel and change the values from "asking every time" to "save file/download file". Sorry if these entries in the list differ from the actual ones in firefox but my Firefox is in German. After doing this, my issue was resolved. I hope it resolves somebody else :) and thanks to anderson.

Categories

Resources