Selenium pdf automatic download not working - python

I am new to selenium and I am writing a scraper to download pdf files automatically from a given site.
Below is my code:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2);
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
webobj = browser.find_element_by_id("download").click();
I followed the steps mentioned in Selenium documentation and in the this link. I am not sure why download dialog box is getting shown every time.
Is there anyway to fix it else can there be a way to give "application/all" so that all the files can be downloaded (work-around)?

Disable the built-in pdfjs plugin and navigate to the URL - the PDF file would be downloaded automatically, the code:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf,application/x-pdf")
fp.set_preference("pdfjs.disabled", "true") # < KEY PART HERE
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
Update (the complete code that worked for me):
from selenium import webdriver
mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", "/home/aafanasiev/Downloads")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
fp.set_preference("pdfjs.disabled", True)
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/")
webobj_get_link = browser.find_element_by_id("liSavePdf")
webobj_get_object = webobj_get_link.find_element_by_tag_name("a")
webobj_get_object.click()

I tested the following code and I succesfully downloaded your pdf on Windows 7:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", download_location)
fp.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
fp.set_preference("pdfjs.disabled", True)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
driver = webdriver.Firefox(fp)
driver.implicitly_wait(10)
driver.maximize_window()
driver.get("http://epaper.dinamalar.com/")
element = driver.find_element_by_css_selector("li#liSavePdf>a>img")
element.click()

Since there is not HTML code available, my guess is that this line
webobj = browser.find_element_by_id("download").click();
actually calls the onclick event, but you don't handle it properly. In other words, what you're missing is the location where this .pdf file will be stored. I have very little experience with python programming, but one solution could be to use HTTP webclient lib, that will allow you to automatically download files. Something like CSharp's WebClient.DownloadFile Method (String, String). And if used properly, you can skip any Selenium commands for this action.
Maybe something like this post will be a good start.

Related

Downloading images at a particular location in selenium python

I am trying to download images using selenium but I don't know how to direct those files at a desired location. Can anyone tell me how to do this?
Use this code to set desire download location in Selenium-Python bindings :
executable_path = r"C:\\Selenium+Python\\chromedriver.exe"
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : '/path/to/dir'}
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path, options=options)
You need to change /path/to/dir to your desired location.
In case you are using a Chrome webdriver you can use these settings:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Downloads")
driver = webdriver.Chrome(chrome_options=options)
Here I set it to "C:/Downloads" but you can change it to any other destination.
For the Firefox you can use this:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", 'PATH TO DESKTOP')
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/x-gzip")
driver = webdriver.Firefox(firefox_profile=profile)
Where 'PATH TO DESKTOP' is a path on your disk where you want to download your files
as per this post to download using firefox
from selenium import webdriver
profile = webdriver.FirefoxProfile()
path = 'C:\\downloads'
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', path)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'image/png', 'image/jpeg')
then select the button to download and click it.
if you only have the link of the image, i'd recommend using something like mechanize or urllib to download the contents
img = driver.find_element_by_xpath(xpath)
src = img.get_attribute('src')
# download the image
req = urllib.urlopen(src)
f = open(filename,'wb')
f.write(req.read())
f.close()

Click on a file to download Selenium Python

I use Selenium (python) and Firefox portable browser.
My goal is to download a lot of files using selenium (namely through Selenium).
When you click on the link, the file should start downloading, but this window opens.
Tell me, are there any selenium settings to avoid opening such a window?
Try to set preference neverAsk.saveToDisk
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
def example():
opt = Options()
opt.headless = False # Or True
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.manager.showAlertOnComplete", False)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk",
"application/vnd.ms-powerpoint")
fp.set_preference("browser.download.dir", "C:\\folder_name\\Downloads")
firefox_browser = webdriver.Firefox(firefox_profile=fp, options=opt)
file type https://www.freeformatter.com/mime-types-list.html

What is wrong with this selenium firefox profile to download file into customized folder?

I am using selenium and python v3.6 to automate firefox to download file into a customized folder. The location of the folder is C:/Users/username/Dropbox/Inv/.
Below is my firefox profile.
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', 'C:/Users/username/Dropbox/Inv/')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/plain')
profile.set_preference('browser.helperApps.neverAsk.openFile', 'text/plain')
Currently, the file is always downloaded in the default folder C:\Users\username\Downloads. How do I get the downloaded folder location to be C:/Users/username/Dropbox/Inv/?
You need to use profile while launching Firefox:
driver = webdriver.Firefox(firefox_profile = profile)
Check 8.4. How to auto save files using custom Firefox profile ? in Selenium Docs FAQ.
This is the example in the link:
import os
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream")
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://pypi.python.org/pypi/selenium")
browser.find_element_by_partial_link_text("selenium-2").click()
I will answer my own question. The problem lies with the string specifying the download directory. I should use \\ and not /.
profile.set_preference('browser.download.dir', 'C:\\Users\\username\\Dropbox\\Inv')
The code has been verified to be working now.

Python Set Firefox Preferences for Selenium--Download Location

I use Selenium Marrionette and GeckoDriver to pull web data. I use the following to set my Firefox profile preferences:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 1)
fp.set_preference("browser.helperApps.alwaysAsk.force", False)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "H:\Downloads")
fp.set_preference("browser.download.downloadDir","H:\Downloads")
fp.set_preference("browser.download.defaultFolder","H:\Downloads")
binary = FirefoxBinary(r'C:\Program Files (x86)\Mozilla Firefox\Firefox.exe')
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
driver = webdriver.Firefox(capabilities=firefox_capabilities, firefox_binary=binary, firefox_profile = fp)
From what I understand after reading Unable to set firefox profile preferences and FirefoxProfile passed to FirefoxDriver, it seems that nothing is being done when using firefox_profile now. So I need to implement the new updates to firefox_capabilities, but I'm not sure how to exactly do that. Any ideas?
Ok, I believe I finally figured this mess out. Instead of using the code above, I used the following code which I point to my Firefox profile folder(if you need to update your default profile settings do that in Firefox before running this code):
from selenium.webdriver.firefox.options import Options
binary = FirefoxBinary(r'C:\Program Files (x86)\Mozilla Firefox\Firefox.exe')
fp = (r'C:\Users\username\AppData\Roaming\Mozilla\Firefox\Profiles\oqmqnsih.default')
opts = Options()
opts.profile = fp
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
driver = webdriver.Firefox(capabilities=firefox_capabilities,firefox_binary=binary, firefox_options = opts)
I ran this code along with my web-scraping code and once I clicked the "Export CSV" link, it automatically downloaded as opposed to the Download Manager window popping up. Feel free to add any feedback.
The initial code is partialy correct. You must set browser.download.folderList value as 2 :
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2) # 0 means to download to the desktop, 1 means to download to the default "Downloads" directory, 2 means to use the directory
fp.set_preference("browser.helperApps.alwaysAsk.force", False)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "H:\Downloads")
binary = FirefoxBinary(r'C:\Program Files (x86)\Mozilla Firefox\Firefox.exe')
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
driver = webdriver.Firefox(capabilities=firefox_capabilities,firefox_binary=binary, firefox_profile = fp)
the solution for my python script (on raspi 3):
binary = FirefoxBinary('/usr/bin/firefox')
driver = webdriver.Firefox(capabilities={'browserName': 'firefox' }, firefox_binary=binary)

Cannot load custom profile python - selenium

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2);
fp.set_preference("browser.download.manager.showWhenStarting", False);
fp.set_preference("browser.download.dir", self.download_dir);
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv");
self.browser = webdriver.Remote("http://192.168.1.242:4444/wd/hub",
desired_capabilities=webdriver.DesiredCapabilities.FIREFOX,
browser_profile=fp
)
the above code does not respect the profile specified.
BUT the code below works as expected:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2);
fp.set_preference("browser.download.manager.showWhenStarting", False);
fp.set_preference("browser.download.dir", self.download_dir);
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv");
self.browser = webdriver.Firefox(fp)
In the seleniums documentation page http://seleniumhq.org/docs/04_webdriver_advanced.html#remotewebdriver
has the following example:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
# set something on the profile...
driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.FIREFOX, browser_profile=fp)
which is the same as the code in my example.
Also when i start the selenium server with -firefoxProfileTemplate it seems to ignore the profile's settings
java -jar ./selenium-server-standalone-2.25.0.jar -firefoxProfileTemplate /home/xubuntu/.mozilla/firefox/fdui6lsj.crawler/
EDIT:
I also want to mention that if I load the profile from the file:
fp = webdriver.FirefoxProfile('/home/xubuntu/.mozilla/firefox/fdui6lsj.crawler/')
self.browser = webdriver.Remote("http://192.168.1.242:4444/wd/hub",
desired_capabilities=webdriver.DesiredCapabilities.FIREFOX,
browser_profile=fp
)
the profile is loaded but it takes a lot of time.
Can someone tell me what is wrong?
Try calling update_preferences() at the end. That should force the writing of the config file:
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv");
fp.update_preferences()

Categories

Resources