Downlad PDF from PDF Viewer using Selenium/Python/Chrome - python

I try to navigate through a webpage and whenever a pdf viewer appears, I want to download the pdf file. So to keep it easy in the beginning, I only try to login to the page, navigate to the first page that holds a pdf and try to download it.
The code I used so far:
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Documents",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
})
browser = webdriver.Chrome("/Users/XXX/Documents/chromedriver", options=options)
browser.get('the login webpage')
From here I login and navigate to the desired webpage.
And from then on, I don't really know how to get the PDF...
Hope someone can help me out here.
Thank you

Related

Selenium Webdriver requests a different url from the one set

After an update in selenium and visual studio I have the following problem. I try to get a url for example
thestore = "http://shop.oki.gr/shop/store/diathesimotita_new.asp" and instead I have a window opened with
http://www.puttop.top/object.php?u=http://shop.oki.gr/shop/store/customerauthenticateform.asp?redirectUrl=http://shop.oki.gr/shop/store/diathesimotita_new.asp&title=Login%20Page
which of course is not working.
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {"download.default_directory": downloads_path,
"profile.default_content_settings.popups": 0,
"download.prompt_for_download": False,
"directory_upgrade": True,
"safebrowsing.enabled": True})
browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
thestore = "http://shop.oki.gr/shop/store/diathesimotita_new.asp"
browser.get(thestore)
The initial url is opening from other pc normally..
What is happening?
I cant open either of these but it seems like its redirecting you to a login page first
I have a couple programs that do something similar and I keep a function that login for me on hand, then I get the original url again

Use python-firefox-selenium to fill form and download redirected page as pdf

I want to automatically download some details from a web page, which has a form to fill up. Upon clicking enter, the page is redirected to another URL, which contains a pdf file. I want to download the pdf file. I tried save as html, but it has no information at all. Also tried to capture the screenshot, but for a single file, it requires more than one screenshots. I want the page to be downloaded as pdf.
Tried saving as html - the html file contains no information
Tried screenshots - More than one screenshot for a single page - complications
Tried pdfkit - it rerenders the url, and thus loses the credentials entered, resulting in an error page.
I understand that it is not easy to emulate the 'save' option of the browser. But unfortunately, that is something i want.
Great question. I have faced this issue before and found snippets which I joined to the following code. Instead of displaying the PDF in the browser it will be downloaded.
# firefox profile to download PDF
mime_types_pdf = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"
mime_types = mime_types_pdf
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
fp.set_preference("browser.helperApps.neverAsk.openFile", mime_types)
fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
fp.set_preference("pdfjs.disabled", True)
geckodriver = '[path_to_your_firefox_driver]/geckodriver'
driver = webdriver.Firefox(executable_path=geckodriver, firefox_profile=fp)

Python download href, got the source code instead of a pdf file

I'm trying to download a pdf file with the following href (i change some value cause the pdf contain personal information)
https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d
When i past this href in my browser the pdf file is directly download, but when i'm trying to use request in my python code its only download the source code of
https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/
Here is my code, i use selenium to find the href in the website
fact = driver.find_element_by_xpath(url)
href = fact.get_attribute('href')
print(href) // href is correct here
reply = get(href, Stream=True)
print(reply) // I got the source code
Here is the html find by selenium
I hope you have enough informations to help, Thx
Can't use your link because it required auth so found another example of a redirecting pdf download. Setting Chrome to download the pdf instead of displaying it taken from this StackOverflow answer.
import selenium.webdriver
url = "https://readthedocs.org/projects/selenium-python/downloads/pdf/latest/"
download_dir = 'C:/Dev'
profile = {
"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
"download.default_directory": download_dir ,
"download.extensions_to_open": "applications/pdf"
}
options = selenium.webdriver.ChromeOptions()
options.add_experimental_option("prefs", profile)
driver = selenium.webdriver.Chrome(options=options)
driver.get(url)
From looking at the docs, the driver.get method doesn't return anything, it's just telling the webdriver to navigate to a page. If you want to handle the pdf in Python before saving it to a file then perhaps look at using Requests or Robobrowser.
Stream=True option wasn't available for webdriver.Chrome so not sure if this is the method you were using but the above should do what you want.

Python Web Scraping saving Tik Tok video from url

I am trying to save videos from this url:
Original:
https://api2.musical.ly/aweme/v1/play/?video_id=v09044a20000beeff4c108gs7sflfdug
Link changes to this:
http://v16.muscdn.com/3d238aa3e1c34000ce53792155cd0e15/5bcf3070/video/tos/maliva/tos-maliva-v-0068/e5a1ab74d0b54f97b3578924a428e58d/
The video is from TikTok. When you go to the url, it instantly redirects you to another url. The other url is the one I want in order to save the video. However, the url it directs you to does not have a "view html source" option. I can inspect the element and that shows it has a video tag, but I cannot find a way to save the url between the tag. I am using python and beautifulsoup. I tried to do this with selenium, but to no effect.
Edit:
The link that it redirects to changes all the time! as of 27/08/2019, the link below works...
If you get Access denied you should check the link once again...
I think you should use other libraries for saving videos...
For example (in Python 3+):
import urllib.request
vid_url = "http://v19.muscdn.com/21b98c731608b8aa296ec31468c26dd1/5d652a88/video/tos/maliva/tos-maliva-v-0068/e5a1ab74d0b54f97b3578924a428e58d/?rc=amdvdnY7NDdpaDMzNTczM0ApdSlINzU2NTM0MzM2MzM1MzQ1b2k5ZmU5Z2c1ZGY5ZmQzPGZAaUBoNnYpQGczdilAZjY1QHJjYzRkLWBjYl8tLV4xNnNzOmk0NTU1LjQtLi4uMTQ0NTYtOiM2MDAtXjQzXzMxMTFeMWEzYSNvIzphLW8jOmAtbyMwLl4%3D"
urllib.request.urlretrieve(vid_url, "your_video_name.mp4")
If you insist on using selenium you can add options like this:
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
"download.default_directory": r"C:\Users\xxx\downloads\Test",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
driver = webdriver.Chrome(chrome_options=options)
Hope this helps you!

Chromedriver, Selenium - Automate downloads

I am using Selenium 2.43.0 with Python 2.7.5. At one point, the test clicks on a button which sends form information to the server. If the request is successful, the server responds with
1) A successful message
2) A PDF with the form information merged in
I don't care to test the PDF, my test is just looking for a successful message. However the PDF is part of the package response from the server that I, as the tester, cannot change.
Until recently, this was never an issue using Chromedriver, since Chrome would automatically download pdfs into its default folder.
However, a few days ago one of my test environments started popping a separate window with a "Print" screen for the pdf, which derails my tests.
I don't want or need this dialog. How do I suppress this dialog programmatically using chromedriver's options? (Something equivalent to FireFox's pdfjs.disable option in about:config).
Here is my current attempt to bypass the dialog, which does not work (by "not work" does not disable or suppress the print pdf dialog window):
dc = DesiredCapabilities.CHROME
dc['loggingPrefs'] = {'browser': 'ALL'}
chrome_profile = webdriver.ChromeOptions()
profile = {"download.default_directory": "C:\\SeleniumTests\\PDF",
"download.prompt_for_download": False,
"download.directory_upgrade": True}
chrome_profile.add_experimental_option("prefs", profile)
chrome_profile.add_argument("--disable-extensions")
chrome_profile.add_argument("--disable-print-preview")
self.driver = webdriver.Chrome(executable_path="C:\\SeleniumTests\\chromedriver.exe",
chrome_options=chrome_profile,
service_args=["--log-path=C:\\SeleniumTests\\chromedriver.log"],
desired_capabilities=dc)
All component versions are the same in both testing environments:
Selenium 2.43.0, Python 2.7.5, Chromedriver 2.12, Chrome (browser) 38.0.02125.122
I had to dig into the source code on this one - I couldn't find any docs listing the full set of Chrome User Preferences.
The key is "plugins.plugins_disabled": ["Chrome PDF Viewer"]}
FULL CODE:
dc = DesiredCapabilities.CHROME
dc['loggingPrefs'] = {'browser': 'ALL'}
chrome_profile = webdriver.ChromeOptions()
profile = {"download.default_directory": "C:\\SeleniumTests\\PDF",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.plugins_disabled": ["Chrome PDF Viewer"]}
chrome_profile.add_experimental_option("prefs", profile)
#Helpful command line switches
# http://peter.sh/experiments/chromium-command-line-switches/
chrome_profile.add_argument("--disable-extensions")
self.driver = webdriver.Chrome(executable_path="C:\\SeleniumTests\\chromedriver.exe",
chrome_options=chrome_profile,
service_args=["--log-path=C:\\SeleniumTests\\chromedriver.log"],
desired_capabilities=dc)
Interestingly the blanket command chrome_profile.add_argument("--disable-plugins") switch did not solve this problem. But I prefer the more surgical approach anyways.

Categories

Resources