Save PDF by using Selenium and IE Browser

Save PDF by using Selenium and IE Browser - python

To save PDF by using CHrome Browser does not cause any issues (I'm using these options):
options.add_experimental_option('prefs',{
'credentials_enable_service': False,
'plugins':{
'always_open_pdf_externally': True
},
'profile': {
'password_manager_enabled': False,
},
'download': {
'prompt_for_download': False,
'directory_upgrade': True,
'default_directory': ''
}
})
BUT .... How to save PDF by using webdriver.Ie() Internet Explorer Driver with Python + Selenium?
P.S. AFAIK Internet explorer can not be executed by using headless mode, but if someone will not the way to do it, will be amazing !!!

You can't use Selenium to deal with the download prompt in IE because that's an OS-level prompt. Selenium WebDriver has no capability to automate OS-level prompt window. You need to use some 3rd party tools to help you to download file in IE using Selenium.
Here I use Wget to bypass the download prompt and download file in IE. You can refer to this article about how to use Wget.
About using headless mode in IE in Selenium, you can also use a 3rd party tool called headless_ie_selenium. You can download this tool and use headless_ie_selenium.exe instead of IEDriverServer.exe to automate IE.
The sample code to download a pdf file is like below, please note to change the paths in the code to your owns:
from selenium import webdriver
import time
import os
url = "https://file-examples.com/index.php/sample-documents-download/sample-pdf-download/"
driver = webdriver.Ie('D:\\headless-selenium-for-win-v1-4\\headless_ie_selenium.exe')
driver.get(url)
time.sleep(3)
link = driver.find_elements_by_class_name("download-button")[0]
hrefurl = link.get_attribute("href")
os.system('cmd /c C:\\Wget\\wget.exe -P D:\\Download --no-check-certificate ' + hrefurl)
print("*******************")

Related

Headless mode disable python web file downloading

I aim to download web files while in headless mode. My program downloads perfectly when NOT in headless mode, but once I add the constraint not to show MS Edge opening, the downloading is disregarded.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
driver = webdriver.Edge()
driver.get("URL")
id_box = driver.find_element(By.ID,"...")
pw_box = driver.find_element(By.ID,"...")
id_box.send_keys("...")
pw_box.send_keys("...")
log_in = driver.find_element(By.ID,"...")
log_in.click()
time.sleep(0.1) # If not included, get error: "Unable to locate element"
drop_period = Select(driver.find_element(By.ID,"..."))
drop_period.select_by_index(1)
drop_consul = Select(driver.find_element(By.ID,"..."))
drop_consul.select_by_visible_text("...")
drop_client = Select(driver.find_element(By.ID,"..."))
drop_client.select_by_index(1)
# Following files do not download with headless inculded:
driver.find_element(By.XPATH, "...").click()
driver.find_element(By.XPATH, "...").click()

In that case, you might try downloading the file using the direct link (to the file) and python requests.
You'll need to get the url, by parsing the elemt its href:
Downloading and saving a file from url should work as following then:
import requests as req
remote_url = 'http://www.example.com/file.txt'
local_file_name = 'my_file.txt'
data = req.get(remote_url)
# Save file data to local copy
with open(local_file_name, 'wb')as file:
file.write(data.content)
resource

There are different headless modes for Chrome. If you want to download files, use one of the special ones.
For Chrome 109 and above, use:
options.add_argument("--headless=new")
For Chrome 108 and below, use:
options.add_argument("--headless=chrome")
Reference: https://github.com/chromium/chromium/commit/e9c516118e2e1923757ecb13e6d9fff36775d1f4

Downloading files in headless mode works for me on MicrosoftEdge version 110.0.1587.41 using following options:
MicrosoftEdge: [{
"browserName": "MicrosoftEdge",
"ms:edgeOptions": {
args: ['--headless=new'],
prefs: {
"download.prompt_for_download": false,
"plugins.always_open_pdf_externally": true,
'download.default_directory': "dlFolder"
}
},
}]
Nothing worked until I added the option '--headless=new'
N.B: Tested on a Mac environment using webdriverIO

How to automatically download file in IE in locked screen

I am using selenium webdriver to do some automation on IE11 and am stuck on auto-downloading file while screen is locked.
The file download starts after pressing a button. That button does not link to a url for the download file, but seemed to link to a javascript function. I have managed everything until pressing the button but am stuck on the bottom bar prompt from IE11 (open or save). For security reasons I have to use IE11 and do the automation in locked screen. I have tried to send Alt+S using WScript.Shell, but it only seemed to work after I unlock the screen. Here's what I've tried.
shell = win32.Dispatch("WScript.Shell")
config = Path(localpath + localfile)
#confirm file exists
while not config.is_file():
shell.SendKeys("%s", 0)
time.sleep(2)
Is there a way to bypass the IE prompt and automatically save the file to the download folder during locked screen?

As far as I know, WebDriver has no capability to access the IE Download dialog boxes presented by browsers when you click on a download link or button. But, we can bypass these dialog boxes using a separate program called "wget".
we can use command-line program and this wget to download the file. Code as below (the following code is C# code, you could convert it to Python):
var options = new InternetExplorerOptions()
{
InitialBrowserUrl = URL, // "http://demo.guru99.com/test/yahoo.html";
IntroduceInstabilityByIgnoringProtectedModeSettings = true
};
//IE_DRIVER_PATH: #"D:\Downloads\webdriver\IEDriverServer_x64_3.14.0";
var driver = new InternetExplorerDriver(IE_DRIVER_PATH, options);
driver.Navigate();
Thread.Sleep(5000);
//get the download file link.
String sourceLocation = driver.FindElementById("messenger-download").GetAttribute("href");
Console.WriteLine(sourceLocation);
//using command-line program to execute the script and download file.
String wget_command = #"cmd /c D:\\temp\\wget.exe -P D:\\temp --no-check-certificate " + sourceLocation;
try
{
Process exec = Process.Start("CMD.exe", wget_command);
exec.WaitForExit();
Console.WriteLine("success");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
driver.Close(); // closes browser
driver.Quit(); // closes IEDriverServer process
More details, you could refer to this article.
Edit: Since you are using python, you could use the find_element_by_id method to find the element, then using the get_attribute method to get the value. More details, you could check these articles:
Get Element Attribute
Python + Selenium for Dummies like Me
WebDriver API
Get value of an input box using Selenium (Python)

Python download href, got the source code instead of a pdf file

I'm trying to download a pdf file with the following href (i change some value cause the pdf contain personal information)
https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d
When i past this href in my browser the pdf file is directly download, but when i'm trying to use request in my python code its only download the source code of
https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/
Here is my code, i use selenium to find the href in the website
fact = driver.find_element_by_xpath(url)
href = fact.get_attribute('href')
print(href) // href is correct here
reply = get(href, Stream=True)
print(reply) // I got the source code
Here is the html find by selenium
I hope you have enough informations to help, Thx

Can't use your link because it required auth so found another example of a redirecting pdf download. Setting Chrome to download the pdf instead of displaying it taken from this StackOverflow answer.
import selenium.webdriver
url = "https://readthedocs.org/projects/selenium-python/downloads/pdf/latest/"
download_dir = 'C:/Dev'
profile = {
"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
"download.default_directory": download_dir ,
"download.extensions_to_open": "applications/pdf"
}
options = selenium.webdriver.ChromeOptions()
options.add_experimental_option("prefs", profile)
driver = selenium.webdriver.Chrome(options=options)
driver.get(url)
From looking at the docs, the driver.get method doesn't return anything, it's just telling the webdriver to navigate to a page. If you want to handle the pdf in Python before saving it to a file then perhaps look at using Requests or Robobrowser.
Stream=True option wasn't available for webdriver.Chrome so not sure if this is the method you were using but the above should do what you want.

Using Selenium with Python and PhantomJS to download file to filesystem

I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem.
I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.

Despite this question is quite old, downloading files through PhantomJS is still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requests to download it actually:
import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)
And now in response.content actual file content should appear. We can next write it with open or do whatever we want.

PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:
File download
How to handle file save dialog box using Selenium webdriver and PhantomJS?
As far as I understand, you have at least 3 options:
switch to casperjs (and you should leave python here)
try with headless on xvfb
switch to normal non-headless browsers
Here are also some links that might help too:
Selenium Headless Automated Testing in Ubuntu
XWindows for Headless Selenium (with further links inside)
How to run browsers(chrome, IE and firefox) in headless mode?
Tutorial: How to use Headless Firefox for Scraping in Linux

My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script() function.
js = '''
var callback = arguments[0];
var theForm = document.forms['theFormId'];
data = new FormData();
data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
data.append('otherFormField', theForm.otherFormField.value);
var xhr = new XMLHttpRequest();
xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
xhr.onload = function () {
callback(this.responseText);
};
xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)

Is not posible in that way. You can use other alternatives to download files like wget o curl.
Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file
curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

Firefox + Selenium WebDriver and download a csv file automatically

I have problem with Selenium WebDriver and Firefox. I want to download csv file without confirmation in dialog window and I have code like this:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.dir", download_dir)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")
but it seems not working.
I tried many combination with browser.helperApps.neverAsk.saveToDisk
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv,application/csv,text/plan,text/comma-separated-values")
or
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/csv")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/plain")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/comma-separated-values")
but there's no difference and Firefox won't download automaticly.
How can I fix it?

Sometime the content type is not as you'd expect
Use HttpFox Firefox plugin (or similar) to find the real content type of the file and use it in your code
BTW, For me the content type was
fp.set_preference("browser.helperApps.neverAsk.openFile", "application/octet-stream");
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream");

SetPreference("browser.helperApps.neverAsk.saveToDisk", "application/comma-separated-values ,text/csv"); //in java selenium
this will work for downloading all type of csv files...
thanks, enjoy....

Now (May 2016),
SetPreference("browser.helperApps.neverAsk.saveToDisk", "text/csv"); // C#
works for me

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save PDF by using Selenium and IE Browser - python

Related

Headless mode disable python web file downloading

How to automatically download file in IE in locked screen

Python download href, got the source code instead of a pdf file

Using Selenium with Python and PhantomJS to download file to filesystem

Firefox + Selenium WebDriver and download a csv file automatically

Categories

Resources