Using Selenium with Python and PhantomJS to download file to filesystem - python

I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem.
I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.

Despite this question is quite old, downloading files through PhantomJS is still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requests to download it actually:
import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)
And now in response.content actual file content should appear. We can next write it with open or do whatever we want.

PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:
File download
How to handle file save dialog box using Selenium webdriver and PhantomJS?
As far as I understand, you have at least 3 options:
switch to casperjs (and you should leave python here)
try with headless on xvfb
switch to normal non-headless browsers
Here are also some links that might help too:
Selenium Headless Automated Testing in Ubuntu
XWindows for Headless Selenium (with further links inside)
How to run browsers(chrome, IE and firefox) in headless mode?
Tutorial: How to use Headless Firefox for Scraping in Linux

My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script() function.
js = '''
var callback = arguments[0];
var theForm = document.forms['theFormId'];
data = new FormData();
data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
data.append('otherFormField', theForm.otherFormField.value);
var xhr = new XMLHttpRequest();
xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
xhr.onload = function () {
callback(this.responseText);
};
xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)

Is not posible in that way. You can use other alternatives to download files like wget o curl.
Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file
curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

Related

Save PDF by using Selenium and IE Browser

To save PDF by using CHrome Browser does not cause any issues (I'm using these options):
options.add_experimental_option('prefs',{
'credentials_enable_service': False,
'plugins':{
'always_open_pdf_externally': True
},
'profile': {
'password_manager_enabled': False,
},
'download': {
'prompt_for_download': False,
'directory_upgrade': True,
'default_directory': ''
}
})
BUT .... How to save PDF by using webdriver.Ie() Internet Explorer Driver with Python + Selenium?
P.S. AFAIK Internet explorer can not be executed by using headless mode, but if someone will not the way to do it, will be amazing !!!
You can't use Selenium to deal with the download prompt in IE because that's an OS-level prompt. Selenium WebDriver has no capability to automate OS-level prompt window. You need to use some 3rd party tools to help you to download file in IE using Selenium.
Here I use Wget to bypass the download prompt and download file in IE. You can refer to this article about how to use Wget.
About using headless mode in IE in Selenium, you can also use a 3rd party tool called headless_ie_selenium. You can download this tool and use headless_ie_selenium.exe instead of IEDriverServer.exe to automate IE.
The sample code to download a pdf file is like below, please note to change the paths in the code to your owns:
from selenium import webdriver
import time
import os
url = "https://file-examples.com/index.php/sample-documents-download/sample-pdf-download/"
driver = webdriver.Ie('D:\\headless-selenium-for-win-v1-4\\headless_ie_selenium.exe')
driver.get(url)
time.sleep(3)
link = driver.find_elements_by_class_name("download-button")[0]
hrefurl = link.get_attribute("href")
os.system('cmd /c C:\\Wget\\wget.exe -P D:\\Download --no-check-certificate ' + hrefurl)
print("*******************")

Selenium Firefox browser is stuck after downloading pdf

Was hoping someone could help me understand what's going on:
I'm using Selenium with Firefox browser to download a pdf (need Selenium to login to the corresponding website):
le = browser.find_elements_by_xpath('//*[#title="Download PDF"]')
time.sleep(5)
if le:
pdf_link = le[0].get_attribute("href")
browser.get(pdf_link)
The code does download the pdf, but after that just stays idle.
This seems to be related to the following browser settings:
fp.set_preference("pdfjs.disabled", True)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
If I disable the first, it doesn't hang, but opens pdf instead of downloading it. If I disable the second, a "Save As" pop-up window shows up. Could someone explain how to handle this?
For me, the best way to solve this was to let Firefox render the PDF in the browser via pdf.js and then send a subsequent fetch via the Python requests library with the selenium cookies attached. More explanation below:
There are several ways to render a PDF via Firefox + Selenium. If you're using the most recent version of Firefox, it'll most likely render the PDF via pdf.js so you can view it inline. This isn't ideal because now we can't download the file.
You can disable pdf.js via Selenium options but this will likely lead to the issue in this question where the browser gets stuck. This might be because of an unknown MIME-Type but I'm not totally sure. (There's another StackOverflow answer that says this is also due to Firefox versions.)
However, we can bypass this by passing Selenium's cookie session to requests.session().
Here's a toy example:
import requests
from selenium import webdriver
pdf_url = "/url/to/some/file.pdf"
# setup driver with options
driver = webdriver.Firefox(..options)
# do whatever you need to do to auth/login/click/etc.
# navigate to the PDF URL in case the PDF link issues a
# redirect because requests.session() does not persist cookies
driver.get(pdf_url)
# get the URL from Selenium
current_pdf_url = driver.current_url
# create a requests session
session = requests.session()
# add Selenium's cookies to requests
selenium_cookies = driver.get_cookies()
for cookie in selenium_cookies:
session.cookies.set(cookie["name"], cookie["value"])
# Note: If headers are also important, you'll need to use
# something like seleniumwire to get the headers from Selenium
# Finally, re-send the request with requests.session
pdf_response = session.get(current_pdf_url)
# access the bytes response from the session
pdf_bytes = pdf_response.content
I highly recommend using seleniumwire over regular selenium because it extends Python Selenium to let you return headers, wait for requests to finish, use proxies, and much more.

How to automatically download file in IE in locked screen

I am using selenium webdriver to do some automation on IE11 and am stuck on auto-downloading file while screen is locked.
The file download starts after pressing a button. That button does not link to a url for the download file, but seemed to link to a javascript function. I have managed everything until pressing the button but am stuck on the bottom bar prompt from IE11 (open or save). For security reasons I have to use IE11 and do the automation in locked screen. I have tried to send Alt+S using WScript.Shell, but it only seemed to work after I unlock the screen. Here's what I've tried.
shell = win32.Dispatch("WScript.Shell")
config = Path(localpath + localfile)
#confirm file exists
while not config.is_file():
shell.SendKeys("%s", 0)
time.sleep(2)
Is there a way to bypass the IE prompt and automatically save the file to the download folder during locked screen?
As far as I know, WebDriver has no capability to access the IE Download dialog boxes presented by browsers when you click on a download link or button. But, we can bypass these dialog boxes using a separate program called "wget".
we can use command-line program and this wget to download the file. Code as below (the following code is C# code, you could convert it to Python):
var options = new InternetExplorerOptions()
{
InitialBrowserUrl = URL, // "http://demo.guru99.com/test/yahoo.html";
IntroduceInstabilityByIgnoringProtectedModeSettings = true
};
//IE_DRIVER_PATH: #"D:\Downloads\webdriver\IEDriverServer_x64_3.14.0";
var driver = new InternetExplorerDriver(IE_DRIVER_PATH, options);
driver.Navigate();
Thread.Sleep(5000);
//get the download file link.
String sourceLocation = driver.FindElementById("messenger-download").GetAttribute("href");
Console.WriteLine(sourceLocation);
//using command-line program to execute the script and download file.
String wget_command = #"cmd /c D:\\temp\\wget.exe -P D:\\temp --no-check-certificate " + sourceLocation;
try
{
Process exec = Process.Start("CMD.exe", wget_command);
exec.WaitForExit();
Console.WriteLine("success");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
driver.Close(); // closes browser
driver.Quit(); // closes IEDriverServer process
More details, you could refer to this article.
Edit: Since you are using python, you could use the find_element_by_id method to find the element, then using the get_attribute method to get the value. More details, you could check these articles:
Get Element Attribute
Python + Selenium for Dummies like Me
WebDriver API
Get value of an input box using Selenium (Python)

How to implement data from one web browser to another

i'm using Selenium WebDriver with Python 2.7.14 on Firefox browser. i'm tring to get text from .JSON file that located at this url: http://a360ci.s3.amazonaws.com/Jmx/einat_world_bank.json and implement all the data in the main area on this url: http://jsonviewer.stack.hu/
This is my code:
driver = self.driver
driver.get('http://a360ci.s3.amazonaws.com/Jmx/einat_world_bank.json')
RawData = driver.find_element_by_id("tab-1")
RawData.click()
self.driver.implicitly_wait(2)
content = driver.find_element_by_class_name("data").text
driver.get('http://jsonviewer.stack.hu/')
MainField = driver.find_element_by_id("edit")
MainField.send_keys(content)
*I moved to RawData tab because on Firefox the JSON not parsing well
*After the second url opened the program stuck and nothing happens. what can be the problem and how it can be solved? Thanks.
Not clear the root cause. suggest you to fetch the json by http client library, not by opening in browser and get from browser. Otherwise your code can't run cross browser. Different browser will display the json content with different DOM tree. I think there is no 'tab-1' when open in Chrome. Another reason is firefox report parsing fail, but chrome has no such issue.
Because the json content is too long. suggest you not use send_keys(), you can try
driver.execute_script('arguments[0].value=arguments[1]', textbox, jsonstring)

Python selenium get redirected url with Phantomjs

Here is my problem: I'm trying to use selenium to access a webpage and the special about this page is it is an auto redirecting page (you open that page and after few seconds, it automatically redirect to another page). When i use driver = webdriver.Firefox(), my IDM catched that link just perfectly after few seconds.
And because i don't want the browser to come up so i use Phantomjs instead, ut it not working. My application just can get the loading page url (bitdl-1336...) but not the redirected link. Please help!
This is my code:
link = 'http://torrent.ajee.sh/hash.php?hash=' + self.global_hash_code
driver = webdriver.PhantomJS('phantomjs.exe')
driver.get(str(link))
element = driver.find_element_by_link_text('Download Zip')
element.click()
time.sleep(10)
msg = QMessageBox.information(self, QString('Thành công'),QString(driver.current_url))
And this is the result:
Please help!
Sorry about my english
Not exactly an answer to your PhantomJS-specific question, but a workaround to the problem.
And because i don't want the browser to come up so i use Phantomjs instead
You can continue using Firefox, but start it in a Virtual Display, see more information at:
How do I run Selenium in Xvfb?
You may also need to let the browser automatically save the archive in a specified directory, see:
How do I automatically download files from a pop up dialog using selenium-python
Access to file download dialog in Firefox

Categories

Resources