How to read a file downloaded by selenium webdriver in python - python

I am using selenium with webdriver in python to download a csv file from a site . The file gets downloaded into the download directory specified. Here is an overview of my code
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",'xx/yy')
fp.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
driver = webdriver.Firefox(fp)
driver.get('url')
I need to print the contents of this csv to the terminal . A lot of similar files with random names will be downloaded into the same folder so accessing the file via filename wont work as I don't know what it will be in advance

You can get the last downloaded file from that location and then read the file:
path = /path to folder
list = os.listdir(path)
time_sorted_list = sorted(list, key=os.path.getmtime)
file_name = time_sorted_list[len(time_sorted_list)-1]
and then u can read from this file. Hoping not multiple files are getting there by parallel processes.
EDIT:
Just saw comment that multiple instances are up for downloading, so other way around you can use urllib and download the file by using its url as:
import urllib
urllib.urlretrieve( "http://www.example.com/yourfile.ext", "your-file-name.ext") // you can provide unique-id to your file name

This answer was formed from a combination of previous stack overflow questions , answers as well as comments in this post so thank you everyone.
I combined selenium webdriver and the python requests module for this solution . I essentially logged into the site using selenium, copied the cookies from the webdriver session and then used a requests.get(url,cookies = webdriver_cookies) to get the file.
Here's the gist of my solution
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",'xx/yy')
fp.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
driver = webdriver.Firefox(fp)
# selenium login code ...
driver_cookies = driver.get_cookies()
cookies_copy = {}
for driver_cookie in driver_cookies:
cookies_copy[driver_cookie["name"]] = driver_cookie["value"]
r = requests.get('url',cookies = cookies_copy)
print r.text
I hope that this helps someone

Downloading files in Selenium is never a good idea. You cannot control where and under which filename the file is downloaded, and if you want to find out, then you have to use dirty hacks. It depends on the browser and its settings and if the same file has already been downloaded before or not.
Plus, you have to take care of deleting the file after the download, bc otherwise, numerous copies of the same file will spam your hard drive until it's completely full.
If possible, you should call something like
string downloadUrl = ButtonDownloadPdf.GetAttribute("href");
and then handle the downloading yourself, using conventional methods, not Selenium.

Related

Download excel file in current folder instead of "downloads"

I am trying to download an excel file using Selenium in Python from a website
I need the file to be downloaded in the current folder instead of "download"
but it is not working, it is downloading it in the downloads folder
mime_types = [
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
]
options = Options()
options.set_preference("browser.download.folderList",2)
options.set_preference("browser.download.manager.showWhenStarting", False)
options.set_preference("browser.download.dir","./")
options.set_preference("browser.helperApps.neverAsk.saveToDisk", ",".join(mime_types))
s = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=s, options=options)
driver.get("https://chartink.com/screener/close-below-bb-205")
WebDriverWait(driver, 2).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[2]/div[2]/div[2]/div/div/div/div[2]/div/div[2]/div[6]/div[1]/div/div[1]/div/button[3]"))).click()
please let me know what is wrong with the code?
To download a file using Selenium in Python within a specified directory you need to tweak the following about:config entries:
Solution
So an working solution can be to create a FirefoxProfile and then create a new directory to later download the files in it as follows:
newpath = 'C:\\home\\vivvin\\shKLSE'
if not os.path.exists(newpath):
os.makedirs(newpath)
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.dir",newpath);
profile.set_preference("browser.download.folderList",2);
Reference
You can find a couple of detailed discussion in:
Downloading file through Selenium Webdriver in python
Python: Unable to download with selenium in webpage

Save PDF by using Selenium and IE Browser

To save PDF by using CHrome Browser does not cause any issues (I'm using these options):
options.add_experimental_option('prefs',{
'credentials_enable_service': False,
'plugins':{
'always_open_pdf_externally': True
},
'profile': {
'password_manager_enabled': False,
},
'download': {
'prompt_for_download': False,
'directory_upgrade': True,
'default_directory': ''
}
})
BUT .... How to save PDF by using webdriver.Ie() Internet Explorer Driver with Python + Selenium?
P.S. AFAIK Internet explorer can not be executed by using headless mode, but if someone will not the way to do it, will be amazing !!!
You can't use Selenium to deal with the download prompt in IE because that's an OS-level prompt. Selenium WebDriver has no capability to automate OS-level prompt window. You need to use some 3rd party tools to help you to download file in IE using Selenium.
Here I use Wget to bypass the download prompt and download file in IE. You can refer to this article about how to use Wget.
About using headless mode in IE in Selenium, you can also use a 3rd party tool called headless_ie_selenium. You can download this tool and use headless_ie_selenium.exe instead of IEDriverServer.exe to automate IE.
The sample code to download a pdf file is like below, please note to change the paths in the code to your owns:
from selenium import webdriver
import time
import os
url = "https://file-examples.com/index.php/sample-documents-download/sample-pdf-download/"
driver = webdriver.Ie('D:\\headless-selenium-for-win-v1-4\\headless_ie_selenium.exe')
driver.get(url)
time.sleep(3)
link = driver.find_elements_by_class_name("download-button")[0]
hrefurl = link.get_attribute("href")
os.system('cmd /c C:\\Wget\\wget.exe -P D:\\Download --no-check-certificate ' + hrefurl)
print("*******************")

send files to a website through a 'browse files' on pc

I'm browsing through a website using dryscrape in python and i need to upload a file to this site. But there is only one way of doing it, that is clicking in a button and browse into my files and select the one i want. How can i do it with python? i would appreciate if someone could help me using dryscrape too, but i'm accepting all answers.
heres the example image:
You can use Selenium. I tested this code and it works.
from selenium import webdriver
url = "https://example.com/"
driver = webdriver.Chrome("./chromedriver")
driver.get(url)
input_element = driver.find_element_by_css_selector("input[type=\"file\"]")
# absolute path to file
abs_file_path = "/Users/foo/Downloads/bar.png"
input_element.send_keys(abs_file_path)
sleep(5)
driver.quit()
Resources
Selenium python
Chrome driver download
For those who are searching for the answer in dryscrape i translated the selenium code to dryscrape:
element = sessin.at_xpath("xpath...") # Session is a dryscrape session
# The xpath is from the button like the one in the image "Browse..."
element.set("fullpath")
just as simple as it is.

Using Selenium with Python and PhantomJS to download file to filesystem

I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem.
I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.
Despite this question is quite old, downloading files through PhantomJS is still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requests to download it actually:
import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)
And now in response.content actual file content should appear. We can next write it with open or do whatever we want.
PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:
File download
How to handle file save dialog box using Selenium webdriver and PhantomJS?
As far as I understand, you have at least 3 options:
switch to casperjs (and you should leave python here)
try with headless on xvfb
switch to normal non-headless browsers
Here are also some links that might help too:
Selenium Headless Automated Testing in Ubuntu
XWindows for Headless Selenium (with further links inside)
How to run browsers(chrome, IE and firefox) in headless mode?
Tutorial: How to use Headless Firefox for Scraping in Linux
My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script() function.
js = '''
var callback = arguments[0];
var theForm = document.forms['theFormId'];
data = new FormData();
data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
data.append('otherFormField', theForm.otherFormField.value);
var xhr = new XMLHttpRequest();
xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
xhr.onload = function () {
callback(this.responseText);
};
xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)
Is not posible in that way. You can use other alternatives to download files like wget o curl.
Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file
curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

Firefox + Selenium WebDriver and download a csv file automatically

I have problem with Selenium WebDriver and Firefox. I want to download csv file without confirmation in dialog window and I have code like this:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.dir", download_dir)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")
but it seems not working.
I tried many combination with browser.helperApps.neverAsk.saveToDisk
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv,application/csv,text/plan,text/comma-separated-values")
or
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/csv")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/plain")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/comma-separated-values")
but there's no difference and Firefox won't download automaticly.
How can I fix it?
Sometime the content type is not as you'd expect
Use HttpFox Firefox plugin (or similar) to find the real content type of the file and use it in your code
BTW, For me the content type was
fp.set_preference("browser.helperApps.neverAsk.openFile", "application/octet-stream");
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream");
SetPreference("browser.helperApps.neverAsk.saveToDisk", "application/comma-separated-values ,text/csv"); //in java selenium
this will work for downloading all type of csv files...
thanks, enjoy....
Now (May 2016),
SetPreference("browser.helperApps.neverAsk.saveToDisk", "text/csv"); // C#
works for me

Categories

Resources