How to fetch a file from a URL - python

I have a URL, for example
url = "www.example.com/file/processing/path/excefile.xls"
This URL downloads an excel file directly when I paste it in a browser.
How can I use python to download this file? That is, if I run the python code the above URL should open in a browser and download the excel file.

If you don't necessarily need to go through the browser, you can use the urllib module to save a file to a specified location.
import urllib
url = 'http://www.example.com/file/processing/path/excelfile.xls'
local_fname = '/home/John/excelfile.xls'
filename, headers = urllib.retrieveurl(url, local_fname)
http://docs.python.org/library/urllib.html#urllib.urlretrieve

Use the webbrowser module:
import webbrowser
webbrowser.open(url)

You should definitely look into the awesome requests lib.

Related

How to download PDF files in python that doesn't end with .pdf

The URL looks like this: https://apps.websitename.com/AccountOnlineWeb/AccountOnlineCommand?command=getBlobImage&image=11/19/2019 I have tried everything. But none of them worked.
import requests
from requests.auth import HTTPBasicAuth
url ='https://apps.websitename.com/AccountOnlineWeb/AccountOnlineCommand?command=getBlobImage&image=11/19/2019'
s = requests.Session()
r = requests.get(url, allow_redirects=True, auth=HTTPBasicAuth('username', 'password'))
with open('filepath/file.pdf', 'wb')as f:
f.write(r.content)
I tested getting a .jpg file from the website to make sure the authentication part is working. I have downloaded a file with a .pdf url that's not authenticated to make sure downloading pdf is working. But I just cannot download this file.
I used r.is_redirect to test if the url redirects to another url for the PDF but it returned False
I should mention that when you open the file manually it just waits for like 2s and loads the PDF like a regular PDF and you can download it just like a regular PDF.
Currently my code downloads a file that's supposed to be the PDF but it has 0 KB.

Convert CSV file to HTML and display in browser with Pandas

How can I convert a CSV file to HTML and open it in a web browser via Python using pandas.
Below is my program but I can not display them in the web page:
import pandas
import webbrowser
data = pandas.read_csv(r'C:\Users\issao\Downloads\data.csv')
data = data.to_html()
webbrowser.open('data.html')
You need to pass a url to webbrowser.
Save the html content into a local file and pass it's path to webbrowser
import os
import webbrowser
import pandas
data = pandas.read_csv(r'C:\Users\issao\Downloads\data.csv')
html = data.to_html()
path = os.path.abspath('data.html')
url = 'file://' + path
with open(path, 'w') as f:
f.write(html)
webbrowser.open(url)
You're missing a few steps:
Pandas does not build a full HTML page but only a element.
pd.DataFrame({'a': [1,2,3]}).to_html()
Returns: <table border="1" class="dataframe">...</table>
You need host the HTML somewhere and open a web browser. You can use a local file and do run the browser from python (os.system('firefox page.html'). But I doubt that is what you are looking for.
Doesn't answer the OP's question directly, but for someone who is looking at an alternative to Pandas, they can check out csvtotable(https://github.com/vividvilla/csvtotable), especially the option with "--serve". Sample usage would be something like this: csvtotable data.csv --serve. This "serves" the CSV file to the browser.

How to download a csv file from internet when there is javascript button?

I am trying to download a CSV file from morningstar using python.
here is the link:
http://financials.morningstar.com/income-statement/is.html?t=NAB&region=aus
There is a button to "export CSV" but I can't access the link.
There is this javascript:exportKeyStat2CSV(); but I am not sure how to find the URL of the CSV file?
I tried to download the file and get the URL to use requests/panadas to download the file but requests/panadas cant get anything.
import pandas as pd
URL= ("http://financials.morningstar.com/finan/ajax/exportKR2CSV.html?&callback=?&t=XASX:NAB&region=aus&culture=en-US&cur=&order=asc")
df=pd.read_csv(URL) ( didnt work with pandas)
import requests
print ('Starting to Download!')
r = requests.get(URL)
filename = url.split('/')[-1]
with open(filename, 'wb') as out_file:
out_file.write(r.content)
print("Download complete!")
I get Requests 204 code.
How do I solve the problem?
Have you tried accessing the resource via Selenium with Python?
See this example:
https://stackoverflow.com/a/51725319/5198805

how to download with splinter knowing direction and name of the file

I am working on python and splinter. I want to download a file from clicking event using splinter. I wrote following code
from splinter import Browser
browser = Browser('chrome')
url = "download link"
browser.visit(url)
I want to know how to download with splinter knowing URL and name of the file
Splinter is not involved in the download of a file.
Maybe you need to navigate the page to find the exact URL, but then use the regular requests library for the download:
import requests
url="some.download.link.com"
result = requests.get(url)
with open('file.pdf', 'wb') as f:
f.write(result.content)

Using urllib2 in Python

I am trying to do the following via python:
From this website:
http://www.bmf.com.br/arquivos1/arquivos_ipn.asp?idioma=pt-BR&status=ativo
I would like to check the 4th checkbox and then click on Download image.
That is what I did:
import urllib2
import urllib
url = "http://www.bmf.com.br/arquivos1/arquivos_ipn.asp?idioma=pt-BR&status=ativo"
payload = {"chkArquivoDownload3_ativo":"1"}
data = urllib.urlencode(payload)
request = urllib2.Request(url, data)
print request
response = urllib2.urlopen(request)
contents = response.read()
print contents
Does anyone have any suggestions?
Selenium is a great project, it lets you control a firefox browser with python. Something like this:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.bmf.com.br/arquivos1/arquivos_ipn.asp?idioma=pt-BR&status=ativo')
browser.find_element_by_id('chkArquivoDownload3').click()
browser.find_element_by_id('imgSubmeter_ativo').click()
browser.quit()
would probably work.
Web browsers are a complex collection of components which interact together.
Python does not have a web-browser built in (in particular a DOM or Javascript engine) and it is simply downloading a html file which would normally interact with said DOM and javascript in your browser.
The easiest method I foresee:
Pares the string using the python module BeautifulSoup.
Manually make the download request with the information you have parsed.
Save the downloaded image to file

Categories

Resources