Using python to download table data without .csv file address provided - python

My purpose is to download the data from this website:
http://transoutage.spp.org/
When opening this website, in the bottom of web, there is a description used to illustrate how to auto-download the data. For example:
http://transoutage.spp.org/report.aspx?download=true&actualendgreaterthan=3/1/2018&includenulls=true
The code I wrote is this:
import requests
ul_begin = 'http://transoutage.spp.org/report.aspx?download=true'
timeset = '3/1/2018' #define the time, m/d/yyyy
fn = ['&actualendgreaterthan='] + [timeset] + ['&includenulls=true']
fn = ''.join(fn)
ul = ul_begin+fn
r = requests.get(ul, verify=False)
Since, if you input the web address,
http://transoutage.spp.org/report.aspx?download=true&actualendgreaterthan=3/1/2018&includenulls=true,
into the Chrome, it will auto-download the data in .csv file. I do not know how to continue my code.
Please help me!!!!

You need to write the response you receive to a file:
r = requests.get(ul, verify=False)
if 200 >= r.status_code <= 300:
# If the request has succeeded
file_path = '<path_where_file_has_to_be_downloaded>`
f = open(file_path, 'w+')
f.write(r.content)
f.close()
This will work properly if the csv file is small in size. but for large files, you need to use stream param to download: http://masnun.com/2016/09/18/python-using-the-requests-module-to-download-large-files-efficiently.html

Related

Download a single file from a shared Dropbox's folder without having the share link of this file

As the title says, I have access to a shared folder where some files are uploaded. I just want to donwload an specific file, called "db.dta". So, I have this script:
def download_file(url, filename):
url = url
file_name = filename
with open(file_name, "wb") as f:
print("Downloading %s" % file_name)
response = requests.get(url, stream=True)
total_length = response.headers.get('content-length')
if total_length is None: # no content length header
f.write(response.content)
else:
dl = 0
total_length = int(total_length)
for data in response.iter_content(chunk_size=4096):
dl += len(data)
f.write(data)
done = int(50 * dl / total_length)
sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50-done)) )
sys.stdout.flush()
print(" ")
print('Descarga existosa.')
It actually download shares links of files if I modify the dl=0 to 1, like this:
https://www.dropbox.com/s/ajklhfalsdfl/db_test.dta?dl=1
The thing is, I dont have the share link of this particular file in this shared folder, so if I use the url of the file preview, I get an error of denied access (even if I change dl=0 to 1).
https://www.dropbox.com/sh/a630ksuyrtw33yo/LKExc-MKDKIIWJMLKFJ?dl=1&preview=db.dta
Error given:
dropbox.exceptions.ApiError: ApiError('22eaf5ee05614d2d9726b948f59a9ec7', GetSharedLinkFileError('shared_link_access_denied', None))
Is there a way to download this file?
If you have the shared link to the parent folder and not the specific file you want, you can use the /2/sharing/get_shared_link_file endpoint to download just the specific file.
In the Dropbox API v2 Python SDK, that's the sharing_get_shared_link_file method (or sharing_get_shared_link_file_to_file). Based on the error output you shared, it looks like you are already using that (though not in the particular code snippet you posted).
Using that would look like this:
import dropbox
dbx = dropbox.Dropbox(ACCESS_TOKEN)
folder_shared_link = "https://www.dropbox.com/sh/a630ksuyrtw33yo/LKExc-MKDKIIWJMLKFJ"
file_relative_path = "/db.dat"
res = dbx.sharing_get_shared_link_file(url=folder_shared_link, path=file_relative_path)
print("Metadata: %s" % res[0])
print("File data: %s bytes" % len(res[1].content))
(You mentioned both "db.dat" and "db.dta" in your question. Make sure you use whichever is actually correct.)
Additionally, note if you using a Dropbox API app registered with the "app folder" access type: there's currently a bug that can cause this shared_link_access_denied error when using this method with an access token for an app folder app.

python download folder of text files

The goal is to download GTFS data through python web scraping, starting with https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download
Currently, I'm using requests like so:
def download(url):
fpath = "prov/city/GTFS"
r = requests.get(url)
if r.ok:
print("Saving file.")
open(fpath, "wb").write(r.content)
else:
print("Download failed.")
The results of requests.content of the above url unfortunately renders the following:
You can see the files of interest within the output (e.g. stops.txt) but how might I access them to read/write?
I fear you're trying to read a zip file with a text editor, perhaps you should try using the "zipfile" module.
The following worked:
def download(url):
fpath = "path/to/output/"
f = requests.get(url, stream = True, headers = headers)
if f.ok:
print("Saving to {}".format(fpath))
g=open(fpath+'output.zip','wb')
g.write(f.content)
g.close()
else:
print("Download failed with error code: ", f.status_code)
You need to write this file into a zip.
import requests
url = "https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download"
fname = "gtfs.zip"
r = requests.get(url)
open(fname, "wb").write(r.content)
Now fname exists and has several text files inside. If you want to programmatically extract this zip and then read the content of a file, for example stops.txt, then you need first to extract a single file, or simply extractall.
import zipfile
# this will extract only a single file, and
# raise a KeyError if the file is missing from the archive
zipfile.ZipFile(fname).extract("stops.txt")
# this will extract all the files found from the archive,
# overwriting files in the process
zipfile.ZipFile(fname).extractall()
Now you just need to work with your file(s).
thefile = "stops.txt"
# just plain text
text = open(thefile).read()
# csv file
import csv
reader = csv.reader(open(thefile))
for row in reader:
...

Saving an image from webpage using request

There is an image on a webpage which I would like to save on my disk using python.
What I tried to do was
r=requests.get(url, timeout=60)
p=os.path.sep.join([args["output"],"{}.jpeg".format(str(total).zfill(5))])
f.write(r.content)
f.close()
But I realized that the file saved is not in image format as
$file name_of_file
00018.jpeg: HTML document, ASCII text, with very long lines, with no line terminators
Then I tried to:
r=requests.get(url, timeout=60)
p=os.path.sep.join([args["output"],"{}.jpeg".format(str(total).zfill(5))])
f=open(p, "wb")
i=r.raw
q=Image.open(BytesIO(r.content))
print(q.type)
f.write(i)
f.close()
But with no success. What should I do?
UPDATE:
r = requests.get(url, timeout=60)
# save the image to disk
p = os.path.sep.join([args["output"], "{}.jpeg".format(
str(total).zfill(5))])
with open("test.jpeg","wb+") as f:
f.write(requests.get("name_of_website",headers=headers).content)
f.close()
When I copied the image manually from the web using cursor, it was a jpg format.
This page needs cookie to do that:
If not,you can not visit it directly.
An easy way is add a cookie in your request header:
import requests
headers = {
"Cookie":"visid_incap_276192=vO9ugmNqRS+XGehZnF1jiwL8kl4AAAAAQUIPAAAAAADc6Z+46+Lp6X9DL0FUaSOv; incap_ses_627_276192=HgPZUq1t1yD2FURXnY2zCAL8kl4AAAAAyQ+1ZeYdSVzPTcurvHnlwA==; JSESSIONID=0001Zh35TV6HDxcVflnHMwIHsqe:-1801K8D; incap_ses_553_276192=XuxOZn9AsVOTcVuFwKasB3P9kl4AAAAAaxsIzIzT5BwV8RqhcTVPsw==",
}
with open("test.jpg","wb+") as f:
f.write(requests.get("https://www.e-zpassny.com/vector/jcaptcha.do",headers=headers).content)
Now it can download the image successfully:
I think you should do someting like this:
r = requests.get(url, timeout=60)
q = Image.open(BytesIO(r.content))
fp = os.path.join([args["output"], f"{str(total).zfill(5)}.jpeg"]) # here i used f-string because it looks more compact
q = q.save(fp)
Image.save() described here
F-strings it's way of formatting, it described here and here
I hope that it's helpful, have a good day!
EDIT:
Ok it looks like it doesn't work
So, you can try this from here:
r = requests.get(url, timeout=60)
bytes = BytesIO(r.content)
bytes.seek(0)
q = Image.ope(bytes)
fp = os.path.join([args["output"], f"{str(total).zfill(5)}.jpeg"]) # here i used f-string because it looks more compact
q = q.save(fp)

Save streaming audio from URL as MP3, or even just audio file from URL as MP3

I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?

downloading a file, not the contents

I am trying to automate downloading a .Z file from a website, but the file I get is 2kb when it should be around 700 kb and it contains a list of the contents of the page (ie: all the files available for download). I am able to download it manually without a problem. I have tried urllib and urllib2 and different configurations of each, but each does the same thing. I should add that the urlVar and fileName variables are generated in a different part of the code, but I have given an example of each here to demonstrate.
import urllib2
urlVar = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z"
fileName = txga1000.14d.Z
downFile = urllib2.urlopen(urlVar)
with open(fileName, "wb") as f:
f.write(downFile.read())
At least the urllib2documentation suggest you should use the Requestobject. This works with me:
import urllib2
req = urllib2.Request("ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z")
response = urllib2.urlopen(req)
data = response.read()
Data length seems to be 740725.
I was able to download what seems like the correct size for your file with the following python2 code:
import urllib2
filename = "txga1000.14d.Z"
url = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/{}".format(filename)
reply = urllib2.urlopen(url)
buf = reply.read()
with open(filename, "wb") as fh:
fh.write(buf)
Edit: The post above me was answered faster and is much better.. I thought I'd post since I tested and wrote this out anyways.

Categories

Resources