My purpose is to download the data from this website:
http://transoutage.spp.org/
When opening this website, in the bottom of web, there is a description used to illustrate how to auto-download the data. For example:
http://transoutage.spp.org/report.aspx?download=true&actualendgreaterthan=3/1/2018&includenulls=true
The code I wrote is this:
import requests
ul_begin = 'http://transoutage.spp.org/report.aspx?download=true'
timeset = '3/1/2018' #define the time, m/d/yyyy
fn = ['&actualendgreaterthan='] + [timeset] + ['&includenulls=true']
fn = ''.join(fn)
ul = ul_begin+fn
r = requests.get(ul, verify=False)
Since, if you input the web address,
http://transoutage.spp.org/report.aspx?download=true&actualendgreaterthan=3/1/2018&includenulls=true,
into the Chrome, it will auto-download the data in .csv file. I do not know how to continue my code.
Please help me!!!!
You need to write the response you receive to a file:
r = requests.get(ul, verify=False)
if 200 >= r.status_code <= 300:
# If the request has succeeded
file_path = '<path_where_file_has_to_be_downloaded>`
f = open(file_path, 'w+')
f.write(r.content)
f.close()
This will work properly if the csv file is small in size. but for large files, you need to use stream param to download: http://masnun.com/2016/09/18/python-using-the-requests-module-to-download-large-files-efficiently.html
Related
As the title says, I have access to a shared folder where some files are uploaded. I just want to donwload an specific file, called "db.dta". So, I have this script:
def download_file(url, filename):
url = url
file_name = filename
with open(file_name, "wb") as f:
print("Downloading %s" % file_name)
response = requests.get(url, stream=True)
total_length = response.headers.get('content-length')
if total_length is None: # no content length header
f.write(response.content)
else:
dl = 0
total_length = int(total_length)
for data in response.iter_content(chunk_size=4096):
dl += len(data)
f.write(data)
done = int(50 * dl / total_length)
sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50-done)) )
sys.stdout.flush()
print(" ")
print('Descarga existosa.')
It actually download shares links of files if I modify the dl=0 to 1, like this:
https://www.dropbox.com/s/ajklhfalsdfl/db_test.dta?dl=1
The thing is, I dont have the share link of this particular file in this shared folder, so if I use the url of the file preview, I get an error of denied access (even if I change dl=0 to 1).
https://www.dropbox.com/sh/a630ksuyrtw33yo/LKExc-MKDKIIWJMLKFJ?dl=1&preview=db.dta
Error given:
dropbox.exceptions.ApiError: ApiError('22eaf5ee05614d2d9726b948f59a9ec7', GetSharedLinkFileError('shared_link_access_denied', None))
Is there a way to download this file?
If you have the shared link to the parent folder and not the specific file you want, you can use the /2/sharing/get_shared_link_file endpoint to download just the specific file.
In the Dropbox API v2 Python SDK, that's the sharing_get_shared_link_file method (or sharing_get_shared_link_file_to_file). Based on the error output you shared, it looks like you are already using that (though not in the particular code snippet you posted).
Using that would look like this:
import dropbox
dbx = dropbox.Dropbox(ACCESS_TOKEN)
folder_shared_link = "https://www.dropbox.com/sh/a630ksuyrtw33yo/LKExc-MKDKIIWJMLKFJ"
file_relative_path = "/db.dat"
res = dbx.sharing_get_shared_link_file(url=folder_shared_link, path=file_relative_path)
print("Metadata: %s" % res[0])
print("File data: %s bytes" % len(res[1].content))
(You mentioned both "db.dat" and "db.dta" in your question. Make sure you use whichever is actually correct.)
Additionally, note if you using a Dropbox API app registered with the "app folder" access type: there's currently a bug that can cause this shared_link_access_denied error when using this method with an access token for an app folder app.
The goal is to download GTFS data through python web scraping, starting with https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download
Currently, I'm using requests like so:
def download(url):
fpath = "prov/city/GTFS"
r = requests.get(url)
if r.ok:
print("Saving file.")
open(fpath, "wb").write(r.content)
else:
print("Download failed.")
The results of requests.content of the above url unfortunately renders the following:
You can see the files of interest within the output (e.g. stops.txt) but how might I access them to read/write?
I fear you're trying to read a zip file with a text editor, perhaps you should try using the "zipfile" module.
The following worked:
def download(url):
fpath = "path/to/output/"
f = requests.get(url, stream = True, headers = headers)
if f.ok:
print("Saving to {}".format(fpath))
g=open(fpath+'output.zip','wb')
g.write(f.content)
g.close()
else:
print("Download failed with error code: ", f.status_code)
You need to write this file into a zip.
import requests
url = "https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download"
fname = "gtfs.zip"
r = requests.get(url)
open(fname, "wb").write(r.content)
Now fname exists and has several text files inside. If you want to programmatically extract this zip and then read the content of a file, for example stops.txt, then you need first to extract a single file, or simply extractall.
import zipfile
# this will extract only a single file, and
# raise a KeyError if the file is missing from the archive
zipfile.ZipFile(fname).extract("stops.txt")
# this will extract all the files found from the archive,
# overwriting files in the process
zipfile.ZipFile(fname).extractall()
Now you just need to work with your file(s).
thefile = "stops.txt"
# just plain text
text = open(thefile).read()
# csv file
import csv
reader = csv.reader(open(thefile))
for row in reader:
...
There is an image on a webpage which I would like to save on my disk using python.
What I tried to do was
r=requests.get(url, timeout=60)
p=os.path.sep.join([args["output"],"{}.jpeg".format(str(total).zfill(5))])
f.write(r.content)
f.close()
But I realized that the file saved is not in image format as
$file name_of_file
00018.jpeg: HTML document, ASCII text, with very long lines, with no line terminators
Then I tried to:
r=requests.get(url, timeout=60)
p=os.path.sep.join([args["output"],"{}.jpeg".format(str(total).zfill(5))])
f=open(p, "wb")
i=r.raw
q=Image.open(BytesIO(r.content))
print(q.type)
f.write(i)
f.close()
But with no success. What should I do?
UPDATE:
r = requests.get(url, timeout=60)
# save the image to disk
p = os.path.sep.join([args["output"], "{}.jpeg".format(
str(total).zfill(5))])
with open("test.jpeg","wb+") as f:
f.write(requests.get("name_of_website",headers=headers).content)
f.close()
When I copied the image manually from the web using cursor, it was a jpg format.
This page needs cookie to do that:
If not,you can not visit it directly.
An easy way is add a cookie in your request header:
import requests
headers = {
"Cookie":"visid_incap_276192=vO9ugmNqRS+XGehZnF1jiwL8kl4AAAAAQUIPAAAAAADc6Z+46+Lp6X9DL0FUaSOv; incap_ses_627_276192=HgPZUq1t1yD2FURXnY2zCAL8kl4AAAAAyQ+1ZeYdSVzPTcurvHnlwA==; JSESSIONID=0001Zh35TV6HDxcVflnHMwIHsqe:-1801K8D; incap_ses_553_276192=XuxOZn9AsVOTcVuFwKasB3P9kl4AAAAAaxsIzIzT5BwV8RqhcTVPsw==",
}
with open("test.jpg","wb+") as f:
f.write(requests.get("https://www.e-zpassny.com/vector/jcaptcha.do",headers=headers).content)
Now it can download the image successfully:
I think you should do someting like this:
r = requests.get(url, timeout=60)
q = Image.open(BytesIO(r.content))
fp = os.path.join([args["output"], f"{str(total).zfill(5)}.jpeg"]) # here i used f-string because it looks more compact
q = q.save(fp)
Image.save() described here
F-strings it's way of formatting, it described here and here
I hope that it's helpful, have a good day!
EDIT:
Ok it looks like it doesn't work
So, you can try this from here:
r = requests.get(url, timeout=60)
bytes = BytesIO(r.content)
bytes.seek(0)
q = Image.ope(bytes)
fp = os.path.join([args["output"], f"{str(total).zfill(5)}.jpeg"]) # here i used f-string because it looks more compact
q = q.save(fp)
I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?
I am trying to automate downloading a .Z file from a website, but the file I get is 2kb when it should be around 700 kb and it contains a list of the contents of the page (ie: all the files available for download). I am able to download it manually without a problem. I have tried urllib and urllib2 and different configurations of each, but each does the same thing. I should add that the urlVar and fileName variables are generated in a different part of the code, but I have given an example of each here to demonstrate.
import urllib2
urlVar = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z"
fileName = txga1000.14d.Z
downFile = urllib2.urlopen(urlVar)
with open(fileName, "wb") as f:
f.write(downFile.read())
At least the urllib2documentation suggest you should use the Requestobject. This works with me:
import urllib2
req = urllib2.Request("ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z")
response = urllib2.urlopen(req)
data = response.read()
Data length seems to be 740725.
I was able to download what seems like the correct size for your file with the following python2 code:
import urllib2
filename = "txga1000.14d.Z"
url = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/{}".format(filename)
reply = urllib2.urlopen(url)
buf = reply.read()
with open(filename, "wb") as fh:
fh.write(buf)
Edit: The post above me was answered faster and is much better.. I thought I'd post since I tested and wrote this out anyways.