This question already has answers here:
Download large file in python with requests
(8 answers)
Closed 4 years ago.
I'm trying to download a binary file and save it with its original name on the disk (linux).
Any ideas?
import requests
params = {'apikey': 'xxxxxxxxxxxxxxxxxxx', 'hash':'xxxxxxxxxxxxxxxxxxxxxxxxx'}
response = requests.get('https://www.test.com/api/file/download', params=params)
downloaded_file = response.content
if response.status_code == 200:
with open('/tmp/', 'wb') as f:
f.write(response.content)
From your clarification in the comments, your issue is that you want to keep the file's original name.
If the URL directs to the raw binary data, then the last part of the URL would be its "original name", hence you can get that by parsing the URL as follows:
local_filename = url.split('/')[-1]
To put this into practice, and considering the context of the question, here is the code that does exactly what you need, copied as it is from another SO question:
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return local_filename
Couldn't post this as a comment, so had to put it in an answer. I hope I have been clear enough. Tell me if you have any issues with the code. And when the issue is resolved, please also inform me so I can then delete this as it's already been answered.
EDIT
Here is a version for your code:
import requests
url = 'https://www.test.com/api/file/download'
params = {'apikey': 'xxxxxxxxxxxxxxxxxxx', 'hash':'xxxxxxxxxxxxxxxxxxxxxxxxx', 'stream':True}
response = requests.get(url, params=params)
local_filename = url.split('/')[-1]
totalbits = 0
if response.status_code == 200:
with open(local_filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
totalbits += 1024
print("Downloaded",totalbits*1025,"KB...")
f.write(chunk)
NOTE: if you don't want it to show progress, just remove the printstatement on line 15. This was tested using this url: https://imagecomics.com/uploads/releases/_small/DeadRabbit-02_cvr.jpg and it seemed to work pretty well. Again, if you have any issues, just comment down below.
Related
Well guys I come here in times of need, I've been trying to develop a batch and the first step of this batch would be downloanding a zipped file from the web, the first code that I tryied was this
import requests
url = "http://servicos.ibama.gov.br/ctf/publico/areasembargadas/ConsultaPublicaAreasEmbargadas.php"
save_path = "C:/Users/gb2gaet"
proxies = {
"I had to erase this for safety reasons",
}
r = requests.get(url, proxies=proxies, stream=True, verify=False )
handle = open('test.zip', "wb")
for chunk in r.iter_content(chunk_size=512):
if chunk:
handle.write(chunk)
handle.close()
it turns out that I get a zipped file that can't be oppened, after a long search I came accross a possible solution that would be something like this
import requests, zipfile, io
url = "http://servicos.ibama.gov.br/ctf/publico/areasembargadas/ConsultaPublicaAreasEmbargadas.php"
save_path = "C:/Users/gb2gaet"
proxies = {
you know
}
r = requests.get(url, proxies=proxies, stream=True, verify=False )
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(save_path)
but all I ended up getting was this error message
zipfile.BadZipFile: File is not a zip file
I'd be for ever gratefull if any of you guys could help me on this matter
from urllib.request import urlopen
open('Sample1.zip', 'wb').write(urlopen('Valid Url for Zip File').read())
I've recently come across the functionality or the requests package of python (http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow) that allows to defer downloading the response body until you access the Response.content of a file, as told here :
https://stackoverflow.com/a/16696317/8376187
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
return local_filename
I use this to stream videos and since the headers of the file is present, my video player read the video smoothly.
I would like to do the same with an SSH/SFTP file transfer, i have tryied to use paramiko for that, but my code reads the file without getting the indexes and headers of the file making my video player fail and and is also very slow.
The code (assuming connected paramiko SSHClient() ) :
sftp_client = client.open_sftp()
remote_file = sftp_client.open('remotefile')
with open('localfile', 'wb') as f:
try:
data = remote_file.read(1024)
while (data):
f.write(data)
data = remote_file.read(1024)
finally:
remote_file.close()
Is there a way to reproduce the behavior of requests' "stream=True" option with an ssh/sftp transfer in python ?
Thanks :)
I am very new to subprocess, and I have a hard debugging it without any error code.
I'm trying to automatically call an API which respond to :
http -f POST https://api-adresse.data.gouv.fr/search/csv/ columns=voie columns=ville data#path/to/file.csv > response_file.csv
I've tried various combination with subprocess.call, but I only manage to get "1" as an error code.
What is the correct way to format this call, knowing that the answer from the API has to go in a csv file, and that I send a csv (path after the #data)?
EDIT: Here are my attempts :
ret = subprocess.call(cmd,shell=True)
ret = subprocess.call(cmd.split(),shell=True)
ret = subprocess.call([cmd],shell=True)
The same with shell = False, and with stdout = myFileHandler (open inside a with open(file,"w") as myFileHandler:)
EDIT2 : still curious about the answer, but I managed to go around with Request, as #spectras suggested
file_path = "PATH/TO/OUTPUT/FILE.csv"
url = "https://api-adresse.data.gouv.fr/search/csv/"
files = {'data': open('PATH/TO/CSV/FILE.csv','rb')}
values = {'columns': 'Adresse', 'columns': 'Ville', 'postcode': 'CP'}
r = requests.post(url, files=files, data=values)
with open(file_path, "w") as myFh:
myFh.write(r.content)
Since you are attempting to send a form, may I suggest you do it straight from python?
import requests
with open('path/to/file', 'rb') as fd:
payload = fd.read()
r = requests.post(
'https://api-adresse.data.gouv.fr/search/csv/',
data=(
('columns', 'voie'),
('columns', 'ville'),
),
files={
'data': ('filename.csv', payload, 'text/csv'),
}
)
if r.status_code not in requests.codes.ok:
r.raise_for_status()
with open('response_file.csv', 'wb') as result:
result.write(r.content)
This uses the ubiquitous python-requests module, and especially the form file upload part of the documentation.
It's untested. Basically, I opened httpie documentation and converted your command line arguments into python-requests api arguments.
I'm trying to get all users information from GitHub API using Python Requests library. Here is my code:
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
r = requests.get(url, headers=headers)
users = r.json()
with open('users.json', 'w') as outfile:
json.dump(users, outfile)
I can dump first page of users into a json file by now. I can also find the 'next' page's url:
next_url = r.links['next'].get('url')
r2 = requests.get(next_url, headers=headers)
users2 = r2.json()
Since I don't know how many pages yet, how can I append 2nd, 3rd... page to 'users.json' sequentially in a while loop as fast as possible?
Thanks!
First, you need to open file in 'a' mode, otherwise subsequence write will overwrite everything
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
outfile = open('users.json', 'a')
while True:
r = requests.get(url, headers=headers)
users = r.json()
json.dump(users, outfile)
url = r.links['next'].get('url')
# I don't know what Github return in case there is no more users, so you need to double check by yourself
if url == '':
break
outfile.close()
Append the data you get from the requests query to a list and move on to the next query.
Once you have all of the data you want, then proceed to try to concatenate the data into a file or into an object. You can also use threading to do multiple queries in parallel, but most likely there is going to be rate limiting on the api.
I'm trying to download a large file from a server with Python 2:
req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read()
The server sends data with "Transfer-Encoding: chunked" and I'm only getting some binary data, which cannot be unpacked by gunzip.
Do I have to iterate over multiple read()s? Or multiple requests? If so, how do they have to look like?
Note: I'm trying to solve the problem with only the Python 2 standard library, without additional libraries such as urllib3 or requests. Is this even possible?
From the python documentation on urllib2.urlopen:
One caveat: the read() method, if the size argument is omitted or
negative, may not read until the end of the data stream; there is no
good way to determine that the entire stream from a socket has been
read in the general case.
So, read the data in a loop:
req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read(8192)
while data:
# .. Do Something ..
data = rsp.read(8192)
If I'm not mistaken, the following worked for me - a while back:
data = ''
chunk = rsp.read()
while chunk:
data += chunk
chunk = rsp.read()
Each read reads one chunk - so keep on reading until nothing more's coming.
Don't have documenation ready supporting this...yet.
I have the same problem.
I found that "Transfer-Encoding: chunked" often appears with "Content-Encoding:
gzip".
So maybe we can get the compressed content and unzip it.
It works for me.
import urllib2
from StringIO import StringIO
import gzip
req = urllib2.Request(url)
req.add_header('Accept-encoding', 'gzip, deflate')
rsp = urllib2.urlopen(req)
if rsp.info().get('Content-Encoding') == 'gzip':
buf = StringIO(rsp.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()