I've recently come across the functionality or the requests package of python (http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow) that allows to defer downloading the response body until you access the Response.content of a file, as told here :
https://stackoverflow.com/a/16696317/8376187
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
return local_filename
I use this to stream videos and since the headers of the file is present, my video player read the video smoothly.
I would like to do the same with an SSH/SFTP file transfer, i have tryied to use paramiko for that, but my code reads the file without getting the indexes and headers of the file making my video player fail and and is also very slow.
The code (assuming connected paramiko SSHClient() ) :
sftp_client = client.open_sftp()
remote_file = sftp_client.open('remotefile')
with open('localfile', 'wb') as f:
try:
data = remote_file.read(1024)
while (data):
f.write(data)
data = remote_file.read(1024)
finally:
remote_file.close()
Is there a way to reproduce the behavior of requests' "stream=True" option with an ssh/sftp transfer in python ?
Thanks :)
Related
This question already has answers here:
Download large file in python with requests
(8 answers)
Closed 4 years ago.
I'm trying to download a binary file and save it with its original name on the disk (linux).
Any ideas?
import requests
params = {'apikey': 'xxxxxxxxxxxxxxxxxxx', 'hash':'xxxxxxxxxxxxxxxxxxxxxxxxx'}
response = requests.get('https://www.test.com/api/file/download', params=params)
downloaded_file = response.content
if response.status_code == 200:
with open('/tmp/', 'wb') as f:
f.write(response.content)
From your clarification in the comments, your issue is that you want to keep the file's original name.
If the URL directs to the raw binary data, then the last part of the URL would be its "original name", hence you can get that by parsing the URL as follows:
local_filename = url.split('/')[-1]
To put this into practice, and considering the context of the question, here is the code that does exactly what you need, copied as it is from another SO question:
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return local_filename
Couldn't post this as a comment, so had to put it in an answer. I hope I have been clear enough. Tell me if you have any issues with the code. And when the issue is resolved, please also inform me so I can then delete this as it's already been answered.
EDIT
Here is a version for your code:
import requests
url = 'https://www.test.com/api/file/download'
params = {'apikey': 'xxxxxxxxxxxxxxxxxxx', 'hash':'xxxxxxxxxxxxxxxxxxxxxxxxx', 'stream':True}
response = requests.get(url, params=params)
local_filename = url.split('/')[-1]
totalbits = 0
if response.status_code == 200:
with open(local_filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
totalbits += 1024
print("Downloaded",totalbits*1025,"KB...")
f.write(chunk)
NOTE: if you don't want it to show progress, just remove the printstatement on line 15. This was tested using this url: https://imagecomics.com/uploads/releases/_small/DeadRabbit-02_cvr.jpg and it seemed to work pretty well. Again, if you have any issues, just comment down below.
I have a problem because I need to download a file using python but I cannot use the libraries urllib, urllib2 and urllib3 neither request
If someone can help me I thanks him a lot
you can use subprocess module and curl command
import subprocess
result = subprocess.run(['curl', 'https://www.google.com'], stdout=subprocess.PIPE)
# do with result
You need to do wget in terminal to get the package. Why can't you use these!? (I can't ugh comment)
Try using requests to get the data of the file and write it a path if you can't use urllib's package.
To download files with requests use this example from here
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return local_filename
Here's the aiohttp example for async functions:
import aiohttp
async def download_file(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
with open(filename, 'wb') as fd:
while True:
chunk = await resp.content.read(chunk_size)
if not chunk:
break
fd.write(chunk)
url = "URL"
await download_file(url)
I recommend using aiohttp if you don't can't use requests.
Thanks to all, finally I could solve my problem
I give you the solution that I used
import * from socket
s = socket(AF_INET, SOCK_STREAM)
s.connect((url, port))
file_mess="GET /" + step2[:5] + " HTTP/1.1\r\n\r\n"
s.send((file_mess).encode('utf-8'))
print(s.recv(1024).decode('utf-8'))
print(s.recv(1024).decode('utf-8'))
I used the second print for obtaining all the data, because with the first I only get a 200 OK
I receive a Chunked Encoding Error when running my script using python requests, and I am wondering why I could be receiving this error.
When I do resp.encoding I get it back None
I am currently not using any SSL certification verification in my script and so I get Insecure Requests Warnings as well.
for attachment in attachments:
if attachment['type'] == 'Form':
continue
# create insertcursor row
row = cursorPointData.newRow()
pprint(attachments)
for key in attachment:
pprint(key)
pprint(attachment[key])
row.setValue(key, attachment[key])
# Download file
guid = attachment['guid']
resp = requests.get(
'https:...' + guid,
headers={'Authorization': 'Bearer ' + token},
stream=True,
verify=False
)
contentType = resp.headers['Content-Type']
contentLength = int(resp.headers['content-length'])
pprint('contentLength = ' + str(contentLength))
extension = contentType.split('/')[1]
filename = '{0}.{1}'.format(guid, extension)
output_path = os.path.join(attachmentDir, filename)
attachmentPath = os.path.join("filepath", filename)
with open(output_path, 'wb') as f:
for chunk in resp.iter_content(chunk_size=None):
if chunk:
f.write(chunk)
f.flush() #flush the data to file
os.fsync(f.fileno()) #force the file write and free memory
row.setValue('attachmentPath', attachmentPath)
cursorPointData.insertRow(row)
del cursorPointData, row
I checked in the Network tab of Chrome developer tools, and I was getting a status 200. However in Firefox developer tools, I got a 301 status: "Moved Permanently".
It turns out I was putting in the wrong url to the request and needed to change the url to the updated version it was redirecting to.
Found out you can use response.history in the python requests library to find any redirection issues.
Now, I get back utf-8 as the response encoding instead of none.
Leaving this here in case anyone else runs into the same issue.
I'm trying to download a large file from a server with Python 2:
req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read()
The server sends data with "Transfer-Encoding: chunked" and I'm only getting some binary data, which cannot be unpacked by gunzip.
Do I have to iterate over multiple read()s? Or multiple requests? If so, how do they have to look like?
Note: I'm trying to solve the problem with only the Python 2 standard library, without additional libraries such as urllib3 or requests. Is this even possible?
From the python documentation on urllib2.urlopen:
One caveat: the read() method, if the size argument is omitted or
negative, may not read until the end of the data stream; there is no
good way to determine that the entire stream from a socket has been
read in the general case.
So, read the data in a loop:
req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read(8192)
while data:
# .. Do Something ..
data = rsp.read(8192)
If I'm not mistaken, the following worked for me - a while back:
data = ''
chunk = rsp.read()
while chunk:
data += chunk
chunk = rsp.read()
Each read reads one chunk - so keep on reading until nothing more's coming.
Don't have documenation ready supporting this...yet.
I have the same problem.
I found that "Transfer-Encoding: chunked" often appears with "Content-Encoding:
gzip".
So maybe we can get the compressed content and unzip it.
It works for me.
import urllib2
from StringIO import StringIO
import gzip
req = urllib2.Request(url)
req.add_header('Accept-encoding', 'gzip, deflate')
rsp = urllib2.urlopen(req)
if rsp.info().get('Content-Encoding') == 'gzip':
buf = StringIO(rsp.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
conn = httplib.HTTPConnection("www.encodable.com/uploaddemo/")
conn.request("POST", path, chunk, headers)
Above is the site "www.encodable.com/uploaddemo/" where I want to upload an image.
I am better versed in php so I am unable to understand the meaning of path and headers here. In the code above, chunk is an object consisting of my image file.
The following code produces an error as I was trying to implement without any knowledge of headers and path.
import httplib
def upload_image_to_url():
filename = '//home//harshit//Desktop//h1.jpg'
f = open(filename, "rb")
chunk = f.read()
f.close()
headers = {
"Content−type": "application/octet−stream",
"Accept": "text/plain"
}
conn = httplib.HTTPConnection("www.encodable.com/uploaddemo/")
conn.request("POST", "/uploaddemo/files/", chunk)
response = conn.getresponse()
remote_file = response.read()
conn.close()
print remote_file
upload_image_to_url()
Currently, you aren't using the headers you've declared earlier in the code. You should provide them as the fourth argument to conn.request:
conn.request("POST", "/uploaddemo/files/", chunk, headers)
Also, side note: you can pass open("h1.jpg", "rb") directly into conn.request without reading it fully into chunk first. conn.request accepts file-like objects and it will be more efficient to stream the file a little at a time:
conn.request("POST", "/uploaddemo/files/", open("h1.jpg", "rb"), headers)