Download the middle of a file using python requests - python

What I am trying to accomplish is to download a specific portion of a video file using python. Sort of what a browser will do when playing a video. If the file is 1000 Bytes, I want to download from byte 200 to 700. I know that I can download the file in parts using the method below:
file_ = open(filename, 'wb')
res = requests.get(url, stream=True)
for chunk in res.iter_content(amount):
file_.write(chunk)
file_.close()
How can I modify this code to accomplish that?

The server has to support this:
If Accept-Ranges is present in HTTP responses (and its value isn't
none), the server supports range requests. You can check this by
issuing a HEAD request.
If the server supports it you can request the part as
curl http://i.imgur.com/z4d4kWk.jpg -i -H "Range: bytes=0-1023"
Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests
Also take a look at this.

Related

Send POST request with Python that generates a download and download the file

There's a website that has a button which downloads an Excel file. After I click, it takes around 20 seconds for the server API to generate the file and send it back to my browser for download.
If I monitor the communication after I click the button, I can see how the browser sends a POST request to a server with a series of headers and form values.
Is there a way that I can simulate a similar POST request programmatically using Python, and retrieve the Excel file after the server sends it over?
Thank you in advance
The requests module is used for sending all kinds of request types.
requests.post sends the post requests synchronously.
The payload data can be set using data=
The response can be accessed using .content.
Be sure to check the .status_code and only save on a successful response code
Also note the use of "wb" inside open, because we want to save the file as a binary instead of text.
Example:
import requests
payload = {"dao":"SampleDAO",
"condigId": 1,
...}
r = requests.post("http://url.com/api", data=payload)
if r.status_code == 200:
with open("file.save","wb") as f:
f.write(r.content)
Requests Documentation
I guess You could similarly do this:
file_info = request.get(url)
with open('file_name.extension', 'wb') as file:
file.write(file_info.content)
I honestly do not know how to explain this tho since I have little understanding how it works

how to get file size without downloading it in python requests

surely you have used a download manager. they detect and show the length of the file without downloading it
I know i can do this:
import requests
resp = requests.get("https://Whereever.user.wants.com/THEFILE.zip")
print(f"your file has {resp.headers['content-length'] \ 1048576}.")
...
but get downloads the content (THEFILE). so i can tell use the length after download.
how to do that before download in python?
Thanks for detailed-answer
Instead of using GET request, do HEAD request:
resp = requests.request('HEAD', "https://Whereever.user.wants.com/THEFILE.zip")
The HTTP HEAD method requests the headers that would be returned if the HEAD request's URL was instead requested with the HTTP GET method. In your case, where URL produces a large download, a HEAD request would read its Content-Length header to get the filesize without actually downloading the file.

How do I use requests.put() to upload a file using Python?

I am trying to use the requests library in Python to upload a file into Fedora commons repository on localhost. I'm fairly certain my main problem is not understanding open() / read() and what I need to do to send data with an http request.
def postBinary(fileName,dirPath,url):
path = dirPath+'/'+fileName
print('to ' + url + '\n' + path)
openBin = {'file':(fileName,open(path,'rb').read())}
headers = {'Slug': fileName} #not important
r = requests.put(url, files=openBin,headers=headers, auth=HTTPBasicAuth('username', 'pass'))
print(r.text)
print("and the url used:")
print(r.url)
This will successfully upload a file in the repository, but it will be slightly larger and corrupted after. For example an image that was 6.6kb became 6.75kb and was not openable anymore.
So how should I properly open and upload a file using put in python?
###Extra details:###
When I replace files=openBin with data=openBin I end up with my dictionary and I presume the data as a string. I don't know if that information is helpful or not.
"file=FILE_NAME.extension&file=TYPE89a%24%02Q%03%E7%FF%00E%5B%19%FC%....
and the size of the file increases to a number of megabytes
I am using specifically put because the Fedora RESTful HTTP API end point says to use put.
The following command does work:
curl -u username:password -H "Content-Type: text/plain" -X PUT -T /path/to/someFile.jpeg http://localhost:8080/fcrepo/rest/someFile.jpeg
Updated
Using requests.put() with the files parameter sends a multipart/form-data encoded request which the server does not seem to be able to handle without corrupting the data, even when the correct content type is declared.
The curl command simply performs a PUT with the raw data contained in the body of the request. You can create a similar request by passing the file data in the data parameter. Specify the content type in the header:
headers = {'Content-type': 'image/jpeg', 'Slug': fileName}
r = requests.put(url, data=open(path, 'rb'), headers=headers, auth=('username', 'pass'))
You can vary the Content-type header to suit the payload as required.
Try setting the Content-type for the file.
If you are sure that it is a text file then try text/plain which you used in your curl command - even though you would appear to be uploading a jpeg file? However, for a jpeg image, you should use image/jpeg.
Otherwise for arbitrary binary data you can use application/octet-stream:
openBin = {'file': (fileName, open(path,'rb'), 'image/jpeg' )}
Also it is not necessary to explicitly read the file contents in your code, requests will do that for you, so just pass the open file handle as shown above.

Python requests post a file

Using CURL I can post a file like
CURL -X POST -d "pxeconfig=`cat boot.txt`" https://ip:8443/tftp/syslinux
My file looks like
$ cat boot.txt
line 1
line 2
line 3
I am trying to achieve the same thing using requests module in python
r=requests.post(url, files={'pxeconfig': open('boot.txt','rb')})
When I open the file on server side, the file contains
{:filename=>"boot.txt", :type=>nil, :name=>"pxeconfig",
:tempfile=>#<Tempfile:/tmp/RackMultipart20170405-19742-1cylrpm.txt>,
:head=>"Content-Disposition: form-data; name=\"pxeconfig\";
filename=\"boot.txt\"\r\n"}
Please suggest how I can achieve this.
Your curl request sends the file contents as form data, as opposed to an actual file! You probably want something like
with open('boot.txt', 'rb') as f:
r = requests.post(url, data={'pxeconfig': f.read()})
The two actions you are performing are not the same.
In the first: you explicitly read the file using cat and pass it to curl instructing it to use it as the value of a header pxeconfig.
Whereas, in the second example you are using multipart file uploading which is a completely different thing. The server is supposed to parse the received file in that case.
To obtain the same behavior as the curl command you should do:
requests.post(url, data={'pxeconfig': open('file.txt').read()})
For contrast the curl request if you actually wanted to send the file multipart encoded is like this:
curl -F "header=#filepath" url
with open('boot.txt', 'rb') as f: r = requests.post(url, files={'boot.txt': f})
You would probably want to do something like that, so that the files closes afterwards also.
Check here for more: Send file using POST from a Python script

Why is it necessary to use a stream to download an image through HTTP GET?

Here is a body of code that works, taken from: https://stackoverflow.com/a/18043472
It uses the requests module in python to download an image.
import requests, shutil
url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
Two questions I've been thinking about:
1) Why is it necessary to set stream=True? (I've tested it without that parameter and the image is blank) Conceptually, I don't understand what a streaming GET request is.
2) What's the difference between a raw response and a response? (Why is shutil.copyfileobj necessary, why can't I just directly write to file?)
Thanks!
Quote from documentation:
If you set stream to True when making a request, Requests cannot
release the connection back to the pool unless you consume all the
data or call Response.close.
More info here.

Categories

Resources