POST Binary (video) File Using Python Requests

POST Binary (video) File Using Python Requests - python

I have a working bit of PHP code that uploads a binary to a remote server I don't have shell access to. The PHP code is:
function upload($uri, $filename) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $uri);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array('file' => '#' . $filename));
curl_exec($ch);
curl_close($ch);
}
This results in a header like:
HTTP/1.1
Host: XXXXXXXXX
Accept: */*
Content-Length: 208045596
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------360aaccde050
I'm trying to port this over to python using requests and I cannot get the server to accept my POST. I have tried every which way to use requests.post, but the header will not mimic the above.
This will successfully transfer the binary to the server (can tell by watching wireshark) but because the header is not what the server is expecting it gets rejected. The response_code is a 200 though.
files = {'bulk_test2.mov': ('bulk_test2.mov', open('bulk_test2.mov', 'rb'))}
response = requests.post(url, files=files)
The requests code results in a header of:
HTTP/1.1
Host: XXXX
Content-Length: 160
Content-Type: multipart/form-data; boundary=250852d250b24399977f365f35c4e060
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.2.1 CPython/2.7.5 Darwin/13.1.0
--250852d250b24399977f365f35c4e060
Content-Disposition: form-data; name="bulk_test2.mov"; filename="bulk_test2.mov"
--250852d250b24399977f365f35c4e060--
Any thoughts on how to make requests match the header that the PHP code generates?

There are two large differences:
The PHP code posts a field named file, your Python code posts a field named bulk_test2.mov.
Your Python code posts an empty file. There Content-Length header is 160 bytes, exactly the amount of space the multipart boundaries and Content-Disposition part header take up. Either the bulk_test2.mov file is indeed empty, or you tried to post the file multiple times without rewinding or reopening the file object.
To fix the first problem, use 'file' as the key in your files dictionary:
files = {'file': open('bulk_test2.mov', 'rb')}
response = requests.post(url, files=files)
I used just the open file object as the value; requests will get the filename directly from the file object in that case.
The second issue is something only you can fix. Make sure you don't reuse files when repeatedly posting. Reopen, or use files['file'].seek(0) to rewind the read position back to the start.
The Expect: 100-continue header is an optional client feature that asks the server to confirm that the body upload can go ahead; it is not a required header and any failure to post your file object is not going to be due to requests using this feature or not. If an HTTP server were to misbehave if you don't use this feature, it is in violation of the HTTP RFCs and you'll have bigger problems on your hands. It certainly won't be something requests can fix for you.
If you do manage to post actual file data, any small variations in Content-Length are due to the (random) boundary being a different length between Python and PHP. This is normal, and not the cause of upload problems, unless your target server is extremely broken. Again, don't try to fix such brokenness with Python.
However, I'd assume you overlooked something much simpler. Perhaps the server blacklists certain User-Agent headers, for example. You could clear some of the default headers requests sets by using a Session object:
files = {'file': open('bulk_test2.mov', 'rb')}
session = requests.Session()
del session.headers['User-Agent']
del session.headers['Accept-Encoding']
response = session.post(url, files=files)
and see if that makes a difference.
If the server fails to handle your request because it fails to handle HTTP persistent connections, you could try to use the session as a context manager to ensure that all session connections are closed:
files = {'file': open('bulk_test2.mov', 'rb')}
with requests.Session() as session:
response = session.post(url, files=files, stream=True)
and you could add:
response.raw.close()
for good measure.

Related

How to send entire http request from file?

I am writing a small application that interprets the http response of a request. I am writing the application in python. I have not found anything that allows me to send the body + headers stored in one file. I can send certain parts like the headers but not the entire request.
For example, if the request is:
GET /index.html HTTP/1.1
Host: localhost
Cookie: bob=lemon
I want to send this entire request in one go. How would I do this in python?

Check out the python requests library. https://requests.readthedocs.io/en/master/user/quickstart/#make-a-request
For the request above it would look something like
import requests
url = 'http://localhost:[YOUR PORT HERE]/'
cookies = {bob : lemon}
r = requests.get(url, cookies=cookies)
To check if you had a successful request you should get a 200 code from.
r.status_code
Check out the library for more, it is very extensive.

Send data through POST request

I had been using sockets, with Python, for some time ago and I'm trying to understand why this POST which should send some data on fields data1 and data2 do not work.
POST /method.php HTTP/1.1\r\nHost: localhost\r\nContent-Type: multipart/form-data\r\n\r\ndata1=something&data2= otherthing\r\n\r\n
What is the problem with this request?

There are several things wrong with your request:
POST /method.php HTTP/1.1
Host: localhost
Content-Type: multipart/form-data
data1=something&data2= otherthing
First, whenever a body is used within a HTTP request the length of the body must be known. This is typically done by given the length up-front with Content-length in the HTTP header although also chunked encoding might be used if the full length is not known up front. Your request does not do any of these which means the request is an invalid HTTP request.
Additionally you claim a Content-Type of multipart/form-data although your body is not of this type. With multipart/form-data your body would consist of several MIME parts separated by a text boundary and this boundary would need to have been declared in your Content-type header. The correct type for the body you show would be instead application/x-www-form-urlencoded.
Even with application/x-www-form-urlencoded the body is partly wrong. This type of body should be only pairs of key=value concatenated by &, i.e. there should be neither as space after a key as you have after data2= nor there should be new lines added after the end of the data as you have.
When removing all these problems you should probably send the following request:
body = "data1=something&data2=otherthing"
request = ("POST /method.php HTTP/1.1\r\n" + \
"Host: localhost\r\n" + \
"Content-Type: application/x-www-form-urlencoded\r\n" + \
"Content-Length: %d\r\n" + \
"\r\n%s") % (len(body),body)
But once you have send this request the trouble continues since getting the response correctly is complex too. Generally I recommend to not code your own HTTP handling unless you really know what you do but instead use existing libraries. While HTTP might look simple when just looking at a few example requests it is way more complex than it initially looks. And while your code might seem to work against specific servers it might fail with other servers.

It might be easier to use the requests library so your code would look something like this:
import requests
# Data
data = {
'data1':'something',
'data2':'otherthing'
}
# Custom headers
headers = {
'content-type': 'multipart/form-data'
}
# Get response from server
response = requests.post('http://localhost/', data=data, headers=headers)
# If you care about the response
print(response.json())
You can also send files and a whole lot of other stuff

Have you tried using the Requests library instead, example of a post request below
import requests
header = {"Content-Type": "multipart/form-data"}
data1="something"
data2= "otherthing"
session_requests = requests.session()
result = session_requests.post("http://localhost/", data=dict(data1, data2), headers=header)

How to set up Accept-Encoding to gzip in Python client?

This is probably a very newbie question, but I'm reading HTTP Request library and I need to write a code that will make a request to a server and asks for gzip compression (since the server supports gzip compression).
For instance, I have:
import requests
r = requests.get('some_url')
r.json()
I know it has something to do with sending Accept-Encoding: gzip in the header of the HTTP request, but I'm not sure how to do that.

You should be able to set the header by using the header argument.
import requests
headers = {'Accept-Encoding': 'gzip'}
r = requests.get('some_url',headers=headers)
result = r.json()

Compressing request body with python-requests?

(This question is not about transparent decompression of gzip-encoded responses from a web server; I know that requests handles that automatically.)
Problem
I'm trying to POST a file to a RESTful web service. Obviously, requests makes this pretty easy to do:
files = dict(data=(fn, file))
response = session.post(endpoint_url, files=files)
In this case, my file is in a really highly-compressible format (yep, XML) so I'd like to make sure that the request body is compressed.
The server claims to accept gzip encoding (Accept-Encoding: gzip in response headers), so I should be able to gzip the whole body request body, right?
Attempted solution
Here's my attempt to make this work: I first construct the request and prepare it, then I go into the PreparedRequest object, yank out the body, run it through gzip, and put it back. (Oh, and don't forget to update the Content-Length and Content-Encoding headers.)
files = dict(data=(fn, file))
request = request.Request('POST',endpoint_url, files=files)
prepped = session.prepare_request(request)
with NamedTemporaryFile(delete=True) as gzfile:
gzip.GzipFile(fileobj=gzfile, mode="wb").write(prepped.body)
prepped.headers['Content-Length'] = gzfile.tell()
prepped.headers['Content-Encoding'] = 'gzip'
gzfile.seek(0,0)
prepped.body = gzfile.read()
response = session.send(prepped)
Unfortunately, the server is not cooperating and returns 500 Internal Server Error. Perhaps it doesn't really accept gzip-encoded requests?
Or perhaps there is a mistake in my approach? It seems rather convoluted. Is there an easier way to do request body compression with python-requests?
EDIT: Fixed (3) and (5) from #sigmavirus24's answer (these were basically just artifacts I'd overlooked in simplifying the code to post it here).

Or perhaps there is a mistake in my approach?
I'm unsure how you arrived at your approach, frankly, but there's certainly a simpler way of doing this.
First, a few things:
The files parameter constructs a multipart/form-data body. So you're compressing something that the server potentially has no clue about.
Content-Encoding and Transfer-Encoding are two very different things. You want Transfer-Encoding here.
You don't need to set a suffix on your NamedTemporaryFile.
Since you didn't explicitly mention that you're trying to compress a multipart/form-data request, I'm going to assume that you don't actually want to do that.
Your call to session.Request (which I assume should be, requests.Request) is missing a method, i.e., it should be: requests.Request('POST', endpoint_url, ...)
With those out of the way, here's how I would do this:
# Assuming `file` is a file-like obj
with NamedTemporaryFile(delete=True) as gzfile:
gzip.GzipFile(fileobj=gzfile, mode="wb").write(file.read())
headers = {'Content-Length': str(gzfile.tell()),
'Transfer-Encoding': 'gzip'}
gzfile.seek(0, 0)
response = session.post(endpoint_url, data=gzfile,
headers=headers)
Assuming that file has the xml content in it and all you meant was to compress it, this should work for you. You probably want to set a Content-Type header though, for example, you'd just do
headers = {'Content-Length': gzfile.tell(),
'Content-Type': 'application/xml', # or 'text/xml'
'Transfer-Encoding': 'gzip'}
The Transfer-Encoding tells the server that the request is being compressed only in transit and it should uncompress it. The Content-Type tells the server how to handle the content once the Transfer-Encoding has been handled.

I had a question that was marked as an exact duplicate. I was concernd with both ends of the transaction.
The code from sigmavirus24 wasn't a direct cut and paste fix, but it was the inspiration for this version.
Here's how my solution ended up looking:
sending from the python end
import json
import requests
import StringIO
import gzip
url = "http://localhost:3000"
headers = {"Content-Type":"application/octet-stream"}
data = [{"key": 1,"otherKey": "2"},
{"key": 3,"otherKey": "4"}]
payload = json.dumps(data)
out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
f.write(json.dumps(data))
out.getvalue()
r = requests.post(url+"/zipped", data=out.getvalue(), headers=headers)
receiving at the express end
var zlib = require("zlib");
var rawParser = bodyParser.raw({type: '*/*'});
app.post('/zipped', rawParser, function(req, res) {
zlib.gunzip(req.body, function(err, buf) {
if(err){
console.log("err:", err );
} else{
console.log("in the inflate callback:",
buf,
"to string:", buf.toString("utf8") );
}
});
res.status(200).send("I'm in ur zipped route");
});
There's a gist here with more verbose logging included. This version doesn't have any safety or checking built in either.

Upload a large XML file with Python Requests library

I'm trying to replace curl with Python & the requests library. With curl, I can upload a single XML file to a REST server with the curl -T option. I have been unable to do the same with the requests library.
A basic scenario works:
payload = '<person test="10"><first>Carl</first><last>Sagan</last></person>'
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=payload, headers=headers, auth=HTTPDigestAuth("*", "*"))
When I change payload to a bigger string by opening an XML file, the .put method hangs (I use the codecs library to get a proper unicode string). For example, with a 66KB file:
xmlfile = codecs.open('trb-1996-219.xml', 'r', 'utf-8')
headers = {'content-type': 'application/xml'}
content = xmlfile.read()
r = requests.put(url, data=content, headers=headers, auth=HTTPDigestAuth("*", "*"))
I've been looking into using the multipart option (files), but the server doesn't seem to like that.
So I was wondering if there is a way to simulate curl -T behaviour in Python requests library.
UPDATE 1:
The program hangs in textmate, but throws an UnicodeEncodeError error on the commandline. Seems that must be the problem. So the question would be: is there a way to send unicode strings to a server with the requests library?
UPDATE 2:
Thanks to the comment of Martijn Pieters the UnicodeEncodeError went away, but a new issue turned up.
With a literal (ASCII) XML string, logging shows the following lines:
2012-11-11 15:55:05,154 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:55:05,294 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:55:05,430 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 201 0
Seems the server always bounces the first authentication attempt (?) but then accepts the second one.
With a file object (open('trb-1996-219.xml', 'rb')) passed to data, the logfile shows:
2012-11-11 15:50:54,309 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:50:55,105 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:51:25,603 WARNING Retrying (0 attempts remain) after connection broken by 'BadStatusLine("''",)': /v1/documents?uri=/example/test.xml
So, first attempt is blocked as before, but no second attempt is made.
According to Martijn Pieters (below), the second issue can be explained by a faulty server (empty line).
I will look into this, but if someone has a workaround (apart from using curl) I wouldn't mind hearing it.
And I am still surprised that the requests library behaves so differently for small string and file object. Isn't the file object serialized before it gets to the server anyway?

To PUT large files, don't read them into memory. Simply pass the file as the data keyword:
xmlfile = open('trb-1996-219.xml', 'rb')
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=xmlfile, headers=headers, auth=HTTPDigestAuth("*", "*"))
Moreover, you were opening the file as unicode (decoding it from UTF-8). As you'll be sending it to a remote server, you need raw bytes, not unicode values, and you should open the file as a binary instead.

Digest authentication always requires you to make at least two request to the server. The first request doesn't contain any authentication data. This first request will fail with a 401 "Authorization required" response code and a digest challenge (called a nounce) to be used for hashing your password etc. (the exact details don't matter here). This is used to make a second request to the server containing your credentials hashed with the challenge.
The problem is in the this two step authentication: your large file was already send with the first unauthorized request (send in vain) but on the second request the file object is already at the EOF position. Since the file size was also send in the Content-length header of the second request, this causes the server to wait for a file that will never be send.
You could solve it using a requests Session and first make a simple request for authentication purposes (say a GET request). Then make a second PUT request containing the actual payload using the same digest challenge form the first request.
sess = requests.Session()
sess.auth = HTTPDigestAuth("*", "*")
sess.get(url)
headers = {'content-type': 'application/xml'}
with codecs.open('trb-1996-219.xml', 'r', 'utf-8') as xmlfile:
sess.put(url, data=xmlfile, headers=headers)

i used requests in python to upload an XML file using the commands.
first to open the file use open()
file = open("PIR.xsd")
fragment = file.read()
file.close()
copy the data of XML file in the payload of the requests and post it
payload = {'key':'PFAkrzjmuZR957','xmlFragment':fragment}
r = requests.post(URL,data=payload)
to check the html validation code
print (r.text)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.