Compressing request body with python-requests?

Compressing request body with python-requests? - python

(This question is not about transparent decompression of gzip-encoded responses from a web server; I know that requests handles that automatically.)
Problem
I'm trying to POST a file to a RESTful web service. Obviously, requests makes this pretty easy to do:
files = dict(data=(fn, file))
response = session.post(endpoint_url, files=files)
In this case, my file is in a really highly-compressible format (yep, XML) so I'd like to make sure that the request body is compressed.
The server claims to accept gzip encoding (Accept-Encoding: gzip in response headers), so I should be able to gzip the whole body request body, right?
Attempted solution
Here's my attempt to make this work: I first construct the request and prepare it, then I go into the PreparedRequest object, yank out the body, run it through gzip, and put it back. (Oh, and don't forget to update the Content-Length and Content-Encoding headers.)
files = dict(data=(fn, file))
request = request.Request('POST',endpoint_url, files=files)
prepped = session.prepare_request(request)
with NamedTemporaryFile(delete=True) as gzfile:
gzip.GzipFile(fileobj=gzfile, mode="wb").write(prepped.body)
prepped.headers['Content-Length'] = gzfile.tell()
prepped.headers['Content-Encoding'] = 'gzip'
gzfile.seek(0,0)
prepped.body = gzfile.read()
response = session.send(prepped)
Unfortunately, the server is not cooperating and returns 500 Internal Server Error. Perhaps it doesn't really accept gzip-encoded requests?
Or perhaps there is a mistake in my approach? It seems rather convoluted. Is there an easier way to do request body compression with python-requests?
EDIT: Fixed (3) and (5) from #sigmavirus24's answer (these were basically just artifacts I'd overlooked in simplifying the code to post it here).

Or perhaps there is a mistake in my approach?
I'm unsure how you arrived at your approach, frankly, but there's certainly a simpler way of doing this.
First, a few things:
The files parameter constructs a multipart/form-data body. So you're compressing something that the server potentially has no clue about.
Content-Encoding and Transfer-Encoding are two very different things. You want Transfer-Encoding here.
You don't need to set a suffix on your NamedTemporaryFile.
Since you didn't explicitly mention that you're trying to compress a multipart/form-data request, I'm going to assume that you don't actually want to do that.
Your call to session.Request (which I assume should be, requests.Request) is missing a method, i.e., it should be: requests.Request('POST', endpoint_url, ...)
With those out of the way, here's how I would do this:
# Assuming `file` is a file-like obj
with NamedTemporaryFile(delete=True) as gzfile:
gzip.GzipFile(fileobj=gzfile, mode="wb").write(file.read())
headers = {'Content-Length': str(gzfile.tell()),
'Transfer-Encoding': 'gzip'}
gzfile.seek(0, 0)
response = session.post(endpoint_url, data=gzfile,
headers=headers)
Assuming that file has the xml content in it and all you meant was to compress it, this should work for you. You probably want to set a Content-Type header though, for example, you'd just do
headers = {'Content-Length': gzfile.tell(),
'Content-Type': 'application/xml', # or 'text/xml'
'Transfer-Encoding': 'gzip'}
The Transfer-Encoding tells the server that the request is being compressed only in transit and it should uncompress it. The Content-Type tells the server how to handle the content once the Transfer-Encoding has been handled.

I had a question that was marked as an exact duplicate. I was concernd with both ends of the transaction.
The code from sigmavirus24 wasn't a direct cut and paste fix, but it was the inspiration for this version.
Here's how my solution ended up looking:
sending from the python end
import json
import requests
import StringIO
import gzip
url = "http://localhost:3000"
headers = {"Content-Type":"application/octet-stream"}
data = [{"key": 1,"otherKey": "2"},
{"key": 3,"otherKey": "4"}]
payload = json.dumps(data)
out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
f.write(json.dumps(data))
out.getvalue()
r = requests.post(url+"/zipped", data=out.getvalue(), headers=headers)
receiving at the express end
var zlib = require("zlib");
var rawParser = bodyParser.raw({type: '*/*'});
app.post('/zipped', rawParser, function(req, res) {
zlib.gunzip(req.body, function(err, buf) {
if(err){
console.log("err:", err );
} else{
console.log("in the inflate callback:",
buf,
"to string:", buf.toString("utf8") );
}
});
res.status(200).send("I'm in ur zipped route");
});
There's a gist here with more verbose logging included. This version doesn't have any safety or checking built in either.

Related

Send data through POST request

I had been using sockets, with Python, for some time ago and I'm trying to understand why this POST which should send some data on fields data1 and data2 do not work.
POST /method.php HTTP/1.1\r\nHost: localhost\r\nContent-Type: multipart/form-data\r\n\r\ndata1=something&data2= otherthing\r\n\r\n
What is the problem with this request?

There are several things wrong with your request:
POST /method.php HTTP/1.1
Host: localhost
Content-Type: multipart/form-data
data1=something&data2= otherthing
First, whenever a body is used within a HTTP request the length of the body must be known. This is typically done by given the length up-front with Content-length in the HTTP header although also chunked encoding might be used if the full length is not known up front. Your request does not do any of these which means the request is an invalid HTTP request.
Additionally you claim a Content-Type of multipart/form-data although your body is not of this type. With multipart/form-data your body would consist of several MIME parts separated by a text boundary and this boundary would need to have been declared in your Content-type header. The correct type for the body you show would be instead application/x-www-form-urlencoded.
Even with application/x-www-form-urlencoded the body is partly wrong. This type of body should be only pairs of key=value concatenated by &, i.e. there should be neither as space after a key as you have after data2= nor there should be new lines added after the end of the data as you have.
When removing all these problems you should probably send the following request:
body = "data1=something&data2=otherthing"
request = ("POST /method.php HTTP/1.1\r\n" + \
"Host: localhost\r\n" + \
"Content-Type: application/x-www-form-urlencoded\r\n" + \
"Content-Length: %d\r\n" + \
"\r\n%s") % (len(body),body)
But once you have send this request the trouble continues since getting the response correctly is complex too. Generally I recommend to not code your own HTTP handling unless you really know what you do but instead use existing libraries. While HTTP might look simple when just looking at a few example requests it is way more complex than it initially looks. And while your code might seem to work against specific servers it might fail with other servers.

It might be easier to use the requests library so your code would look something like this:
import requests
# Data
data = {
'data1':'something',
'data2':'otherthing'
}
# Custom headers
headers = {
'content-type': 'multipart/form-data'
}
# Get response from server
response = requests.post('http://localhost/', data=data, headers=headers)
# If you care about the response
print(response.json())
You can also send files and a whole lot of other stuff

Have you tried using the Requests library instead, example of a post request below
import requests
header = {"Content-Type": "multipart/form-data"}
data1="something"
data2= "otherthing"
session_requests = requests.session()
result = session_requests.post("http://localhost/", data=dict(data1, data2), headers=header)

Header info being written into file when PUT-ing to a Webdav server

I have the following code:
r = requests.put(
config.get('webdav', 'url') + file_name,
auth=(
config.get('webdav', 'username'),
config.get('webdav', 'password')
),
files={
"files": open(os.path.expanduser(charges_file_path), 'rb')
}
)
Which is fairly straightforward. It simply calls a PUT request to a webdav server, and pushes the data that is in files (plain text) to the server.
It works, except for a strange (or maybe not so strange if I am just missing something small) issue. When I do a GET on the file, or the file is viewed on the server directly, the file itself contains header information:
--55e72d74a10b423590cd4faa68212192
Content-Disposition: form-data; name="files"; filename="test_file6.txt"
(file_data)
--55e72d74a10b423590cd4faa68212192--
I haven't been able to find a reason or way around this. When I cURL the file from command line, it works fine.
Any ideas?

I am not really familiar with how Python requests works, but after reading through some docs and finding a similar issue someone had with sending files to Zendesk (this post), you might want to try using the data (or json) parameter instead of files in your request. Also, maybe attaching a params with filename if that's applicable here as well similar to the post I linked.
Another thing to do would be to put a Content-Type header on this request.
i.e.
requests.put(
...,
headers={'Content-Type': 'application/binary'},
data=open(os.path.expanduser(charges_file_path), 'rb').read()
)

Make an http POST request to upload a file using Python urllib/urllib2

I would like to make a POST request to upload a file to a web service (and get response) using Python. For example, I can do the following POST request with curl:
curl -F "file=#style.css" -F output=json http://jigsaw.w3.org/css-validator/validator
How can I make the same request with python urllib/urllib2? The closest I got so far is the following:
with open("style.css", 'r') as f:
content = f.read()
post_data = {"file": content, "output": "json"}
request = urllib2.Request("http://jigsaw.w3.org/css-validator/validator", \
data=urllib.urlencode(post_data))
response = urllib2.urlopen(request)
I got a HTTP Error 500 from the code above. But since my curl command succeeds, it must be something wrong with my python request?
I am quite new to this topic and my question may have very simple answers or mistakes.

Personally I think you should consider the requests library to post files.
url = 'http://jigsaw.w3.org/css-validator/validator'
files = {'file': open('style.css')}
response = requests.post(url, files=files)
Uploading files using urllib2 is not impossible but quite a complicated task: http://pymotw.com/2/urllib2/#uploading-files

After some digging around, it seems this post solved my problem. It turns out I need to have the multipart encoder setup properly.
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2
register_openers()
with open("style.css", 'r') as f:
datagen, headers = multipart_encode({"file": f})
request = urllib2.Request("http://jigsaw.w3.org/css-validator/validator", \
datagen, headers)
response = urllib2.urlopen(request)

Well, there are multiple ways to do it. As mentioned above, you can send the file in "multipart/form-data". However, the target service may not be expecting this type, in which case you may try some more approaches.
Pass the file object
urllib2 can accept a file object as data. When you pass this type, the library reads the file as a binary stream and sends it out. However, it will not set the proper Content-Type header. Moreover, if the Content-Length header is missing, then it will try to access the len property of the object, which doesn't exist for the files. That said, you must provide both the Content-Type and the Content-Length headers to have the method working:
import os
import urllib2
filename = '/var/tmp/myfile.zip'
headers = {
'Content-Type': 'application/zip',
'Content-Length': os.stat(filename).st_size,
}
request = urllib2.Request('http://localhost', open(filename, 'rb'),
headers=headers)
response = urllib2.urlopen(request)
Wrap the file object
To not deal with the length, you may create a simple wrapper object. With just a little change you can adapt it to get the content from a string if you have the file loaded in memory.
class BinaryFileObject:
"""Simple wrapper for a binary file for urllib2."""
def __init__(self, filename):
self.__size = int(os.stat(filename).st_size)
self.__f = open(filename, 'rb')
def read(self, blocksize):
return self.__f.read(blocksize)
def __len__(self):
return self.__size
Encode the content as base64
Another way is encoding the data via base64.b64encode and providing Content-Transfer-Type: base64 header. However, this method requires support on the server side. Depending on the implementation, the service can either accept the file and store it incorrectly, or return HTTP 400. E.g. the GitHub API won't throw an error, but the uploaded file will be corrupted.

Is it possible to use gzip compression with Server-Sent Events (SSE)?

I would like to know if it is possible to enable gzip compression
for Server-Sent Events (SSE ; Content-Type: text/event-stream).
It seems it is possible, according to this book:
http://chimera.labs.oreilly.com/books/1230000000545/ch16.html
But I can't find any example of SSE with gzip compression. I tried to
send gzipped messages with the response header field
Content-Encoding set to "gzip" without success.
For experimenting around SSE, I am testing a small web application
made in Python with the bottle framework + gevent ; I am just running
the bottle WSGI server:
#bottle.get('/data_stream')
def stream_data():
bottle.response.content_type = "text/event-stream"
bottle.response.add_header("Connection", "keep-alive")
bottle.response.add_header("Cache-Control", "no-cache")
bottle.response.add_header("Content-Encoding", "gzip")
while True:
# new_data is a gevent AsyncResult object,
# .get() just returns a data string when new
# data is available
data = new_data.get()
yield zlib.compress("data: %s\n\n" % data)
#yield "data: %s\n\n" % data
The code without compression (last line, commented) and without gzip
content-encoding header field works like a charm.
EDIT: thanks to the reply and to this other question: Python: Creating a streaming gzip'd file-like?, I managed to solve the problem:
#bottle.route("/stream")
def stream_data():
compressed_stream = zlib.compressobj()
bottle.response.content_type = "text/event-stream"
bottle.response.add_header("Connection", "keep-alive")
bottle.response.add_header("Cache-Control", "no-cache, must-revalidate")
bottle.response.add_header("Content-Encoding", "deflate")
bottle.response.add_header("Transfer-Encoding", "chunked")
while True:
data = new_data.get()
yield compressed_stream.compress("data: %s\n\n" % data)
yield compressed_stream.flush(zlib.Z_SYNC_FLUSH)

TL;DR: If the requests are not cached, you likely want to use zlib and declare Content-Encoding to be 'deflate'. That change alone should make your code work.
If you declare Content-Encoding to be gzip, you need to actually use gzip. They are based on the the same compression algorithm, but gzip has some extra framing. This works, for example:
import gzip
import StringIO
from bottle import response, route
#route('/')
def get_data():
response.add_header("Content-Encoding", "gzip")
s = StringIO.StringIO()
with gzip.GzipFile(fileobj=s, mode='w') as f:
f.write('Hello World')
return s.getvalue()
That only really makes sense if you use an actual file as a cache, though.

There's also middleware you can use so you don't need to worry about gzipping responses for each of your methods. Here's one I used recently.
https://code.google.com/p/ibkon-wsgi-gzip-middleware/
This is how I used it (I'm using bottle.py with the gevent server)
from gzip_middleware import Gzipper
import bottle
app = Gzipper(bottle.app())
run(app = app, host='0.0.0.0', port=8080, server='gevent')
For this particular library, you can set w/c types of responses you want to compress by modifying the DEFAULT_COMPRESSABLES variable for example
DEFAULT_COMPRESSABLES = set(['text/plain', 'text/html', 'text/css',
'application/json', 'application/x-javascript', 'text/xml',
'application/xml', 'application/xml+rss', 'text/javascript',
'image/gif'])
All responses go through the middleware and get gzipped without modifying your existing code. By default, it compresses responses whose content-type belongs to DEFAULT_COMPRESSABLES and whose content-length is greater than 200 characters.

multipart data POST using python requests: no multipart boundary was found

I have a form-data as well as file to be sent in the same POST. For ex, {duration: 2000, file: test.wav}. I saw the many threads here on multipart/form-data posting using python requests. They were useful, especially this one.
My sample request is as below:
files = {'file': ('wavfile', open(filename, 'rb'))}
data = {'duration': duration}
headers = {'content-type': 'multipart/form-data'}
r = self.session.post(url, files=files, data=data, headers=headers)
But when I execute the above code, I get this error:
5:59:55.338 Dbg 09900 [DEBUG] Resolving exception from handler [null]: org.springframework.web.multipart.MultipartException: Could not parse multipart servlet request; nested exception is org.apache.commons.fileupload.FileUploadException: the request was rejected because no multipart boundary was found.
So my questions are: 1) How can I see the content of the request being sent? Couldn't use wireshark, its not across the network.
2) why is the boundary missing in the encoded data? Did I miss anything, please point out.

You should NEVER set that header yourself. We set the header properly with the boundary. If you set that header, we won't and your server won't know what boundary to expect (since it is added to the header). Remove your custom Content-Type header and you'll be fine.

Taking out the Content-Type header with explicit "multipart/form-data" worked!

To specifically add boundary add following in header :
headers = {
'content-type': 'multipart/form-data; boundary=ebf9f03029db4c2799ae16b5428b06bd'
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.