POST multipart/form data to site with checkboxes via python - python

I'm relatively new to python and computers in general. Currently I'm trying to post data to a website, namely http://www.camp.bicnirrh.res.in/featcalc/, and select four checkboxes after uploading a file which contains the data to be analyzed.
So far, this is what I've tried.
def encode_multipart_formdata(fields, files, data):
"""
fields is a sequence of (name, value) elements for regular form fields.
files is a sequence of (name, filename, value) elements for data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
BOUNDARY = '-----------------------------7de18336272e32'
CRLF = '\r\n'
L = []
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="seq"')
L.append('')
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (files[0], files[1]))
L.append('Content-Type: text/plain')
L.append('')
L.append(data)
L.append('')
for (key, value) in fields:
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="%s"' % key)
L.append('')
L.append(value)
L.append('--' + BOUNDARY + '--')
L.append('')
body = CRLF.join(L)
content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
return content_type, body
For this code, data is the file that's been opened, read, and closed, files is the filename and the full filename - ex ('Practice', 'Practice.txt')
This returned what I thought was a good format. But when I tried to post the encrypted data using urllib2, urllib2 request, and urlopen, I got something that looked like the source code of the results page, but didn't have any of the data that I needed (ie no values). I tried this as well just to see if it would work.
files = {'file': ('Practice.txt', open('Practice.txt', 'rb'))}
r = requests.post(url, files=files)
r.text
The problem is, I think, the page requires that you select a checkbox, and I have no idea how to do that in a post request. I was thinking about trying to use a cgi script next, but I'm literally running out of ideas.
Any help would be greatly appreciated! Thank you!

The requests module will be most useful here, you need to set data and files
files = {'file': ('Practice.txt', open('Practice.txt', 'rb'))}
data = {'amino':'amino', 'aliphatic': 'aliphatic'}
r = requests.post(url, files=files, data=data)
r.text

Related

Wrong content length when sending a file

I am having the problem that the requests library for some reason is making the payload bigger and causing issues. I enabled http logging, and in in the output I can see the content length being 50569, not 50349, as the actual file size should indicate.
send: b'POST /api/1/images HTTP/1.1\r\nHost: localhost:8000\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAc
cept: application/json\r\nConnection: keep-alive\r\nAuthorization: Bearer 28956340ba9c7e25b49085b4d273522b\r\ncontent-type: image/png\r\n
Content-Length: 50569\r\n\r\n'
send: b'--ac9e15d6d3aa3a77506c2daccca2ee47\r\nContent-Disposition: form-data; name="0007-Alternate-Lateral-Pulldown_back-STEP1"; filename
="0007-Alternate-Lateral-Pulldown_back-STEP1.png"\r\n\r\n\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\xf0\x00\x00\x00\xf0\x08\x06\x00\
x00\x00>U\xe9\x92\x00\x00\x00\tpHYs\x00\x00\x0b\x13\x00\x00\x0b\x13\x01\x00\x9a\x9c\x18\x00\x00FMiTXtXML:com.adobe.xmp\x00\x00\x00\x00\x0
0<?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?>\n<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.5-c021 79.
155772, 2014/01/13-19:44:00 ">\n <rdf:RDF xmlns:rdf="http:
Chrome has exactly the same headers when sending this, but the correct content length, so I am assuming this is why my server complains of a invalid image being sent.
This is my code
url = self.server + "/api/1/images";
headers = self.default_headers()
headers['content-type'] = 'image/png'
# neither of these are actually used for anything
filename = os.path.basename(image)
field_name = os.path.splitext(filename)[0]
files = {field_name: (filename, open(image, 'rb'), '')}
# Post image
r = requests.post(url, headers=headers, files=files, timeout = 5.0)
As can be seen, I am using the b flag when opening the file to preserve the binary content, so it should not change.
File size is 50349
$ ls -l 0007-Alternate-Lateral-Pulldown_back-STEP1.png
-rw-rw-r--# 1 carlerik staff 50349 Nov 26 2019 0007-Alternate-Lateral-Pulldown_back-STEP1.png
I used Charles proxy to dig into this and I now have gotten to the bottom of this. There are basically two things to note:
The difference in content length between the request sent by Chrome and Requests is exactly the length of the boundary fields (a multipart form concept) before and after the file + the six CRLF (\r\n) sequences + the Content-Disposition header.
echo '50569-178-36-6' | bc
50349
The boundary field looks like this: --ac9e15d6d3aa3a77506c2daccca2ee47\r\n
You can also see from the HTTP header and body dump that the header is actually in the body and after the boundary field, not as part of the normal headers. This was important and led me on the right path.
The second part of the answer is that the guys that wrote the server API I am interfacing with did not understand/read the HTTP spec for the exact bits they ask for: the Content-Disposition header.
The API docs for .../images state that this header must be present always, as they use (well, used in the past) its content to extract filenames and such. The problem is that the way they use it is not in accordance with how it is intended to be used: in a multipart HTTP request it is part of the body of the HTTP request, describing the part (form field) of the request it precedes.
This is, of course, also how Requests uses it, but I did not have this information before venturing into this abyss, so I was misinformed by the code in the controller that states this in its doc. So I assumed that Requests would put the header in the header section, which it did not, and not the body, which it did. After all, I saw that Chrome "did the right thing", but it turned out that was only because these requests were handcrafted in javascript:
apiService.js
/**
* Upload image
* #param file
* #returns {*}
* #private
*/
_api.postImage = function (file) {
if (typeof file != 'object') {
throw Error('ApiService: Object expected');
}
file.fileName = (/(.*)\./).exec(file.name)[1];
var ContentDisposition = 'form-data; name="' + encodeURI(file.fileName) + '"; filename="' + encodeURI(file.name) + '"';
return Upload.http({
url: Routing.generate('api_post_image'),
headers: {
'Content-Disposition': ContentDisposition,
'Content-Type': file.type
},
data: file
});
};
So the Content-Disposition header here is basically a proprietary header to convey information about the filename, that shares its appearance with the general in-body header from the spec. That means all it takes to fix this is to create a request with a custom body read from file and set this header.
To round this off, this was how it was all simplified down to:
headers = dict()
headers['authorization'] = "<something>"
headers['content-type'] = 'image/png'
with open(image, 'rb') as imagefile:
# POST image
r = requests.post(url, headers=headers, data=imagefile)

Response bodies written as non-human readable in proxpy

I am writing a plugin for proxpy. This is basically an HTTP/HTTPS proxy written in python. You extend it by implementing two functions, the arguments to which is the HTTP request and response respectively. Something like this:
method1(request):
#your implementation
method2(response):
#your implementation
I want to simply write the requests and responses to a file.
The response object has a serialize() function which I call to get the entire response as a string and then I write it to a file. Here is my code:
def proxy_mangle_response(res):
temp = res.serialize()
file_temp = open('test.txt', 'a')
file_temp.write(temp + '\n\n')
file_temp.close()
However, the problem is, the response body is written as non human-readable gibberish even though it appears as HTML when inspected through something like Live HTTP headers (chrome extension).
The serialize() method is provided by proxpy and the implementation is this:
def serialize(self):
# Response line
s = "%s %s %s" % (self.proto, self.code, self.msg)
s += HTTPMessage.EOL
# Headers
for n,v in self.headers.iteritems():
for i in v:
s += "%s: %s" % (n, i)
s += HTTPMessage.EOL
s += HTTPMessage.EOL
# Body
if not self.isChunked():
s += self.body
else:
# FIXME: Make a single-chunk body
s += "%x" % len(self.body) + HTTPMessage.EOL
s += self.body + HTTPMessage.EOL
s += HTTPMessage.EOL
s += "0" + HTTPMessage.EOL + HTTPMessage.EOL
return s
To reproduce this issue, hit 'dawn.com' after running proxpy. The first request which goes to dawn.com will reproduce the issue.
The following are the response headers:
CF-RAY 1b2c4934487f073d-AMS
Connection keep-alive
Content-Encoding gzip
Content-Type text/html
Date Tue, 03 Feb 2015 05:39:05 GMT
Server cloudflare-nginx
Transfer-Encoding chunked
Vary Accept-Encoding
X-Backend www2
X-Developer Enjoy webdev? We like you, reach out at topcoder(at)compunode.com
I'm thinking this is some sort of encoding issue and there is some info in the headers which makes the browser interpret the response body correctly.
As it turns out, the response body was compressed using gzip, as is apparent from the response header Content-Encoding. The browser decompresses it before displaying. Just need to decompress using zlib if the Content-Encoding header value is set to gzip.

POST single file in Python

Is there an easy way to upload single file using Python?
I know about requests, but it POST a dict of files containing a single file, so we have a little problem receiving that one file on the other end.
Currently code sending that file is:
def sendFileToWebService(filename, subpage):
error = None
files = {'file': open(filename, 'rb')}
try:
response = requests.post(WEBSERVICE_IP + subpage, files=files)
data = json.load(response)
(...)
And the problem is that requests sends each file in a
--7163947ad8b44c91adaddbd22414aff8
Content-Disposition: form-data; name="file"; filename="filename.txt"
Content-Type: text/plain
<beggining of file content>
(...)
<end of file content>
--7163947ad8b44c91adaddbd22414aff8--
I suppose that's a package for a file. Is there a way to send file "clear"?
Use the data parameter to requests, not the files parameter:
def sendFileToWebService(filename, subpage):
error = None
try:
response = requests.post(WEBSERVICE_IP + subpage,
data=open(filename, 'rb'))
data = json.load(response)
(...)
This will cause the file's contents to be placed in the body of the HTTP request. Specifying the files parameter triggers requests to switch to multipart/form-data.

Google http://maps.google.com/maps/geo query with non-english characters

I'm creating a Python (using urllib2) parser of addresses with non-english characters in it. The goal is to find coordinates of every address.
When I open this url in Firefox:
http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv
it is converted (changes in address box) to
http://maps.google.com/maps/geo?q=Czech Republic 10000 Malešice&output=csv
and returns
200,6,50.0865113,14.4918052
which is a correct result.
However, if I open the same url (encoded, with %20 and such) in urllib2 (or Opera browser), the result is
200,4,49.7715220,13.2955410
which is incorrect. How can I open the first url in urllib2 to get the "200,6,50.0865113,14.4918052" result?
Edit:
Code used
import urllib2
psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))
response = urllib2.urlopen(url)
data = response.read()
print 'Parsed url %s, result %s\n' % (url, data)
output
Parsed url http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv, result 200,4,49.7715220,13.2955410
I can reproduce this behavior, and at first I was dumbfounded as to why it's happening. Closer inspection of the HTTP requests with wireshark showed that the requests sent by Firefox (not surprisingly) contain a couple more HTTP-Headers.
In the end it turned out it's the Accept-Language header that makes the difference. You only get the correct result if
an Accept-Language header is set
and it has a non-english language listed first (the priorities don't seem to matter)
So, for example this Accept-Language header works:
headers = {'Accept-Language': 'de-ch,en'}
To summarize, modified like this your code works for me:
# -*- coding: utf-8 -*-
import urllib2
psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))
headers = {'Accept-Language': 'de-ch,en'}
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req)
data = response.read()
print 'Parsed url %s, result %s\n' % (url, data)
Note: In my opinion, this is a bug in Google's geocoding API. The Accept-Language header indicates what languages the user agent prefers the content in, but it shouldn't have any effect on how the request is interpreted.

Using MultipartPostHandler to POST form-data with Python

Problem: When POSTing data with Python's urllib2, all data is URL encoded and sent as Content-Type: application/x-www-form-urlencoded. When uploading files, the Content-Type should instead be set to multipart/form-data and the contents be MIME-encoded.
To get around this limitation some sharp coders created a library called MultipartPostHandler which creates an OpenerDirector you can use with urllib2 to mostly automatically POST with multipart/form-data. A copy of this library is here: MultipartPostHandler doesn't work for Unicode files
I am new to Python and am unable to get this library to work. I wrote out essentially the following code. When I capture it in a local HTTP proxy, I can see that the data is still URL encoded and not multi-part MIME-encoded. Please help me figure out what I am doing wrong or a better way to get this done. Thanks :-)
FROM_ADDR = 'my#email.com'
try:
data = open(file, 'rb').read()
except:
print "Error: could not open file %s for reading" % file
print "Check permissions on the file or folder it resides in"
sys.exit(1)
# Build the POST request
url = "http://somedomain.com/?action=analyze"
post_data = {}
post_data['analysisType'] = 'file'
post_data['executable'] = data
post_data['notification'] = 'email'
post_data['email'] = FROM_ADDR
# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
urllib2.install_opener(opener)
request = urllib2.Request(url, post_data)
request.set_proxy('127.0.0.1:8080', 'http') # For testing with Burp Proxy
# Make the request and capture the response
try:
response = urllib2.urlopen(request)
print response.geturl()
except urllib2.URLError, e:
print "File upload failed..."
EDIT1: Thanks for your response. I'm aware of the ActiveState httplib solution to this (I linked to it above). I'd rather abstract away the problem and use a minimal amount of code to continue using urllib2 how I have been. Any idea why the opener isn't being installed and used?
It seems that the easiest and most compatible way to get around this problem is to use the 'poster' module.
# test_client.py
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2
# Register the streaming http handlers with urllib2
register_openers()
# Start the multipart/form-data encoding of the file "DSC0001.jpg"
# "image1" is the name of the parameter, which is normally set
# via the "name" parameter of the HTML <input> tag.
# headers contains the necessary Content-Type and Content-Length
# datagen is a generator object that yields the encoded parameters
datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")})
# Create the Request object
request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)
# Actually do the request, and get the response
print urllib2.urlopen(request).read()
This worked perfect and I didn't have to muck with httplib. The module is available here:
http://atlee.ca/software/poster/index.html
Found this recipe to post multipart using httplib directly (no external libraries involved)
import httplib
import mimetypes
def post_multipart(host, selector, fields, files):
content_type, body = encode_multipart_formdata(fields, files)
h = httplib.HTTP(host)
h.putrequest('POST', selector)
h.putheader('content-type', content_type)
h.putheader('content-length', str(len(body)))
h.endheaders()
h.send(body)
errcode, errmsg, headers = h.getreply()
return h.file.read()
def encode_multipart_formdata(fields, files):
LIMIT = '----------lImIt_of_THE_fIle_eW_$'
CRLF = '\r\n'
L = []
for (key, value) in fields:
L.append('--' + LIMIT)
L.append('Content-Disposition: form-data; name="%s"' % key)
L.append('')
L.append(value)
for (key, filename, value) in files:
L.append('--' + LIMIT)
L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
L.append('Content-Type: %s' % get_content_type(filename))
L.append('')
L.append(value)
L.append('--' + LIMIT + '--')
L.append('')
body = CRLF.join(L)
content_type = 'multipart/form-data; boundary=%s' % LIMIT
return content_type, body
def get_content_type(filename):
return mimetypes.guess_type(filename)[0] or 'application/octet-stream'
Just use python-requests, it will set proper headers and do upload for you:
import requests
files = {"form_input_field_name": open("filename", "rb")}
requests.post("http://httpbin.org/post", files=files)
I ran into the same problem and I needed to do a multipart form post without using external libraries. I wrote a whole blogpost about the issues I ran into.
I ended up using a modified version of http://code.activestate.com/recipes/146306/. The code in that url actually just appends the content of the file as a string, which can cause problems with binary files. Here's my working code.
import mimetools
import mimetypes
import io
import http
import json
form = MultiPartForm()
form.add_field("form_field", "my awesome data")
# Add a fake file
form.add_file(key, os.path.basename(filepath),
fileHandle=codecs.open("/path/to/my/file.zip", "rb"))
# Build the request
url = "http://www.example.com/endpoint"
schema, netloc, url, params, query, fragments = urlparse.urlparse(url)
try:
form_buffer = form.get_binary().getvalue()
http = httplib.HTTPConnection(netloc)
http.connect()
http.putrequest("POST", url)
http.putheader('Content-type',form.get_content_type())
http.putheader('Content-length', str(len(form_buffer)))
http.endheaders()
http.send(form_buffer)
except socket.error, e:
raise SystemExit(1)
r = http.getresponse()
if r.status == 200:
return json.loads(r.read())
else:
print('Upload failed (%s): %s' % (r.status, r.reason))
class MultiPartForm(object):
"""Accumulate the data to be used when posting a form."""
def __init__(self):
self.form_fields = []
self.files = []
self.boundary = mimetools.choose_boundary()
return
def get_content_type(self):
return 'multipart/form-data; boundary=%s' % self.boundary
def add_field(self, name, value):
"""Add a simple field to the form data."""
self.form_fields.append((name, value))
return
def add_file(self, fieldname, filename, fileHandle, mimetype=None):
"""Add a file to be uploaded."""
body = fileHandle.read()
if mimetype is None:
mimetype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'
self.files.append((fieldname, filename, mimetype, body))
return
def get_binary(self):
"""Return a binary buffer containing the form data, including attached files."""
part_boundary = '--' + self.boundary
binary = io.BytesIO()
needsCLRF = False
# Add the form fields
for name, value in self.form_fields:
if needsCLRF:
binary.write('\r\n')
needsCLRF = True
block = [part_boundary,
'Content-Disposition: form-data; name="%s"' % name,
'',
value
]
binary.write('\r\n'.join(block))
# Add the files to upload
for field_name, filename, content_type, body in self.files:
if needsCLRF:
binary.write('\r\n')
needsCLRF = True
block = [part_boundary,
str('Content-Disposition: file; name="%s"; filename="%s"' % \
(field_name, filename)),
'Content-Type: %s' % content_type,
''
]
binary.write('\r\n'.join(block))
binary.write('\r\n')
binary.write(body)
# add closing boundary marker,
binary.write('\r\n--' + self.boundary + '--\r\n')
return binary
What a coincide, 2 years and 6 months ago I created the project
https://pypi.python.org/pypi/MultipartPostHandler2, that fixes MultipartPostHandler for utf-8 systems. I also have done some minor improvements, you are welcome to test it :)
To answer the OP's question of why the original code didn't work, the handler passed in wasn't an instance of a class. The line
# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
should read
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler())

Categories

Resources