Problem: When POSTing data with Python's urllib2, all data is URL encoded and sent as Content-Type: application/x-www-form-urlencoded. When uploading files, the Content-Type should instead be set to multipart/form-data and the contents be MIME-encoded.
To get around this limitation some sharp coders created a library called MultipartPostHandler which creates an OpenerDirector you can use with urllib2 to mostly automatically POST with multipart/form-data. A copy of this library is here: MultipartPostHandler doesn't work for Unicode files
I am new to Python and am unable to get this library to work. I wrote out essentially the following code. When I capture it in a local HTTP proxy, I can see that the data is still URL encoded and not multi-part MIME-encoded. Please help me figure out what I am doing wrong or a better way to get this done. Thanks :-)
FROM_ADDR = 'my#email.com'
try:
data = open(file, 'rb').read()
except:
print "Error: could not open file %s for reading" % file
print "Check permissions on the file or folder it resides in"
sys.exit(1)
# Build the POST request
url = "http://somedomain.com/?action=analyze"
post_data = {}
post_data['analysisType'] = 'file'
post_data['executable'] = data
post_data['notification'] = 'email'
post_data['email'] = FROM_ADDR
# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
urllib2.install_opener(opener)
request = urllib2.Request(url, post_data)
request.set_proxy('127.0.0.1:8080', 'http') # For testing with Burp Proxy
# Make the request and capture the response
try:
response = urllib2.urlopen(request)
print response.geturl()
except urllib2.URLError, e:
print "File upload failed..."
EDIT1: Thanks for your response. I'm aware of the ActiveState httplib solution to this (I linked to it above). I'd rather abstract away the problem and use a minimal amount of code to continue using urllib2 how I have been. Any idea why the opener isn't being installed and used?
It seems that the easiest and most compatible way to get around this problem is to use the 'poster' module.
# test_client.py
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2
# Register the streaming http handlers with urllib2
register_openers()
# Start the multipart/form-data encoding of the file "DSC0001.jpg"
# "image1" is the name of the parameter, which is normally set
# via the "name" parameter of the HTML <input> tag.
# headers contains the necessary Content-Type and Content-Length
# datagen is a generator object that yields the encoded parameters
datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")})
# Create the Request object
request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)
# Actually do the request, and get the response
print urllib2.urlopen(request).read()
This worked perfect and I didn't have to muck with httplib. The module is available here:
http://atlee.ca/software/poster/index.html
Found this recipe to post multipart using httplib directly (no external libraries involved)
import httplib
import mimetypes
def post_multipart(host, selector, fields, files):
content_type, body = encode_multipart_formdata(fields, files)
h = httplib.HTTP(host)
h.putrequest('POST', selector)
h.putheader('content-type', content_type)
h.putheader('content-length', str(len(body)))
h.endheaders()
h.send(body)
errcode, errmsg, headers = h.getreply()
return h.file.read()
def encode_multipart_formdata(fields, files):
LIMIT = '----------lImIt_of_THE_fIle_eW_$'
CRLF = '\r\n'
L = []
for (key, value) in fields:
L.append('--' + LIMIT)
L.append('Content-Disposition: form-data; name="%s"' % key)
L.append('')
L.append(value)
for (key, filename, value) in files:
L.append('--' + LIMIT)
L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
L.append('Content-Type: %s' % get_content_type(filename))
L.append('')
L.append(value)
L.append('--' + LIMIT + '--')
L.append('')
body = CRLF.join(L)
content_type = 'multipart/form-data; boundary=%s' % LIMIT
return content_type, body
def get_content_type(filename):
return mimetypes.guess_type(filename)[0] or 'application/octet-stream'
Just use python-requests, it will set proper headers and do upload for you:
import requests
files = {"form_input_field_name": open("filename", "rb")}
requests.post("http://httpbin.org/post", files=files)
I ran into the same problem and I needed to do a multipart form post without using external libraries. I wrote a whole blogpost about the issues I ran into.
I ended up using a modified version of http://code.activestate.com/recipes/146306/. The code in that url actually just appends the content of the file as a string, which can cause problems with binary files. Here's my working code.
import mimetools
import mimetypes
import io
import http
import json
form = MultiPartForm()
form.add_field("form_field", "my awesome data")
# Add a fake file
form.add_file(key, os.path.basename(filepath),
fileHandle=codecs.open("/path/to/my/file.zip", "rb"))
# Build the request
url = "http://www.example.com/endpoint"
schema, netloc, url, params, query, fragments = urlparse.urlparse(url)
try:
form_buffer = form.get_binary().getvalue()
http = httplib.HTTPConnection(netloc)
http.connect()
http.putrequest("POST", url)
http.putheader('Content-type',form.get_content_type())
http.putheader('Content-length', str(len(form_buffer)))
http.endheaders()
http.send(form_buffer)
except socket.error, e:
raise SystemExit(1)
r = http.getresponse()
if r.status == 200:
return json.loads(r.read())
else:
print('Upload failed (%s): %s' % (r.status, r.reason))
class MultiPartForm(object):
"""Accumulate the data to be used when posting a form."""
def __init__(self):
self.form_fields = []
self.files = []
self.boundary = mimetools.choose_boundary()
return
def get_content_type(self):
return 'multipart/form-data; boundary=%s' % self.boundary
def add_field(self, name, value):
"""Add a simple field to the form data."""
self.form_fields.append((name, value))
return
def add_file(self, fieldname, filename, fileHandle, mimetype=None):
"""Add a file to be uploaded."""
body = fileHandle.read()
if mimetype is None:
mimetype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'
self.files.append((fieldname, filename, mimetype, body))
return
def get_binary(self):
"""Return a binary buffer containing the form data, including attached files."""
part_boundary = '--' + self.boundary
binary = io.BytesIO()
needsCLRF = False
# Add the form fields
for name, value in self.form_fields:
if needsCLRF:
binary.write('\r\n')
needsCLRF = True
block = [part_boundary,
'Content-Disposition: form-data; name="%s"' % name,
'',
value
]
binary.write('\r\n'.join(block))
# Add the files to upload
for field_name, filename, content_type, body in self.files:
if needsCLRF:
binary.write('\r\n')
needsCLRF = True
block = [part_boundary,
str('Content-Disposition: file; name="%s"; filename="%s"' % \
(field_name, filename)),
'Content-Type: %s' % content_type,
''
]
binary.write('\r\n'.join(block))
binary.write('\r\n')
binary.write(body)
# add closing boundary marker,
binary.write('\r\n--' + self.boundary + '--\r\n')
return binary
What a coincide, 2 years and 6 months ago I created the project
https://pypi.python.org/pypi/MultipartPostHandler2, that fixes MultipartPostHandler for utf-8 systems. I also have done some minor improvements, you are welcome to test it :)
To answer the OP's question of why the original code didn't work, the handler passed in wasn't an instance of a class. The line
# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
should read
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler())
Related
I have a flask server, and in a GET request, I want to download a file from another server then respond to the client. The code is like:
import requests
from flask import send_file
def get(self, report_id):
url = origin + '/reports/sastScan/' + report_id
r = requests.get(url, headers=headers, allow_redirects=True)
return send_file(<something correct here>, attachment_filename='report.pdf')
Do I need to write the r.content into a file then give it to send_file, or I can use other functions rather than send_file, or I can use some other attributes from response?
Try this one: wrap the response as BytesIO and send it use send_file
from io import BytesIO
def get(self, report_id):
url = origin + '/reports/sastScan/' + report_id
r = requests.get(url, headers=headers, allow_redirects=True)
# Reencode r.content with utf-8. Otherwise, you'd better set encoding
# manually in mimetype
file_obj = BytesIO(r.text.encode('utf-8'))
# file_obj is not a real file obj. We have to
# provide `filename` or `attachment_filename` explicitly
return send_file(
file_obj,
attachment_filename='report.pdf',
mimetype="Content-Type: application/pdf; charset=utf-8"
)
Points:
BytesIO is not a real file, since BytesIO doesn't have .name property, we have to provide filename or attachment_filename explicitly in send_file()
Explicit charset is preferred to be provided in send_file()
In case the response is not text. The following may be better
def get(self, report_id):
url = origin + '/reports/sastScan/' + report_id
file_obj = BytesIO(r.content)
return send_file(
file_obj,
attachment_filename='report.pdf',
mimetype=f"Content-Type: application/pdf; charset={r.encoding}"
)
How can I send a file from my local computer to hipchat using a python API? I am currently using Hypchat but it is not well documented.
Here is my code so far:
import hypchat
hc = hypchat.HypChat("myKey")
room = hc.get_room('bigRoom')
I'm not sure how to proceed. I tried other methods such as this one but I keep getting the error:
[ERROR] HipChat file failed: '405 Client Error: Method Not Allowed for url: https://api.hipchat.com/v2/room/bigRoom/share/file'
This code allows me to send any file to a hipchat room:
# do this:
# pip install requests_toolbelt
from os import path
from sys import exit, stderr
from requests import post
from requests_toolbelt import MultipartEncoder
class MultipartRelatedEncoder(MultipartEncoder):
"""A multipart/related encoder"""
#property
def content_type(self):
return str('multipart/related; boundary={0}'.format(self.boundary_value))
def _iter_fields(self):
# change content-disposition from form-data to attachment
for field in super(MultipartRelatedEncoder, self)._iter_fields():
content_type = field.headers['Content-Type']
field.make_multipart(content_disposition = 'attachment',
content_type = content_type)
yield field
def hipchat_file(token, room, filepath, host='api.hipchat.com'):
if not path.isfile(filepath):
raise ValueError("File '{0}' does not exist".format(filepath))
url = "https://{0}/v2/room/{1}/share/file".format(host, room)
headers = {'Content-type': 'multipart/related; boundary=boundary123456'}
headers['Authorization'] = "Bearer " + token
m = MultipartRelatedEncoder(fields={'metadata' : (None, '', 'application/json; charset=UTF-8'),
'file' : (path.basename(filepath), open(filepath, 'rb'), 'text/csv')})
headers['Content-type'] = m.content_type
r = post(url, data=m, headers=headers)
if __name__ == '__main__:
my_token = <my token>
my_room = <room name>
my_file = <filepath>
try:
hipchat_file(my_token, my_room, my_file)
except Exception as e:
msg = "[ERROR] HipChat file failed: '{0}'".format(e)
print(msg, file=stderr)
exit(1)
Shout out to #Martijn Pieters
I'm building a website + backend with the FLask Framework in which I use Flask-OAuthlib to authenticate with google. After authentication, the backend needs to regularly scan the user his Gmail. So currently users can authenticate my app and I store the access_token and the refresh_token. The access_token expires after one hour, so within that one hour I can get the userinfo like so:
google = oauthManager.remote_app(
'google',
consumer_key='xxxxxxxxx.apps.googleusercontent.com',
consumer_secret='xxxxxxxxx',
request_token_params={
'scope': ['https://www.googleapis.com/auth/userinfo.email', 'https://www.googleapis.com/auth/gmail.readonly'],
'access_type': 'offline'
},
base_url='https://www.googleapis.com/oauth2/v1/',
request_token_url=None,
access_token_method='POST',
access_token_url='https://accounts.google.com/o/oauth2/token',
authorize_url='https://accounts.google.com/o/oauth2/auth'
)
token = (the_stored_access_token, '')
userinfoObj = google.get('userinfo', token=token).data
userinfoObj['id'] # Prints out my google id
Once the hour is over, I need to use the refresh_token (which I've got stored in my database) to request a new access_token. I tried replacing the_stored_access_token with the_stored_refresh_token, but this simply gives me an Invalid Credentials-error.
In this github issue I read the following:
regardless of how you obtained the access token / refresh token (whether through an authorization code grant or resource owner password credentials), you exchange them the same way, by passing the refresh token as refresh_token and grant_type set to 'refresh_token'.
From this I understood I had to create a remote app like so:
google = oauthManager.remote_app(
'google',
# also the consumer_key, secret, request_token_params, etc..
grant_type='refresh_token',
refresh_token=u'1/xK_ZIeFn9quwvk4t5VRtE2oYe5yxkRDbP9BQ99NcJT0'
)
But this leads to a TypeError: __init__() got an unexpected keyword argument 'refresh_token'. So from here I'm kinda lost.
Does anybody know how I can use the refresh_token to get a new access_token? All tips are welcome!
This is how I get a new access_token for google:
from urllib2 import Request, urlopen, URLError
from webapp2_extras import json
import mimetools
BOUNDARY = mimetools.choose_boundary()
def refresh_token()
url = google_config['access_token_url']
headers = [
("grant_type", "refresh_token"),
("client_id", <client_id>),
("client_secret", <client_secret>),
("refresh_token", <refresh_token>),
]
files = []
edata = EncodeMultiPart(headers, files, file_type='text/plain')
headers = {}
request = Request(url, headers=headers)
request.add_data(edata)
request.add_header('Content-Length', str(len(edata)))
request.add_header('Content-Type', 'multipart/form-data;boundary=%s' % BOUNDARY)
try:
response = urlopen(request).read()
response = json.decode(response)
except URLError, e:
...
EncodeMultipart function is taken from here:
https://developers.google.com/cloud-print/docs/pythonCode
Be sure to use the same BOUNDARY
Looking at the source code for OAuthRemoteApp. The constructor does not take a keyword argument called refresh_token. It does however take an argument called access_token_params which is an optional dictionary of parameters to forward to the access token url.
Since the url is the same, but the grant type is different. I imagine a call like this should work:
google = oauthManager.remote_app(
'google',
# also the consumer_key, secret, request_token_params, etc..
grant_type='refresh_token',
access_token_params = {
refresh_token=u'1/xK_ZIeFn9quwvk4t5VRtE2oYe5yxkRDbP9BQ99NcJT0'
}
)
flask-oauthlib.contrib contains an parameter named auto_refresh_url / refresh_token_url in the remote_app which does exactely what you wanted to wanted to do. An example how to use it looks like this:
app= oauth.remote_app(
[...]
refresh_token_url='https://www.douban.com/service/auth2/token',
authorization_url='https://www.douban.com/service/auth2/auth',
[...]
)
However I did not manage to get it running this way. Nevertheless this is possible without the contrib package. My solution was to catch 401 API calls and redirect to a refresh page if a refresh_token is available.
My code for the refresh endpoint looks as follows:
#app.route('/refresh/')
def refresh():
data = {}
data['grant_type'] = 'refresh_token'
data['refresh_token'] = session['refresh_token'][0]
data['client_id'] = CLIENT_ID
data['client_secret'] = CLIENT_SECRET
# make custom POST request to get the new token pair
resp = remote.post(remote.access_token_url, data=data)
# checks the response status and parses the new tokens
# if refresh failed will redirect to login
parse_authorized_response(resp)
return redirect('/')
def parse_authorized_response(resp):
if resp is None:
return 'Access denied: reason=%s error=%s' % (
request.args['error_reason'],
request.args['error_description']
)
if isinstance(resp, dict):
session['access_token'] = (resp['access_token'], '')
session['refresh_token'] = (resp['refresh_token'], '')
elif isinstance(resp, OAuthResponse):
print(resp.status)
if resp.status != 200:
session['access_token'] = None
session['refresh_token'] = None
return redirect(url_for('login'))
else:
session['access_token'] = (resp.data['access_token'], '')
session['refresh_token'] = (resp.data['refresh_token'], '')
else:
raise Exception()
return redirect('/')
Hope this will help. The code can be enhanced of course and there surely is a more elegant way than catching 401ers but it's a start ;)
One other thing: Do not store the tokens in the Flask Session Cookie. Rather use Server Side Sessions from "Flask Session" which I did in my code!
This is how i got my new access token.
from urllib2 import Request, urlopen, URLError
import json
import mimetools
BOUNDARY = mimetools.choose_boundary()
CRLF = '\r\n'
def EncodeMultiPart(fields, files, file_type='application/xml'):
"""Encodes list of parameters and files for HTTP multipart format.
Args:
fields: list of tuples containing name and value of parameters.
files: list of tuples containing param name, filename, and file contents.
file_type: string if file type different than application/xml.
Returns:
A string to be sent as data for the HTTP post request.
"""
lines = []
for (key, value) in fields:
lines.append('--' + BOUNDARY)
lines.append('Content-Disposition: form-data; name="%s"' % key)
lines.append('') # blank line
lines.append(value)
for (key, filename, value) in files:
lines.append('--' + BOUNDARY)
lines.append(
'Content-Disposition: form-data; name="%s"; filename="%s"'
% (key, filename))
lines.append('Content-Type: %s' % file_type)
lines.append('') # blank line
lines.append(value)
lines.append('--' + BOUNDARY + '--')
lines.append('') # blank line
return CRLF.join(lines)
def refresh_token():
url = "https://oauth2.googleapis.com/token"
headers = [
("grant_type", "refresh_token"),
("client_id", "xxxxxx"),
("client_secret", "xxxxxx"),
("refresh_token", "xxxxx"),
]
files = []
edata = EncodeMultiPart(headers, files, file_type='text/plain')
#print(EncodeMultiPart(headers, files, file_type='text/plain'))
headers = {}
request = Request(url, headers=headers)
request.add_data(edata)
request.add_header('Content-Length', str(len(edata)))
request.add_header('Content-Type', 'multipart/form-data;boundary=%s' % BOUNDARY)
response = urlopen(request).read()
print(response)
refresh_token()
#response = json.decode(response)
#print(refresh_token())
With your refresh_token, you can get a new access_token like:
from google.oauth2.credentials import Credentials
from google.auth.transport import requests
creds = {"refresh_token": "<goes here>",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"client_id": "<YOUR_CLIENT_ID>.apps.googleusercontent.com",
"client_secret": "<goes here>",
"scopes": ["https://www.googleapis.com/auth/userinfo.email"]}
cred = Credentials.from_authorized_user_info(creds)
cred.refresh(requests.Request())
my_new_access_token = cred.token
I'm relatively new to python and computers in general. Currently I'm trying to post data to a website, namely http://www.camp.bicnirrh.res.in/featcalc/, and select four checkboxes after uploading a file which contains the data to be analyzed.
So far, this is what I've tried.
def encode_multipart_formdata(fields, files, data):
"""
fields is a sequence of (name, value) elements for regular form fields.
files is a sequence of (name, filename, value) elements for data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
BOUNDARY = '-----------------------------7de18336272e32'
CRLF = '\r\n'
L = []
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="seq"')
L.append('')
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (files[0], files[1]))
L.append('Content-Type: text/plain')
L.append('')
L.append(data)
L.append('')
for (key, value) in fields:
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="%s"' % key)
L.append('')
L.append(value)
L.append('--' + BOUNDARY + '--')
L.append('')
body = CRLF.join(L)
content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
return content_type, body
For this code, data is the file that's been opened, read, and closed, files is the filename and the full filename - ex ('Practice', 'Practice.txt')
This returned what I thought was a good format. But when I tried to post the encrypted data using urllib2, urllib2 request, and urlopen, I got something that looked like the source code of the results page, but didn't have any of the data that I needed (ie no values). I tried this as well just to see if it would work.
files = {'file': ('Practice.txt', open('Practice.txt', 'rb'))}
r = requests.post(url, files=files)
r.text
The problem is, I think, the page requires that you select a checkbox, and I have no idea how to do that in a post request. I was thinking about trying to use a cgi script next, but I'm literally running out of ideas.
Any help would be greatly appreciated! Thank you!
The requests module will be most useful here, you need to set data and files
files = {'file': ('Practice.txt', open('Practice.txt', 'rb'))}
data = {'amino':'amino', 'aliphatic': 'aliphatic'}
r = requests.post(url, files=files, data=data)
r.text
I always used to save the files I wanted to make downloadable in django. In my project I used that code for example:
def keyDownload(request, benutzername):
benutzernameKey = benutzername +".key"
fsock = open('/var/www/openvpn/examples/easy-rsa/2.0/keys/'+benutzernameKey, 'r')
response = HttpResponse(fsock, mimetype='application/pgp-keys')
response['Content-Disposition'] = "attachment; filename = %s " % (benutzernameKey)
return response
I got a pdf file which I get through urllib:
url = "http://www.urltomypdf.com"
sock = urllib2.urlopen(url)
with open('report.pdf', 'wb') as f:
while True:
content = sock.read()
if not content: break
f.write(content)
At the moment I am saving the pdf in a file called report.pdf. But my aim is to render it directly to my template with a function in django. Is that possible ?
With the introduction of Django 1.5, the StreamingHttpResponse class has been made available to stream a response based on an iterator. Your view and iterator could look like this:
def stream_pdf(url, chunk_size=8192):
sock = urllib2.urlopen(url)
while True:
content = sock.read(chunk_size)
if not content: break
yield content
def external_pdf_view(request, *args, **kwargs):
url = <url> # specify the url here
response = StreamingHttpResponse(stream_pdf(url), content_type="application/pdf")
response['Content-Disposition'] = "filename='%s'" % <filename> #specify the filename here
return response
Pre Django 1.5, it is still possible to stream a response by passing an iterator to HttpResponse, but there are several caveats. First, you need to use the #condition(etag_func=None) decorator on your view function. Secondly, some middleware can prevent a properly streamed response, so you'll need to bypass that middleware. And finally, a chunk of content only gets send when it reaches a length of 1024 bytes, so chunk_size should be over 1024.