We're receiving some POST data of xml + arbitrary binary files (like images and audio) from a device that only gives us multipart/mixed encoding.
I've setup a cherrypy upload/POST handler for our receiver end. I've managed to allow it to do arbitrary number of parameters using multipart/form-data. However when we try to send the multipart-mixed data, we're not getting any processing.
#cherrypy.expose
def upload(self, *args,**kwargs):
"""upload adapted from cherrypy tutorials
We use our variation of cgi.FieldStorage to parse the MIME
encoded HTML form data containing the file."""
print args
print kwargs
cherrypy.response.timeout = 1300
lcHDRS = {}
for key, val in cherrypy.request.headers.iteritems():
lcHDRS[key.lower()] = val
incomingBytes = int(lcHDRS['content-length'])
print cherrypy.request.rfile
#etc..etc...
So, when submitting multipart/form-data, args and kwargs are well defined.
args are the form fields, kwargs=hash of vars and values.
When I submit multipart/mixed, args and kwargs are empty, and I just have cherrypy.request.rfile as the raw POST information.
My question is, does cherrypy have a built in handler to handle multipart/mixed and chunked encoding for POST? Or will I need to override the cherrypy.tools.process_request_body and roll my own decoder?
It seems like the builtin wsgi server with cherrypy handles this as part of the HTTP/1.1 spec, but I could not seem to find documentation in cherrypy in accessing this functionality.
...to clarify
I'm using latest version 3.1.1 or so of Cherrypy.
Making a default form just involves making parameters in the upload function.
For the multipart/form-data, I've been calling curl -F param1=#file1.jpg -F param2=sometext -F param3=#file3.wav http://destination:port/upload
In that example, I get:
args = ['param1','param2','param3]
kwargs = {'param1':CString<>, 'param2': 'sometext', 'param3':CString<>}
When trying to submit the multipart/mixed, I tried looking at the request.body, but kept on getting None for that, regardless of setting the body processing.
The input we're getting is coming in as this:
user-agent:UNTRUSTED/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
content-language:en-US
content-length:565719
mime-version:1.0
content-type:multipart/mixed; boundary='newdivider'
host:192.168.1.1:8180
transfer-encoding:chunked
--newdivider
Content-type: text/xml
<?xml version='1.0' ?><data><Stuff>....
etc...etc...
--newdivider
Content-type: image/jpeg
Content-ID: file://localhost/root1/photos/Garden.jpg
Content-transfer-encoding: binary
<binary data>
I've got a sneaking suspicion that the multipart/mixed is the problem that cherrypy is just giving me just the rfile. Our goal is to have cherrypy process the body into its parts with minimal processing on the receive side (ie, let cherrypy do its magic). If that requires us being tougher on the sending format to be a content-type that cherrypy likes, then so be it. What are the accepted formats? Is it only multipart/form-data?
My bad. Whenever the Content-Type is of type "multipart/*", then CP tries to stick the contents into request.params (if any other Content-Type, it goes into request.body).
Unfortunately, CP has assumed that any multipart message is form-data, and made no provision for other subtypes. I've just fixed this in trunk, and it should be released in 3.1.2. Sorry for the inconvenience. In the short term, you can try applying the changeset locally; see http://www.cherrypy.org/ticket/890.
Related
I'm trying to upload a PDF as an attachment to a Trello card using python-requests. I've been unable to get the request in the function below to return anything other than 400: Error parsing body despite significant tweaks (detailed below).
I should note that I'm able to create cards and add URL attachments to them (neither of which require a file upload) without any problems.
Here's the code that handles the POST of the file:
def post_pdf(session, design, card_id):
attachment = {
"name": design["campaign_title"] + " - Combined PDF",
"mimeType": "application/pdf"
}
pdf_post = session.post(
url = "https://api.trello.com/1/cards/" + card_id + "/attachments",
files = {"file": open("combined_pdf.pdf", "rb")},
data = attachment
)
The authentication key and token are set Session params when the session was created, so they're not added here.
Also, in the actual code, the POST is handled by a wrapper function that adds some boilerplate error-checking and rate limiting to the request, as well as more-verbose error dumps when a request fails, but I've confirmed (in the above example) that the same error persists without the wrapper.
Adjustments I've tried
Substituting data = attachment with json = attachment
Substituting data = attachment with params = attachment
Omitting attachment completely and POSTing the file with no associated data
Adding stream = True to the request parameters (this doesn't seem to matter for uploads, but I figured it couldn't hurt to try)
Encoding the file as base64 (this encoding has been required elsewhere; I was grasping at straws)
Encoding the file as base64, combined with the above tweaks to data / json / params
Note: The PDF file is potentially a source of the problem - it's generated by converting several images to PDF format and then concatenating them with pdfunite, so I could well have made mistakes in its creation that are causing Trello to reject the file. What seems to confirm this is that Googling for Trello "Error parsing body" returns two hits, only one of which deals with Trello, and neither of which are useful. This leads me to think that this is a particularly odd / rare error message, which means to me that I've made some kind of serious error encoding the file.
However, the PDF file opens properly on my (and my coworkers') systems without any error messages, artifacts, or other strange behavior. More importantly, trying this with other "known good" PDFs also fails, with the same error code. Because the file's contents fall within the bounds of "company property / information", I'd like to avoid posting it (and / or the raw request body), but I'll do so if there's agreement that it's causing the problem.
I found the solution: the Content-Type header was set incorrectly due to a session-wide setting ( Session.headers.update({"Content-Type": "application/json"}) ) overriding the multipart/form-data header when the upload request was sent. This caused Trello to reject the body. I solved the problem by removing the session-level header, which allowed requests to modify the content type for each request.
I am developing an app which prompts the user to upload a file which is then available for download.
Here is the download handler:
class ViewPrezentacje(blobstore_handlers.BlobstoreDownloadHandler, BaseHandler):
def get(self,blob_key):
blob_key = str(urllib.unquote(blob_key))
blob_info=blobstore.BlobInfo.get(blob_key)
self.send_blob(blob_info, save_as=urllib.quote(blob_info.filename.encode('utf-8')))
The file is downloaded with the correct file name (i.e. unicode literals are properly displayed) while using Chrome or IE, but in Firefox it is saved as a string of the form "%83%86%E3..."
Is there any way to make it work properly in Firefox?
Sending filenames with non-ASCII characters in attachments is fraught with difficulty, as the original specification was broken and browser behaviours have varied.
You shouldn't be %-encoding (urllib.quote) the filename; Firefox is right to offer it as literal % sequences as a result. IE's behaviour of %-decoding sequences in the filename is incorrect, even though Chrome eventually went on to copy it.
Ultimately the right way to send non-ASCII filenames is to use the mechanism specified in RFC6266, which ends up with a header that looks like this:
Content-Disposition: attachment; filename*=UTF-8''foo-%c3%a4-%e2%82%ac.html
However:
older browsers such as IE8 don't support it so if you care you should pass something as an ASCII-only filename= as well;
BlobstoreDownloadHandler doesn't know about this mechanism.
The bit of BlobstoreDownloadHandler that needs fixing is this inner function in send_blob:
def send_attachment(filename):
if isinstance(filename, unicode):
filename = filename.encode('utf-8')
self.response.headers['Content-Disposition'] = (
_CONTENT_DISPOSITION_FORMAT % filename)
which really wants to do:
rfc6266_filename = "UTF-8''" + urllib.quote(filename.encode('utf-8'))
fallback_filename = filename.encode('us-ascii', 'ignore')
self.response.headers['Content-Disposition'] = 'attachment; filename="%s"; filename*=%s' % (rfc6266_filename, fallback_filename)
but unfortunately being an inner function makes it annoying to try to fix in a subclass. You could:
override the whole of send_blob to replace the send_attachment inner function
or maybe you can write self.response.headers['Content-Disposition'] like this after calling send_blob? I'm not sure how GAE handles this
or, probably most practical of all, give up on having Unicode filenames for now until GAE fixes it
I am building a REST API on Google App Engine (not using Endpoints) that will allow users to upload a CSV or tab-delimited file and search for potential duplicates. Since it's an API, I cannot use <form>s or the BlobStore's upload_url. I also cannot rely on having a single web client that will call this API. Instead, ideally, users will send the file in the body of the request.
My problem is, when I try to read the content of a tab-delimited file, I find that all newline characters have been removed, so there is no way of splitting the content into rows.
If I check the content of the file directly on the Python interpreter, I see that tabs and newlines are there (output is truncated in the example)
>>> with open('./data/occ_sample.txt') as o:
... o.read()
...
'id\ttype\tmodified\tlanguage\trights\n123456\tPhysicalObject\t2015-11-11 11:50:59.0\ten\thttp://creativecommons.org/licenses/by-nc/3.0\n...'
The RequestHandler logs the content of the request body:
import logging
class ReportApi(webapp2.RequestHandler):
def post(self):
logging.info(self.request.body)
...
So when I call the API running in the dev_appserver via curl
curl -X POST -d #data/occ_sample.txt http://localhost:8080/api/v0/report
This shows up in the logs:
id type modified language rights123456 PhysicalObject 2015-11-11 11:50:59.0 en http://creativecommons.org/licenses/by-nc/3.0
As you can see, there is nothing between the last value of the headers and the first record (rights and 123456 respectively) and the same happens with the last value of each record and the first one of the next.
Am I missing something obvious here? I have tried loading the data with self.request.body, self.request.body_file and self.request.POST, and none seem to work. I also tried applying the Content-Type values text/csv, text/plain, application/csv in the request headers, with no success. Should I add a different Content-Type?
You are using the wrong curl command-line option to send your file data, and it is this option that is stripping the newlines.
The -d option parses out your data and sends a application/x-www-form-urlencoded request, and it strips newlines. From the curl manpage:
-d, --data <data>
[...]
If you start the data with the letter #, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin. Multiple files can also be specified. Posting data from a file named 'foobar' would thus be done with --data #foobar. When --data is told to read from a file like that, carriage returns and newlines will be stripped out.
Bold emphasis mine.
Use the --data-binary option instead:
--data-binary <data>
(HTTP) This posts data exactly as specified with no extra processing whatsoever.
If you start the data with the letter #, the rest should be a filename. Data is posted in a similar manner as --data-ascii does, except that newlines and carriage returns are preserved and conversions are never done.
You may want to include a Content-Type header in that case; of course this depends on your handler if you care about that header.
I have a web service that returns JSON responses when successful. Unfortunately, when I try to test this service via multi-mechanize, I get an error - "not viewing HTML". Obviously it's not viewing HTML, it's getting content clearly marked as JSON. How do I get mechanize to ignore this error and accept the JSON it's getitng back?
It turns out mechanize isn't set up to accept JSON responses out of the box. For a quick and dirty solution to this, update mechanize's _headersutil.py file (check /usr/local/lib/python2.7/dist-packages/mechanize).
In the is_html() method, change the line:
html_types = ["text/html"]
to read:
html_types = ["text/html", "application/json"]
I have problem with HTTP headers, they're encoded in ASCII and I want to provided a view for downloading files that names can be non ASCII.
response['Content-Disposition'] = 'attachment; filename="%s"' % (vo.filename.encode("ASCII","replace"), )
I don't want to use static files serving for same issue with non ASCII file names but in this case there would be a problem with File system and it's file name encoding. (I don't know target os.)
I've already tried urllib.quote(), but it raises KeyError exception.
Possibly I'm doing something wrong but maybe it's impossible.
This is a FAQ.
There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).
See test cases at http://greenbytes.de/tech/tc2231/.
Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).
Don't send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).
Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.
(*: actually, there's not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)
Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.
Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.
Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).
Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.
Starting with Django 2.1 (see issue #16470), you can use FileResponse, which will correctly set the Content-Disposition header for attachments. Starting with Django 3.0 (issue #30196) it will also set it correctly for inline files.
For example, to return a file named my_img.jpg with MIME type image/jpeg as an HTTP response:
response = FileResponse(open("my_img.jpg", 'rb'), as_attachment=True, content_type="image/jpeg")
return response
Or, if you can't use FileResponse, you can use the relevant part from FileResponse's source to set the Content-Disposition header yourself. Here's what that source currently looks like:
from urllib.parse import quote
disposition = 'attachment' if as_attachment else 'inline'
try:
filename.encode('ascii')
file_expr = 'filename="{}"'.format(filename)
except UnicodeEncodeError:
file_expr = "filename*=utf-8''{}".format(quote(filename))
response.headers['Content-Disposition'] = '{}; {}'.format(disposition, file_expr)
I can say that I've had success using the newer (RFC 5987) format of specifying a header encoded with the e-mail form (RFC 2231). I came up with the following solution which is based on code from the django-sendfile project.
import unicodedata
from django.utils.http import urlquote
def rfc5987_content_disposition(file_name):
ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
header = 'attachment; filename="{}"'.format(ascii_name)
if ascii_name != file_name:
quoted_name = urlquote(file_name)
header += '; filename*=UTF-8\'\'{}'.format(quoted_name)
return header
# e.g.
# request['Content-Disposition'] = rfc5987_content_disposition(file_name)
I have only tested my code on Python 3.4 with Django 1.8. So the similar solution in django-sendfile may suite you better.
There's a long standing ticket in Django's tracker which acknowledges this but no patches have yet been proposed afaict. So unfortunately this is as close to using a robust tested library as I could find, please let me know if there's a better solution.
The escape_uri_path function from Django is the solution that worked for me.
Read the Django Docs here to see which RFC standards are currently specified.
from django.utils.encoding import escape_uri_path
file = "response.zip"
response = HttpResponse(content_type='application/zip')
response['Content-Disposition'] = f"attachment; filename*=utf-8''{escape_uri_path(file)}"
A hack:
if (Request.UserAgent.Contains("IE"))
{
// IE will accept URL encoding, but spaces don't need to be, and since they're so common..
filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}