Python server cgi.FieldStorage parsing multipart/form-data - python

so I have been writing a simple web server in Python, and right now I'm trying to handle multipart/form-data POST requests. I can already handle application/x-www-form-urlencoded POST requests, but the same code won't work for the multipart. If it looks like I am misunderstanding anything, please call me out, even if it's something minor. Also if you guys have any advice on making my code better please let me know as well :) Thanks!
When the request comes in, I first parse it, and split it into a dictionary of headers and a string for the body of the request. I use those to then construct a FieldStorage form, which I can then treat like a dictionary to pull the data out:
requestInfo = ''
while requestInfo[-4:] != '\r\n\r\n':
requestInfo += conn.recv(1)
requestSplit = requestInfo.split('\r\n')[0].split(' ')
requestType = requestSplit[0]
url = urlparse.urlparse(requestSplit[1])
path = url[2] # Grab Path
if requestType == "POST":
headers, body = parse_post(conn, requestInfo)
print "!!!Request!!! " + requestInfo
print "!!!Body!!! " + body
form = cgi.FieldStorage(headers = headers, fp = StringIO(body), environ = {'REQUEST_METHOD':'POST'}, keep_blank_values=1)
Here's my parse_post method:
def parse_post(conn, headers_string):
headers = {}
headers_list = headers_string.split('\r\n')
for i in range(1,len(headers_list)-2):
header = headers_list[i].split(': ', 1)
headers[header[0]] = header[1]
content_length = int(headers['Content-Length'])
content = conn.recv(content_length)
# Parse Content differently if it's a multipart request??
return headers, content
So for an x-www-form-urlencoded POST request, I can treat FieldStorage form like a dictionary, and if I call, for example:
firstname = args['firstname'].value
print firstname
It will work. However, if I instead send a multipart POST request, it ends up printing nothing.
This is the body of the x-www-form-urlencoded request:
firstname=TEST&lastname=rwar
This is the body of the multipart request:
--070f6a3146974d399d97c85dcf93ed44
Content-Disposition: form-data; name="lastname"; filename="lastname"
rwar
--070f6a3146974d399d97c85dcf93ed44
Content-Disposition: form-data; name="firstname"; filename="firstname"
TEST
--070f6a3146974d399d97c85dcf93ed44--
So here's the question, should I manually parse the body for the data in parse_post if it's a multipart request?
Or is there a method that I need/can use to parse the multipart body?
Or am I doing this wrong completely?
Thanks again, I know it's a long read but I wanted to make sure my question was comprehensive

So I solved my problem, but in a totally hacky way.
Ended up manually parsing the body of the request, here's the code I wrote:
if("multipart/form-data" in headers["Content-Type"]):
data_list = []
content_list = content.split("\r\n\r\n")
for i in range(len(content_list) - 1):
data_list.append("")
data_list[0] += content_list[0].split("name=")[1].split(";")[0].replace('"','') + "="
for i,c in enumerate(content_list[1:-1]):
key = c.split("name=")[1].split(";")[0].replace('"','')
data_list[i+1] += key + "="
value = c.split("\r\n")
data_list[i] += value[0]
data_list[-1] += content_list[-1].split("\r\n")[0]
content = "&".join(data_list)
If anybody can still solve my problem without having to manually parse the body, please let me know!

There's the streaming-form-data project that provides a Python parser to parse data that's multipart/form-data encoded. It's intended to allow parsing data in chunks, but since there's no chunk size enforced, you could just pass your entire input at once and it should do the job. It should be installable via pip install streaming_form_data.
Here's the source code - https://github.com/siddhantgoel/streaming-form-data
Documentation - https://streaming-form-data.readthedocs.io/en/latest/
Disclaimer: I'm the author. Of course, please create an issue in case you run into a bug. :)

Related

Python requests with a raw body

Apologies if this is a repost/stupid question, I have tried searching around but found nothing.
I have a cgi webserver that takes a POST payload that is neither percentage encoded or form data. i.e, I need to stop the request from being percentage encoded, as with:
req = Request('POST', license_url, data={ None: data})
And I also need the request to not include the multiform boundaries introduced by:
req = Request('POST', license_url, file={ None: file_handle})
My request body looks something like:
abc
def
ghi=
And this is exactly what I want to post to the server (Postman achieves this when set to raw payload).
However, sending through the data param looks like:
data=abc%3Adef%3a..
And through the file Param:
--b70cc8bd5ac1411488c719a1d773edde
Content-Disposition: form-data; filename="blahblah"
abc
def
ghi=
--b70cc8bd5ac1411488c719a1d773edde
^^^ This is the closest to what I need but I also want to strip the boundary/content-disposition terms (not on the server).
Many Thanks
You can just post the raw text content like
req = Request('POST', license_url, data=data.encode())
provided the data is a string.

API gives only the headers in Python but not the data

I am trying to access an API from this website. (https://www.eia.gov/opendata/qb.php?category=717234)
I am able to call the API but I am getting only headers. Not sure if I am doing correctly or any additions are needed.
Code:
import urllib
import requests
import urllib.request
locu_api = 'WebAPI'
def locu_search(query):
api_key = locu_api
url = 'https://api.eia.gov/category?api_key=' + api_key
locality = query.replace(' ', '%20')
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')
data = json.loads(json_obj)
When I try to print the results to see whats there in data:
data
I am getting only the headers in JSON output. Can any one help me figure out how to do extract the data instead of headers.
Avi!
Look, the data you posted seems to be an application/json response. I tried to reorganize your snippet a little bit so you could reuse it for other purposes later.
import requests
API_KEY = "insert_it_here"
def get_categories_data(api_key, category_id):
"""
Makes a request to gov API and returns its JSON response
as a python dict.
"""
host = "https://api.eia.gov/"
endpoint = "category"
url = f"{host}/{endpoint}"
qry_string_params = {"api_key": api_key, "category_id": category_id}
response = requests.post(url, params=qry_string_params)
return response.json()
print(get_categories_data(api_key=API_KEY, category_id="717234"))
As far as I can tell, the response contains some categories and their names. If that's not what you were expecting, maybe there's another endpoint that you should look for. I'm sure this snippet can help you if that's the case.
Side note: isn't your API key supposed to be private? Not sure if you should share that.
Update:
Thanks to Brad Solomon, I've changed the snippet to pass query string arguments to the requests.post function by using the params parameter which will take care of the URL encoding, if necessary.
You haven't presented all of the data. But what I see here is first a dict that associates category_id (a number) with a variable name. For example category_id 717252 is associated with variable name 'Import quantity'. Next I see a dict that associates category_id with a description, but you haven't presented the whole of that dict so 717252 does not appear. And after that I would expect to see a third dict, here entirely missing, associating a category_id with a value, something like {'category_id': 717252, 'value': 123.456}.
I think you are just unaccustomed to the way some APIs aggressively decompose their data into key/value pairs. Look more closely at the data. Can't help any further without being able to see the data for myself.

Robot Framework: send binary data in POST request body with

I have a problem with getting my test running using Robot Framework and robotframework-requests. I need to send a POST request and a binary data in the body. I looked at this question already, but it's not really answered. Here's how my test case looks like:
Upload ${filename} file
Create Session mysession http://${ADDRESS}
${data} = Get Binary File ${filename}
&{headers} = Create Dictionary Content-Type=application/octet-stream Accept=application/octet-stream
${resp} = Post Request mysession ${CGIPath} data=${data} headers=&{headers}
[Return] ${resp.status_code} ${resp.text}
The problem is that my binary data is about 250MB. When the data is read with Get Binary File I see that memory consumption goes up to 2.x GB. A few seconds later when the Post Request is triggered my test is killed by OOM. I already looked at files parameter, but it seems it uses multipart encoding upload, which is not what I need.
My other thought was about passing open file handler directly to underlying requests library, but I guess that would require robotframework-request modification. Another idea is to fall back to curl for this test only.
Am I missing something in my test? What is the better way to address this?
I proceeded with the idea of robotframework-request modification and added this method
def post_request_binary(
self,
alias,
uri,
path=None,
params=None,
headers=None,
allow_redirects=None,
timeout=None):
session = self._cache.switch(alias)
redir = True if allow_redirects is None else allow_redirects
self._capture_output()
method_name = "post"
method = getattr(session, method_name)
with open(path, 'rb') as f:
resp = method(self._get_url(session, uri),
data=f,
params=self._utf8_urlencode(params),
headers=headers,
allow_redirects=allow_redirects,
timeout=self._get_timeout(timeout),
cookies=self.cookies,
verify=self.verify)
self._print_debug()
# Store the last session object
session.last_resp = resp
self.builtin.log(method_name + ' response: ' + resp.text, 'DEBUG')
return resp
I guess I can improve it a bit and create a pull request.

Creating POST request in python, need to send data as multipart/form-data?

I'm in the process of writing a very simple Python application for a friend that will query a service and get some data in return. I can manage the GET requests easily enough, but I'm having trouble with the POST requests. Just to get my feet wet, I've only slightly modified their example JSON data, but when I send it, I get an error. Here's the code (with identifying information changed):
import urllib.request
import json
def WebServiceClass(Object):
def __init__(self, user, password, host):
self.user = user
self.password = password
self.host = host
self.url = "https://{}/urldata/".format(self.host)
mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
mgr.add_password(None, "https://{}".format(self.host), self.user, self.password)
self.opener = urllib.request.build_opener(urllib.request.HTTPDigestAuthHandler(mgr))
username = "myusername"
password = "mypassword"
service_host = "thisisthehostinfo"
web_service_object = WebServiceClass(username, password, service_host)
user_query = {"searchFields":
{
"First Name": "firstname",
"Last Name": "lastname"
},
"returnFields":["Entity ID","First Name","Last Name"]
}
user_query = json.dumps(user_query)
user_query = user_query.encode("ascii")
the_url = web_service_object.url + "modules/Users/search"
try:
user_data = web_service_object.opener.open(the_url, user_query)
user_data.read()
except urllib.error.HTTPError as e:
print(e.code)
print(e.read())
I got the class data from their API documentation.
As I said, I can do GET requests fine, but this POST request gives me a 500 error with the following text:
Content type 'application/x-www-form-urlencoded;charset=UTF-8' not supported
In researching this error, my assumption has become that the above error means I need to submit the data as multipart/form-data. Whether or not that assumption is correct is something I would like to test, but stock Python doesn't appear to have any easy way to create multipart/form-data - there are modules out there, but all of them appear to take a file and convert it to multipart/form-data, whereas I just want to convert this simple JSON data to test.
This leads me to two questions: does it seem as though I'm correct in my assumption that I need multipart/form-data to get this to work correctly? And if so, do I need to put my JSON data into a text file and use one of those modules out there to turn it into multipart, or is there some way to do it without creating a file?
Maybe you want to try the requests lib, You can pass a files param, then requests will send a multipart/form-data POST instead of an application/x-www-form-urlencoded POST. You are not limited to using actual files in that dictionary, however:
import requests
response = requests.post('http://httpbin.org/post', files=dict(foo='bar'))
print response.status_code
If you want to know more about the requests lib, and specially in sending multipart forms take a look at this:
http://docs.python-requests.org/en/master/
and
http://docs.python-requests.org/en/master/user/advanced/?highlight=Multipart-Encoded%20Files

Download file and render template in one request with Flask

In my application, I'm rendering a PDF file and pass it back as a response. For this purpose I'm using flask_weasyprint's render_pdf, which does exactly this:
def render_pdf(html, stylesheets=None, download_filename=None):
if not hasattr(html, 'write_pdf'):
html = HTML(html)
pdf = html.write_pdf(stylesheets=stylesheets)
response = current_app.response_class(pdf, mimetype='application/pdf')
if download_filename:
response.headers.add('Content-Disposition', 'attachment', filename=download_filename)
return response
I now need to render a template + returning the rendered pdf as a download. Something like
#app.route("/view")
def view() :
resp1 = render_pdf(HTML(string="<p>Render me!</p>"), download_filename = "test.pdf")
resp2 = render_template("test.html")
return resp1, resp2
Is there any way to achieve this? Any workaround?
I am not sure if this is solvable in the backend, you want to send two http responses following one request. Should that be possible? (I really don't know) Shouldn't the client make two responses? (javascript).
An option would be, javascript datablob returned in your render_template call.
Maybe something like this? (untested):
fileData = new Blob([pdf_data_here], { type: 'application/pdf' });
fileUrl = URL.createObjectURL(fileData);
window.location.assign(fileUrl);
Or maybe just use the window.location.assign() function to generate the second request.
Or put the data base64 encoded in a href attribute?

Categories

Resources