The request's content-type is application/json, but I want to get the request body bytes. Flask will auto convert the data to json. How do I get the request body?
You can get the non-form-related data by calling request.get_data() You can get the parsed form data by accessing request.form and request.files.
However, the order in which you access these two will change what is returned from get_data. If you call it first, it will contain the full request body, including the raw form data. If you call it second, it will typically be empty, and form will be populated. If you want consistent behavior, call request.get_data(parse_form_data=True).
You can get the body parsed as JSON by using request.get_json(), but this does not happen automatically like your question suggests.
See the docs on dealing with request data for more information.
To stream the data rather than reading it all at once, access request.stream.
If you want the data as a string instead of bytes, use request.get_data(as_text=True). This will only work if the body is actually text, not binary, data.
Files in a FormData request can be accessed at request.files then you can select the file you included in the FormData e.g. request.files['audio'].
So now if you want to access the actual bytes of the file, in our case 'audio' using .stream, you should make sure first that your cursor points to the first byte and not to the end of the file, in which case you will get empty bytes.
Hence, a good way to do it:
file = request.files['audio']
file.stream.seek(0)
audio = file.read()
If the data is JSON, use request.get_json() to parse it.
Related
I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.
I am putting a JSON response into a variable via requests.json() like this:
response = requests.get(some_url, params=some_params).json()
This however converts JSON's original " to Python's ', true to True, null to None.
This poses a problem when trying to save the response as text and the convert it back to JSON - sure, I can use .replace() for all conversions mentioned above, but even once I do that, I get other funny json decoder errors.
Is there any way in Python to get JSON response and keep original JavaScript format?
json() is the JSON decoder method. You are looking at a Python object, that is why it looks like Python.
Other formats are listed on the same page, starting from Response Content
.text: text - it has no separate link/paragraph, it is right under "Response Content"
.content: binary, as bytes
.json(): decoded JSON, as Python object
.raw: streamed bytes (so you can get parts of content as it comes)
You need .text for getting text, including JSON data.
You can get the raw text of your response with requests.get(some_url, params=some_params).text
It is the json method which converts to a Python friendly format.
When to use RequestHandler.get_argument(), RequestHandler.get_query_argument() and RequestHandler.get_body_argument()?
What is the use-case for each of them?
Also what does the request.body and request.argument do in these cases? Which are to be used in which scenarios?
And, is there a request.query or something similar too?
Most HTTP requests store extra parameters (say, form values) in one of two places: the URL (in the form of a ?foo=bar&spam=eggs query string), or in the request body (when using a POST request and either the application/x-www-form-urlencoded or multipart/form-data mime type).
The Request.get_query_argument() looks for URL parameters, the RequestHandler.get_body_argument() lets you retrieve parameters set in the POST body. The RequestHandler.get_argument() method retrieves either a body or a URL parameter (in that order).
You use Request.get_argument() when you explicitly don't care where the parameter comes from and your endpoint supports both GET and POST parameters. Otherwise, use one of the other methods, to keep it explicit where your parameters come from.
The Request.get_*_argument methods use the request.body_arguments and request.query_arguments values (with request.arguments being their aggregate), decoded to Unicode. request.body is the undecoded, unparsed raw request body; and yes, there is an equivalent self.query containing the query string from the URL.
I'm having trouble parsing specific body parts of MIME messages.
I have an email client web interface. I want to allow the user to download the attachments of an email. In the past, each time I wanted to download an attachment I would make a call to the IMAP server with the argument RFC822 to obtain the whole message, that I could easily parse with Python.
However, this is not efficient and I need a way to obtain just the required attachment. I'm using the alternative of making a call to the IMAP server with the BODY[1], BODY[2], etc index of the specific bodypart.
When I make this IMAP call I obtain back the correct body part (when I make a call to BODYSTRUCTURE, the number of bytes in the part I'm looking for adds up, so I'm definitely obtaining the correct part).
However, I cannot parse this body part into something useable, or save it for that matter.
A specific example: I make a call to obtain the BODY[1] of an email and obtain back
('4 (UID 26776 BODY[2] {5318}', '/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwkHBgoJCAkLCwoMDxkQDw4ODx4WFxIZJCAmJSMg\r\nIyIoLTkwKCo2KyIjMkQyNjs9QEBAJjBGS0U+Sjk/QD3/2wBDAQsLCw8NDx0QEB09KSMpPT09PT09\r\nPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT3/wAARCABRAQIDASIA\r\nAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA\r\nAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3\r\nODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm\r\np6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA\r\nAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx\r\nBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK\r\nU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3\r\nuLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2aiii\r\ngAooooAKKKKACorm4jtLaS4mbbFEpdzjOAOtS1neIf8AkXdR/wCvaT/0E1UVeSRM3yxbRbtbuC9t\r\n1ntZUliboyHIqauA+F+f+JkMnH7vj/vqu/rSvS9lUcE9jLDVvbUlNq1wooorE3CiiigAooooAKKK\r\nKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKzvEP8AyLuo/wDX\r\ntJ/6Cabruu23h/T2urlJZAPuxxLuZv8A63vXlWo+M/EXjC9FnpMUsMRORBb8sR6u3p+QrehSlJ83\r\nRGFerGKcerOp+F/XUv8Atn/7NXfE4GT0FcN4fx4Ss5jqTwS6jPt3w2vRcZxuPQHnnH5VDe6vqOuS\r\n+RGG2t0hi7/X1/HiniZqpVckThKbpUVCW534IYAggg8gilqG0QxWcKMMMsaqR6YFTVznSFFFFABR\r\nRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFITgZPSuen8daJDqLW\r\na3Xmug3SPENyRjIHJ+p7ZqowlLSKuTKcYq8nY6Kio1uIngEyyIYmGQ4PBH1qCLVLSa5Nuky+bjIU\r\n8bh7etSUcn8TGaOz05kYqwlYgg4I4qbRiLXw7ZG3SOFrmLzJmjQKZGyeSRUHxO/48dP/AOurfyqb\r\nTf8AkXNK/wCvf+prtn/usPVnn0/98n6Ikh8KzXt9LPdP5UDOWAXlmH9K6Wy0+20+Ly7WJUHc9z9T\r\n3pXdotOZ0OGWLcPqBXF6V4m1S4k0xpLvzPtTESRvaeWijBPyv0Y8dBXPClKabXQ6qlaNNpPqd5RX\r\nKWuvX8ug6HdPIhlvLsRTHYMFSW6Dt0FM0HxBqF/faZHcSIyXEM7yAIBkq5A+nFU8PNJvt/wf8iVi\r\nYNpd7fjb/M66iuZvrzVZ/Ed3ZWV7HbRQWqzDdAH3E5461lr4w1E2clyRESumLcBNvHmGTZn1x7UR\r\nw8pK6/q4SxMIuzT/AOGO6orjL7XdW0QzR3FzDds1ibmNvJ2bGBAxgHkc1saXJeJewx3+rxTySwea\r\nLcW4Q445yD0HSlKi4q9xxxCk+VJ/h/mbdFcodS1G6uNVnOqw2FnY3Bh+a3D8DHJJPqa09B1G4v7j\r\nVEuHVlt7too8Lj5QBSlSaVxxrxk7W/r+kbFFcW+v6t9in1ZbiEW0V79n+y+T1XcFzvznPNVF8T6q\r\nS8gvBuF55Iia0xGV345l6A4q1hpvqZvFwW6Z39FcRdeJr+P7VeJfWqrBeGBbEoNzoGC7s5znv0rf\r\n0HUbi/l1MXDBhb3jwx4XGFGMD3qZUJRjzMuGIhOXKv6/qxsUVzWva9d6XqsqQ7Gij057gIy9XDYH\r\nPpVOTXNV0lomu7iG8W4sZLlR5Pl+WyqGA4PI5ojQk0muopYmEW0+h2NFcjBrOq2U+nteXMN1HfWs\r\nk+wQ7PLKpvABB5Haq665rMFnpt5LdQzLqKviEQBfKO0kYOeenen9Xl3X9X/yF9aj2f8AVv8ANHbU\r\nVysXiG6e18PyCVHa7jd7gBR821CT9ORVTT/EmoPJpc0t/azi/Yq9skYDQcEjkHPbvR9Xn/Xz/wAh\r\n/Woaf12/zO1orkdI8R6tLpVpdXsFmYZiVEzT7Gc5PATHXjpntUdnrmriHSL+4uYZINSm8s24h2+X\r\nnOCGzk9O9H1eSuhLFQaTSev9fqdlRXB6Z4m1K4Gnyf2lbXM9xcCOSyWEB0XJy2Qc9Bn8as6T4gvp\r\n9Rgj1K+kt5JJin2Y2JCnk4USU5Yaav5Cji4Stbr6f5/8E7OikornOoWoLyWWC0llt4DcSopKxBgp\r\nc+mTwKnrP1LWrPTFInkzJ2jTlj/h+NAHkPiLxL4m8Rak2lyW9xbEnH2GFCGP+8erfyq7F4FvdA8P\r\n3ep6jIiSsixrbpztBdeWPTPHQV1g8Q3OoavDsVYIySMIPmIwTgt6e3SpPFF5JdeD71ZcEoY/m9cs\r\nK7qWIbnGEVZXRwV6CVOc5O7syj4SYnwlgkkLdsACeg2iofFhGn2cF8gkmldSBCi8gKfvZ9Km8I/8\r\nim3/AF+N/wCgiqXjpikGkMpKsElIIOCPmFDpqpinF92TGq6WDjNdkcoPEGu+IZ0SQrc28fAjkHyJ\r\n77uuffOa9JtvIXSbKG1mEqwQ7GI7H0NadnpFnqHh6xWeFQTAjb0G1gSAScj3qnZ+EDb3xka8byl+\r\n6EGGb2NY1qvN7iVkjpo0lH327tm5cyJFpEskgLIkBZgvUgLziuUtLPTrWSEhbyQWsgFtbTXWfnLB\r\nQQmOB82c+hrspII5rd4ZFDRupRl9QRjFVZNHs5m3Sxs5Awu6Rjs5B+Xn5eg6Y6VjGco6JmsqcZWc\r\nkc+ulWulXsDyWtz5UKyXSRG63xQ7cbiq+vzcVAunaUmlwXJkurcwRyGAQ3XzupO5hkDrk11v2C3I\r\nUOhfajRguxY7WxkEnrnA601dOt1tHtdrtA67SjyM3GMY5PAqvbT7k+wp/wApy0thpxa2nM2qCS5z\r\nbl1ucMwWQJ8x78t+VW9T0TSLEW8EkM4iuo1sSUfiNAdwJz/tAc+9bDaJYOWLQH5ju4dhtJYMSvPy\r\n/MAeMVLNplrcWwgnjMsYVlxIxY4YEHknPQmj20+4ewp/ynM3Fxp+rySzGxuJRFF9kIaUIGRpNoP4\r\nkA59DUxt7Tw7qMM5a6muBauQJ7oEKgK5Vc9TzwK310myQyFbdR5hUtgnnacj8jUzWsL3KzvGrSqp\r\nRWPYEg/zApe0la1x+yhfmtqcZfQ6ZdiS4eO+hh1Bt+xbkIkpDqhLA/d5INTQ2dvNfLNajUIzdXD7\r\n/KvNqF1zk8dRgf0rpRotgH3fZwfm3AFiQp3BuBnA+YA8VLHp9tE6tHCqlXaRcZ4Zup/Gn7WdrXF7\r\nCne9jlpLLTJLi2eOC8YXpS6S18/bCXbJyR+GahmsdMgeRZReNAswklhW8BQykb+F7jpz/hXULodg\r\ni4WAjGNpEjZTGcBTn5RyeBjrSLoGmLHsFnGRkHJyWzjHXr0o9tU7h7Cn/KZd1oWkDSWmltmIuZ0m\r\naTI8xWd1/i7AE9PTNU47SyfXrmO1lv4pPPMswW72ITk5IUDn7p49q6ePTreO0e2Cs0DrtKSOzjGM\r\nY5Jpkek2cRjMUbJsQINkjDKgk4bB+bknrnqaPaz7h7Gn2OfvZbHVLuKS7tLsSXlqIYxE4OYny2cd\r\nj8v61JfRaZdxFpYrkrZxC0G1wCVkAB/EVuPo9k6xgw48pFjjKsVKKvTBByOtMGh6eCuLcAKANoZs\r\nHGcEjOCRk8nml7SS2ZTpwd7o55b/AE+efS1FuzSQQbYB9oUpsZMEOR/FgdPerMOj2Wm6rAsUV1N9\r\nkj84JLcEx26kkfKO54NbJ0Wx3xusGx40CI0bshVRkAZBB7mppNPt5Z0mdW8xF2bg7DK9cHB5H1zR\r\n7Se1xeyhvY5nT4dPtbxmtrCYXF9GDCjSgqscgZjt/ufdOR9Kgt00m3h0xl05oHiKSQO0qKXDBgPM\r\nb8Dx7iupg0iyt3R44cNGRsJZm24BAAyeAAx46c0DSbIGEi3T9yAsecnaBnH8zR7WfcPYw7HJ2Mel\r\nB5LiGC6eCxDzfZ5LnKIwBJ2J0I54Oe9T/ZNK0u8siIdSfEp8i3eQlYWJAyEJ/wBr+ddO+mWryyO0\r\nZzKNsih2CuMbeVzg8cdKi/sLTyOYCzZyHZ2LA8YO4nPG0Y9MU/az7iVGmvsmEtppkQstNtrSdpYJ\r\nVmhLOFbdlyQzdcDaePpTLRLK4vrW7dtSnjSdNpnudyxyuMj5fbIGa6EaJYgDbCVYHO9XYPnLHO7O\r\nc/M3PfNPj0qziQJHAqqHWQAZ4ZQAp/AAUvaz7j9jDsW6KKKg0AjIIPQ1zV34NjluxJBcskbHLq/z\r\nEfQ/4101FAGbDpFpptjMttF85QgueWbj1rlNYu7ebSriwDl3mKZZOQu056967zqOawtX8LW17DM9\r\noqwXLL8pyQmfcD+lVB2kmiZpSi01c5HToZLO0aZLlLOyhY75pXwgOORj+I9OKx/EXijTddntrW2e\r\nRFtlZEmkTCyliM8clenGf0pn/CBeJ9W1U2t6iQQRHPmlv3IB7oB1P6+pr0Pw34G0rw4Fkij+0Xg6\r\n3EoBYf7o6L+HPvXYpQoy5+bmkccoSrx9ny2ibGkRtFo9lHIpV0gRWB7EKKuUUVxN3dztSsrBRRRS\r\nGFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQA\r\nneloooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKA\r\nCiiigD//2Q==\r\n').
This specific response corresponds to a JPEG image attachment.
I tried extracting the string representing the body part (so, I'm talking about the string starting in '/9j' and ending in '2Q==\r\n') and saving that to a file as a .jpg, but it's not a valid file.
I then though that, as there are multiple instances of \r\n in that string, that the string might be split with newline/carriage return, so I split the string and stripped it of the \r\n, then joined the substrings and tried to save that to a file. Still not a valid JPEG file.
What can I do to try and parse this response?
Thank you.
You need to parse the BODYRESPONSE string to see what format the data is encoded in, see the IMAP RFC 3501, section 7.4.2. The 5th field is the content encoding:
['IMAGE', 'JPEG', ['NAME', 'image001.jpg'], '<image001.jpg#01CDE914.6E62F850>', None, 'BASE64', 5318, None, None, None]
The fields are, in order, the type and subtype (so image/jpeg in this case), body parameters (such as characterset, format-flowed, or the filename in this case), the attachment id, description, encoding, size, MD5 signature (if any), disposition and language.
In this case the data is base-64 encoded:
>>> imagedata = datastring.decode('base64')
>>> imagedata[:10]
'\xff\xd8\xff\xe0\x00\x10JFIF'
which looks like JPEG data to me.
Is there a way to easily extract the json data portion in the body of a POST request?
For example, if someone posts to www.example.com/post with the body of the form with json data, my GAE server will receive the request by calling:
jsonstr = self.request.body
However, when I look at the jsonstr, I get something like :
str: \r\n----------------------------8cf1c255b3bd7f2\r\nContent-Disposition: form-data;
name="Actigraphy"\r\n Content-Type: application/octet-
stream\r\n\r\n{"Data":"AfgCIwHGAkAB4wFYAZkBKgHwAebQBaAD.....
I just want to be able to call a function to extract the json part of the body which starts at the {"Data":...... section.
Is there an easy function I can call to do this?
there is a misunderstanding, the string you show us is not json data, it looks like a POST body. You have to parse the body with something like cgi.parse_multipart.
Then you could parse json like answered by aschmid00. But instead of the body, you parse only the data.
Here you can find a working code that shows how to use cgi.FieldStorage for parsing the POST body.
This Question is also answered here..
It depends on how it was encoded on the browser side before submitting, but normally you would get the POST data like this:
jsonstr = self.request.POST["Data"]
If that's not working you might want to give us some info on how "Data" was encoded into the POST data on the client side.
you can try:
import json
values = 'random stuff .... \r\n {"data":{"values":[1,2,3]}} more rnandom things'
json_value = json.loads(values[values.index('{'):values.rindex('}') + 1])
print json_value['data'] # {u'values': [1, 2, 3]}
print json_value['data']['values'] # [1, 2, 3]
but this is dangerous and takes a fair amount of assumptions, Im not sure which framework you are using, bottle, flask, theres many, please use the appropriate call to POST
to retrieve the values, based on the framework, if indeed you are using one.
I think you mean to do this self.request.get("Data") If you are using the GAE by itself.
https://developers.google.com/appengine/docs/python/tools/webapp/requestclass#Request_get
https://developers.google.com/appengine/docs/python/tools/webapp/requestclass#Request_get_all