Is there a way to easily extract the json data portion in the body of a POST request?
For example, if someone posts to www.example.com/post with the body of the form with json data, my GAE server will receive the request by calling:
jsonstr = self.request.body
However, when I look at the jsonstr, I get something like :
str: \r\n----------------------------8cf1c255b3bd7f2\r\nContent-Disposition: form-data;
name="Actigraphy"\r\n Content-Type: application/octet-
stream\r\n\r\n{"Data":"AfgCIwHGAkAB4wFYAZkBKgHwAebQBaAD.....
I just want to be able to call a function to extract the json part of the body which starts at the {"Data":...... section.
Is there an easy function I can call to do this?
there is a misunderstanding, the string you show us is not json data, it looks like a POST body. You have to parse the body with something like cgi.parse_multipart.
Then you could parse json like answered by aschmid00. But instead of the body, you parse only the data.
Here you can find a working code that shows how to use cgi.FieldStorage for parsing the POST body.
This Question is also answered here..
It depends on how it was encoded on the browser side before submitting, but normally you would get the POST data like this:
jsonstr = self.request.POST["Data"]
If that's not working you might want to give us some info on how "Data" was encoded into the POST data on the client side.
you can try:
import json
values = 'random stuff .... \r\n {"data":{"values":[1,2,3]}} more rnandom things'
json_value = json.loads(values[values.index('{'):values.rindex('}') + 1])
print json_value['data'] # {u'values': [1, 2, 3]}
print json_value['data']['values'] # [1, 2, 3]
but this is dangerous and takes a fair amount of assumptions, Im not sure which framework you are using, bottle, flask, theres many, please use the appropriate call to POST
to retrieve the values, based on the framework, if indeed you are using one.
I think you mean to do this self.request.get("Data") If you are using the GAE by itself.
https://developers.google.com/appengine/docs/python/tools/webapp/requestclass#Request_get
https://developers.google.com/appengine/docs/python/tools/webapp/requestclass#Request_get_all
Related
I have a large JSON item returned through a REST API, I wont junk up this with the full text but here is the code I am currently using:
import urllib2
import json
req = urllib2.Request
('http://elections.huffingtonpost.com/
pollster/api/polls.json?state=IA')
response = urllib2.urlopen(req)
the_page = response.read()
decode = json.loads(the_page)
#print = decode #removed, because it is not actually related to the question
print decode
I have been trying to extract information out of it such as the date polls are updated, the actual data from the polls etc (particularly the presidential polls) but I am having trouble returning any data at all. Can anyone assist?
EDIT:
The actual question is how to query data from the returned array/dict
The problem is, that you overwrite print with your data, instead of printing the data. Just remove the = in the last line and it should work fine:
print decode
If you want to use Python 3, you need parenthesis for print. This would look like this:
print(decode)
Edit: As you updated your question, here an answer to your actual question: The data is returned as a combination of dicts and lists by the loads function. Hence you can also access the data like a dict/list. For example, to get the last_updated field of all polls in one list, you can do something like this:
all_last_updated = [poll['last_updated'] for poll in decode]
Or to just get the end date of all polls sponsored by "Constitutional Responsibility Project", you could do this:
end_dates = [poll['end_date'] for poll in decode if any(sponsor['name'] == 'Constitutional Responsibility Project' for sponsor in poll['sponsors'])]
Or if you just want the id of the first poll in the list, do:
the_id = decode[0]['id']
You access anything you want from the json in a similar way.
it is because you do
print = decode
instead, if you are using python 2 do
print decode
or in python 3 do
print(decode)
The request's content-type is application/json, but I want to get the request body bytes. Flask will auto convert the data to json. How do I get the request body?
You can get the non-form-related data by calling request.get_data() You can get the parsed form data by accessing request.form and request.files.
However, the order in which you access these two will change what is returned from get_data. If you call it first, it will contain the full request body, including the raw form data. If you call it second, it will typically be empty, and form will be populated. If you want consistent behavior, call request.get_data(parse_form_data=True).
You can get the body parsed as JSON by using request.get_json(), but this does not happen automatically like your question suggests.
See the docs on dealing with request data for more information.
To stream the data rather than reading it all at once, access request.stream.
If you want the data as a string instead of bytes, use request.get_data(as_text=True). This will only work if the body is actually text, not binary, data.
Files in a FormData request can be accessed at request.files then you can select the file you included in the FormData e.g. request.files['audio'].
So now if you want to access the actual bytes of the file, in our case 'audio' using .stream, you should make sure first that your cursor points to the first byte and not to the end of the file, in which case you will get empty bytes.
Hence, a good way to do it:
file = request.files['audio']
file.stream.seek(0)
audio = file.read()
If the data is JSON, use request.get_json() to parse it.
I am attempting am attempting to extract some information from a website that requires a post to an ajax script.
I am trying to create an automated script however I am consitently running into an HTTP 500 error. This is in contrast to a different data pull I did from a
url = 'http://www.ise.com/ExchangeDataService.asmx/Get_ISE_Dividend_Volume_Data/'
paramList = ''
paramList += '"' + 'dtStartDate' + '":07/25/2014"'
paramList += ','
paramList += '"' + 'dtEndDate' + '":07/25/2014"';
paramList = '{' + paramList + '}';
response = requests.post(url, headers={
'Content-Type': 'application/json; charset=UTF-8',
'data': paramList,
'dataType':'json'
})
I was wondering if anyone had any recommendations as to what is happening. This isn't proprietary data as they allow you to manually download it in excel format.
The input you're generating is not valid JSON. It looks like this:
{"dtStartDate":07/25/2014","dtEndDate":07/25/2014"}
If you look carefully, you'll notice a missing " before the first 07.
This is one of many reasons you shouldn't be trying to generate JSON by string concatenation. Either build a dict and use json.dump, or if you must, use a multi-line string as a template for str.format or %.
Also, as bruno desthuilliers points out, you almost certainly want to be sending the JSON as the POST body, not as a data header in an empty POST. Doing it the wrong way does happen to work with some back-ends, but only by accident, and that's certainly not something you should be relying on. And if the server you're talking to isn't one of those back-ends, then you're sending the empty string as your JSON data, which is just as invalid.
So, why does this give you a 500 error? Probably because the backend is some messy PHP code that doesn't have an error handler for invalid JSON, so it just bails with no information on what went wrong, so the server can't do anything better than send you a generic 500 error.
If that's a copy/paste from you actual code, 'data' is probably not supposed to be part of the request headers. As a side note: you don't "post to an ajax script", you post to an URL. The fact that this URL is called via an asynchronous request from some javascript on some page of the site is totally irrelevant.
it sounds like a server error. So what your posting could breaking their api due to its formatting.
Or their api could be down.
http://pcsupport.about.com/od/findbyerrormessage/a/500servererror.htm
I'm currently testing out creating a RESTful json API, and in the process I've been testing out posting data via curl primarily to see if I can login through a request. I can't figure out what to do even if I hack it to work, but that's a separate question.
I'm sending the following POST request to my app:
curl -X POST http://localhost:6543/users/signin -d '{"username":"a#a.com","password":"password"}'
And when I see what data is in my request, the output is extremely strange:
ipdb> self.request.POST
MultiDict([('{"username":"a#a.com","password":"password"}', '******')])
ipdb> self.request.POST.keys()
['{"username":"a#a.com","password":"password"}']
ipdb> self.request.POST.values()
[u'']
So, it comes out to be a MultiDict with my json object as a string key, and a blank string as its value?! That doesn't seem right.
Removing the single quotes in my json declaration gives the following:
ipdb> self.request.POST
MultiDict([('username:a#a.com', u'')])
Does anyone have any idea why my data may not be being posted correctly?
Update:
To be clear, the header I'm using is in fact application/x-www-form-urlencoded.
ipdb> self.request.headers['CONTENT-TYPE']
'application/x-www-form-urlencoded'
What I DID find is that for some reason using the requests library works when I do the following:
In [49]: s.post('http://localhost:6543/users/signin', data=[('username', 'a#a.com'), ('password', 'password')], headers={'content-type': 'application/x-www-form-urlencoded'})
Out[49]: <Response [200]>
The fact that it doesn't work with curl as expected is still troubling though.
I'm not sure which content type you are attempting to upload - application/json or application/x-www-form-urlencoded. request.POST only works with the latter option, and request.json_body is used to parse data from a json request body.
To be clear, application/x-www-form-urlencoded is the format used when your web browser submits a form. It's a key/value format looking like a=b&c=d&e=f. From there you can expect request.POST to contain a dictionary with the keys a, c, and e.
I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:
Content-Type text/html
However when I use the python code:
urllib2.urlopen(URL).info()
the resulting output returns:
Content-Type: video/x-flv
I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.
Thanks in advance for reading this post
Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:
import urllib2
request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')
There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:
http://docs.python.org/library/urllib2.html
Content-Type text/html
Really, like that, without the colon?
If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv.
This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).
Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.
Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.
according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .
Asking because Your code works fine for
response.info().getheader('Set cookie')
but once i execute
response.info().get_header('Set cookie')
i get:
Traceback (most recent call last):
File "baza.py", line 11, in <module>
cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'
edit:
Moreover
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc....
for getting raw data for the headers in python2, a little bit of a hack but it works.
"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])
basically "".join(list) will the list of headers, which all include "\n" at the end.
__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.
and ofcourse ["headers"] is selecting the list value from the .info() response value dict
hope this helped you learn a few ez python tricks :)