Python grabbing JSON from POST method - python

I have an Android appthat originally posted some strings in json format to a python cgi script, which all worked fine. The problem is when the json object contains lists, then python (Using simplejson) when it gets them is still treating them as a big string
Here is a text dump of the json once it reaches python before I parse it:
{"Prob1":"[1, 2, 3]","Name":"aaa","action":1,"Prob2":"[20, 20, 20]","Tasks":"[1 task, 2 task, 3 task]","Description":""}
if we look at the "Tasks" key, the list after is clearly a single string with the elements all treated as one string (i.e. no quotes around each element). it's the same for prob1 and prob2. action, Name etc are all fine. I'm not sure if this is what python is expecting but I'm guessing not?
Just in case the android data was to blame i added quotes around each element of the arraylist like this:
Tasks.add('"'+row.get(1).toString()+'"'); instead of Tasks.add(row.get(1).toString());
On the webserver it's now received as
{"Prob1":"[1, 2, 3]","Name":"aaa","action":1,"Prob2":"[20, 20, 20]","Tasks":"[\"1 task\", \"2 task\", \"3 task\"]","Description":""}
but i still get the same problem; when i iterate through "Tasks" in a loop it's looping through each individual character as if the whole thing were a string :/
Since I don't know what the json structure should look like before it gets to Python I'm wondering whether it's a probem with the Android sending the data or my python interpreting it.. though from the looks of that script I've been guessing it's been the sending.
In the Android App I'm sending one big JSONObject containing "Tasks" and the associated arraylist as one of the key value pairs... is this correct? or should JSONArray be involved anywhere?
Thanks for any help everyone, I'm new to the whole JSON thing as well as to Android/Java (And only really a novice at Python too..). I can post additional code if anyone needs it, I just didn't want to lengthen the post too much
EDIT:
when I add
json_data=json_data.replace(r'"[','[')
json_data=json_data.replace(r']"',']')
json_data=json_data.replace(r'\"','"')
to the python it WORKS!!!! but that strikes me as a bit nasty and just papering over a crack..

Tasks is just a big string. To be a valid list, it would have to be ["1 task", "2 task", "3 task"]
Same goes for Prob1 and Prob2. To be a valid list, the brackets should not be enclosed in quotes.

Related

JSON Parsing with python from Rethink database [Python]

Im trying to retrieve data from a database named RethinkDB, they output JSON when called with r.db("Databasename").table("tablename").insert([{ "id or primary key": line}]).run(), when doing so it outputs [{'id': 'ValueInRowOfid\n'}] and I want to parse that to just the value eg. "ValueInRowOfid". Ive tried with JSON in Python, but I always end up with the typeerror: list indices must be integers or slices, not str, and Ive been told that it is because the Database outputs invalid JSON format. My question is how can a JSON format be invalid (I cant see what is invalid with the output) and also what would be the best way to parse it so that the value "ValueInRowOfid" is left in a Operator eg. Value = ("ValueInRowOfid").
This part imports the modules used and connects to RethinkDB:
import json
from rethinkdb import RethinkDB
r = RethinkDB()
r.connect( "localhost", 28015).repl()
This part is getting the output/value and my trial at parsing it:
getvalue = r.db("Databasename").table("tablename").sample(1).run() # gets a single row/value from the table
print(getvalue) # If I print that, it will show as [{'id': 'ValueInRowOfid\n'}]
dumper = json.dumps(getvalue) # I cant use `json.loads(dumper)` as JSON object must be str. Which the output of the database isnt (The output is a list)
parsevalue = json.loads(dumper) # After `json.dumps(getvalue)` I can now load it, but I cant use the loaded JSON.
print(parsevalue["id"]) # When doing this it now says that the list is a str and it needs to be an integers or slices. Quite frustrating for me as it is opposing it self eg. It first wants str and now it cant use str
print(parsevalue{'id'}) # I also tried to shuffle it around as seen here, but still the same result
I know this is janky and is very hard to comprehend this level of stupidity that I might be on. As I dont know if it is the most simple problem or something that just isnt possible (Which it should or else I cant use my data in the database.)
Thank you for reading this through and not jumping straight into the comments and say that I have to read the JSON documentation, because I have and I havent found a single piece that could help me.
I tried reading the documentation and watching tutorials about JSON and JSON parsing. I also looked for others whom have had the same problems as me and couldnt find.
It looks like it's returning a dictionary ({}) inside a list ([]) of one element.
Try:
getvalue = r.db("Databasename").table("tablename").sample(1).run()
print(getvalue[0]['id'])

Scraping data from a http & javaScript site

I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap
I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.

Python Dictionary JSON Key, Val Extraction

I'm having a hard time understanding what is going on with this walmart API and I can't seem to iterate through key, values like I wish. I get different errors depending on the way I attack the problem.
import requests
import json
import urllib
response=requests.get("https://grocery.walmart.com/v0.1/api/stores/4104/departments/1256653758154/aisles/1256653758260/products?count=60&start=0")
info = json.loads(response.text)
print(info)
I'm not sure if I'm playing with a dictionary or a JSON object.
I'm thrown off because the API itself has no quotes over key/val.
When I do a json.loads it comes in but only comes in with single quotes.
I've tried going at it with for-loops but can only traverse the top layer and nothing else. My overall goal is to retrieve the info from the API link, turn it into JSON and be able to grab which ever key/val I need from it.
I'm not sure if I'm playing with a dictionary or a JSON object.
Python has no concept of a "JSON Object". It's a dictionary.
I'm thrown off because the API itself has no quotes over key/val.
Yes it does
{"aisleName":"Organic Dairy, Eggs & Meat","productCount":17,"products":[{"data":
When I do a json.loads it comes in but only comes in with single quotes
Because it's a Python dictionary, and the repr() of dict uses single quotes.
Try print(info['aisleName']) for example

Python Requests POST to form records incorrect payload (checkboxes)

I am having quite a bit of trouble with getting the correct form data saved to a server via POST with Requests (2.8.1) module.
I have previous code which does exactly what I want it to do: it encodes a bunch of key:value pairs into the correct header:value payload dict format, and successfully POSTS to the URI. I get a 200 response (what I'm looking for) and everything is great.
This is a section of the OLD payload encoding function, with a ton of key:value pairs omitted for brevity.
Note: the checkbox value set could be any sequence of numbers between 1 and 25, I just wrote it as
item in range(1,5)
to illustrate that the list is comprised of int numbers, i.e. [ "", 1, 2, 3, 4, 5,...] or [ "", 2, 7, 5, 1, 25,...] etc.
checkboxList = ["",]
for item in range(1,5):
checkboxList.append(item)
payload['checkbox[ids][]'] = checkboxList
...
response = request.post(data_url, data=payload)
>> 200 OK!
Here is a print of what the payload dict (checkboxes) looks like before it's sent to the server:
{... "checkbox[ids][]" : [ "", 2, 17, 20, 5], ...}
And when I look on the page with a browser, all the payload information has been correctly recorded (omitted above) AND the checkboxes (shown above) are correct!
Originally, the checkbox values came from an excel file, as did the rest of the information that was put into the payload before being POSTed to the server. However, now I'm retrieving the information from an SQLite db.
Below is the NEW code that records the checkboxes incorrectly. I should note: I do not have access to the server, so I cannot easily tell if it's a server issue, but let's assume it's not the servers fault. I've had this issue previously, but I got it to work with the above code. However, now that I've started to store the values I need in a db, I cannot get the correct checkboxes recorded by the server.
This is what the data from the db column looks like:
12-5-1-22-4
(... I know this isn't great practice for DB mgmt, but I assume this isn't why the POST is recording the wrong data, and I wanted this question to be as closely representational to my code as possible.)
checkList = checkboxesFromDB.split('-')
payload['checkbox[ids][]'] = checkList
...
response = request.post(data_url, data=payload)
>> 200 OK!
When I look at the site with the browser, it records the checkboxes incorrectly. Now, i should note that 3 checkboxes are selected no matter what I pass to payload[checkbox[ids][]]
It's ALWAYS the same 3, incorrect checkboxes, even if I completely omit checkbox[ids][] from the payload dict. Knowing that, we could assume its a server issue. However, the nearly EXACT code from above works (when I grab the info from an excel file).
I've tried the following (with only one value as a test) without getting the correct checkboxes recorded by the server:
payload['checkbox[ids][]'] = '1'
payload['checkbox[ids][]'] = 1
payload['checkbox[ids][]'] = [1]
payload['checkbox[ids][]'] = ["",1]
payload['checkbox[ids][]'] = [1,""]
When uploading images to the same server, I had an encoding issue when retrieving the image BLOB from the db and trying to pass the buffer object directly to Requests as a file, but I fixed this with cStringIO encoding. (It took me forever as I'm really new to programming, and still unsure of syntax, let alone ways to handle this sort of stuff....) I thought I might be having a similar encoding issue, but with the testing and research I've done, I cannot determine either way as I feel like I'm a bit over my head.
I apologize if this is completely NOOB, but I've done extensive research, trying so many different things that I could think of. I tried passing strings, lists, dicts, forcing encoding of lists as utf-8.
The main reason I'm so perplexed is my original code WORKS, and my new code is nearly identical but doesn't. The only real difference I can think of is now my information is coming from a SQLite db (this particular checkbox column is TEXT type)
Can anyone help me, or point me in a new direction I haven't thought of/know of?
I went through all payload pairs to find that it was an issue with HTML.
I was saving HTML in my SQLite db (via BeautifulSoup without prettifying it) as TEXT. Then I was retrieving it and sending it as a string. This was throwing off the server response.
I have since swapped that sql column value type to VARCHAR (as is best for my use) and prettify it like this foo = bar.prettify(formatter="html")before saving to the db. Now, when i retrieve the value and pass it to the payload, everything works as it should.

Representation of python dictionaries with unicode in database queries

I have a problem that I would like to know how to efficiently tackle.
I have data that is JSON-formatted (used with dumps / loads) and contains unicode.
This is part of a protocol implemented with JSON to send messages. So messages will be sent as strings and then loaded into python dictionaries. This means that the representation, as a python dictionary, afterwards will look something like:
{u"mykey": u"myVal"}
It is no problem in itself for the system to handle such structures, but the thing happens when I'm going to make a database query to store this structure.
I'm using pyOrient towards OrientDB. The command ends up something like:
"CREATE VERTEX TestVertex SET data = {u'mykey': u'myVal'}"
Which will end up in the data field getting the following values in OrientDB:
{'_NOT_PARSED_': '_NOT_PARSED_'}
I'm assuming this problem relates to other cases as well when you wish to make a query or somehow represent a data object containing unicode.
How could I efficiently get a representation of this data, of arbitrary depth, to be able to use it in a query?
To clarify even more, this is the string the db expects:
"CREATE VERTEX TestVertex SET data = {'mykey': 'myVal'}"
If I'm simply stating the wrong problem/question and should handle it some other way, I'm very much open to suggestions. But what I want to achieve is to have an efficient way to use python2.7 to build a db-query towards orientdb (using pyorient) that specifies an arbitrary data structure. The data property being set is of the OrientDB type EMBEDDEDMAP.
Any help greatly appreciated.
EDIT1:
More explicitly stating that the first code block shows the object as a dict AFTER being dumped / loaded with json to avoid confusion.
Dargolith:
ok based on your last response it seems you are simply looking for code that will dump python expression in a way that you can control how unicode and other data types print. Here is a very simply function that provides this control. There are ways to make this function more efficient (for example, by using a string buffer rather than doing all of the recursive string concatenation happening here). Still this is a very simple function, and as it stands its execution is probably still dominated by your DB lookup.
As you can see in each of the 'if' statements, you have full control of how each data type prints.
def expr_to_str(thing):
if hasattr(thing, 'keys'):
pairs = ['%s:%s' % (expr_to_str(k),expr_to_str(v)) for k,v in thing.iteritems()]
return '{%s}' % ', '.join(pairs)
if hasattr(thing, '__setslice__'):
parts = [expr_to_str(ele) for ele in thing]
return '[%s]' % (', '.join(parts),)
if isinstance(thing, basestring):
return "'%s'" % (str(thing),)
return str(thing)
print "dumped: %s" % expr_to_str({'one': 33, 'two': [u'unicode', 'just a str', 44.44, {'hash': 'here'}]})
outputs:
dumped: {'two':['unicode', 'just a str', 44.44, {'hash':'here'}], 'one':33}
I went on to use json.dumps() as sobolevn suggested in the comment. I didn't think of that one at first since I wasn't really using json in the driver. It turned out however that json.dumps() provided exactly the formats I needed on all the data types I use. Some examples:
>>> json.dumps('test')
'"test"'
>>> json.dumps(['test1', 'test2'])
'["test1", "test2"]'
>>> json.dumps([u'test1', u'test2'])
'["test1", "test2"]'
>>> json.dumps({u'key1': u'val1', u'key2': [u'val21', 'val22', 1]})
'{"key2": ["val21", "val22", 1], "key1": "val1"}'
If you need to take more control of the format, quotes or other things regarding this conversion, see the reply by Dan Oblinger.

Categories

Resources