SimpleJson handling of same named entities - python

I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name
{
"status": "OK",
"usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
"url": "",
"language": "english",
"entities": [
{
"type": "Person",
"relevance": "0.33",
"count": "1",
"text": "Michael Jordan",
"disambiguated": {
"name": "Michael Jordan",
"subType": "Athlete",
"subType": "AwardWinner",
"subType": "BasketballPlayer",
"subType": "HallOfFameInductee",
"subType": "OlympicAthlete",
"subType": "SportsLeagueAwardWinner",
"subType": "FilmActor",
"subType": "TVActor",
"dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
"freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
"umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
"opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
"yago": "http://mpii.de/yago/resource/Michael_Jordan"
}
}
]
}
So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?

The rfc 4627 that defines application/json says:
An object is an unordered collection of zero or more name/value pairs
And:
The names within an object SHOULD be unique.
It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.
You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:
import simplejson as json
from collections import defaultdict
def multidict(ordered_pairs):
"""Convert duplicate keys values to lists."""
# read all values into lists
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
# unpack lists that have only 1 item
for k, v in d.items():
if len(v) == 1:
d[k] = v[0]
return dict(d)
print json.JSONDecoder(object_pairs_hook=multidict).decode(text)
Example
text = """{
"type": "Person",
"subType": "Athlete",
"subType": "AwardWinner"
}"""
Output
{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}

The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
From rfc 2119:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
This is a known problam.
You can solve this by modify the duplicate key, or save him into array.
You can use this code if you want.
import json
def parse_object_pairs(pairs):
"""
This function get list of tuple's
and check if have duplicate keys.
if have then return the pairs list itself.
but if haven't return dict that contain pairs.
>>> parse_object_pairs([("color": "red"), ("size": 3)])
{"color": "red", "size": 3}
>>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
[("color": "red"), ("size": 3), ("color": "blue")]
:param pairs: list of tuples.
:return dict or list that contain pairs.
"""
dict_without_duplicate = dict()
for k, v in pairs:
if k in dict_without_duplicate:
return pairs
else:
dict_without_duplicate[k] = v
return dict_without_duplicate
decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)
str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'
data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)

Related

Constructing GraphQL call string from Python list of dictionaries

I am using Python requests library to execute GraphQL mutation. I need to pass requests library a query parameter which should contain a string which should be constructed from the Python list of Python dictionaries.
Python list of dictionaries looks like:
my_list_of_dicts = [{"custom_module_id": "23", "answer": "some text 2", "user_id": "111"},
{"custom_module_id": "24", "answer": "a", "user_id": "111"}]
Now I need to convert this list of dictionaries in a string so it should look like this:
my_list_of_dicts = [{custom_module_id: "23", answer: "some text 2", user_id: "111"},
{custom_module_id: "24", answer: "a", user_id: "111"}]
Basically I need to get the string that looks like a Python list of dictionaries except that keys of the dictionaries does not have quotations around dictionary key names. I did this and it works:
my_query_string = json.dumps(my_list_of_dicts).replace("\"custom_module_id\"", "custom_module_id")
my_query_string = my_query_string.replace("\"answer\"", "answer")
my_query_string = my_query_string.replace("\"user_id\"", "user_id")
But I was wondering maybe there is better way to achieve this? By "better" I mean some function call that will prepare json/dictionary format for ready to be used GraphQL string.
I think this may help you find your final answer.
Follow this article
gq = """
mutation ReorderProducts($id: ID!, $moves: [MoveInput!]!) {
collectionReorderProducts(id: $id, moves: $moves) {
job {
id
}
userErrors {
field
message
}
}
}
"""
resp = self.sy_graphql_client.execute(
query=gq,
variables={
"id": before_collection_meta.coll_meta.id,
"moves": list(map(lambda mtc:
{
"id": mtc.id, "newPosition": mtc.new_position
}, move_to_commands))
}
)
reorder_job_id = resp["data"]["collectionReorderProducts"]["job"]["id"]
self.sy_graphql_client.wait_for_job(reorder_job_id)

Dynamic var origination from JSON object

this is going to be a kinky one... well it is for me as I've been trying to nail it for a week with no success so far :(
Lets say I get a nested JSON response from an API hit as:
{"Parameters": {
"Name": {
"Unparsed": null,
"First": "John",
"Middle": "A",
"Last": "Smith",
"Suffix": "Jr"
},
"Address": {
"Unparsed": null,
"Line1": "123 Main St",
"Line2": "apt.2",
"City": "New York",
"State": "NY",
"Zip": "12345"
}
and I wanted to create a variables dynamically from the key and assign value from the key's value.
I know how to do it like with name_first = data.get("Name").get(First), but in this case I am highly dependable on JSON response structure and above wont work if the structure is changed (renamed keys, added or deleted key) etc.
So I am working on writing a python script to do it, but so far had no luck getting this nailed.
thanks!
You might use locals().update to update current variables. So, this snippet creates new variables, like Address_Line2, Name_Suffix, etc
from collection import deque
import json
st = deque()
st.append(([], json.loads(your_json)['Parameters']))
while len(st):
prefix, item = st.pop()
if isinstance(item, dict):
for k, v in item.items():
st.append((prefix + [k], v))
else:
print({'_'.join(prefix): item})
locals().update({'_'.join(prefix): item})

Adding key to values in json using Python

This is the structure of my JSON:
"docs": [
{
"key": [
null,
null,
"some_name",
"12345567",
"test_name"
],
"value": {
"lat": "29.538208354844658",
"long": "71.98762580927113"
}
},
I want to add the keys to the key list. This is what I want the output to look like:
"docs": [
{
"key": [
"key1":null,
"key2":null,
"key3":"some_name",
"key4":"12345567",
"key5":"test_name"
],
"value": {
"lat": "29.538208354844658",
"long": "71.98762580927113"
}
},
What's a good way to do it. I tried this but doesn't work:
for item in data['docs']:
item['test'] = data['docs'][3]['key'][0]
UPDATE 1
Based on the answer below, I have tweaked the code to this:
for number, item in enumerate(data['docs']):
# pprint (item)
# print item['key'][4]
newdict["key1"] = item['key'][0]
newdict["yek1"] = item['key'][1]
newdict["key2"] = item['key'][2]
newdict["yek2"] = item['key'][3]
newdict["key3"] = item['key'][4]
newdict["latitude"] = item['value']['lat']
newdict["longitude"] = item['value']['long']
This creates the JSON I am looking for (and I can eliminate the list I had previously). How does one make this JSON persist outside the for loop? Outside the loop, only the last value from the dictionary is added otherwise.
In your first block, key is a list, but in your second block it's a dict. You need to completely replace the key item.
newdict = {}
for number,item in enumerate(data['docs']['key']):
newdict['key%d' % (number+1)] = item
data['docs']['key'] = newdict

how to parse json where key is variable in python?

i am parsing a log file which is in json format,
and contains data in the form of key : value pair.
i was stuck at place where key itself is variable. please look at the attached code
in this code i am able to access keys like username,event_type,ip etc.
problem for me is to access the values inside the "submission" key where
i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1 is a variable key which will change for different users,
how can i access it as a variable ?
{
"username": "batista",
"event_type": "problem_check",
"ip": "127.0.0.1",
"event": {
"submission": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"input_type": "choicegroup",
"question": "",
"response_type": "multiplechoiceresponse",
"answer": "MenuInflater.inflate()",
"variant": "",
"correct": true
}
},
"success": "correct",
"grade": 1,
"correct_map": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"hint": "",
"hintmode": null,
"correctness": "correct",
"npoints": null,
"msg": "",
"queuestate": null
}
}
this is my code how i am solving it :
import json
import pprint
with open("log.log") as infile:
# Loop until we have parsed all the lines.
for line in infile:
# Read lines until we find a complete object
while (True):
try:
json_data = json.loads(line)
username = json_data['username']
print "username :- " + username
except ValueError:
line += next(infile)
how can i access i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1 key and
data inside this key ??
You don't need to know the key in advance, you can simply iterate over the dictionary:
for k,v in obj['event']['submission'].iteritems():
print(k,v)
Suppose you have a dictionary of type d = {"a":"b"} then d.popitem() would give you a tuple ("a","b") which is (key,value). So using this you can access key-value pairs without knowing the key.
In you case if j is the main dictionary then j["event"]["submission"].popitem() would give you tuple
("i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"input_type": "choicegroup",
"question": "",
"response_type": "multiplechoiceresponse",
"answer": "MenuInflater.inflate()",
"variant": "",
"correct": true
})
Hope this is what you were asking.
using python json module you'll end up with a dictionary of parsed values from the above JSON data
import json
parsed = json.loads(this_sample_data_in_question)
# parsed is a dictionary, so are "correct_map" and "submission" dictionary keys within "event" key
So you could iterate over the key, values of the data as a normal dictionary, say like this:
for k, v in parsed.items():
print k, v
Now you could find the (possible different values) of "i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1" key in a quick way like this:
import json
parsed = json.loads(the_data_in_question_as_string)
event = parsed['event']
for key, val in event.items():
if key in ('correct_map', 'submission'):
section = event[key]
for possible_variable_key, its_value in section.items():
print possible_variable_key, its_value
Of course there might be better way of iterating over the dictionary, but that one you could choose based on your coding taste, or performance if you have a fairly larger kind of data than the one posted in here.

Create Dynamic Python Dictionary Reference Path

I'm having trouble dynamically creating a Python dictionary path to loop through and validate a value. Here's what I'd like to do:
Make API call using Requests 1.0 and store the JSON response in a dict.
response = requests.get(path/to/file.json).json()
The response object will be formatted as follows:
{
"status": "OK",
"items": [
{
"name": "Name 1",
"id": 0,
"address":{
"city": "New York",
}
},
{
"name": "Name 2",
"id": 1,
"address":{
"city": "New York",
}
},
{
"name": "Name 3",
"id": 2,
"address":{
"city": "New York",
}
}]
}
Send the response dict, field and value to a function for validation. The function would take the response object and append the field entry to it to define its path then validate against the value. So in theory it would be:
response[field] = value
The code that I wrote to do this was:
def dynamic_assertion(response, field, value):
i = 0
stations = "response['items']"
count = len(response['items'])
while i < count:
path = '%s[%s]%s' % (stations, i, field)
path = path.strip("")
if path != value:
print type(path)
return False
i += 1
return True
dynamic_assertion(response, "['address']['city']", "New York")
I realize that once I create the path string it is no longer an object. How do I create this in a way that will allow me to keep the response object and append the reference path to traverse through? Is this even possible?!
I think you'd be better off avoiding a single path string in favor of a tuple or list of strings which represent the individual keys in the nested dictionaries. That is, rather than "['address']['city']" being your field argument, you'd pass ("address", "city"). Then you just need a loop to go through the keys and see if the final value is the correct one:
def dynamic_assertion(response, field, value):
for item in response["items"]:
for key in field:
item = item[key] # go deeper into the nested dictionary
if item != value:
return False # raising an exception might be more Pythonic
return True
Example output (given the response dict from the question):
>>> dynamic_assertion(response, ("address", "city"), "New York")
True
>>> dynamic_assertion(response, ("address", "city"), "Boston")
False
>>> response["items"][2]["address"]["city"] = "Boston" # make response invalid
>>> dynamic_assertion(response, ("address", "city"), "New York")
False
>>> dynamic_assertion(response, ("address", "city"), "Boston")
False

Categories

Resources