Python issue with getting data out of json.loads - python

I have the following JSON data which I pass through json.loads:
{
"meta":{
"limit":20,
"next":null,
"offset":0,
"previous":null,
"total_count":2
},
"objects":[
{
"attributes":"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'AF126E/AF1266', u'type': u'actuator'}",
"id":1,
"module":"/api/v1/module/1/",
"moduleName":"rfm_ninjablock (ninjablock)",
"name":"HallwayLight"
},
{
"attributes":"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'0x53df5c', u'type': u'sensor'}",
"id":2,
"module":"/api/v1/module/1/",
"moduleName":"rfm_ninjablock (ninjablock)",
"name":"ToiletDoor"
}
]
}
I'm trying to get all the data out of it, but I'm having trouble referencing the data.
My code is as follows:
for object in r['objects']:
for attributes in object.iteritems():
print attributes
Which gives me:
(u'attributes', u"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'AF126E/AF1266', u'type': u'actuator'}")
(u'moduleName', u'rfm_ninjablock (ninjablock)')
(u'id', 1)
(u'module', u'/api/v1/module/1/')
(u'name', u'HallwayLight')
(u'attributes', u"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'0x53df5c', u'type': u'sensor'}")
(u'moduleName', u'rfm_ninjablock (ninjablock)')
(u'id', 2)
(u'module', u'/api/v1/module/1/')
(u'name', u'ToiletDoor')
I'm not really sure of referencing these or if indeed I'm doing it right.
Attributes contains JSON already, as that is how it is stored in the database.

You have a problem with the way the original data was serialized. The attributes dict of each element was not serialized as a set of nested dicts, but as an outer dict containing Python string representations of the inner dicts.
You should post the code that did the original serialization.

As the dict inside attributes ends up as a string representation I used the following code to convert it to the dict:
for object in r['objects']:
attrib = []
attrib = ast.literal_eval(object['attributes'])
print attrib['code']

Related

Iterating through separately-JSON-encoded strings found inside a document

I am having trouble iterating though json, containing nested json strings (with escaped quotes) in itself.
(My apologies in advance, I am sort of new and probably missing some important info...)
Actually I have several questions:
1) How can I iterate (as I tried to do below with nested for loops) through the elements beneath "section-content" of the section "nodes" (!not of the section "element-names"!)? My problem seems to be, that section-content is a string with escaped quotes, which represents a separate json string in itself.
2) Is the JSON example provided even valid json? I tried several validators, which all seem to fail when the escaped quotes come into play.
3) Is there a smarter method of accessing specific elements, instead of just iterating through the whole tree?
I am thinking of something that specifies key/value pairs like:
my_json_obj['sections']['section-id' = 'nodes']['section-content']['occ_id' = '051MZjd97jUdYfSEOG}k10']
Code:
import json
import requests
import pprint
client = requests.session()
header = {'X-CSRF-Token': 'Fetch', 'Accept': 'application/json', 'Content-Type': 'application/json'}
response = client.get('http://xxxxxx.xxx/ProcessManagement/BranchContentSet(BranchId=\'051MZjd97jUdYfX7{dREAm\',SiteId=\'\',SystemRole=\'D\')/$value',auth=('TestUser', 'TestPass'),headers=header)
my_json_obj = response.json()
sections = my_json_obj['sections']
for mysection in sections:
print(mysection['section-id'])
if mysection['section-id'] == 'NODES':
nodes = mysection['section-content'] #nodes seems to be string
for mynode in nodes:
print(mynode) #prints string character by character
JSON example:
{
"smud-data-version": "0.1",
"sections": [
{
"section-id": "ELEMENT-NAMES",
"section-content-version": "",
"section-content": "{\"D\":[
{\"occ_id\":\"051MZjd97kcBgtZiEI0IvW\",\"lang\":\"E\",\"name\":\"0TD1 manuell\"},
{\"occ_id\":\"051MZjd97kcBgtZiEH}IvW\",\"lang\":\"E\",\"name\":\"Documentation\"}
]}"
},
{
"section-id": "NODES",
"section-content-version": "1.0",
"section-content": "[
{\"occ_id\":\"051MZjd97jUdYfSEOG}k10\",\"obj_type\":\"ROOT\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[]},
{\"occ_id\":\"051MZjd97jUdYfSEOH0k10\",\"obj_type\":\"ROOTGRP\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[]},
{\"occ_id\":\"051MZjd97jcAnKoe03JRRm\",\"obj_type\":\"SCN\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[
{\"attr_type\":\"NODE_CHANGED_AT\",\"lang\":\"\",\"values\":[\"20190213095843\"]},
{\"attr_type\":\"NODE_CHANGED_BY\",\"lang\":\"\",\"values\":[\"TestUser\"]},
{\"attr_type\":\"TCASSIGNMENTTYPE\",\"lang\":\"\",\"values\":[\"A\"]},
{\"attr_type\":\"DESCRIPTION\",\"lang\":\"E\",\"values\":[\"Scenario\"]}
]}
]"
}
]
}
Actual output:
ELEMENT-NAMES
NODES
[
{
"
o
c
c
_
i
d
"
Hopefully you can convince the folks who are generating this data to fix their server. That said, to work around the issues might look like:
# instead of using requests.json(), remove literal newlines and then decode ourselves
# ...because the original data has newline literals in positions where they aren't allowed.
my_json_obj = json.loads(response.text.replace('\n', ''))
for section in my_json_obj['sections']:
if section['section-id'] != 'NODES': continue
# doing another json.loads() here so you treat content as an array, not a string
for node in json.loads(section['section-content']):
__import__('pprint').pprint(node)
...properly emits as output:
{u'attributes': [],
u'deleted': u'',
u'obj_type': u'ROOT',
u'occ_id': u'051MZjd97jUdYfSEOG}k10',
u'reference': u''}
{u'attributes': [],
u'deleted': u'',
u'obj_type': u'ROOTGRP',
u'occ_id': u'051MZjd97jUdYfSEOH0k10',
u'reference': u''}
{u'attributes': [{u'attr_type': u'NODE_CHANGED_AT',
u'lang': u'',
u'values': [u'20190213095843']},
{u'attr_type': u'NODE_CHANGED_BY',
u'lang': u'',
u'values': [u'TestUser']},
{u'attr_type': u'TCASSIGNMENTTYPE',
u'lang': u'',
u'values': [u'A']},
{u'attr_type': u'DESCRIPTION',
u'lang': u'E',
u'values': [u'Scenario']}],
u'deleted': u'',
u'obj_type': u'SCN',
u'occ_id': u'051MZjd97jcAnKoe03JRRm',
u'reference': u''}```

python 2.7 urllib2.Request KeyError

I'm successfully filling a json.load(response) request and able to navigate/see the result and appears to be what I am expecting. However, I'm getting a KeyError as I try to access attributes. In this instance I need to set a local variable to the "SHORT_NAME" attribute.
{u'fieldAliases': {u'SHORT_NAME': u'SHORT_NAME', u'OBJECTID':
u'OBJECTID'}, u'fields': [{u'alias': u'OBJECTID', u'type':
u'esriFieldTypeOID', u'name': u'OBJECTID'}, {u'alias': u'SHORT_NAME',
u'length': 50, u'type': u'esriFieldTypeString', u'name':
u'SHORT_NAME'}], u'displayFieldName': u'LONG_NAME', u'features':
[{u'attributes': {u'SHORT_NAME': u'Jensen Beach to Jupiter Inlet',
u'OBJECTID': 17}}]}
My python code to access the above:
reqAp = urllib2.Request(queryURLAp, paramsAp)
responseAp = urllib2.urlopen(reqAp)
jsonResultAp = json.load(responseAp) #all good here! above example is what this contains
#trying to set variable to the SHORT_NAME attribute
for featureAp in jsonResultAp['features']:
aqp = feature['attributes']['SHORT_NAME']
#this fails with: "KeyError: 'SHORT_NAME'"
It's obvious that "SHORT_NAME" is there so I'm not quite sure what I'm doing incorrectly.
Thanks for any feedback!
Change:
aqp = feature['attributes']['SHORT_NAME']
To:
aqp = featureAp['attributes']['SHORT_NAME']

Storing Python dictionary data into a csv

I have a list of dicts that stores Facebook status data (Graph API):
len(test_statuses)
3
test_statuses
[{u'comments': {u'data': [{u'created_time': u'2016-01-27T10:47:30+0000',
u'from': {u'id': u'1755814687982070', u'name': u'Fadi Cool Panther'},
u'id': u'447173898813933_447182555479734',
u'message': u'Sidra Abrar'}],
u'paging': {u'cursors': {u'after': u'WTI5dGJXVnVkRjlqZFhKemIzSTZORFEzTVRneU5UVTFORGM1TnpNME9qRTBOVE00T1RFMk5UQT0=',
u'before': u'WTI5dGJXVnVkRjlqZFhKemIzSTZORFEzTVRneU5UVTFORGM1TnpNME9qRTBOVE00T1RFMk5UQT0='}},
u'summary': {u'can_comment': False,
u'order': u'ranked',
u'total_count': 1}},
u'created_time': u'2016-01-27T10:16:56+0000',
u'id': u'5842136044_10153381090881045',
u'likes': {u'data': [{u'id': u'729038357232696'},
{u'id': u'547422955417520'},
{u'id': u'422351987958296'},
{u'id': u'536057309903473'},
{u'id': u'206846772999449'},
{u'id': u'1671329739783719'},
{u'id': u'991398107599340'},
{u'id': u'208751836138231'},
{u'id': u'491047841097510'},
{u'id': u'664580270350825'}],
u'paging': {u'cursors': {u'after': u'NjY0NTgwMjcwMzUwODI1',
u'before': u'NzI5MDM4MzU3MjMyNjk2'},
u'next': u'https://graph.facebook.com/v2.5/5842136044_10153381090881045/likes?limit=10&summary=true&access_token=521971961312518|121ca7ef750debf4c51d1388cf25ead4&after=NjY0NTgwMjcwMzUwODI1'},
u'summary': {u'can_like': False, u'has_liked': False, u'total_count': 13}},
u'link': u'https://www.facebook.com/ukbhangrasongs/videos/447173898813933/',
u'message': u'Track : Ik Waar ( Official Music Video )\nSinger : Falak shabir ft DJ Shadow\nMusic by Dj Shadow\nFor more : UK Bhangra Songs',
u'shares': {u'count': 7},
u'type': u'video'},
{u'comments': {u'data': [],
u'summary': {u'can_comment': False,
u'order': u'chronological',
u'total_count': 0}},
u'created_time': u'2016-01-27T06:15:40+0000',
u'id': u'5842136044_10153380831261045',
u'likes': {u'data': [],
u'summary': {u'can_like': False, u'has_liked': False, u'total_count': 0}},
u'message': u'I want to work with you. tracks for flicks',
u'type': u'status'}]
I need to extract each status text and the text of each comment under the status, which I can do by appending them to separate lists e.g.,:
status_text = []
comment_text = []
for s in test_statuses:
try:
status_text.append(s['message'])
for c in s['comments']['data']:
comment_text.append(c['message'])
except:
continue
This gives me two lists of separate lengths len(status_text) = 2, len(comment_text) = 49.
Unfortunately that's a horrible way of dealing with the data since I cannot keep track of what comment belongs to what status. Ideally I would like to store this as a tree structure and export in into a cvs file, but I can't figure out how to do it.
Probable data acv data structure:
Text is_comment
status1 0
status2 0
statusN 0
comment1 status1
comment2 status1
commentN statusN
Why do you need this to be in a CSV? It is already structured and ready to be persisted as JSON.
If you really need the tabular approach offered by CSV, then you have to either denormalize it, or use more than one CSV table with references from one to another (and again, the best approach would be to put the data in an SQL database which takes care of the relationships for you)
That said, the way to denormalize is simply to save the same status text to each row where a comment is - that is: record you CSV row in the innermost loop with your approach:
import csv
status_text = []
comment_text = []
writer = csv.writer(open("mycsv.csv", "wt"))
for s in test_statuses:
test_messages.append(s['message'])
for c in s['comments']['data']:
test_comments.append(c['message'])
writer.writerow((s['message'], c['message']))
Note that you'd probably be better by writing the status idto each row, and create a second table with the status message where the id is the key (and put it in a database instead of various CSV files). And then, again, you are probably better of simply keeping the JSON. If you need search capabilities, use a JSON capable database such as MongoDB or PostgreSQL

How do I merge and sort two json lists using key value

I can get a JSON list of groups and devices from an API, but the key values don't allow me to do a merge without manipulating the returned lists. Unfortunately, the group info and devices info have to be retrieved using separate http requests.
The code for getting the group info looks like this:
#Python Code
import requests
import simplejson as json
import datetime
import pprintpp
print datetime.datetime.now().time()
url = 'https://www.somecompany.com/api/v2/groups/?fields=id,name'
s = requests.Session()
## Ver2 API Authenticaion ##
headers = {
'X-ABC-API-ID': 'nnnn-nnnn-nnnn-nnnn-nnnn',
'X-ABC-API-KEY': 'nnnnnnnn',
'X-DE-API-ID': 'nnnnnnnn',
'X-DE-API-KEY': 'nnnnnnnn'
}
r = json.loads(s.get((url), headers=headers).text)
print "Working...Groups extracted"
groups = r["data"]
print "*** Ver2 API Groups Information ***"
pprintpp.pprint (groups)
The printed output of groups looks like this:
#Groups
[
{u'id': u'0001', u'name': u'GroupA'},
{u'id': u'0002', u'name': u'GroupB'},
]
The code for getting the devices info looks like this:
url = 'https://www.somecompany.com/api/v2/devicess/?limit=500&fields=description,group,id,name'
r = json.loads(s.get((url), headers=headers).text)
print "Working...Devices extracted"
devices = r["data"]
print "*** Ver2 API Devices Information ***"
pprintpp.pprint (devices)
The devices output looks like this:
#Devices
[
{
u'description': u'GroupB 100 (City State)',
u'group': u'https://www.somecompany.com/api/v2/groups/0002/',
u'id': u'90001',
u'name': u'ABC550-3e9',
},
{
u'description': u'GroupA 101 (City State)',
u'group': u'https://www.somecompany.com/api/v2/groups/0001/',
u'id': u'90002',
u'name': u'ABC500-3e8',
}
]
What I would like to do is to be able to merge and sort the two JSON lists into an output that looks like this:
#Desired Output
#Seperated List of GroupA & GroupB Devices
[
{u'id': u'0001', u'name': u'GroupA'},
{
u'description': u'GroupA 101 (City State)',
u'group': u'https://www.somecompany.com/api/v2/groups/0001/',
u'id': u'90002',
u'name': u'ABC500-3e8',
},
{u'id': u'0002', u'name': u'GroupB'},
{
u'description': u'GroupB 100 (City State)',
u'group': u'https://www.somecompany.com/api/v2/groups/0002/',
u'id': u'90001',
u'name': u'ABC550-3e9',
}
]
A couple of problems I am having is that the key names for groups and devices output are not unique. The key named 'id' in groups is actually the same value as the last 4 digits of the key named 'group' in devices, and is the value I wish to use for the sort. Also, 'id' and 'name' in groups is different than 'id' and 'name' in devices. My extremely limited skill with Python is making this quite the challenge. Any help with pointing me in the correct direction for a solution will be greatly appreciated.
This program produces your desired output:
import pprintpp
groups = [
{u'id': u'0001', u'name': u'GroupA'},
{u'id': u'0002', u'name': u'GroupB'},
]
devices = [
{
u'description': u'GroupB 100 (City State)',
u'group': u'https://www.somecompany.com/api/v2/groups/0002/',
u'id': u'90001',
u'name': u'ABC550-3e9',
},
{
u'description': u'GroupA 101 (City State)',
u'group': u'https://www.somecompany.com/api/v2/groups/0001/',
u'id': u'90002',
u'name': u'ABC500-3e8',
}
]
desired = sorted(
groups + devices,
key = lambda x: x.get('group', x.get('id')+'/')[-5:-1])
pprintpp.pprint(desired)
Or, if that lambda does not seem self-documenting:
def key(x):
'''Sort on either the last few digits of x['group'], if that exists,
or the entirety of x['id'], if x['group'] does not exist.
'''
if 'group' in x:
return x['group'][-5:-1]
return x['id']
desired = sorted(groups + devices, key=key)

Python Json Parse

I have made a mistake during my storage of json strings to a database. Accidentally I did not store the string as json but I stored it as the string formation of the Object.
I received
my_jstring['field']
and I have inserted as a string to the database.
my_jstring['field'] is not json but a python json object. Is it possible to parse again this object that is in string format?
My string is the following:
'"\'{u\'\'full_name\'\': u\'\'Dublin City\'\', u\'\'url\'\': u\'\'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json\'\', u\'\'country\'\': u\'\'Ireland\'\', u\'\'place_type\'\': u\'\'city\'\', u\'\'bounding_box\'\': {u\'\'type\'\': u\'\'Polygon\'\', u\'\'coordinates\'\': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u\'\'contained_within\'\': [], u\'\'country_code\'\': u\'\'IE\'\', u\'\'attributes\'\': {}, u\'\'id\'\': u\'\'7dde0febc9ef245b\'\', u\'\'name\'\': u\'\'Dublin City\'\'}\'"'
Use ast.literal_eval() to parse Python literals back into a Python object.
You appear to have doubly qouted the value however, adding in extra single quotes. These need to be repaired too:
data = ast.literal_eval(data)
data = data[1:-1].replace("''", "'")
obj = ast.literal_eval(data)
Demo:
>>> import ast
>>> data = '"\'{u\'\'full_name\'\': u\'\'Dublin City\'\', u\'\'url\'\': u\'\'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json\'\', u\'\'country\'\': u\'\'Ireland\'\', u\'\'place_type\'\': u\'\'city\'\', u\'\'bounding_box\'\': {u\'\'type\'\': u\'\'Polygon\'\', u\'\'coordinates\'\': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u\'\'contained_within\'\': [], u\'\'country_code\'\': u\'\'IE\'\', u\'\'attributes\'\': {}, u\'\'id\'\': u\'\'7dde0febc9ef245b\'\', u\'\'name\'\': u\'\'Dublin City\'\'}\'"'
>>> data = ast.literal_eval(data)
>>> data = data[1:-1].replace("''", "'")
>>> obj = ast.literal_eval(data)
>>> obj
{u'country_code': u'IE', u'url': u'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json', u'country': u'Ireland', u'place_type': u'city', u'bounding_box': {u'type': u'Polygon', u'coordinates': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u'contained_within': [], u'full_name': u'Dublin City', u'attributes': {}, u'id': u'7dde0febc9ef245b', u'name': u'Dublin City'}

Categories

Resources