Python Json Parse - python

I have made a mistake during my storage of json strings to a database. Accidentally I did not store the string as json but I stored it as the string formation of the Object.
I received
my_jstring['field']
and I have inserted as a string to the database.
my_jstring['field'] is not json but a python json object. Is it possible to parse again this object that is in string format?
My string is the following:
'"\'{u\'\'full_name\'\': u\'\'Dublin City\'\', u\'\'url\'\': u\'\'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json\'\', u\'\'country\'\': u\'\'Ireland\'\', u\'\'place_type\'\': u\'\'city\'\', u\'\'bounding_box\'\': {u\'\'type\'\': u\'\'Polygon\'\', u\'\'coordinates\'\': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u\'\'contained_within\'\': [], u\'\'country_code\'\': u\'\'IE\'\', u\'\'attributes\'\': {}, u\'\'id\'\': u\'\'7dde0febc9ef245b\'\', u\'\'name\'\': u\'\'Dublin City\'\'}\'"'

Use ast.literal_eval() to parse Python literals back into a Python object.
You appear to have doubly qouted the value however, adding in extra single quotes. These need to be repaired too:
data = ast.literal_eval(data)
data = data[1:-1].replace("''", "'")
obj = ast.literal_eval(data)
Demo:
>>> import ast
>>> data = '"\'{u\'\'full_name\'\': u\'\'Dublin City\'\', u\'\'url\'\': u\'\'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json\'\', u\'\'country\'\': u\'\'Ireland\'\', u\'\'place_type\'\': u\'\'city\'\', u\'\'bounding_box\'\': {u\'\'type\'\': u\'\'Polygon\'\', u\'\'coordinates\'\': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u\'\'contained_within\'\': [], u\'\'country_code\'\': u\'\'IE\'\', u\'\'attributes\'\': {}, u\'\'id\'\': u\'\'7dde0febc9ef245b\'\', u\'\'name\'\': u\'\'Dublin City\'\'}\'"'
>>> data = ast.literal_eval(data)
>>> data = data[1:-1].replace("''", "'")
>>> obj = ast.literal_eval(data)
>>> obj
{u'country_code': u'IE', u'url': u'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json', u'country': u'Ireland', u'place_type': u'city', u'bounding_box': {u'type': u'Polygon', u'coordinates': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u'contained_within': [], u'full_name': u'Dublin City', u'attributes': {}, u'id': u'7dde0febc9ef245b', u'name': u'Dublin City'}

Related

Iterating through separately-JSON-encoded strings found inside a document

I am having trouble iterating though json, containing nested json strings (with escaped quotes) in itself.
(My apologies in advance, I am sort of new and probably missing some important info...)
Actually I have several questions:
1) How can I iterate (as I tried to do below with nested for loops) through the elements beneath "section-content" of the section "nodes" (!not of the section "element-names"!)? My problem seems to be, that section-content is a string with escaped quotes, which represents a separate json string in itself.
2) Is the JSON example provided even valid json? I tried several validators, which all seem to fail when the escaped quotes come into play.
3) Is there a smarter method of accessing specific elements, instead of just iterating through the whole tree?
I am thinking of something that specifies key/value pairs like:
my_json_obj['sections']['section-id' = 'nodes']['section-content']['occ_id' = '051MZjd97jUdYfSEOG}k10']
Code:
import json
import requests
import pprint
client = requests.session()
header = {'X-CSRF-Token': 'Fetch', 'Accept': 'application/json', 'Content-Type': 'application/json'}
response = client.get('http://xxxxxx.xxx/ProcessManagement/BranchContentSet(BranchId=\'051MZjd97jUdYfX7{dREAm\',SiteId=\'\',SystemRole=\'D\')/$value',auth=('TestUser', 'TestPass'),headers=header)
my_json_obj = response.json()
sections = my_json_obj['sections']
for mysection in sections:
print(mysection['section-id'])
if mysection['section-id'] == 'NODES':
nodes = mysection['section-content'] #nodes seems to be string
for mynode in nodes:
print(mynode) #prints string character by character
JSON example:
{
"smud-data-version": "0.1",
"sections": [
{
"section-id": "ELEMENT-NAMES",
"section-content-version": "",
"section-content": "{\"D\":[
{\"occ_id\":\"051MZjd97kcBgtZiEI0IvW\",\"lang\":\"E\",\"name\":\"0TD1 manuell\"},
{\"occ_id\":\"051MZjd97kcBgtZiEH}IvW\",\"lang\":\"E\",\"name\":\"Documentation\"}
]}"
},
{
"section-id": "NODES",
"section-content-version": "1.0",
"section-content": "[
{\"occ_id\":\"051MZjd97jUdYfSEOG}k10\",\"obj_type\":\"ROOT\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[]},
{\"occ_id\":\"051MZjd97jUdYfSEOH0k10\",\"obj_type\":\"ROOTGRP\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[]},
{\"occ_id\":\"051MZjd97jcAnKoe03JRRm\",\"obj_type\":\"SCN\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[
{\"attr_type\":\"NODE_CHANGED_AT\",\"lang\":\"\",\"values\":[\"20190213095843\"]},
{\"attr_type\":\"NODE_CHANGED_BY\",\"lang\":\"\",\"values\":[\"TestUser\"]},
{\"attr_type\":\"TCASSIGNMENTTYPE\",\"lang\":\"\",\"values\":[\"A\"]},
{\"attr_type\":\"DESCRIPTION\",\"lang\":\"E\",\"values\":[\"Scenario\"]}
]}
]"
}
]
}
Actual output:
ELEMENT-NAMES
NODES
[
{
"
o
c
c
_
i
d
"
Hopefully you can convince the folks who are generating this data to fix their server. That said, to work around the issues might look like:
# instead of using requests.json(), remove literal newlines and then decode ourselves
# ...because the original data has newline literals in positions where they aren't allowed.
my_json_obj = json.loads(response.text.replace('\n', ''))
for section in my_json_obj['sections']:
if section['section-id'] != 'NODES': continue
# doing another json.loads() here so you treat content as an array, not a string
for node in json.loads(section['section-content']):
__import__('pprint').pprint(node)
...properly emits as output:
{u'attributes': [],
u'deleted': u'',
u'obj_type': u'ROOT',
u'occ_id': u'051MZjd97jUdYfSEOG}k10',
u'reference': u''}
{u'attributes': [],
u'deleted': u'',
u'obj_type': u'ROOTGRP',
u'occ_id': u'051MZjd97jUdYfSEOH0k10',
u'reference': u''}
{u'attributes': [{u'attr_type': u'NODE_CHANGED_AT',
u'lang': u'',
u'values': [u'20190213095843']},
{u'attr_type': u'NODE_CHANGED_BY',
u'lang': u'',
u'values': [u'TestUser']},
{u'attr_type': u'TCASSIGNMENTTYPE',
u'lang': u'',
u'values': [u'A']},
{u'attr_type': u'DESCRIPTION',
u'lang': u'E',
u'values': [u'Scenario']}],
u'deleted': u'',
u'obj_type': u'SCN',
u'occ_id': u'051MZjd97jcAnKoe03JRRm',
u'reference': u''}```

str to unicode not working python

I am trying to match the exact format of a response. I have:
def recreate_metrics_format(self):
MAP = {
# "measurment_name": "measurement_name_",
"ref": "measurement_ref_",
"value": "measurement_value_"
}
IDs = 3
metrics = []
for this_id in range(1, IDs+1):
this_metric = {}
for k, v in MAP.items():
key = k.encode('UTF-8')
attribute = v + str(this_id)
this_metric[key] = getattr(self, attribute)
metrics.append(this_metric)
return metrics
For some reason, even after adding the encode part, my format is off:
ipdb> expected_metrics
[{'ref': u'tenetur', 'value': 567.41}, {'ref': u'blanditiis', 'value': 1632.0}, {'ref': u'ducimus', 'value': 786.4862}]
ipdb> metrics
[{u'ref': u'tenetur', u'value': 567.41}, {u'ref': u'blanditiis', u'value': 1632.0}, {u'ref': u'ducimus', u'value': 786.4862}]
I do unicode instead of encode in shell and it works:
ipdb> unicode
<type 'unicode'>
ipdb> unicode('blah', 'UTF-8')
u'blah'
I've used encode before and it worked, what is wrong with using encode here? Thank you

TypeError: string indices must be integers while parsing JSON, Python?

I'm trying to parse JSON, below is my code.
import requests
import json
yatoken = '123123sdfsdf'
listOfId = ['11111111111', '2222222222', '33333333333']
for site in listOfId:
r = requests.get('http://api.ya-bot.net/projects/' + site + '?token=' + yatoken)
parsed = json.loads(r.text)
for url in parsed['project']:
#print url
print str(url['name'])
And JSON:
{
"project":{
"id":"123123sdfsdfs",
"urls":[],
"name":"Имя",
"group":"Группа",
"comments":"",
"sengines":[],
"wordstat_template":1,
"wordstat_regions":[]
}
}
It gives this error
print str(url['name'])
TypeError: string indices must be integers
How I can fix this problem?
Thx.
The 'project' key refers to a dictionary. Looping over that dictionary gives you keys, each a string. You are not looping over the list of URLs. One of those keys will be 'name'
Your code is confusing otherwise. You appear to want to get each URL. To do that, you'd have to loop over the 'urls' key in that nested dictionary:
for url in parsed['project']['urls']:
# each url value
In your sample response that list is empty however.
If you wanted to get the 'name' key from the nested dictionary, just print it without looping:
print parsed['project']['name']
Demo:
>>> import json
>>> parsed = json.loads('''\
... {
... "project":{
... "id":"123123sdfsdfs",
... "urls":[],
... "name":"Имя",
... "group":"Группа",
... "comments":"",
... "sengines":[],
... "wordstat_template":1,
... "wordstat_regions":[]
... }
... }
... ''')
>>> parsed['project']
{u'group': u'\u0413\u0440\u0443\u043f\u043f\u0430', u'name': u'\u0418\u043c\u044f', u'wordstat_regions': [], u'comments': u'', u'urls': [], u'sengines': [], u'id': u'123123sdfsdfs', u'wordstat_template': 1}
>>> parsed['project']['name']
u'\u0418\u043c\u044f'
>>> print parsed['project']['name']
Имя
>>> print parsed['project']['urls']
[]
for url in parsed['project'] returns a dict so you are actually iterating over the keys of the dict so "id"["name"] etc.. is going to error, you can use d = parsed['project'] to get the dict then access the dict by key to get whatever value you want.
d = parsed['project']
print(d["name"])
print(d["urls"])
...
Or iterate over the items to get key and value:
for k, v in parsed['project'].items():
print(k,v)
If you print what is returned you can see exactly what is happening:
In [17]: js = {
"project":{
"id":"123123sdfsdfs",
"urls":[],
"name":"Имя",
"group":"Группа",
"comments":"",
"sengines":[],
"wordstat_template":1,
"wordstat_regions":[]
}
}
In [18]: js["project"] # dict
Out[18]:
{'comments': '',
'group': 'Группа',
'id': '123123sdfsdfs',
'name': 'Имя',
'sengines': [],
'urls': [],
'wordstat_regions': [],
'wordstat_template': 1}
In [19]: for k in js["project"]: # iterating over the keys of the dict
....: print(k)
....:
sengines # k["name"] == error
id
urls
comments
name
wordstat_regions
wordstat_template
group

Remove python dict item from nested json file

I have a JSON file that I fetch from an API that returns KeyError:0 while I attempt to remove items in a python dict. I assume its a combination of my lack of skill and format of the json.
My goal is to remove all instances of 192.168.1.1 from ip_address_1
My Code:
from api import Request
import requests, json, ordereddict
# prepare request
request = Request().service('').where({"query":"192.168.1.0"}).withType("json")
# call request
response = request.execute()
# parse response into python object
obj = json.loads(response)
# remove items
for i in xrange(len(obj)):
if obj[i]["ip_address_1"] == "192.168.1.1":
obj.pop(i)
# display
print json.dumps(obj,indent=1)
Example JSON:
{
"response": {
"alerts": [
{
"action": "New",
"ip_address_1": "192.168.1.1",
"domain": "example.com",
"ip_address_2": "192.68.1.2"
},
{
"action": "New",
"ip_address_1": "192.168.1.3",
"domain": "example2.com",
"ip_address_2": "192.168.1.1"
}
],
"total": "2",
"query": "192.168.1.0",
}
}
This is incorrect:
# remove items
for i in xrange(len(obj)):
if obj[i]["ip_address_1"] == "192.168.1.1":
obj.pop(i)
You are iterating over an object as if it were a list.
What you want to do:
for sub_obj in obj["response"]["alerts"]:
if sub_obj["ip_address_1"] == "192.168.1.1":
sub_obj.pop("ip_address_1")
I've interpreted your requirements to be:
Remove from the "alerts" list any dictionary with ip_address_1 set
to 192.168.1.1.
Create a list of all other ip_address_1 values.
json.loads(response) produces this dictionary:
{u'response': {u'alerts': [{u'action': u'New',
u'domain': u'example.com',
u'ip_address_1': u'192.168.1.1',
u'ip_address_2': u'192.68.1.2'},
{u'action': u'New',
u'domain': u'example2.com',
u'ip_address_1': u'192.168.1.3',
u'ip_address_2': u'192.168.1.1'}],
u'query': u'192.168.1.0',
u'total': u'2'}}
The "alerts" list is accessed by (assuming the dict is bound to obj):
>>> obj['response']['alerts']
[{u'action': u'New',
u'domain': u'example.com',
u'ip_address_1': u'192.168.1.1',
u'ip_address_2': u'192.68.1.2'},
{u'action': u'New',
u'domain': u'example2.com',
u'ip_address_1': u'192.168.1.3',
u'ip_address_2': u'192.168.1.1'}]
The first part can be done like this:
alerts = obj['response']['alerts']
obj['response']['alerts'] = [d for d in alerts if d.get('ip_address_1') != '192.168.1.1']
Here a list comprehension is used to filter out those dictionaries with ip_address_1 192.168.1.1 and the resulting list is then rebound the the obj dictionary. After this obj is:
>>> pprint(obj)
{u'response': {u'alerts': [{u'action': u'New',
u'domain': u'example2.com',
u'ip_address_1': u'192.168.1.3',
u'ip_address_2': u'192.168.1.1'}],
u'query': u'192.168.1.0',
u'total': u'2'}}
Next, creating a list of the other ip addresses is easy with another list comprehension run on the alerts list after removing the undesired dicts as shown above:
ip_addresses = [d['ip_address_1'] for d in obj['response']['alerts'] if d.get('ip_address_1') is not None]
Notice that we use get() to handle the possibility that some dictionaries might not have a ip_address_1 key.
>>> ip_addresses
[u'192.168.1.3']

Python issue with getting data out of json.loads

I have the following JSON data which I pass through json.loads:
{
"meta":{
"limit":20,
"next":null,
"offset":0,
"previous":null,
"total_count":2
},
"objects":[
{
"attributes":"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'AF126E/AF1266', u'type': u'actuator'}",
"id":1,
"module":"/api/v1/module/1/",
"moduleName":"rfm_ninjablock (ninjablock)",
"name":"HallwayLight"
},
{
"attributes":"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'0x53df5c', u'type': u'sensor'}",
"id":2,
"module":"/api/v1/module/1/",
"moduleName":"rfm_ninjablock (ninjablock)",
"name":"ToiletDoor"
}
]
}
I'm trying to get all the data out of it, but I'm having trouble referencing the data.
My code is as follows:
for object in r['objects']:
for attributes in object.iteritems():
print attributes
Which gives me:
(u'attributes', u"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'AF126E/AF1266', u'type': u'actuator'}")
(u'moduleName', u'rfm_ninjablock (ninjablock)')
(u'id', 1)
(u'module', u'/api/v1/module/1/')
(u'name', u'HallwayLight')
(u'attributes', u"{u'_code': u'[ON CODE]/[OFF CODE]', u'_type': u'pick actuator or sensor', u'code': u'0x53df5c', u'type': u'sensor'}")
(u'moduleName', u'rfm_ninjablock (ninjablock)')
(u'id', 2)
(u'module', u'/api/v1/module/1/')
(u'name', u'ToiletDoor')
I'm not really sure of referencing these or if indeed I'm doing it right.
Attributes contains JSON already, as that is how it is stored in the database.
You have a problem with the way the original data was serialized. The attributes dict of each element was not serialized as a set of nested dicts, but as an outer dict containing Python string representations of the inner dicts.
You should post the code that did the original serialization.
As the dict inside attributes ends up as a string representation I used the following code to convert it to the dict:
for object in r['objects']:
attrib = []
attrib = ast.literal_eval(object['attributes'])
print attrib['code']

Categories

Resources