I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'
Related
I want to process a nested JSON with an Azure Function and Python. The JSON has more than 3 sometimes up to 10 or more nested layers. this is a simple example of a JSON passed to the function:
[{
"node1": {
"tattr": {
"usqx": "qhi123"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "xht",
"ust": "cr12"
},
"x-child": [{
"el1": "asx",
"tattr": {
"usx": "md0"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "mdw"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "mdh"
},
"x-child": [{
"el1": "ast",
"x-child": "placeholder_a"
}]
}, {
"el1": "ast",
"tattr": {
"usx": "ssq"
},
"x-child": "placeholder_c"
}, {
"el1": "div",
"tattr": {
"usx": "mdf"
},
"x-child": "abc"
}]
}]
}]
}]
}
}, {
"node02": {
"tattr": {
"usx": "qhi123"
}
}
}]
In this example, placeholder_a should be replaced.
Somewhere in this is a value that needs to be replaced. My idea is to recursive iterate the JSON and process every key that has a dict or list as value. I think the recursive call of the function with a part of the JSON string just copies the JSON. So if the searched string will be find and changed in one recursion, it does not change the original string. What is the best approach to get the "placeholder" in the JSON replaced? It can be on every level.
Since my approach seems to be wrong, I am looking for ideas how to solve the issue. Currently I am between a simple string replace, where I am not sure if the replaced string will be a key or value in a JSON or a recursive function that takes the JSON, search and replace and rebuild the JSON on every recusion.
The code finds the search_para and replaces it but it will not be changed in the original string.
def lin(json_para,search_para,replace_para):
json_decoded = json.loads(json_para)
if isinstance(json_decoded,list):
for list_element in json_decoded:
lin(json.dumps(list_element))
elif isinstance(json_decoded,dict):
for dict_element in json_decoded:
if isinstance(json_decoded[dict_element],dict):
lin(json.dumps(json_decoded[dict_element]))
elif isinstance(json_decoded[dict_element],str):
if str(json_decoded[dict_element]) == 'search_para:
json_decoded[dict_element] = replace_para
While it certainly could be accomplished via recursion given the nature of the problem, I think there's an even more elegant approach based on an idea I got a long time ago reading #Mike Brennan’s answer to another JSON-related question titled How to get string objects instead of Unicode from JSON?
The basic idea is to use the optional object_hook parameter that both the json.load and json.loads() functions accept to watch what is being decoded and check it for the sought-after value (and replace it when it's encountered).
The function passed will be called with the result of any JSON object literal decoded (i.e. a dict) — in other words at any depth. What may not be obvious is that the dict can also be changed if desired.
This nice thing about this overall approach is that it's based (primarily) on prewritten, debugged, and relatively fast code because it's part of the standard library. It also allows the object_hook callback function to be kept relatively simple.
Here's what I'm suggesting:
import json
def replace_placeholders(json_para, search_para, replace_para):
# Local nested function.
def decode_dict(a_dict):
if search_para in a_dict.values():
for key, value in a_dict.items():
if value == search_para:
a_dict[key] = replace_para
return a_dict
return json.loads(json_para, object_hook=decode_dict)
result = replace_placeholders(json_para, 'placeholder_a', 'REPLACEMENT')
print(json.dumps(result, indent=2))
You can use recursion as follows:
data = [{'k1': 'placeholder_a', 'k2': [{'k3': 'placeholder_b', 'k4': 'placeholder_a'}]}, {'k5': 'placeholder_a', 'k6': 'placeholder_c'}]
def replace(data, val_from, val_to):
if isinstance(data, list):
return [replace(x, val_from, val_to) for x in data]
if isinstance(data, dict):
return {k: replace(v, val_from, val_to) for k, v in data.items()}
return val_to if data == val_from else data # other cases
print(replace(data, "placeholder_a", "REPLACED"))
# [{'k1': 'REPLACED', 'k2': [{'k3': 'placeholder_b', 'k4': 'REPLACED'}]}, {'k5': 'REPLACED', 'k6': 'placeholder_c'}]
I've changed the input/output for the sake of simplicity. You can check that the function replaces 'placeholder_a' at any level with 'REPLACED'.
I have a .json file with many entries looking like this:
{
"name": "abc",
"time": "20220607T190731.442",
"id": "123",
"relatedIds": [
{
"id": "456",
"source": "sourceA"
},
{
"id": "789",
"source": "sourceB"
}
],
}
I am saving each entry in a python object, however, I only need the related ID from source A. Problem is, the related ID from source A is not always first place in that nested list.
So data['relatedIds'][0]['id'] is not reliable to yield the right Id.
Currently I am solving the issue like this:
import json
with open("filepath", 'r') as file:
data = json.load(file)
for value in data['relatedIds']:
if(value['source'] == 'sourceA'):
id_from_a = value['id']
entry = Entry(data['name'], data['time'], data['id'], id_from_a)
I don't think this approach is the optimal solution though, especially if relatedIds list gets longer and more entries appended to the JSON file.
Is there a more sophisticated way of singling out this 'id' value from a specified source without looping through all entries in that nested list?
For a cleaner solution, you could try using python's filter() function with a simple lambda:
import json
with open("filepath", 'r') as file:
data = json.load(file)
filtered_data = filter(lambda a : a["source"] == "sourceA", data["relatedIds"])
id_from_a = next(filtered_data)['id']
entry = Entry(data['name'], data['time'], data['id'], id_from_a)
Correct me if I misunderstand how your json file looks, but it seems to work for me.
One step at a time, in order to get to all entries:
>>> data["relatedIds"]
[{'id': '789', 'source': 'sourceB'}, {'id': '456', 'source': 'sourceA'}]
Next, in order to get only those entries with source=sourceA:
>>> [e for e in data["relatedIds"] if e["source"] == "sourceA"]
[{'id': '456', 'source': 'sourceA'}]
Now, since you don't want the whole entry, but just the ID, we can go a little further:
>>> [e["id"] for e in data["relatedIds"] if e["source"] == "sourceA"]
['456']
From there, just grab the first ID:
>>> [e["id"] for e in data["relatedIds"] if e["source"] == "sourceA"][0]
'456'
Can you get whatever generates your .json file to produce the relatedIds as an object rather than a list?
{
"name": "abc",
"time": "20220607T190731.442",
"id": "123",
"relatedIds": {
"sourceA": "456",
"sourceB": "789"
}
}
If not, I'd say you're stuck looping through the list until you find what you're looking for.
How do I get the value of a dict item within a list, within a dict in Python? Please see the following code for an example of what I mean.
I use the following lines of code in Python to get data from an API.
res = requests.get('https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/', params)
data = res.json()
data then returns the following Python dictionary:
{
'_links': {
'next': {
'href': null
},
'previous': {
"href": null
},
'self': {
'href': 'https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/'
}
},
'count': 1,
'results': [
{
'_display': 'Maple Street 99',
'_links': {
'self': {
'href': 'https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/XXXXXXXXXXXXXXXX/'
}
},
'dataset': 'bag',
'landelijk_id': 'XXXXXXXXXXXXXXXX',
'type_adres': 'Hoofdadres',
'vbo_status': 'Verblijfsobject in gebruik'
}
]
}
Using Python, how do I get the value for 'landelijk_id', represented by the twelve Xs?
This should work:
>>> data['results'][0]['landelijk_id']
"XXXXXXXXXXXXXXXX"
You can just chain those [] for each child you need to access.
I'd recommend using the jmespath package to make handling nested Dictionaries easier. https://pypi.org/project/jmespath/
import jmespath
import requests
res = requests.get('https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/', params)
data = res.json()
print(jmespath.search('results[].landelijk_id', data)
I have a massive (over 3000 lines) JSON response I am receiving from a UI. I need to scan through the entire response and search for the value "Not Answered". Once I find this value, I then need to get the other info around that response.
The response is heavily nested, up to 7 layers. I do know that the value has a key of "value", but that key is in the response multiple times. The number of nests and items under the first "value" key can be different for each call.
This is an example of a small piece of what the response can look like. I would need to find each instance of the value "Not Answered". I am not showing the other data within the response under the value keys.
{
"data": {
"reviewData": [
0: {
"value": [
1: {
"value": "Answer"
},
2: {
"value": "Not Answered"
},
3: {
"value": "Answer"
}
]
},
1: {
"value": [
1: {
"value": "Not Answered"
},
2: {
"value": "Not Answered"
},
3: {
"value": "Answer"
}
]
}
]
}
}
I understand that I could just put this all into a string and use regex but that wouldn't assist in getting the other data that I need. Thanks for any help!
You can use a generator to walk through your dictionary and yield the paths that lead to 'Not Answered' values in form of tuples:
def walk(obj):
for key, value in obj.items():
if isinstance(value, dict):
yield from ((key,) + x for x in walk(value))
elif value == 'Not Answered':
yield (key,)
For your example this gives the following output:
[('data', 'reviewData', '0', 'value', '2', 'value'),
('data', 'reviewData', '1', 'value', '1', 'value'),
('data', 'reviewData', '1', 'value', '2', 'value')]
If you need access to the surrounding information you can reduce the provided paths to any depth by using __getitem__ on the nested dicts:
from functools import reduce
for path in walk(test_dict):
info = reduce(lambda obj, key: obj[key], path[:-1], test_dict)
You can use a recursive function like this one:
def find_not_anwsered_object(obj, sink):
for key in obj.keys():
if obj[key] == "Not Answered":
sink.append(obj)
elif isinstance(obj[key], dict):
find_not_answered_object(obj[key], sink)
return sink
response = {} # your JSON response
print(find_not_answered_object(response, [])
Given this json string, how can I pull out the value of id if code equals 4003?
error_json = '''{
'error': {
'meta': {
'code': 4003,
'message': 'Tracking already exists.',
'type': 'BadRequest'
},
'data': {
'tracking': {
'slug': 'fedex',
'tracking_number': '442783308929',
'id': '5b59ea69a9335baf0b5befcf',
'created_at': '2018-07-26T15:36:09+00:00'
}
}
}
}'''
I can't assume that anything other than the error element exists at the beginning, so the meta and code elements may or may not be there. The data, tracking, and id may or may not be there either.
The question is how to extract the value of id if the elements are all there. If any of the elements are missing, the value of id should be None
A python dictionary has a get(key, default) method that supports returning a default value if a key is not found. You can chain empty dictionaries to reach nested elements.
# use get method to access keys without raising exceptions
# see https://docs.quantifiedcode.com/python-anti-patterns/correctness/not_using_get_to_return_a_default_value_from_a_dictionary.html
code = error_json.get('error', {}).get('meta', {}).get('code', None)
if code == 4003:
# change id with _id to avoid overriding builtin methods
# see https://docs.quantifiedcode.com/python-anti-patterns/correctness/assigning_to_builtin.html
_id = error_json.get('error', {}).get('data', {}).get('tracking', {}).get('id', None)
Now, given a string that looks like a JSON you can parse it into a dictionary using json.loads(), as shown in Parse JSON in Python
I would try this:
import json
error_json = '''{
"error": {
"meta": {
"code": 4003,
"message": "Tracking already exists.",
"type": "BadRequest"
},
"data": {
"tracking": {
"slug": "fedex",
"tracking_number": "442783308929",
"id": "5b59ea69a9335baf0b5befcf",
"created_at": "2018-07-26T15:36:09+00:00"
}
}
}
}'''
parsed_json = json.loads(error_json)
try:
if parsed_json["error"]["meta"]["code"] == int(parsed_json["error"]["meta"]["code"]):
print(str(parsed_json["error"]["data"]["tracking"]["id"]))
except:
print("no soup for you")
Output:
5b59ea69a9335baf0b5befcf
A lot of python seems to be it's better to ask for forgiveness instead of permission. You could look up to see if that key in the dictionary is there, but really it's easier to just try. I'm specifically doing a check to make sure that code is an int, but you could change it around any way you'd like. If it can be other things you'd have to adjust it. There are several different solutions to this, it's really whatever you feel the most comfortable doing and maintaining.
You can add a check for key errors to the whole operation:
def get_id(j)
try:
if j['error']['meta’]['code'] == 4003:
return j['error']['data']['tracking']['id']
except KeyError:
pass
return None
If any element is missing, this function will quietly return None. If all the required elements are present, it will return the required ID. The only time it could really fail is if one of the intermediate keys does not refer to a dictionary. That could potentially result in a TypeError.