I want to process a nested JSON with an Azure Function and Python. The JSON has more than 3 sometimes up to 10 or more nested layers. this is a simple example of a JSON passed to the function:
[{
"node1": {
"tattr": {
"usqx": "qhi123"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "xht",
"ust": "cr12"
},
"x-child": [{
"el1": "asx",
"tattr": {
"usx": "md0"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "mdw"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "mdh"
},
"x-child": [{
"el1": "ast",
"x-child": "placeholder_a"
}]
}, {
"el1": "ast",
"tattr": {
"usx": "ssq"
},
"x-child": "placeholder_c"
}, {
"el1": "div",
"tattr": {
"usx": "mdf"
},
"x-child": "abc"
}]
}]
}]
}]
}
}, {
"node02": {
"tattr": {
"usx": "qhi123"
}
}
}]
In this example, placeholder_a should be replaced.
Somewhere in this is a value that needs to be replaced. My idea is to recursive iterate the JSON and process every key that has a dict or list as value. I think the recursive call of the function with a part of the JSON string just copies the JSON. So if the searched string will be find and changed in one recursion, it does not change the original string. What is the best approach to get the "placeholder" in the JSON replaced? It can be on every level.
Since my approach seems to be wrong, I am looking for ideas how to solve the issue. Currently I am between a simple string replace, where I am not sure if the replaced string will be a key or value in a JSON or a recursive function that takes the JSON, search and replace and rebuild the JSON on every recusion.
The code finds the search_para and replaces it but it will not be changed in the original string.
def lin(json_para,search_para,replace_para):
json_decoded = json.loads(json_para)
if isinstance(json_decoded,list):
for list_element in json_decoded:
lin(json.dumps(list_element))
elif isinstance(json_decoded,dict):
for dict_element in json_decoded:
if isinstance(json_decoded[dict_element],dict):
lin(json.dumps(json_decoded[dict_element]))
elif isinstance(json_decoded[dict_element],str):
if str(json_decoded[dict_element]) == 'search_para:
json_decoded[dict_element] = replace_para
While it certainly could be accomplished via recursion given the nature of the problem, I think there's an even more elegant approach based on an idea I got a long time ago reading #Mike Brennan’s answer to another JSON-related question titled How to get string objects instead of Unicode from JSON?
The basic idea is to use the optional object_hook parameter that both the json.load and json.loads() functions accept to watch what is being decoded and check it for the sought-after value (and replace it when it's encountered).
The function passed will be called with the result of any JSON object literal decoded (i.e. a dict) — in other words at any depth. What may not be obvious is that the dict can also be changed if desired.
This nice thing about this overall approach is that it's based (primarily) on prewritten, debugged, and relatively fast code because it's part of the standard library. It also allows the object_hook callback function to be kept relatively simple.
Here's what I'm suggesting:
import json
def replace_placeholders(json_para, search_para, replace_para):
# Local nested function.
def decode_dict(a_dict):
if search_para in a_dict.values():
for key, value in a_dict.items():
if value == search_para:
a_dict[key] = replace_para
return a_dict
return json.loads(json_para, object_hook=decode_dict)
result = replace_placeholders(json_para, 'placeholder_a', 'REPLACEMENT')
print(json.dumps(result, indent=2))
You can use recursion as follows:
data = [{'k1': 'placeholder_a', 'k2': [{'k3': 'placeholder_b', 'k4': 'placeholder_a'}]}, {'k5': 'placeholder_a', 'k6': 'placeholder_c'}]
def replace(data, val_from, val_to):
if isinstance(data, list):
return [replace(x, val_from, val_to) for x in data]
if isinstance(data, dict):
return {k: replace(v, val_from, val_to) for k, v in data.items()}
return val_to if data == val_from else data # other cases
print(replace(data, "placeholder_a", "REPLACED"))
# [{'k1': 'REPLACED', 'k2': [{'k3': 'placeholder_b', 'k4': 'REPLACED'}]}, {'k5': 'REPLACED', 'k6': 'placeholder_c'}]
I've changed the input/output for the sake of simplicity. You can check that the function replaces 'placeholder_a' at any level with 'REPLACED'.
Related
Situation
I want to make a function that makes me free to give a full dictionary path parameter, and get back the value or node I need, without doing it node by node.
Code
This is the function. Obviously, as is now, it throws TypeError: unhashable type: 'list'. But it's only for getting the idea.
def get_section(api_data, section):
if "/" in section:
section = section.split("/")
return api_data.json()[section]
return api_data.json()[section]
Example
JSON
{
"component": {
"name": "gino",
"measures": [
{
"value": "12",
},
{
"value": "14"
}
]
},
"metrics": {
...
}
}
Expectation
analyses = get_section(analyses_data, "component/measures") # Returns measures node
analyses = get_section(analyses_data, "component/name") # Returns 'gino'
analyses = get_section(analyses_data, "component/measures/value") # Returns error, because it's ambigous
Request
How can I do it?
Edits
Added examples for clarity
A cool solution could be:
def get_section(api_data, section):
return [api_data := api_data[sec] for sec in section.split("/")][-1]
So if you execute it with:
analyses_data = {
"analyses": {
"dates": {
"xyz": "abc"
}
}
}
print(get_section(analyses_data, "analyses/dates/xyz")) # Returns: abc
Or since you are accessing a json using a custom method:
print(get_section(analyses_data.json(), "analyses/dates/xyz")) # Returns: abc
This works because the := operator in python is a variable assignment that returns the assigned value, so it loops all the parts of the section string by reassigning the api_data variable to the result of accessing that key and storing the result of every assignment in a list. Then with the [-1] at the end it returns the last assignment that corresponds to the last accessed key (a.k.a the last accessed dictionary level).
I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'
I need to get the name information, inside the "object" list.
In this example I need this information (10.0.0.19)
"sourceNetworks":{
"objects":[
{
"type":"Host",
"overridable":false,
"id":"005056BF-7C6E-0ed3-0000-012884911113",
"name":"**10.0.0.19**"
}
]
}
I can get any information that is not in the "objects" lists with the command example_json[['metadata']['accessPolicy']['name']
and I get the "mb-test-01" information correctly from the json, but I don't know the syntax to get the items inside the "object" list.
to create this json I use in GET request this way
example_json = requests.get(f"https://{hostname}/api/fmc_config/v1/domain/{uuid}/policy/accesspolicies/{acp_id}/accessrules?expanded=true",headers=header_acp, verify=False).json()
follow the full json.
{
"metadata":{
"ruleIndex":1,
"section":"Mandatory",
"category":"--Undefined--",
"accessPolicy":{
"type":"AccessPolicy",
"name":"mb-test-01",
"id":"005056BF-7C6E-0ed3-0000-012884914323"
},
"timestamp":1635219651530,
"domain":{
"name":"Global",
"id":"e276abec-e0f2-11e3-8169-6d9ed49b625f",
"type":"Domain"
}
},
"links":{
"self":"https://fmcrestapisandbox.cisco.com/api/fmc_config/v1/domain/e276abec-e0f2-11e3-8169-6d9ed49b625f/policy/accesspolicies/005056BF-7C6E-0ed3-0000-012884914323/accessrules/005056BF-7C6E-0ed3-0000-000268434442"
},
"enabled":true,
"action":"ALLOW",
"type":"AccessRule",
"id":"005056BF-7C6E-0ed3-0000-000268434442",
"sourceNetworks":{
"objects":[
{
"type":"Host",
"overridable":false,
"id":"005056BF-7C6E-0ed3-0000-012884911113",
"name":"10.0.0.19"
}
]
},
"destinationNetworks":{
"objects":[
{
"type":"Host",
"overridable":false,
"id":"005056BF-7C6E-0ed3-0000-012884911491",
"name":"192.168.0.39"
}
]
},
"logBegin":false,
"logEnd":false,
"variableSet":{
"name":"Default-Set",
"id":"76fa83ea-c972-11e2-8be8-8e45bb1343c0",
"type":"VariableSet"
},
"logFiles":false,
"enableSyslog":false,
"vlanTags":{
},
"sendEventsToFMC":false,
"name":"rule-1"
}
Presumably you want to retrieve all "name"s under "objects" keys so you could use a recursive function:
def get_name(d):
for k,v in d.items():
if k=='objects':
for i in v:
yield i.get('name')
elif isinstance(v, dict):
yield from get_name(v)
names = list(get_name(data))
Output:
['10.0.0.19', '192.168.0.39']
I am trying to use Python to extract pricePerUnit from JSON. There are many entries, and this is just 2 of them -
{
"terms": {
"OnDemand": {
"7Y9ZZ3FXWPC86CZY": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "7Y9ZZ3FXWPC86CZY",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7",
"description": "Processed translation request in AWS GovCloud (US)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
},
"CQNY8UFVUNQQYYV4": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "CQNY8UFVUNQQYYV4",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7",
"description": "$0.000015 per Character for TextTranslationJob:TextTranslationJob in EU (London)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
}
}
}
}
The issue I run into is that the keys, which in this sample, are 7Y9ZZ3FXWPC86CZY, CQNY8UFVUNQQYYV4.JRTCKXETXF, and CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7 are a changing string that I cannot just type out as I am parsing the dictionary.
I have python code that works for the first level of these random keys -
with open('index.json') as json_file:
data = json.load(json_file)
json_keys=list(data['terms']['OnDemand'].keys())
#Get the region
for i in json_keys:
print((data['terms']['OnDemand'][i]))
However, this is tedious, as I would need to run the same code three times to get the other keys like 7Y9ZZ3FXWPC86CZY.JRTCKXETXF and 7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7, since the string changes with each JSON entry.
Is there a way that I can just tell python to automatically enter the next level of the JSON object, without having to parse all keys, save them, and then iterate through them? Using JQ in bash I can do this quite easily with jq -r '.terms[][][]'.
If you are really sure, that there is exactly one key-value pair on each level, you can try the following:
def descend(x, depth):
for i in range(depth):
x = next(iter(x.values()))
return x
You can use dict.values() to iterate over the values of a dict. You can also use next(iter(dict.values())) to get a first (only) element of a dict.
for demand in data['terms']['OnDemand'].values():
next_level = next(iter(demand.values()))
print(next_level)
If you expect other number of children than 1 in the second level, you can just nest the fors:
for demand in data['terms']['OnDemand'].values():
for sub_demand in demand.values()
print(sub_demand)
If you are insterested in the keys too, you can use dict.items() method to iterate over dict keys and values at the same time:
for demand_key, demand in data['terms']['OnDemand'].items():
for sub_demand_key, sub_demand in demand.items()
print(demand_key, sub_demand_key, sub_demand)
I am writing a python lambda function that reads in a json file from s3 and then will take one of the nodes and send it to another lambda function. Here is my code:
The json snippet I want
"jobstreams": [
{
"jobname": "team-summary",
"bucket": "aaa-bbb",
"key": "team-summary.json"
}
step 1 – convert JSON to python objects for processing
note: these I got from another Stack Overflow guru - thanks!!
def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def json2obj(data): return json.loads(data, object_hook=_json_object_hook)
routes = json2obj(jsonText)
step 2 - I then traverse the python objects and find the json I need and dump it
for jobstream in jobstreams:
x = json.dumps(jobstream, ensure_ascii=False)
Howeever, when I print it out, I only have the values not the attributes. Why is that?
print(json.dumps(jobstream, ensure_ascii=False))
yields
["team-summary", "aaa-bbb", "team-summary.json"]
I'm assuming your full json file looks somewhat like what I have in my example
import json
js = {"jobstreams": [
{
"jobname": "team-summary",
"bucket": "aaa-bbb",
"key": "team-summary.json"
},
{
"jobname": "team-2222",
"bucket": "aaa-2222",
"key": "team-222.json"
}
]}
def extract_by_jobname(jobname):
for d in js['jobstreams']:
if d['jobname'] == jobname:
return d
json.dumps(extract_by_jobname("team-summary"))
# '{"jobname": "team-summary", "bucket": "aaa-bbb", "key": "team-summary.json"}'
I ended up creating a new Dictionary from the list that the json.dumps gave me.
["team-summary", "aaa-bbb", "team-summary.json"]
once i had the new dictionary (that is flat), then i converted that to json.... probably not the most efficient approach but i have other fish to fry. THANKS to all for your help!