Python - How can I move a nested json dictionary up to its own index? - python

I have a json dataset where each item/index can contain 2 nested dictionaries. The problem is that one of these nested dictionaries contains all of the exact key:value pairs as its parent dictionary. To put it in other words, I have a parent "Account" and any time there are "Sub-Accounts" it places the Sub-Accounts in the nested dictionary, and they are never seen as their own standalone item/index.
Here is the sample json of one item/index. Essentially, I need the sub_accounts object to extracted and become its own index. As you can see, it contains all of the same key:value objects as the parent containins the sub_accounts.
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": [
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
},
I have been able to use json_normalize or even variations of .pop() to accomplish a flattening of data and I have tried to explore other flattening options, but with no luck on the specific task I am trying to accomplish. Those solutions usually just result with the subaccounts still be associated to the original index.

You could use a recursive function to traverse the hierarchy while progressively popping out the "sub_accounts" keys:
def extractAccounts(accounts):
return [s for a in accounts
for s in (a,*extractAccounts(a.pop("sub_accounts",[])))]
From a list of account objects:
data = [{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": [
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
}]
Output:
accounts = extractAccounts(data)
for i,account in enumerate(accounts):
print("Account #",i)
print(account)
Account # 0
{'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '235', 'kind': 'Real', 'name': 'Checking'}
Account # 1
{'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '236', 'kind': 'Real', 'name': 'Cash Reserve'}
If your top level is a single account (i.e. not a list), just place it in a list when calling the function: extractAccount([data])

I don't have a generic answer, but this seems to do what you need:
raw_data = """
[
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": [
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
}
]
"""
import json
jdict = json.loads(raw_data)
empty_list = list()
result = list()
for elem in jdict:
sub_elem_list = elem['sub_accounts']
elem['sub_accounts'] = empty_list
result.append(elem)
for sub_elem in sub_elem_list:
result.append(sub_elem)
print(json.dumps(result, indent=4))
output = """
[
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": []
},
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
"""
When you have nested structures you need to nest your loops. The other answer has recursion, which can cause problems if you're nesting over a thousand recursive calls (so probably not this case). I also assumed that you care about order, preferring the parent's id to be first. Also, if you're trying to get rid of the sub_accounts from the json, then you'd want to pop it from the records, but I again assume that the structure should be maintained.

Related

is there any solution for multiple node printing

I have tho following json file:
{
"layout": {
"user": {
"pages": {
"Home": { // first page
"sub_domain_name": "demo",
"title": "Home",
"page_id": 111111,
"desktop": [ // first node in first page
{
"grid_name": "row",
"component_id": "31fac419-f1ff-4614",
"main_classes": "border border-primary",
"items": [ // first node in desktop
{
"grid_name": "col",
"component_id": "111111",
"component_name": "Card",
"items": []
}
]
},
{
"grid_name": "row",
"component_id": "222222",
"main_classes": "border-top-0",
"items": [
{
"grid_name": "col",
"component_id": "3333",
"component_name": "Author",
"items": []
}
]
},
]
}
"section": [
"s11"
],
"mobile": [ //second node in first page
{
"grid_name": "row",
"component_id": "4444",
"main_classes": "border",
"items": [ // first node in mobile
{
"grid_name": "col",
"component_id": "5555",
"component_name": "Card",
"items": []
}
]
},
{
"grid_name": "row",
"component_id": "888",
"main_classes": "border-top-0 ",
"items": [
"grid_name": "col",
"component_id": "999"
"component_name": "Card",
"items": []
},
{
"grid_name": "col",
"component_id": "11",
"component_name": "Media",
"items": []
}
]
}
]
}
]
},
"contact": { // second page
"sub_domain_name": "demo3",
"title": "contact",
"page_id": 2222222,
"desktop": [ // first node in second page
{
"grid_name": "row",
"component_id": "22",
"component_name": "Table",
"items": []
},
{
"grid_name": "row",
"component_id": "99",
"component_name": "Table",
}
}
],
"section": [
"s4"
],
"mobile": [ // second node in second page
{
"grid_name": "row",
"component_id": "67",
"component_name": "Component",
"items": []
},
{
"grid_name": "row",
"component_id": "0000",
"component_name": "Table",
}
]
}
I want to print all component_name from this json file using python. Not shure what the problem is tho. Does anyone see the problem?
import json
myJsonFile=open("config-2_new.json")
jsondata=myJsonFile.read()
obj=json.loads(jsondata)
dict_process=obj['layout']
for i in dict_process:
inner_page=dict_process[i]['pages']
pages_list=inner_page.keys()
for j in pages_list:
desktop=inner_page[j]['desktop']
for x in range(0,len(desktop)):
child_keys=desktop[x]['child']
for k in child_keys:
print (k['component_name'])
Thanks for any answers allready.
Your json is invalid. For parsing a specific key use jsonpath-ng module. Also please refer jsonpath_rw for Python - parsing a specific key
I tried with removing some part of json and then parsing, you can use below.
I have stored your json string in json_data variable.
jsonpath_ng.parse("layout.user.pages.Home.desktop[*].items.component_name").find(json_data)

Exchange of 2 json data values which has different keys

I want to exchange 2 json data's value. But keys of these datas are different from each other. I don't know how can I exchange data value between them.
sample json1: A
{
"contact_person":"Mahmut Kapur",
"contact_people": [
{
"email": "m#gmail.com",
"last_name": "Kapur"
}
],
"addresses": [
{
"city": "istanbul",
"country": "CA",
"first_name": "Mahmut",
"street1": "adres 1",
"zipcode": "34678",
"id": "5f61f72b8348230004f149fd"
}
]
"created_at": "2020-09-16T07:29:47.244-04:00",
"updated_at": "2020-09-16T07:32:50.567-04:00",
}
sample json2: B
The values in this example are: Represents the keys in the A json.
{
"Customer":{
"DisplayName":"contact_person",
"PrimaryEmailAddr":{
"Address":"contact_people/email"
},
"FamilyName":"contact_people/last_name",
"BillAddr":{
"City":"addresses/city",
"CountrySubDivisionCode":"addresses/country",
"Line1":"addresses/street1",
"PostalCode":"addresses/zipcode",
"Id":"addresses/id"
},
"GivenName":"addresses/first_name",
"MetaData":{
"CreateTime":"created_at",
"LastUpdatedTime":"updated_at"
}
}
}
The outcome needs to be:
{
"Customer":{
"DisplayName":"Mahmut Kapur",
"PrimaryEmailAddr":{
"Address":"m#gmail.com"
},
"FamilyName":"Kapur",
"BillAddr":{
"City":"istanbul",
"CountrySubDivisionCode":"CA",
"Line1":"adres 1",
"PostalCode":"34678",
"Id":"5f61f72b8348230004f149fd"
},
"GivenName":"Mahmut",
"MetaData":{
"CreateTime":"2020-09-16T07:29:47.244-04:00",
"LastUpdatedTime":"2020-09-16T07:32:50.567-04:00"
}
}
}
So the important thing here is to match the keys. I hope I was able to explain my problem.
This code can do the work for you. I dont know if someone can make this code shorter for you. It basically searches for dict and list till the leaf level and acts accordingly.
a={
"contact_person":"Mahmut Kapur",
"contact_people": [
{
"email": "m#gmail.com",
"last_name": "Kapur"
}
],
"addresses": [
{
"city": "istanbul",
"country": "CA",
"first_name": "Mahmut",
"street1": "adres 1",
"zipcode": "34678",
"id": "5f61f72b8348230004f149fd"
}
],
"created_at": "2020-09-16T07:29:47.244-04:00",
"updated_at": "2020-09-16T07:32:50.567-04:00",
}
b={
"Customer":{
"DisplayName":"contact_person",
"PrimaryEmailAddr":{
"Address":"contact_people/email"
},
"FamilyName":"contact_people/last_name",
"BillAddr":{
"City":"addresses/city",
"CountrySubDivisionCode":"addresses/country",
"Line1":"addresses/street1",
"PostalCode":"addresses/zipcode",
"Id":"addresses/id"
},
"GivenName":"addresses/first_name",
"MetaData":{
"CreateTime":"created_at",
"LastUpdatedTime":"updated_at"
}
}
}
c={}
for keys in b:
if isinstance(b[keys], dict):
for items in b[keys]:
if isinstance(b[keys][items], dict):
for leaf in b[keys][items]:
if "/" in b[keys][items][leaf]:
getter=b[keys][items][leaf].split("/")
b[keys][items][leaf]=a[getter[0]][0][getter[1]]
else:
b[keys][items][leaf]=a[b[keys][items][leaf]]
else:
if "/" in b[keys][items]:
getter=b[keys][items].split("/")
b[keys][items]=a[getter[0]][0][getter[1]]
else:
b[keys][items]=a[b[keys][items]]
else:
if "/" in b[keys]:
getter=b[keys].split("/")
b[keys]=a[getter[0]][0][getter[1]]
else:
b[keys]=a[b[keys]]
print(json.dumps(b,indent=4))

Adding a list into a list in Python

I'm receiving JSON back from an API call want to log when a keyword has been detected, sometimes there may be one, none or several returned from the API. I'm able to log the data that comes back no problem.
I want to run 1000s of requests and then have each result logged as a list of results within a list, (So I know which list corresponds to which API call).
for item in response['output']['keywords']:
TempEntityList = []
TempEntityList.append(item['keywords'])
EntityList.extend(TempEntityList)
TempEntityList = []
Which does append everything to a list but I can't seem to find the right setup.
I get the below when I run it twice I get.
['Chat', 'Case', 'Telephone','Chat', 'Case', 'Telephone']
When really I want
[['Chat', 'Case', 'Telephone'],['Chat', 'Case', 'Telephone']]
I'm creating TempEntityList appending any matches found to it, extending EntityList by what has been found and then clearing TempEntityList down for the next API call.
What's the best way I could log each set of results to a nested list as so far I've only managed to get one large list or every item is it's own nested item.
As requested the payload that is returned looks like the below
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
"context": {
"global": {
"system": {
"turn_count": 1
},
"session_id": "xxx-xxx-xxx"
},
"skills": {
"main skill": {
"user_defined": {
"Case": "Case",
"Chat": "Chat",
"Telephone": "Telephony"
},
"system": {
"state": "x"
}
}
}
}
}
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
"context": {
"global": {
"system": {
"turn_count": 1
},
"session_id": "xxx-xxx-xxx"
},
"skills": {
"main skill": {
"user_defined": {
"Case": "Case",
"Chat": "Chat",
"Telephone": "Telephony"
},
"system": {
"state": "xxx-xxx-xxx"
}
}
}
}
}
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
Firstly, since you have TempEntityList = [] in the beginning of the for loop, you don't need to add another TempEntityList = [] in the bottom. To answer the question, use list.append() instead of list.extend():
for item in response['output']['keywords']:
TempEntityList = []
TempEntityList.append(item['keywords'])
EntityList.append(TempEntityList)
I've managed to get what I want, thanks everyone for the suggestions.
The solution was:
global EntityList
EntityList = []
for item in response['output']['entities']:
EntityList.append(item['entity'])
FinalList.append(EntityList)
Which after running the function for three times on the same input produced:
[['Chat', 'Case', 'Telephone'], ['Chat', 'Case', 'Telephone'], ['Chat', 'Case', 'Telephone']]

How to flatten JSON response from Surveymonkey API

I'm setting up a Python function to use the Surveymonkey API to get survey responses from Surveymonkey.
The API returns responses in a JSON format with a deep recursive file structure.
I'm having issues trying to flatten this JSON so that it can go into Google Cloud Storage.
I have tried to flatten the response using the following code. Which works; however, it does not transform it to the format that I am looking for.
{
"per_page": 2,
"total": 1,
"data": [
{
"total_time": 0,
"collection_mode": "default",
"href": "https://api.surveymonkey.com/v3/responses/5007154325",
"custom_variables": {
"custvar_1": "one",
"custvar_2": "two"
},
"custom_value": "custom identifier for the response",
"edit_url": "https://www.surveymonkey.com/r/",
"analyze_url": "https://www.surveymonkey.com/analyze/browse/",
"ip_address": "",
"pages": [
{
"id": "73527947",
"questions": [
{
"id": "273237811",
"answers": [
{
"choice_id": "1842351148"
},
{
"text": "I might be text or null",
"other_id": "1842351149"
}
]
},
{
"id": "273240822",
"answers": [
{
"choice_id": "1863145815",
"row_id": "1863145806"
},
{
"text": "I might be text or null",
"other_id": "1863145817"
}
]
},
{
"id": "273239576",
"answers": [
{
"choice_id": "1863156702",
"row_id": "1863156701"
},
{
"text": "I might be text or null",
"other_id": "1863156707"
}
]
},
{
"id": "296944423",
"answers": [
{
"text": "I might be text or null"
}
]
}
]
}
],
"date_modified": "1970-01-17T19:07:34+00:00",
"response_status": "completed",
"id": "5007154325",
"collector_id": "50253586",
"recipient_id": "0",
"date_created": "1970-01-17T19:07:34+00:00",
"survey_id": "105723396"
}
],
"page": 1,
"links": {
"self": "https://api.surveymonkey.com/v3/surveys/123456/responses/bulk?page=1&per_page=2"
}
}
answers_df = json_normalize(data=response_json['data'],
record_path=['pages', 'questions', 'answers'],
meta=['id', ['pages', 'questions', 'id'], ['pages', 'id']])
Instead of returning a row for each question id, I need it to return a column for each question id, choice_id, and text field.
The columns I would like to see are total_time, collection_mode, href, custom_variables.custvar_1, custom_variables.custvar_2, custom_value, edit_url, analyze_url, ip_address, pages.id, pages.questions.0.id, pages.questions.0.answers.0.choice_id, pages.questions.0.answers.0.text, pages.questions.0.answers.0.other_id
Instead of the each Question ID, Choice_id, text and answer being on a separate row. I would like a column for each one. So that there is only 1 row per survey_id or index in data

Get "path" of parent keys and indices in dictionary of nested dictionaries and lists

I am receiving a large json from Google Assistant and I want to retrieve some specific details from it. The json is the following:
{
"responseId": "************************",
"queryResult": {
"queryText": "actions_intent_DELIVERY_ADDRESS",
"action": "delivery",
"parameters": {},
"allRequiredParamsPresent": true,
"fulfillmentMessages": [
{
"text": {
"text": [
""
]
}
}
],
"outputContexts": [
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_screen_output"
},
{
"name": "************************/agent/sessions/1527070836044/contexts/more",
"parameters": {
"polar": "no",
"polar.original": "No",
"cardinal": 2,
"cardinal.original": "2"
}
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_audio_output"
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_media_response_audio"
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_intent_delivery_address",
"parameters": {
"DELIVERY_ADDRESS_VALUE": {
"userDecision": "ACCEPTED",
"#type": "type.googleapis.com/google.actions.v2.DeliveryAddressValue",
"location": {
"postalAddress": {
"regionCode": "US",
"recipients": [
"Amazon"
],
"postalCode": "NY 10001",
"locality": "New York",
"addressLines": [
"450 West 33rd Street"
]
},
"phoneNumber": "+1 206-266-2992"
}
}
}
},
{
"name": "************************/agent/sessions/1527070836044/contexts/actions_capability_web_browser"
}
],
"intent": {
"name": "************************/agent/intents/86fb2293-7ae9-4bed-adeb-6dfe8797e5ff",
"displayName": "Delivery"
},
"intentDetectionConfidence": 1,
"diagnosticInfo": {},
"languageCode": "en-gb"
},
"originalDetectIntentRequest": {
"source": "google",
"version": "2",
"payload": {
"isInSandbox": true,
"surface": {
"capabilities": [
{
"name": "actions.capability.MEDIA_RESPONSE_AUDIO"
},
{
"name": "actions.capability.SCREEN_OUTPUT"
},
{
"name": "actions.capability.AUDIO_OUTPUT"
},
{
"name": "actions.capability.WEB_BROWSER"
}
]
},
"inputs": [
{
"rawInputs": [
{
"query": "450 West 33rd Street"
}
],
"arguments": [
{
"extension": {
"userDecision": "ACCEPTED",
"#type": "type.googleapis.com/google.actions.v2.DeliveryAddressValue",
"location": {
"postalAddress": {
"regionCode": "US",
"recipients": [
"Amazon"
],
"postalCode": "NY 10001",
"locality": "New York",
"addressLines": [
"450 West 33rd Street"
]
},
"phoneNumber": "+1 206-266-2992"
}
},
"name": "DELIVERY_ADDRESS_VALUE"
}
],
"intent": "actions.intent.DELIVERY_ADDRESS"
}
],
"user": {
"lastSeen": "2018-05-23T10:20:25Z",
"locale": "en-GB",
"userId": "************************"
},
"conversation": {
"conversationId": "************************",
"type": "ACTIVE",
"conversationToken": "[\"more\"]"
},
"availableSurfaces": [
{
"capabilities": [
{
"name": "actions.capability.SCREEN_OUTPUT"
},
{
"name": "actions.capability.AUDIO_OUTPUT"
},
{
"name": "actions.capability.WEB_BROWSER"
}
]
}
]
}
},
"session": "************************/agent/sessions/1527070836044"
}
This large json returns amongst other things to my back-end the delivery address details of the user (here I use Amazon's NY locations details as an example). Therefore, I want to retrieve the location dictionary which is near the end of this large json. The location details appear also near the start of this json but I want to retrieve specifically the second location dictionary which is near the end of this large json.
For this reason, I had to read through this json by myself and manually test some possible "paths" of the location dictionary within this large json to find out finally that I had to write the following line to retrieve the second location dictionary:
location = json['originalDetectIntentRequest']['payload']['inputs'][0]['arguments'][0]['extension']['location']
Therefore, my question is the following: is there any concise way to retrieve automatically the "path" of the parent keys and indices of the second location dictionary within this large json?
Hence, I expect that the general format of the output from a function which does this for all the occurrences of the location dictionary in any json will be the following:
[["path" of first `location` dictionary], ["path" of second `location` dictionary], ["path" of third `location` dictionary], ...]
where for the json above it will be
[["path" of first `location` dictionary], ["path" of second `location` dictionary]]
as there are two occurrences of the location dictionary with
["path" of second `location` dictionary] = ['originalDetectIntentRequest', 'payload', 'inputs', 0, 'arguments', 0, 'extension', 'location']
I have in my mind relevant posts on StackOverflow (Python--Finding Parent Keys for a specific value in a nested dictionary) but I am not sure that these apply exactly to my problem since these are for parent keys in nested dictionaries whereas here I am talking about the parent keys and indices in dictionary with nested dictionaries and lists.
I solved this by using recursive search
# result and path should be outside of the scope of find_path to persist values during recursive calls to the function
result = []
path = []
from copy import copy
# i is the index of the list that dict_obj is part of
def find_path(dict_obj,key,i=None):
for k,v in dict_obj.items():
# add key to path
path.append(k)
if isinstance(v,dict):
# continue searching
find_path(v, key,i)
if isinstance(v,list):
# search through list of dictionaries
for i,item in enumerate(v):
# add the index of list that item dict is part of, to path
path.append(i)
if isinstance(item,dict):
# continue searching in item dict
find_path(item, key,i)
# if reached here, the last added index was incorrect, so removed
path.pop()
if k == key:
# add path to our result
result.append(copy(path))
# remove the key added in the first line
if path != []:
path.pop()
# default starting index is set to None
find_path(di,"location")
print(result)
# [['queryResult', 'outputContexts', 4, 'parameters', 'DELIVERY_ADDRESS_VALUE', 'location'], ['originalDetectIntentRequest', 'payload', 'inputs', 0, 'arguments', 0, 'extension', 'location']]

Categories

Resources