Pandas dataframe into nested child dictionary

Pandas dataframe into nested child dictionary - python

I have a dataframe like below, where each 'level' drills down into more detail, with the last level having an id value.
data = [
{'id': 1, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Siamese Cat'},
{'id': 2, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Javanese Cat'},
{'id': 3, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Ursidae', 'level_4', 'Polar Bear'},
{'id': 4, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Labradore Retriever'},
{'id': 5, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Golden Retriever'}
]
I want to turn this into a nested dictionary of parent / child relationships like below.
var data = {
"name": "Animals",
"children": [
{
"name": "Carnivores",
"children": [
{
"name": "Felidae",
"children": [
{
"id": 1,
"name": "Siamese Cat",
"children": []
},
{
"id": 2,
"name": "Javanese Cat",
"children": []
}
]
},
{
"name": "Ursidae",
"children": [
{
"id": 3,
"name": "Polar Bear",
"children": []
}
]
},
{
"name": "Canidae",
"children": [
{
"id": 4,
"name": "Labradore Retriever",
"children": []
},
{
"id": 5,
"name": "Golden Retriever",
"children": []
}
]
}
]
}
]
}
I've tried several approaches of grouping the dataframe and also looping over individual rows, but haven't been able to find a working solution yet. Any help would be greatly appreciated!

The answer of #Timus mimics your intention, however you might encounter some difficulties searching this dictionary as each level has a key name and a key children. If this is what you intended ignore my answer. However, if you would like to create a dictionary in which you can more easily search through unique keys you can try:
df = df.set_index(['level_1', 'level_2', 'level_3', 'level_4'])
def make_dictionary(df):
if df.index.nlevels == 1:
return df.to_dict()
dictionary = {}
for key in df.index.get_level_values(0).unique():
sub_df = df.xs(key)
dictionary[key] = df_to_dict(sub_df)
return dictionary
make_dictionary(df)
It requires setting the different levels as index, and you will end up with a slightly different dictionary:
{'Animals':
{'Carnivores':
{'Felidae':
{'id': {'Siamese Cat': 1,
'Javanese Cat': 2}},
'Ursidae':
{'id': {'Polar Bear': 3}},
'Canidae':
{'id': {'Labradore Retriever': 4,
'Golden Retriever': 5}}}
}
}

EDIT: Had to make an adjustment, because the result wasn't exactly as expected.
Here's an attempt that produces the expected output (if I haven't made a mistake, which wouldn't be a surprise, because I've made several on the way):
def pack_level(df):
if df.columns[0] == 'id':
return [{'id': i, 'name': name, 'children': []}
for i, name in zip(df[df.columns[0]], df[df.columns[1]])]
return [{'name': df.iloc[0, 0],
'children': [entry for lst in df[df.columns[1]]
for entry in lst]}]
df = pd.DataFrame(data)
columns = list(df.columns[1:])
df = df.groupby(columns[:-1]).apply(pack_level)
for i in range(1, len(columns) - 1):
df = (df.reset_index(level=-1, drop=False).groupby(columns[:-i])
.apply(pack_level)
.reset_index(level=-1, drop=True))
var_data = {'name': df.index[0], 'children': df.iloc[0]}
The result looks a bit different at first glance, but that should be only due to the sorting (from printing):
{
"children": [
{
"children": [
{
"children": [
{
"children": [],
"id": 4,
"name": "Labradore Retriever"
},
{
"children": [],
"id": 5,
"name": "Golden Retriever"
}
],
"name": "Canidae"
},
{
"children": [
{
"children": [],
"id": 1,
"name": "Siamese Cat"
},
{
"children": [],
"id": 2,
"name": "Javanese Cat"
}
],
"name": "Felidae"
},
{
"children": [
{
"children": [],
"id": 3,
"name": "Polar Bear"
}
],
"name": "Ursidae"
}
],
"name": "Carnivores"
}
],
"name": "Animals"
}
I've tried to be as generic as possible, but the first column has to be named id (as in your sample).

Related

How to extract JSON from a nested JSON file?

I am calling an API and getting a response like the below.
{
"status": 200,
"errmsg": "OK",
"data": {
"total": 12,
"items": [{
"id": 11,
"name": "BBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChain": {
"inAlerting": false,
"throttlingAlerts": 20,
"enableThrottling": true,
"name": "Example123",
"destination": [],
"description": "",
"ccdestination": [],
"id": 3,
"throttlingPeriod": 10
}
},
{
"id": 21,
"name": "CNBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChain": {
"inAlerting": false,
"throttlingAlerts": 20,
"enableThrottling": true,
"name": "Example456",
"destination": [],
"description": "",
"ccdestination": [],
"id": 3,
"throttlingPeriod": 10
}
}
]
}
}
I need to clean-up this JSON a bit and produce a simple JSON like below where escalatingChainName is the name in the escalatingChain list so that I can write this into a CSV file.
{
"items": [{
"id": 11,
"name": "BBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChainName": "Example123"
},
{
"id": 21,
"name": "CNBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChainName": "Example456"
}
]
}
Is there a JSON function that I can use to copy only the necessary key-value or nested key-values to a new JSON object?
With the below code, I am able to get the details list.
json_response = response.json()
items = json_response['data']
details = items['items']
I can print individual list items using
for x in details:
print(x)
How do I take it from here to pull only the necessary fields like id, name, priority and the name from escalatingchain to create a new list or JSON?

There is no existing function that will do what you want, so you'll need to write one. Fortunately that's not too hard in this case — basically you just create a list of new items by extracting the pieces of data you want from the existing ones.
import json
json_response = """\
{
"status": 200,
"errmsg": "OK",
"data": {
"total": 12,
"items": [{
"id": 11,
"name": "BBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChain": {
"inAlerting": false,
"throttlingAlerts": 20,
"enableThrottling": true,
"name": "Example123",
"destination": [],
"description": "",
"ccdestination": [],
"id": 3,
"throttlingPeriod": 10
}
},
{
"id": 21,
"name": "CNBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChain": {
"inAlerting": false,
"throttlingAlerts": 20,
"enableThrottling": true,
"name": "Example456",
"destination": [],
"description": "",
"ccdestination": [],
"id": 3,
"throttlingPeriod": 10
}
}
]
}
}
"""
response = json.loads(json_response)
cleaned = []
for item in response['data']['items']:
cleaned.append({'id': item['id'],
'name': item['name'],
'priority': item['priority'],
'levelStr': item['levelStr'],
'escalatingChainId': item['escalatingChainId'],
'escalatingChainName': item['escalatingChain']['name']})
print('cleaned:')
print(json.dumps(cleaned, indent=4))

You can try:
data = {
"status": 200,
"errmsg": "OK",
"data": {
"total": 12,
"items": [{
"id": 11,
"name": "BBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChain": {
"inAlerting": False,
"throttlingAlerts": 20,
"enableThrottling": True,
"name": "Example123",
"destination": [],
"description": "",
"ccdestination": [],
"id": 3,
"throttlingPeriod": 10
}
},
{
"id": 21,
"name": "CNBC",
"priority": 4,
"levelStr": "All",
"escalatingChainId": 3,
"escalatingChain": {
"inAlerting": False,
"throttlingAlerts": 20,
"enableThrottling": True,
"name": "Example456",
"destination": [],
"description": "",
"ccdestination": [],
"id": 3,
"throttlingPeriod": 10
}
}
]
}
}
for single_item in data["data"]["items"]:
print(single_item["id"])
print(single_item["name"])
print(single_item["priority"])
print(single_item["levelStr"])
print(single_item["escalatingChain"]["inAlerting"])
# and so on

Two ways of approaching this depending on whether your dealing with a variable or .json file using python list and dictionary comprehension:
Where data variable of type dictionary (nested) already defined:
# keys you want
to_keep = ['id', 'name', 'priority', 'levelStr', 'escalatingChainId',
'escalatingChainName']
new_data = [{k:v for k,v in low_dict.items() if k in to_keep}
for low_dict in data['data']['items']]
# where item is dictionary at lowest level
escalations = [{v+'Name':k[v]['name']} for k in data['data']['items']
for v in k if type(k[v])==dict]
# merge both lists of python dictionaries to produce flattened list of dictionaries
new_data = [{**new,**escl} for new,escl in zip(new_data,escalations)]
Or (and since your refer json package) if you have save the response to as a .json file:
import json
with open('response.json', 'r') as handl:
data = json.load(handl)
to_keep = ['id', 'name', 'priority', 'levelStr', 'escalatingChainId',
'escalatingChainName']
new_data = [{k:v for k,v in low_dict.items() if k in to_keep}
for low_dict in data['data']['items']]
escalations = [{v+'Name':k[v]['name']} for k in data['data']['items']
for v in k if type(k[v])==dict]
new_data = [{**new,**escl} for new,escl in zip(new_data,escalations)]
Both produce output:
[{'id': 11,
'name': 'BBC',
'priority': 4,
'levelStr': 'All',
'escalatingChainId': 3,
'escalatingChainName': 'Example123'},
{'id': 21,
'name': 'CNBC',
'priority': 4,
'levelStr': 'All',
'escalatingChainId': 3,
'escalatingChainName': 'Example456'}]

Django MPTT Queryset to nested dictionary without recursive calling

The Django MPPT is smart library that make only single query to get all nested data.
Is there a way to get the data as nested dictionary without recursive calling.
queryset = MyTreeModel.objects.values()
results = get_nested_dict(queryset) ???
results >>
{
'id': 7,
'name': 'parent',
'children': [
{
'id': 8,
'parent_id': 7,
'name': 'child',
'children': [
{
'id': 9,
'parent_id': 8,
'name': 'grandchild',
}
]
}
]
}
How to create get_nested_dict() without recursive calling?

My category hierarchy
from .models import Category
from mptt.utils import get_cached_trees
categories_list = Category.objects.all().order_by('name')
def get_nested_dictionary(queryset):
roots = get_cached_trees(queryset)
def form_a_tree(objects):
tree = []
for obj in objects:
children = obj.get_children()
dictionary_category_tree = {'id': obj.id, 'name': obj.name}
if children:
dictionary_category_tree.update({'children': form_a_tree(children)})
tree.append(dictionary_category_tree)
return tree
return form_a_tree(roots)
Result:
[
{
"id": 9,
"name": "C++",
"children": [
{
"id": 10,
"name": "C++ for beginners"
}
]
},
{
"id": 5,
"name": "JS",
"children": [
{
"id": 7,
"name": "JS for beginners"
}
]
},
{
"id": 1,
"name": "Python",
"children": [
{
"id": 2,
"name": "Python for beginners",
"children": [
{
"id": 4,
"name": "some subcategory"
},
{
"id": 6,
"name": "some subcategory2"
}
]
}
]
}
]
My solution makes only one query to db
Proof1
Proof2

How to Manipulate dictionary data in python [duplicate]

I need your help to solve the task: I have a list of dicts with the next data about products:
- id;
- title;
- country;
- seller;
In the output result I'm expecting to group all the dictionaries with the same id, creating a new key called "info" and this key
must consist of list of dicts with info about product "country" and product "seller", related to each one product.
Input data
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
Output data
result = [
{"id": 1, "title": "Samsung", "info": [{"country": "France", "seller": "amazon_fr"}]},
{"id": 2, "title": "Apple", "info": [{"country": "Spain", "seller": "amazon_es"}, {"country": "Italy", "seller": "amazon_it"}]},
]
Thanks a lot in advance for your efforts.
P.S. Pandas solutions are also appreciated.

Here's a straight python solution, creating a result dictionary based on the id values from each dictionary in data, and updating the values in that dictionary when a matching id value is found. The values of the dictionary are then used to create the output list:
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
result = {}
for d in data:
id = d['id']
if id in result:
result[id]['info'] += [{ "country": d['country'], "seller": d['seller'] }]
else:
result[id] = { "id": id, "title": d['title'], "info" : [{ "country": d['country'], "seller": d['seller'] }] };
result = [r for r in result.values()]
print(result)
Output:
[
{'title': 'Samsung', 'id': 1, 'info': [{'seller': 'amazon_fr', 'country': 'France'}]},
{'title': 'Apple', 'id': 2, 'info': [{'seller': 'amazon_es', 'country': 'Spain'},
{'seller': 'amazon_it', 'country': 'Italy'}
]
}
]

you can use itertools.groupby:
from operator import itemgetter
from itertools import groupby
data.sort(key=itemgetter('id'))
group = groupby(data, key=lambda x: (x['id'], x['title']))
result = [
{'id': i, 'title': t, 'info': [{'country': d['country'], 'seller': d['seller']} for d in v]}
for (i, t), v in group]
output:
[{'id': 1,
'title': 'Samsung',
'info': [{'country': 'France', 'seller': 'amazon_fr'}]},
{'id': 2,
'title': 'Apple',
'info': [{'country': 'Spain', 'seller': 'amazon_es'},
{'country': 'Italy', 'seller': 'amazon_it'}]}]

How to transform a flattened data to a structured json?

This is primary flattened element, aka input data:
['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
This is the target data what I need, aka output data:
[
{
"title": "a",
"children": [
{
"title": "ab",
"children": [
{
"title": "aba",
"children": [
{
"title": "abaa",
"children": [
{
"title": "abaaa"
}
]
},
{
"title": "abab"
}
]
}
]
},
{
"title": "ac",
"children": [
{
"title": "aca",
"children": [
{
"title": "acaa"
},
{
"title": "acab"
}
]
}
]
}
]
}
]
I thought I can use deep-for-loop iteration to generate this json data, but it's so difficult, because num of level will bigger than 10. so I think for-loop can't do in this process, is there any algrithm or use a packaged code to implement a function to achieve this target?
I'm so grateful if you share your mindset, god bless you!

Here is a recursive solution using itertools. I dont know if this is efficient enough for your purpose, but it works. It works by transforming your list of strings into a list of lists, then dividing that into lists with the same first key, and then building the dict and repeating with the first key removed.
from itertools import groupby
from pprint import pprint
data = ['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
components = [x.split("-") for x in data]
def build_dict(component_list):
key = lambda x: x[0]
component_list = sorted(component_list, key=key)
# divide into lists with the same fist key
sublists = groupby(component_list, key)
result = []
for name, values in sublists:
value = {}
value["title"] = name
value["children"] = build_dict([x[1:] for x in values if x[1:]])
result.append(value)
return result
pprint(build_dict(components))
Output:
[{'children': [{'children': [{'children': [{'children': [{'children': [],
'title': 'abaaa'}],
'title': 'abaa'},
{'children': [], 'title': 'abab'}],
'title': 'aba'}],
'title': 'ab'},
{'children': [{'children': [{'children': [], 'title': 'acaa'},
{'children': [], 'title': 'acab'}],
'title': 'aca'}],
'title': 'ac'}],
'title': 'a'}]
To convert this dict to json you can use json.dumps from the json module. I hope my explanaition is clear.

Here is a start:
def populate_levels(dct, levels):
if levels:
if levels[0] not in dct:
dct[levels[0]] = {}
populate_levels(dct[levels[0]], levels[1:])
def create_final(dct):
final = []
for title in dct:
final.append({"title": title, "children": create_final(dct[title])})
return final
data = ['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
template = {}
for item in data:
populate_levels(template, item.split('-'))
final = create_final(template)
I couldn't see a clean way of doing it all at once so I created this in-between template dictionary. Right now if a 'node' has no children its corresponding dict will contain 'children': []
you can change this behavior in the create_final function if you like.

You can use collections.defaultdict:
from collections import defaultdict
def get_struct(d):
_d = defaultdict(list)
for a, *b in d:
_d[a].append(b)
return [{'title':a, 'children':get_struct(filter(None, b))} for a, b in _d.items()]
data = ['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
import json
print(json.dumps(get_struct([i.split('-') for i in data]), indent=4))
Output:
[
{
"title": "a",
"children": [
{
"title": "ab",
"children": [
{
"title": "aba",
"children": [
{
"title": "abaa",
"children": [
{
"title": "abaaa",
"children": []
}
]
},
{
"title": "abab",
"children": []
}
]
}
]
},
{
"title": "ac",
"children": [
{
"title": "aca",
"children": [
{
"title": "acaa",
"children": []
},
{
"title": "acab",
"children": []
}
]
}
]
}
]
}
]

Group list of dictionaries by dictionary column

I need your help to solve the task: I have a list of dicts with the next data about products:
- id;
- title;
- country;
- seller;
In the output result I'm expecting to group all the dictionaries with the same id, creating a new key called "info" and this key
must consist of list of dicts with info about product "country" and product "seller", related to each one product.
Input data
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
Output data
result = [
{"id": 1, "title": "Samsung", "info": [{"country": "France", "seller": "amazon_fr"}]},
{"id": 2, "title": "Apple", "info": [{"country": "Spain", "seller": "amazon_es"}, {"country": "Italy", "seller": "amazon_it"}]},
]
Thanks a lot in advance for your efforts.
P.S. Pandas solutions are also appreciated.

Here's a straight python solution, creating a result dictionary based on the id values from each dictionary in data, and updating the values in that dictionary when a matching id value is found. The values of the dictionary are then used to create the output list:
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
result = {}
for d in data:
id = d['id']
if id in result:
result[id]['info'] += [{ "country": d['country'], "seller": d['seller'] }]
else:
result[id] = { "id": id, "title": d['title'], "info" : [{ "country": d['country'], "seller": d['seller'] }] };
result = [r for r in result.values()]
print(result)
Output:
[
{'title': 'Samsung', 'id': 1, 'info': [{'seller': 'amazon_fr', 'country': 'France'}]},
{'title': 'Apple', 'id': 2, 'info': [{'seller': 'amazon_es', 'country': 'Spain'},
{'seller': 'amazon_it', 'country': 'Italy'}
]
}
]

you can use itertools.groupby:
from operator import itemgetter
from itertools import groupby
data.sort(key=itemgetter('id'))
group = groupby(data, key=lambda x: (x['id'], x['title']))
result = [
{'id': i, 'title': t, 'info': [{'country': d['country'], 'seller': d['seller']} for d in v]}
for (i, t), v in group]
output:
[{'id': 1,
'title': 'Samsung',
'info': [{'country': 'France', 'seller': 'amazon_fr'}]},
{'id': 2,
'title': 'Apple',
'info': [{'country': 'Spain', 'seller': 'amazon_es'},
{'country': 'Italy', 'seller': 'amazon_it'}]}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe into nested child dictionary - python

Related

How to extract JSON from a nested JSON file?

Django MPTT Queryset to nested dictionary without recursive calling

How to Manipulate dictionary data in python [duplicate]

How to transform a flattened data to a structured json?

Group list of dictionaries by dictionary column

Categories

Resources