How to convert DataFrame into nested JSON

How to convert DataFrame into nested JSON - python

I'm trying to export a dataFrame into a nested JSON (hierarchical) for D3.js using solution which is only for one level ( parent , children)
Any help would be appreciated. I'm new to python
My DataFrame contains 7 levels
Here is the expected solution
JSON Example:
{
"name": "World",
"children": [
{
"name": "Europe",
"children": [
{
"name": "France",
"children": [
{
"name": "Paris",
"population": 1000000
}]
}]
}]
}
and here is the python method:
def to_flare_json(df, filename):
"""Convert dataframe into nested JSON as in flare files used for D3.js"""
flare = dict()
d = {"name":"World", "children": []}
for index, row in df.iterrows():
parent = row[0]
child = row[1]
child1 = row[2]
child2 = row[3]
child3 = row[4]
child4 = row[5]
child5 = row[6]
child_value = row[7]
# Make a list of keys
key_list = []
for item in d['children']:
key_list.append(item['name'])
#if 'parent' is NOT a key in flare.JSON, append it
if not parent in key_list:
d['children'].append({"name": parent, "children":[{"value": child_value, "name1": child}]})
# if parent IS a key in flare.json, add a new child to it
else:
d['children'][key_list.index(parent)]['children'].append({"value": child_value, "name11": child})
flare = d
# export the final result to a json file
with open(filename +'.json', 'w') as outfile:
json.dump(flare, outfile, indent=4,ensure_ascii=False)
return ("Done")
[EDIT]
Here is a sample of my df
World Continent Region Country State City Boroughs Population
1 Europe Western Europe France Ile de France Paris 17 821964
1 Europe Western Europe France Ile de France Paris 19 821964
1 Europe Western Europe France Ile de France Paris 20 821964

The structure you want is clearly recursive so I made a recursive function to fill it:
def create_entries(df):
entries = []
# Stopping case
if df.shape[1] == 2: # only 2 columns left
for i in range(df.shape[0]): # iterating on rows
entries.append(
{"Name": df.iloc[i, 0],
df.columns[-1]: df.iloc[i, 1]}
)
# Iterating case
else:
values = set(df.iloc[:, 0]) # Getting the set of unique values
for v in values:
entries.append(
{"Name": v,
# reiterating the process but without the first column
# and only the rows with the current value
"Children": create_entries(
df.loc[df.iloc[:, 0] == v].iloc[:, 1:]
)}
)
return entries
All that's left is to create the dictionary and call the function:
mydict = {"Name": "World",
"Children": create_entries(data.iloc[:, 1:])}
Then you just write your dict to a JSON file.
I hope my comments are explicit enough, the idea is to recursively use the first column of the dataset as the "Name" and the rest as the "Children".

Thank you Syncrossus for the answer, but this result in different branches for each boroughs or city
The result is this:
"Name": "World",
"Children": [
{
"Name": "Western Europe",
"Children": [
{
"Name": "France",
"Children": [
{
"Name": "Ile de France",
"Children": [
{
"Name": "Paris",
"Children": [
{
"Name": "17ème",
"Population": 821964
}
]
}
]
}
]
}
]
},{
"Name": "Western Europe",
"Children": [
{
"Name": "France",
"Children": [
{
"Name": "Ile de France",
"Children": [
{
"Name": "Paris",
"Children": [
{
"Name": "10ème",
"Population": 154623
}
]
}
]
}
]
}
]
}
But the desired result is this
"Name": "World",
"Children": [
{
"Continent": "Europe",
"Children": [
{
"Region": "Western Europe",
"Children": [
{
"Country": "France",
"Children": [
{
"State": "Ile De France",
"Children": [
{
"City": "Paris",
"Children": [
{
"Boroughs": "17ème",
"Population": 82194
},
{
"Boroughs": "16ème",
"Population": 99194
}
]
},
{
"City": "Saint-Denis",
"Children": [
{
"Boroughs": "10ème",
"Population": 1294
},
{
"Boroughs": "11ème",
"Population": 45367
}
]
}
]
}
]
},
{
"Country": "Belgium",
"Children": [
{
"State": "Oost-Vlaanderen",
"Children": [
{
"City": "Gent",
"Children": [
{
"Boroughs": "2ème",
"Population": 1234
},
{
"Boroughs": "4ème",
"Population": 7456
}
]
}
]
}
]
}
]
}
]
}
]

Related

duplicates in a JSON file based on two attributes

I have a JSON file and that is a nested JSON. I would like to remove duplicates based on two keys.
JSON example:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "2",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "Jeroge",
"main": 1
},
{
"name": "Peter",
"main": 0
},
{
"name": "John",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
Here I try to match the title and author name. For example, for id 1 and id 2 are duplicates( as the title is same and author names are also same(the author sequence doesn't matter and no need to consider the main attributes). So, in the output JSON only id:1 or id:2 will remain with id:3. In the final output I need two file.
Output_JSON:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
duplicatedID.csv:
1-2
The following method I tried but it is not giving correct results:
list= []
duplicate_Id = []
for data in (json_data['books'])[:]:
elements= []
id = data['id']
title = data['story']['title']
elements.append(title)
for i in (data['description'][0]['author']):
name = (i['name'])
elements.append(name)
if not list:
list.append(elements)
else:
for j in list:
if set(elements) == set(j):
duplicate_Id.append(id)
elements = []

The general idea is to:
Get the groups identified by some function that collects duplicates.
Then return the first entry of each group, ensuring no duplicates.
Define the key function as the sorted list of authors and. As the list of authors is by definition the unique key, but may appear in any order.
import json
from itertools import groupby
j = json.load(books)
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group[0] for group in groups]
def getAuthors(book):
authors = book['description'][0]['author']
return sorted([author['name'] for author in authors])
print(transform(j['books']))
If we wanted to get the duplicates, then we do the same computation, but return any sublist with length > 1 as this is by our definition duplicated data.
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group for group in groups if len(group) > 1]
Where j['books'] is the JSON you gave enclosed in an object.

Appending elements in lists in conversion to JSON

I'm coding a tool that reads a xlxs file and converts it to JSON. I'm using python 3 and 0.23.0 version of pandas for it. Here is the data that my code is reading from xlxs:
id label id_customer label_customer part_number
6 Sao Paulo CUST-99992 Brazil 7897
6 Sao Paulo CUST-99992 Brazil 1437
92 Hong Hong CUST-88888 China 785
==================================
Here is my code:
import pandas as pd
import json
file_imported = pd.read_excel('testing.xlsx', sheet_name = 'Plan1')
list_final = []
for index, row in file_imported.iterrows():
list1 = []
list_final.append ({
"id" : int(row['id']),
"label" : str(row['label']),
"Customer" : list1
})
list2 = []
list1.append ({
"id" : str(row['id_customer']) ,
"label" : str(row['label_customer']),
"number" : list2
})
list2.append({
"part" : str(row['part_number'])
})
print (list_final)
with open ('testing.json', 'w') as f:
json.dump(list_final, f, indent= True)
==================================
My code is working, and this is the output that I'm getting:
[
{
"id": 6,
"label": "Sao Paulo",
"Customer": [
{
"id": "CUST-99992",
"label": "Brazil",
"number" : [
{
"part": "7897"
}
]
}
]
},
{
"id": 6,
"label": "Sao Paulo",
"Customer": [
{
"id": "CUST-99992",
"label": "Brazil",
"number" : [
{
"part": "1437"
}
]
}
]
},
{
"id": 92,
"label": "Hong Hong",
"Customer": [
{
"id": "CUST-88888",
"label": "China",
"number" : [
{
"part": "785"
}
]
}
]
}
]
==================================
and I need something like this:
[
{
"id": 6,
"label": "Sao Paulo",
"Customer": [
{
"id": "CUST-99992",
"label": "Brazil",
"number" : [
{
"part": "7897"
},
{
"part": "1437"
}
]
}
]
},
{
"id": 92,
"label": "Hong Hong",
"Customer": [
{
"id": "CUST-88888",
"label": "China",
"number" : [
{
"part": "785"
}
]
}
]
}
]
==================================
I have been searching for other topics here or any useful material, but haven't found yet. This is just a piece of my code and excel file (they are too big to post here). I believe I have to use 'if' statement to verify the content inside each row before add it in my json, but idk how to do it.
I can have a lot of 'Customer' and 'number' lists inside 'list_final' with different contents (this is why I created my excel like that)
Could anyone help me?

Try this code. I started based that your data is a pandas data frame.
def part(value):
data = value.split("#")
part_list = []
for elements in data:
part_list.append({"part" : elements})
return part_list
path = yourpath
data = pd.read_excel(path)
data["part_number"] = data["part_number"].apply(lambda x: str(x))
data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)
data["part_number"] = data["part_number"].apply(lambda x: part(x))
data = data.rename(columns={"id_customer": "Customer", "part_number": "number"})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})
data["Customer"] = data["Customer"].apply(lambda x: {"id": x})
data["number"] = data["number"].apply(lambda x: {"number": x})
data["Customer"] = data.apply(lambda x: [{**x["Customer"], **x["label_customer"], **x["number"]}], axis=1)
data = data[["id", "label", "Customer"]]
data.to_json(path_you_want, orient="records")

How to add square brackets in JSON object with python

I just need contexts to be an Array ie., 'contexts' :[{}] instead of 'contexts':{}
Below is my python code which helps in converting python data-frame to required JSON format
This is the sample df for one row
name type aim context
xxx xxx specs 67646546 United States of America
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts']['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Desired output:
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": [{
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
However I am getting below output
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": {
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
}
}
]
}
Can any one please suggest ways for solving this problem using Python.

I think this does it:
import pandas as pd
import json
df = pd.DataFrame([['xxx xxx','specs','67646546','United States of America']],
columns = ['name', 'type', 'aim', 'context' ])
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':[{'attributes':{},'context':{'country':row['context']}}]}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
if idx2 != 'aim':
continue
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts'][0]['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Output:
{
"entities": [
{
"name": "xxx xxx",
"type": "specs",
"data": {
"contexts": [
{
"attributes": {
"aim": {
"values": [
{
"value": "67646546",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}

The problem is here in the following code
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
As you can see , you are already creating a contexts dict and assigning values to it. What you could do is something like this
contextObj = {'attributes':{},'context':{'dcountry':row['dcountry']}}
contextList = []
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
contextObj['attributes'].update(dict_temp)
contextList.append(contextObj)
Please Note - This code will have logical errors and might not run ( as it is difficult for me , to understand the logic behind it). But here is what you need to do .
You need to create a list of objects, which is not what you are doing. You are trying to manipulate an object and when its JSON dumped , you are getting an object back instead of a list. What you need is a list. You create context object for each and every iteration and keep on appending them to the local list contextList that we created earlier.
Once when the for loop terminates, you can update your original object by using the contextList and you will have a list of objects instead of and object which you are having now.

Multiple Csv to json with no repetitive childrens

I have a csv columns as follows, now I am trying to convert it to name/Children/Size format as required in D3 in JSON. There are repetetive Childrens Occuring For example
In name ="type" there is children ="young", size = 400000
L1 L2 L3 L4 L5 L6 Size
Type cars young young young young 40000
Type cars student US US US 10000
Type cars student UK UK UK 20000
Type cars Graduates Young India Delhi 20000
Type cars Graduates Old UK London 30000
Type Bike Undergrads CB CB UNC 6000
prime prime prime prime prime prime 600
My output I am getting is :
{
"name": "Segments",
"children": [
{
"name": "Type",
"children": [
{
"name": "cars",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"size": "40000"
}
]
}
]
}
]
},
{
"name": "student",
"children": [
{
"name": "US",
"children": [
{
"name": "US",
"children": [
{
"name": "US",
"size": "10000"
}
]
}
]
},
{
"name": "UK",
"children": [
{
"name": "UK",
"children": [
{
"name": "UK",
"size": "20000"
}
]
}
]
}
]
}
]
}
]
},
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"size": "600"
}
]
}
]
}
]
}
]
}
]
}
]
}
expecting output is:
{
"name": "Segments",
"children": [
{
"name": "Type",
"children": [
{
"name": "cars",
"children": [
{
"name": "young",
"size": "40000"
}
]
},
{
"name": "student",
"children": [
{
"name": "US",
"size": "10000"
}
{
"name": "UK",
"size": "20000"
}
]
}
]
},
{
"name": "prime",
"size": "600"
}
]
}
I am using Following code:
import json
import csv
class Node(object):
def __init__(self, name, size=None):
self.name = name
self.children = []
self.size = size
def child(self, cname, size=None):
child_found = [c for c in self.children if c.name == cname]
if not child_found:
_child = Node(cname, size)
self.children.append(_child)
else:
_child = child_found[0]
return _child
def as_dict(self):
res = {'name': self.name}
if self.size is None:
res['children'] = [c.as_dict() for c in self.children]
else:
res['size'] = self.size
return res
root = Node('Segments')
with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
reader = csv.reader(f)
p = list(reader)
for row in range(1, len(p)):
grp1, grp2, grp3, grp4, grp5, grp6, size = p[row]
root.child(grp1).child(grp2).child(grp3).child(grp4).child(grp5).child(grp6, size)
print(json.dumps(root.as_dict(), indent=4))

So what you want to first to is to remove duplicates from each row and create the children accordingly.
Here's what I changed:
with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
reader = csv.reader(f)
p = list(reader)
for row in range(1, len(p)):
temp = []
for x in p[row]:
if x not in temp:
temp.append(x)
#Create a temporary list of the row but keep only unique elements
## Additional code according to your dictionary structure
#if row != 1:
# if 'cars' in temp:
# temp.remove('cars')
# elif 'Bike' in temp:
# temp.remove('Bike')
# Create a string to which will look similar to root.child(grp1)...
evalStr = 'root'
for i in range(len(temp)):
if i == len(temp)-2:
evalStr += '.child("' + temp[i] + '","' + temp[-1] + '")'
else:
evalStr += '.child("' + temp[i] + '")'
# eval(string) will evaluate the string as python code
eval(evalStr)
print(json.dumps(root.as_dict(),indent=2))
Let me know if that works.

First of all you need to remove the dups from your row. This can be done as follows,
p[row] = ('Type', 'cars', 'young', 'young', 'young', 'young', 'Size')
pp = set()
new_p_row = [el for el in p[row] if not (el in pp or pp.add(el))]
# ['Type', 'cars', 'young', 'Size']
Then add childrens to your root until the last two,
for r in new_p_row[:-2]:
root.child(r)
Add the last child to your root with the size of it,
root.child(new_p_row[-2], new_p_row[-1])

converting links to python dictionary for d3.js interactive tree

I have links in hierarchical form like this.
root/Arts/
root/Arts/Literature/
root/Arts/Literature/Novels/
root/Arts/Literature/Comics/
root/Sports/
root/Sports/Football/
...
I want to plot them and visualize the tree but the tree goes very deep with too many links.
I am not able to view this tree more than 3 levels when using pydot/graphviz.
I want to convert this to a dictionary key value pairing with children
like this
[
{
"name": "root",
"parent": "null",
"children": [
{
"name": "Arts",
"parent": "root",
"children": [
{
"name": "Literature",
"parent": "Arts",
"children": [
{
"name": "Novels",
"parent": "Literature"
},
{
"name": "Comics",
"parent": "Literature"
}
]
}
]
},
{
"name": "Sports",
"parent": "root",
"children": [
{
"name": "Football",
"parent": "Sports"
}
]
}
]
}
]
to plot this into a d3.js interactive tree
EDIT
This worked for me -
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
else:
add_to_tree(name, parent, x["children"])
def dic_converter_single_root(sorted_list):
start_tree = [{"name":"root", "parent":"null", "children":[]}]
for x in sorted_list:
name = x.split('/')[-2]
parent = x.split('/')[-3]
add_to_tree(name, parent, start_tree)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert DataFrame into nested JSON - python

Related

duplicates in a JSON file based on two attributes

Appending elements in lists in conversion to JSON

How to add square brackets in JSON object with python

Multiple Csv to json with no repetitive childrens

converting links to python dictionary for d3.js interactive tree

Categories

Resources