How to convert hierarchical table to json - python

I am trying to convert a multi-level hierarchy table into a specific JSON format for a visual I am creating.
I have the data in a pandas dataframe and have tried grouping it by the different levels, but then cannot convert a groupby to a json using pandas. I did also try just converting the dataframe to a json, but the format isn't correct. I am not sure what else to do to get the parent/child format that I am looking for. All the "size" values only need to be 1 so that part seems straightforward enough...
Thanks in advance!
**This is what my data looks like**
ColA ColB ColC
Parent1 Child1
Parent1 Child2 Child2A
Parent1 Child2 Child2B
Parent1 Child3 Child2A
Parent2 Child1
Parent2 Child2 Child2A
What I am getting from the pandas dataframe to_json is creating the json column by column, so I am losing the hierarchy aspect of it.
so its:
data = {"Parent1}"{"index #":"col2 value"
What I want is:
data = ({ "name":"TEST",
"children": [
{
"name": "Parent1",
"children":
[
{
"name": "Child1",
"size": "1"
},
{
"name":"Child2",
"children":
[
{
"name":"Child2A",
"size":"1"
},
{
"name":"Child2B",
"size":"1"
},
{
"name":"Child2C",
"size":"1"
},
{
"name":"Child2D",
"size":"1"
},
],
},
{
"name":"Parent2",
"children": [
{
"name":"Child2A",
"size":"1"
},
{
"name":"Child2B",
"size":"1"
},
{
"name":"Child2C",
"size":"1"
},
]
},
]
},
{
"name": "Parent3",
"children":
[
{
"name": "Child1",
"size": "1",
},
{
"name":"Child2",
"children":
[
{
"name":"Child2A",
"size":"1"
},
{
"name":"Child2B",
"size":"1"
},
{
"name":"Child2C",
"size":"1"
},
],
},
{
"name":"Child3",
"children":
[
{
"name":"Child3A",
"size":"1"
},
],
},
],
},
]})

Here we come
import json
data = [
'Parent1 Child1',
'Parent1 Child2 Child2A',
'Parent1 Child2 Child2B',
'Parent1 Child3 Child2A',
'Parent2 Child1',
'Parent2 Child2 Child2A',
]
tree = {}
for d in data:
node = None
for item in d.split():
name = item.strip() # dont need spaces
current_dict = tree if node is None else node
node = current_dict.get(name)
if not node:
node = {}
current_dict[name] = node
def walker(src, res):
for name, value in src.items():
node = {'name': name, 'size': 1}
if 'children' not in res:
res['children'] = []
res['children'].append(node)
walker(value, node)
result = {'name': 'TEST'}
walker(tree, result)
print (json.dumps(result, indent = True))

Related

Pandas to JSON not respecting DataFrame format

I have a Pandas DataFrame which I need to transform into a JSON object. I thought by grouping it, I would achieve this but this does not seem to yield the correct results. Further, I wouldnt know how to name the sub group.
My data frame as follows:
parent
name
age
nick
stef
10
nick
rob
12
And I do a groupby as I would like all children together under one parent in json:
df = df.groupby(['parent', 'name'])['age'].min()
And I would like it to yield the following:
{
"parent": "Nick",
"children": [
{
"name": "Rob",
"age": 10,
},
{
"name": "Stef",
"age": 15,
},,.. ]
}
When I do .to_json() it seems to regroup everything on age etc.
df.groupby(['parent'])[['name', 'age']].apply(list).to_json()
Given I wanted to add some styling, I ended up solving it as follows:
import json
df_grouped = df.groupby('parent')
new = []
for group_name, df_group in df_grouped:
base = {}
base['parent'] = group_name
children = []
for row_index, row in df_group.iterrows():
temp = {}
temp['name'] = row['name']
temp['age'] = row['age']
children.append(temp)
base['children'] = children
new.append(base)
json_format = json.dumps(new)
print(new)
Which yielded the following results:
[
{
"parent":"fee",
"children":[
{
"name":"bob",
"age":9
},
{
"name":"stef",
"age":10
}
]
},
{
"parent":"nick",
"children":[
{
"name":"stef",
"age":10
},
{
"name":"tobi",
"age":2
},
{
"name":"ralf",
"age":12
}
]
},
{
"parent":"patrick",
"children":[
{
"name":"marion",
"age":10
}
]
}
]

duplicates in a JSON file based on two attributes

I have a JSON file and that is a nested JSON. I would like to remove duplicates based on two keys.
JSON example:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "2",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "Jeroge",
"main": 1
},
{
"name": "Peter",
"main": 0
},
{
"name": "John",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
Here I try to match the title and author name. For example, for id 1 and id 2 are duplicates( as the title is same and author names are also same(the author sequence doesn't matter and no need to consider the main attributes). So, in the output JSON only id:1 or id:2 will remain with id:3. In the final output I need two file.
Output_JSON:
"books": [
{
"id": "1",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
},
{
"name": "Peter",
"main": 0
}
]
}
]
},
{
"id": "3",
"story": {
"title": "Lonely lion"
},
"description": [
{
"release": false,
"author": [
{
"name": "John",
"main": 1
},
{
"name": "Jeroge",
"main": 0
}
]
}
]
}
]
duplicatedID.csv:
1-2
The following method I tried but it is not giving correct results:
list= []
duplicate_Id = []
for data in (json_data['books'])[:]:
elements= []
id = data['id']
title = data['story']['title']
elements.append(title)
for i in (data['description'][0]['author']):
name = (i['name'])
elements.append(name)
if not list:
list.append(elements)
else:
for j in list:
if set(elements) == set(j):
duplicate_Id.append(id)
elements = []
The general idea is to:
Get the groups identified by some function that collects duplicates.
Then return the first entry of each group, ensuring no duplicates.
Define the key function as the sorted list of authors and. As the list of authors is by definition the unique key, but may appear in any order.
import json
from itertools import groupby
j = json.load(books)
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group[0] for group in groups]
def getAuthors(book):
authors = book['description'][0]['author']
return sorted([author['name'] for author in authors])
print(transform(j['books']))
If we wanted to get the duplicates, then we do the same computation, but return any sublist with length > 1 as this is by our definition duplicated data.
def transform(books):
groups = [list(group) for _, group in groupby(books, key=getAuthors)]
return [group for group in groups if len(group) > 1]
Where j['books'] is the JSON you gave enclosed in an object.

Pandas json normalize an object property containing an array of objects

If I have json data formatted like this:
{
"result": [
{
"id": 878787,
"name": "Testing",
"schema": {
"id": 3463463,
"smartElements": [
{
"svKey": "Model",
"value": {
"type": "type1",
"value": "ThisValue"
}
},
{
"svKey": "SecondKey",
"value": {
"type": "example",
"value": "ThisValue2"
}
}
]
}
},
{
"id": 333,
"name": "NameName",
"schema": {
"id": 1111,
"smartElements": [
{
"svKey": "Model",
"value": {
"type": "type1",
"value": "NewValue"
}
},
{
"svKey": "SecondKey",
"value": {
"type": "example",
"value": "ValueIs"
}
}
]
}
}
]
}
is there a way to normalize it so I end up with records:
name Model SecondKey
Testing ThisValue ThisValue2
NameName NewValue ValueIs
I can get the smartElements to a pandas series but I can't figure out a way to break out smartElements[x].svKey to a column header and smartElements[x].value.value to the value for that column and/or merge it.
I'd skip trying to use a pre-baked solution and just navigate the json yourself.
import json
import pandas as pd
data = json.load(open('my.json'))
records = []
for d in data['result']:
record = {}
record['name'] = d['name']
for ele in d['schema']['smartElements']:
record[ele['svKey']] = ele['value']['value']
records.append(record)
pd.DataFrame(records)
name Model SecondKey
0 Testing ThisValue ThisValue2
1 NameName NewValue ValueIs
My solution
import pandas as pd
import json
with open('test.json') as f:
a = json.load(f)
d = pd.json_normalize(data=a['result'], errors='ignore', record_path=['schema', 'smartElements'], meta=['name'])
print(d)
produces
svKey value.type value.value name
0 Model type1 ThisValue Testing
1 SecondKey example ThisValue2 Testing
2 Model type1 NewValue NameName
3 SecondKey example ValueIs NameName

How can I convert my JSON into the format required to make a D3 sunburst diagram?

I have the following JSON data:
{
"data": {
"databis": {
"dataexit": {
"databis2": {
"1250": { }
}
},
"datanode": {
"20544": { }
}
}
}
}
I want to use it to generate a D3 sunburst diagram, but that requires a different data format:
{
"name": "data",
"children": [
{
"name": "databis",
"children": [
{
"name": "dataexit",
"children": [
{
"name": "databis2",
"size": "1250"
}
]
},
{
"name": "datanode",
"size": "20544"
}
]
}
]
}
How can I do this with Python? I think I need to use a recursive function, but I don't know where to start.
You could use recursive solution with function that takes name and dictionary as parameter. For every item in given dict it calls itself again to generate list of children which look like this: {'name': 'name here', 'children': []}.
Then it will check for special case where there's only one child which has key children with value of empty list. In that case dict which has given parameter as a name and child name as size is returned. In all other cases function returns dict with name and children.
import json
data = {
"data": {
"databis": {
"dataexit": {
"databis2": {
"1250": { }
}
},
"datanode": {
"20544": { }
}
}
}
}
def helper(name, d):
# Collect all children
children = [helper(*x) for x in d.items()]
# Return dict containing size in case only child looks like this:
# {'name': '1250', 'children': []}
# Note that get is used to so that cases where child already has size
# instead of children work correctly
if len(children) == 1 and children[0].get('children') == []:
return {'name': name, 'size': children[0]['name']}
# Normal case where returned dict has children
return {'name': name, 'children': [helper(*x) for x in d.items()]}
def transform(d):
return helper(*next(iter(d.items())))
print(json.dumps(transform(data), indent=4))
Output:
{
"name": "data",
"children": [
{
"name": "databis",
"children": [
{
"name": "dataexit",
"children": [
{
"name": "databis2",
"size": "1250"
}
]
},
{
"name": "datanode",
"size": "20544"
}
]
}
]
}

converting links to python dictionary for d3.js interactive tree

I have links in hierarchical form like this.
root/Arts/
root/Arts/Literature/
root/Arts/Literature/Novels/
root/Arts/Literature/Comics/
root/Sports/
root/Sports/Football/
...
I want to plot them and visualize the tree but the tree goes very deep with too many links.
I am not able to view this tree more than 3 levels when using pydot/graphviz.
I want to convert this to a dictionary key value pairing with children
like this
[
{
"name": "root",
"parent": "null",
"children": [
{
"name": "Arts",
"parent": "root",
"children": [
{
"name": "Literature",
"parent": "Arts",
"children": [
{
"name": "Novels",
"parent": "Literature"
},
{
"name": "Comics",
"parent": "Literature"
}
]
}
]
},
{
"name": "Sports",
"parent": "root",
"children": [
{
"name": "Football",
"parent": "Sports"
}
]
}
]
}
]
to plot this into a d3.js interactive tree
EDIT
This worked for me -
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
else:
add_to_tree(name, parent, x["children"])
def dic_converter_single_root(sorted_list):
start_tree = [{"name":"root", "parent":"null", "children":[]}]
for x in sorted_list:
name = x.split('/')[-2]
parent = x.split('/')[-3]
add_to_tree(name, parent, start_tree)

Categories

Resources