How to transform a flattened data to a structured json?

How to transform a flattened data to a structured json? - python

This is primary flattened element, aka input data:
['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
This is the target data what I need, aka output data:
[
{
"title": "a",
"children": [
{
"title": "ab",
"children": [
{
"title": "aba",
"children": [
{
"title": "abaa",
"children": [
{
"title": "abaaa"
}
]
},
{
"title": "abab"
}
]
}
]
},
{
"title": "ac",
"children": [
{
"title": "aca",
"children": [
{
"title": "acaa"
},
{
"title": "acab"
}
]
}
]
}
]
}
]
I thought I can use deep-for-loop iteration to generate this json data, but it's so difficult, because num of level will bigger than 10. so I think for-loop can't do in this process, is there any algrithm or use a packaged code to implement a function to achieve this target?
I'm so grateful if you share your mindset, god bless you!

Here is a recursive solution using itertools. I dont know if this is efficient enough for your purpose, but it works. It works by transforming your list of strings into a list of lists, then dividing that into lists with the same first key, and then building the dict and repeating with the first key removed.
from itertools import groupby
from pprint import pprint
data = ['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
components = [x.split("-") for x in data]
def build_dict(component_list):
key = lambda x: x[0]
component_list = sorted(component_list, key=key)
# divide into lists with the same fist key
sublists = groupby(component_list, key)
result = []
for name, values in sublists:
value = {}
value["title"] = name
value["children"] = build_dict([x[1:] for x in values if x[1:]])
result.append(value)
return result
pprint(build_dict(components))
Output:
[{'children': [{'children': [{'children': [{'children': [{'children': [],
'title': 'abaaa'}],
'title': 'abaa'},
{'children': [], 'title': 'abab'}],
'title': 'aba'}],
'title': 'ab'},
{'children': [{'children': [{'children': [], 'title': 'acaa'},
{'children': [], 'title': 'acab'}],
'title': 'aca'}],
'title': 'ac'}],
'title': 'a'}]
To convert this dict to json you can use json.dumps from the json module. I hope my explanaition is clear.

Here is a start:
def populate_levels(dct, levels):
if levels:
if levels[0] not in dct:
dct[levels[0]] = {}
populate_levels(dct[levels[0]], levels[1:])
def create_final(dct):
final = []
for title in dct:
final.append({"title": title, "children": create_final(dct[title])})
return final
data = ['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
template = {}
for item in data:
populate_levels(template, item.split('-'))
final = create_final(template)
I couldn't see a clean way of doing it all at once so I created this in-between template dictionary. Right now if a 'node' has no children its corresponding dict will contain 'children': []
you can change this behavior in the create_final function if you like.

You can use collections.defaultdict:
from collections import defaultdict
def get_struct(d):
_d = defaultdict(list)
for a, *b in d:
_d[a].append(b)
return [{'title':a, 'children':get_struct(filter(None, b))} for a, b in _d.items()]
data = ['a-ab-aba-abaa-abaaa', 'a-ab-aba-abab', 'a-ac-aca-acaa', 'a-ac-aca-acab']
import json
print(json.dumps(get_struct([i.split('-') for i in data]), indent=4))
Output:
[
{
"title": "a",
"children": [
{
"title": "ab",
"children": [
{
"title": "aba",
"children": [
{
"title": "abaa",
"children": [
{
"title": "abaaa",
"children": []
}
]
},
{
"title": "abab",
"children": []
}
]
}
]
},
{
"title": "ac",
"children": [
{
"title": "aca",
"children": [
{
"title": "acaa",
"children": []
},
{
"title": "acab",
"children": []
}
]
}
]
}
]
}
]

Related

How to format nested JSON to a iterative process format

I have a process tree in nested JSON format and Im trying to turn it into a iterative process dictionary with lists. For example, the nested tree is below:
{
"name": "test",
"children": [
{
"name": "Operator_8a82e",
"children": [
{
"name": "Link_e5479",
"children": [
{
"name": "Operator_b7394",
"children": [
{
"name": "Link_7f62e",
"children": [
{
"name": "Operator_73ea0",
"children": [
{
"name": "Link_93a51",
"children": [
{
"name": "Operator_32a07"
}
]
}
]
}
]
}
]
}
]
},
{
"name": "Link_59e2c",
"children": [
{
"name": "Operator_3ca6d"
}
]
}
]
}
]
}
And I want it to look like this below. Basically each sub-tree is placed in a iterative list (in the order it appears in the nested JSON.. this is very important).
{
"process_1": [
{
"name": "Operator_8a82e"
},
{
"name": "Link_e5479"
},
{
"name": "Operator_b7394"
}
],
"process_2": [
{
"name": "Operator_8a82e"
},
{
"name": "Link_59e2c"
}
]
}
My current function almost gets me there. Ill explain why it doesn't fully work below.
def flatten_json(y):
out = {}
def flatten(x):
i = 0
if type(x) is dict:
for a in x:
flatten(x[a])
elif type(x) is list:
for a in x:
i += 1
flatten(a)
else:
print(x)
flatten(y)
return out
This returns the following. However I cant seem to distinguish when another sub tree ends (and a new one starts)
test
Operator_8a82e
Link_e5479
Operator_b7394
Link_7f62e
Operator_73ea0
Link_93a51
Operator_32a07
Link_59e2c
Operator_3ca6d
So for example, ideally the output looks like:
test
Operator_8a82e
Link_e5479
Operator_b7394
Link_7f62e
Operator_73ea0
Link_93a51
Operator_32a07
Operator_8a82e # this is what is missing in my function above
Link_59e2c
Operator_3ca6d
Any help would be great!

This is tree formatted graph data. You can load it as a networkx graph, then export it again in the desired format. Assuming you have loaded the json as a python dictionary called data:
from networkx.readwrite import json_graph
G = json_graph.tree_graph(data, ident="name")
Now let's first find all sink nodes (nodes with no outgoing edges), then find the simple paths from the defined source:
#define source node
source = 'Operator_8a82e'
#get a list of sink nodes
sinks = [node for node in G.nodes if G.out_degree(node) == 0]
#get all simple paths from source to sinks
paths = [list(nx.all_simple_paths(G, source=source, target=sink)) for sink in sinks]
#get first path since there is only one
paths = [i[0] for i in paths if i]
#create dict
[{f'process_{n+1}': [{'name':i} for i in path]} for n, path in enumerate(paths)]
Result:
[{'process_1': [{'name': 'Operator_8a82e'},
{'name': 'Link_e5479'},
{'name': 'Operator_b7394'},
{'name': 'Link_7f62e'},
{'name': 'Operator_73ea0'},
{'name': 'Link_93a51'},
{'name': 'Operator_32a07'}]},
{'process_2': [{'name': 'Operator_8a82e'},
{'name': 'Link_59e2c'},
{'name': 'Operator_3ca6d'}]}]

Pandas dataframe into nested child dictionary

I have a dataframe like below, where each 'level' drills down into more detail, with the last level having an id value.
data = [
{'id': 1, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Siamese Cat'},
{'id': 2, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Javanese Cat'},
{'id': 3, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Ursidae', 'level_4', 'Polar Bear'},
{'id': 4, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Labradore Retriever'},
{'id': 5, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Golden Retriever'}
]
I want to turn this into a nested dictionary of parent / child relationships like below.
var data = {
"name": "Animals",
"children": [
{
"name": "Carnivores",
"children": [
{
"name": "Felidae",
"children": [
{
"id": 1,
"name": "Siamese Cat",
"children": []
},
{
"id": 2,
"name": "Javanese Cat",
"children": []
}
]
},
{
"name": "Ursidae",
"children": [
{
"id": 3,
"name": "Polar Bear",
"children": []
}
]
},
{
"name": "Canidae",
"children": [
{
"id": 4,
"name": "Labradore Retriever",
"children": []
},
{
"id": 5,
"name": "Golden Retriever",
"children": []
}
]
}
]
}
]
}
I've tried several approaches of grouping the dataframe and also looping over individual rows, but haven't been able to find a working solution yet. Any help would be greatly appreciated!

The answer of #Timus mimics your intention, however you might encounter some difficulties searching this dictionary as each level has a key name and a key children. If this is what you intended ignore my answer. However, if you would like to create a dictionary in which you can more easily search through unique keys you can try:
df = df.set_index(['level_1', 'level_2', 'level_3', 'level_4'])
def make_dictionary(df):
if df.index.nlevels == 1:
return df.to_dict()
dictionary = {}
for key in df.index.get_level_values(0).unique():
sub_df = df.xs(key)
dictionary[key] = df_to_dict(sub_df)
return dictionary
make_dictionary(df)
It requires setting the different levels as index, and you will end up with a slightly different dictionary:
{'Animals':
{'Carnivores':
{'Felidae':
{'id': {'Siamese Cat': 1,
'Javanese Cat': 2}},
'Ursidae':
{'id': {'Polar Bear': 3}},
'Canidae':
{'id': {'Labradore Retriever': 4,
'Golden Retriever': 5}}}
}
}

EDIT: Had to make an adjustment, because the result wasn't exactly as expected.
Here's an attempt that produces the expected output (if I haven't made a mistake, which wouldn't be a surprise, because I've made several on the way):
def pack_level(df):
if df.columns[0] == 'id':
return [{'id': i, 'name': name, 'children': []}
for i, name in zip(df[df.columns[0]], df[df.columns[1]])]
return [{'name': df.iloc[0, 0],
'children': [entry for lst in df[df.columns[1]]
for entry in lst]}]
df = pd.DataFrame(data)
columns = list(df.columns[1:])
df = df.groupby(columns[:-1]).apply(pack_level)
for i in range(1, len(columns) - 1):
df = (df.reset_index(level=-1, drop=False).groupby(columns[:-i])
.apply(pack_level)
.reset_index(level=-1, drop=True))
var_data = {'name': df.index[0], 'children': df.iloc[0]}
The result looks a bit different at first glance, but that should be only due to the sorting (from printing):
{
"children": [
{
"children": [
{
"children": [
{
"children": [],
"id": 4,
"name": "Labradore Retriever"
},
{
"children": [],
"id": 5,
"name": "Golden Retriever"
}
],
"name": "Canidae"
},
{
"children": [
{
"children": [],
"id": 1,
"name": "Siamese Cat"
},
{
"children": [],
"id": 2,
"name": "Javanese Cat"
}
],
"name": "Felidae"
},
{
"children": [
{
"children": [],
"id": 3,
"name": "Polar Bear"
}
],
"name": "Ursidae"
}
],
"name": "Carnivores"
}
],
"name": "Animals"
}
I've tried to be as generic as possible, but the first column has to be named id (as in your sample).

Python - grouping values in dictionary

Here is JSON which I'm receiving from Smartsheet API:
{"rows":
[
{
"id":1315072712697732,
"cells":
[
{"columnId":3691535201003396,"value":"MyBooks","displayValue":"MyBooks"},
{"columnId":8195134828373892},
{"columnId":876785433896836,"value":"2018 Year","displayValue":"2018 Year"},
{"columnId":5380385061267332,"value":"http://google.com","displayValue":"http://google.com"}
]
},
{
"id":5818672340068228,
"cells":
[
{"columnId":3691535201003396,"value":"MyBooks","displayValue":"MyBooks"},
{"columnId":8195134828373892},
{"columnId":876785433896836,"value":"2019 Year","displayValue":"2019 Year"},
{"columnId":5380385061267332,"value":"http://google.com","displayValue":"http://google.com"}
]
},
{
"id":6381622293489540,
"cells":
[
{"columnId":3691535201003396,"value":"MyMovies","displayValue":"MyMovies"},
{"columnId":8195134828373892},
{"columnId":876785433896836,"value":"2027 Year","displayValue":"2027 Year"},
{"columnId":5380385061267332,"value":"http://google.com","displayValue":"http://google.com"}
]
},
{
"id":6100147316778884,
"cells":
[
{"columnId":3691535201003396,"value":"MyMovies","displayValue":"MyMovies"},
{"columnId":8195134828373892},
{"columnId":876785433896836,"value":"2035 Year","displayValue":"2035 Year"},
{"columnId":5380385061267332,"value":"http://google.com","displayValue":"http://google.com"}
]
},
{
"id":8351947130464132,
"cells":
[
{"columnId":3691535201003396,"value":"MyHobbies","displayValue":"MyHobbies"},
{"columnId":8195134828373892},
{"columnId":876785433896836,"value":"2037 Year","displayValue":"2037 Year"},
{"columnId":5380385061267332,"value":"http://google.com","displayValue":"http://google.com"}
]
}]}
Here is a piece of my python's code:
s = json.loads(myJson)
my_dictionary = []
for element in s['rows']:
my_dictionary.append({'category': element['cells'][0]['displayValue'],
'categoryId': element['cells'][1]['columnId'],
'pages': [
{'pageName': element['cells'][2]['displayValue'],
'pageURL': element['cells'][3]['displayValue']
}
]})
As I result I got dictionary with all data I need (without all unnecessary stuff), except one thing. I want to group it by category values. So output I want to achieve should looks similar to this:
"category": "MyMovies",
"categoryID": "8195134828373892"
"pages":
[
{"pageName": "2018 Year", "pageURL": "https://google.com"},
{"pageName": "2019 Year", "pageURL": "https://google.com"}
]
How can I do this?

You can do it with following code:
from collections import defaultdict
d = defaultdict(list)
for element in my_dictionary:
d[(element['categoryId'], element['category'])] += element['pages'] # merges all pages into one list
result = []
for element in sorted(d, key=lambda k: k[1]): # sort by category name
result.append({
'category': element[1],
'categoryId': element[0],
'pages': sorted(d[element], key=lambda e: e['pageName']) # sort by page name in pages list
})
print(result)

Create a JSON type nested dictionary from a list in Python

I wish to create a JSON type nested dictionary from a list of lists. The lists contained a full directory path, but I broke them into their individual components as I thought it would make the creation of the nested dictionaries easier.
An example list:
["root", "dir1", "file.txt"]
The expected result:
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "file",
"name": "file.txt",
}
]
}
]
}
I've tried using a recursive method but couldn't quite get there (new to recursive methods and my head continually spun out). Also tried an iterative method from an idea I found here (stack overflow) which inverted the list and build the dict backwards, which I kind of got to work, but was unable to solve one of the solution requirements, which is that the code can deal with duplication in parts of the directory paths as it iterates over the list of lists.
For example following on from the last example, the next inputted list is this:-
["root", "dir1", "dir2", "file2.txt"]
and it need to build onto the JSON dictionary to produce this:-
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "file",
"name": "file.txt",
}
{
"type": "directory",
"name": "dir2",
"children": [
{
"type": "file",
"name": "file2.txt"
}
]
}
]
}
]
}
and so on with an unknown number of lists containing directory paths.
Thanks.

A recursive solution with itertools.groupby is as follows (assuming all paths are absolute paths). The idea is to group paths by the first element in the path list. This groups similar directory roots together, allowing us to call the function recursively on that group.
Also note that file names cannot be duplicated in a directory, so all files will be grouped as single element lists by groupby:
from itertools import groupby
from operator import itemgetter
def build_dict(paths):
if len(paths) == 1 and len(paths[0]) == 1:
return {"type": "file", "name": paths[0][0]}
dirname = paths[0][0]
d = {"type": "directory", "name": dirname, "children": []}
for k, g in groupby(sorted([p[1:] for p in paths], key=itemgetter(0)),
key=itemgetter(0)):
d["children"].append(build_dict(list(g)))
return d
paths = [["root", "dir1", "file.txt"], ["root", "dir1", "dir2", "file2.txt"]]
print(build_dict(paths))
Output
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "directory",
"name": "dir2",
"children": [
{
"type": "file",
"name": "file2.txt"
}
]
},
{
"type": "file",
"name": "file.txt"
}
]
}
]
}

Here's a naive recursive solution that simply walks through the tree structure, adding children as necessary, until the last element of path is reached (assumed to be a file).
import json
def path_to_json(path, root):
if path:
curr = path.pop(0)
if not root:
root["type"] = "file"
root["name"] = curr
if path:
root["children"] = [{}]
root["type"] = "directory"
path_to_json(path, root["children"][0])
elif path:
try:
i = [x["name"] for x in root["children"]].index(path[0])
path_to_json(path, root["children"][i])
except ValueError:
root["children"].append({})
path_to_json(path, root["children"][-1])
return root
if __name__ == "__main__":
paths = [["root", "dir1", "file.txt"],
["root", "dir1", "dir2", "file2.txt"]]
result = {}
print(json.dumps([path_to_json(x, result) for x in paths][0], indent=4))
Output:
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "file",
"name": "file.txt"
},
{
"type": "directory",
"name": "dir2",
"children": [
{
"type": "file",
"name": "file2.txt"
}
]
}
]
}
]
}
Try it!

Given not much detail has been provided, here is a solution that uses a reference to enter each nested dict
In [537]: structure = ["root", "dir1", "dir2", "file2.txt"]
In [538]: d = {}
# Create a reference to the current dict
In [541]: curr = d
In [542]: for i, s in enumerate(structure):
...: curr['name'] = s
...: if i != len(structure) - 1:
...: curr['type'] = 'directory'
...: curr['children'] = {}
...: curr = curr['children'] # New reference is the child dict
...: else:
...: curr['type'] = 'file'
...:
In [544]: from pprint import pprint
In [545]: pprint(d)
{'children': {'children': {'children': {'name': 'file2.txt', 'type': 'file'},
'name': 'dir2',
'type': 'directory'},
'name': 'dir1',
'type': 'directory'},
'name': 'root',
'type': 'directory'}
I don't know if this will work for all of your questions as the spec isn't very detailed

How can I convert my JSON into the format required to make a D3 sunburst diagram?

I have the following JSON data:
{
"data": {
"databis": {
"dataexit": {
"databis2": {
"1250": { }
}
},
"datanode": {
"20544": { }
}
}
}
}
I want to use it to generate a D3 sunburst diagram, but that requires a different data format:
{
"name": "data",
"children": [
{
"name": "databis",
"children": [
{
"name": "dataexit",
"children": [
{
"name": "databis2",
"size": "1250"
}
]
},
{
"name": "datanode",
"size": "20544"
}
]
}
]
}
How can I do this with Python? I think I need to use a recursive function, but I don't know where to start.

You could use recursive solution with function that takes name and dictionary as parameter. For every item in given dict it calls itself again to generate list of children which look like this: {'name': 'name here', 'children': []}.
Then it will check for special case where there's only one child which has key children with value of empty list. In that case dict which has given parameter as a name and child name as size is returned. In all other cases function returns dict with name and children.
import json
data = {
"data": {
"databis": {
"dataexit": {
"databis2": {
"1250": { }
}
},
"datanode": {
"20544": { }
}
}
}
}
def helper(name, d):
# Collect all children
children = [helper(*x) for x in d.items()]
# Return dict containing size in case only child looks like this:
# {'name': '1250', 'children': []}
# Note that get is used to so that cases where child already has size
# instead of children work correctly
if len(children) == 1 and children[0].get('children') == []:
return {'name': name, 'size': children[0]['name']}
# Normal case where returned dict has children
return {'name': name, 'children': [helper(*x) for x in d.items()]}
def transform(d):
return helper(*next(iter(d.items())))
print(json.dumps(transform(data), indent=4))
Output:
{
"name": "data",
"children": [
{
"name": "databis",
"children": [
{
"name": "dataexit",
"children": [
{
"name": "databis2",
"size": "1250"
}
]
},
{
"name": "datanode",
"size": "20544"
}
]
}
]
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to transform a flattened data to a structured json? - python

Related

How to format nested JSON to a iterative process format

Pandas dataframe into nested child dictionary

Python - grouping values in dictionary

Create a JSON type nested dictionary from a list in Python

How can I convert my JSON into the format required to make a D3 sunburst diagram?

Categories

Resources