How to format nested JSON to a iterative process format - python

I have a process tree in nested JSON format and Im trying to turn it into a iterative process dictionary with lists. For example, the nested tree is below:
{
"name": "test",
"children": [
{
"name": "Operator_8a82e",
"children": [
{
"name": "Link_e5479",
"children": [
{
"name": "Operator_b7394",
"children": [
{
"name": "Link_7f62e",
"children": [
{
"name": "Operator_73ea0",
"children": [
{
"name": "Link_93a51",
"children": [
{
"name": "Operator_32a07"
}
]
}
]
}
]
}
]
}
]
},
{
"name": "Link_59e2c",
"children": [
{
"name": "Operator_3ca6d"
}
]
}
]
}
]
}
And I want it to look like this below. Basically each sub-tree is placed in a iterative list (in the order it appears in the nested JSON.. this is very important).
{
"process_1": [
{
"name": "Operator_8a82e"
},
{
"name": "Link_e5479"
},
{
"name": "Operator_b7394"
}
],
"process_2": [
{
"name": "Operator_8a82e"
},
{
"name": "Link_59e2c"
}
]
}
My current function almost gets me there. Ill explain why it doesn't fully work below.
def flatten_json(y):
out = {}
def flatten(x):
i = 0
if type(x) is dict:
for a in x:
flatten(x[a])
elif type(x) is list:
for a in x:
i += 1
flatten(a)
else:
print(x)
flatten(y)
return out
This returns the following. However I cant seem to distinguish when another sub tree ends (and a new one starts)
test
Operator_8a82e
Link_e5479
Operator_b7394
Link_7f62e
Operator_73ea0
Link_93a51
Operator_32a07
Link_59e2c
Operator_3ca6d
So for example, ideally the output looks like:
test
Operator_8a82e
Link_e5479
Operator_b7394
Link_7f62e
Operator_73ea0
Link_93a51
Operator_32a07
Operator_8a82e # this is what is missing in my function above
Link_59e2c
Operator_3ca6d
Any help would be great!

This is tree formatted graph data. You can load it as a networkx graph, then export it again in the desired format. Assuming you have loaded the json as a python dictionary called data:
from networkx.readwrite import json_graph
G = json_graph.tree_graph(data, ident="name")
Now let's first find all sink nodes (nodes with no outgoing edges), then find the simple paths from the defined source:
#define source node
source = 'Operator_8a82e'
#get a list of sink nodes
sinks = [node for node in G.nodes if G.out_degree(node) == 0]
#get all simple paths from source to sinks
paths = [list(nx.all_simple_paths(G, source=source, target=sink)) for sink in sinks]
#get first path since there is only one
paths = [i[0] for i in paths if i]
#create dict
[{f'process_{n+1}': [{'name':i} for i in path]} for n, path in enumerate(paths)]
Result:
[{'process_1': [{'name': 'Operator_8a82e'},
{'name': 'Link_e5479'},
{'name': 'Operator_b7394'},
{'name': 'Link_7f62e'},
{'name': 'Operator_73ea0'},
{'name': 'Link_93a51'},
{'name': 'Operator_32a07'}]},
{'process_2': [{'name': 'Operator_8a82e'},
{'name': 'Link_59e2c'},
{'name': 'Operator_3ca6d'}]}]

Related

Create a list of new urls contained in objects in python

I have two json databases. If there is a new value in the "img_url" (one in the last json that isn't in the other), I want to print the url or place it in a variable. The goal is just to find a list of the new values.
Input json:
last_data = [
{
"objectID": 16240,
"results": [
{
"img_url": "https://img.com/1.jpg"
},
{
"img_url": "https://img.com/2.jpg"
},
{
"img_url": "https://img.com/30.jpg"
}
]
}
{
"objectID": 16242,
"results": [
{
"img_url": "https://img.com/1.jpg"
},
{
"img_url": "https://img.com/2.jpg"
},
{
"img_url": "https://img.com/3.jpg"
}
]
}]
# ...
#multiple other objectIDs
]
Second input:
second_data =[
{
"objectID": 16240,
"results": [
{
"img_url": "https://img.com/1.jpg"
},
{
"img_url": "https://img.com/2.jpg"
}
]
},
{
"objectID": 16242,
"results": [
{
"img_url": "https://img.com/1.jpg"
},
{
"img_url": "https://img.com/2.jpg"
}
]
}...
#multiple other objectIDs
]
And I want to output only the https://img.com/3.jpg and the https://img.com/3.jpg urls (it can be a list because I have multiples objects) or place it in a variable
My code:
#last file
for item_last in last_data:
results_last = item_last["results"]
if results_last is not []:
for result_last in results_last:
ccv_last = result_last["img_url"]
#second file
for item_second in second_data:
results_second = item_second["results"]
if results_second is not []:
# loop in results
for result_second in results_second:
ccv_second = result_second["img_url"]
if gm_last != gm_second and gm_last is not None:
print(gm_last)
If you are trying to find difference between two different list here it is.
I have slightly modified your same code to get the expected result.
#last file
ccv_last = []
for item_last in last_data:
results_last = item_last["results"]
if results_last:
for result_last in results_last:
ccv_last.append(result_last["img_url"])
#second file
ccv_second = []
for item_second in second_data:
results_second = item_second["results"]
if results_second:
for result_second in results_second:
ccv_second.append(result_second["img_url"])
diff_list = list(set(ccv_last)-set(ccv_second)))
Output:
['https://img.com/30.jpg', 'https://img.com/3.jpg']
However you can plan to slightly change your results model for better performance please find below.
If you think no further keys are planned for the dictionaries in result list then probably you just want list. So you can change dict -> list
from
...
"results": [
{
"img_url": "https://img.com/1.jpg"
},
{
"img_url": "https://img.com/2.jpg"
}
]
...
to just list of urls
...
"img_url_results": ["https://img.com/1.jpg","https://img.com/2.jpg"]
...
By doing this change you can just skip one for loop.
#last file
ccv_last = []
for item_last in last_data:
if item_last.get('img_url_results'):
ccv_last.extend(item_last["img_url_results"])

How to cut desired level of unlimited depth of JSON data in Python?

I have a large JSON file with the following structure, with different depth embedded in a node. I want to delete different levels based on assigned depth.
So far I tried to cut some level manually but still it doesn't remove correctly as I remove them based on index and each time indexes shift
content = json.load(file)
content_copy = content.copy()
for j, i in enumerate(content):
if 'children' in i:
for xx, x in enumerate(i['children']):
if 'children' in x:
for zz, z in enumerate(x['children']):
if 'children' in z:
del content_copy[j]['children'][xx]['children'][zz]
Input:
[
{
"name":"1",
"children":[
{
"name":"3",
"children":"5"
},
{
"name":"33",
"children":"51"
},
{
"name":"13",
"children":[
{
"name":"20",
"children":"30"
},
{
"name":"40",
"children":"50"
}
]
}
]
},
{
"name":"2",
"children":[
{
"name":"7",
"children":"6"
},
{
"name":"3",
"children":"521"
},
{
"name":"193",
"children":"292"
}
]
}
]
Output:
In which in 'name':13, its children were removed.
[
{
"name": "1",
"children": [
{
"name": "3",
"children": "5"
},
{
"name": "33",
"children": "51"
},
{
"name": "13"
}
]
},
{
"name": "2",
"children": [
{
"name": "7",
"children": "6"
},
{
"name": "3",
"children": "521"
},
{
"name": "193",
"children": "292"
}
]
}
]
Not a python answer, but in the hope it's useful to someone, here is a one liner using jq tool:
<file jq 'del(.[][][]?[]?[]?)'
It simply deletes all elements that has a depth more than the 5.
The question mark ? is used to avoid iterating over elements that would have a depth less than 3.
One way to prune is pass depth+1 in a recursive function call.
You are asking for different behaviors for different types. If the grandchild is just a string, you want to keep it, but if it is a list then you want to prune. This seems inconsistent, 13 should have children ["20", "30"] but then they wouldn't be the same node structure, so I can see your dilemma.
I would convent them to a tree of node objects, and then just prune nodes, but to get the exact output you listed, I can just selectively prune based on whether child is a string or list.
import pprint
import json
data = """[{
"name": "1", "children":
[
{"name":"3","children":"5"},{"name":"33","children":"51"},
{"name":"13","children":[{"name":"20","children":"30"},
{"name":"40","children":"50"}]}
]
},
{
"name": "2", "children":
[
{"name":"7","children":"6"},
{"name":"3","children":"521"},
{"name":"193","children":"292"}
]
}]"""
content = json.loads(data)
def pruned_nodecopy(content, prune_level, depth=0):
if not isinstance(content, list):
return content
result = []
for node in content:
node_copy = {'name': node['name']}
if 'children' in node:
children = node['children']
if not isinstance(children, list):
node_copy['children'] = children
elif depth+1 < prune_level:
node_copy['children'] = pruned_nodecopy(node['children'], prune_level, depth+1)
result.append(node_copy)
return result
content_copy = pruned_nodecopy(content, 2)
pprint.pprint (content_copy)
Note that this is specifically copying the attributes you use. I had to make it use hard-coded attributes because you're asking for specific (and different) behaviors on those.

Can't figure out how to append JSON into a single parent object

I'm having an issue which I can't figure out the solution of. I'm trying to build a JSON object. I'm working with some VMware API and I'm building a JSON object for usernames and nesting VM information inside. I'm having trouble building the nested objects. See below and I'll explain further. Note, annotation is used as a tag to identify the owner of the virtual machine.
owner_logged_in = "johndoe"
service_instance = connect.SmartConnectNoSSL(host='10.0.0.202', user='', pwd='')
atexit.register(connect.Disconnect, service_instance)
content = service_instance.RetrieveContent()
container = content.rootFolder # starting point to look into
viewType = [vim.VirtualMachine] # object types to look for
recursive = True # whether we should look into it recursively
containerView = content.viewManager.CreateContainerView(container, viewType, recursive)
children = containerView.view
virtual_machines = []
vm_username = {}
vm_container = {}
for child in children:
summary = child.summary
annotation = summary.config.annotation
if owner_logged_in == annotation:
children = []
children.append({'ip': summary.guest.ipAddress,'power': summary.runtime.powerState})
vm_container['name'] = summary.config.name
vm_username[owner_logged_in] = vm_container
vm_container['properties'] = children
jsonvalues = json.dumps(vm_username)
#debug#
print(jsonvalues)
#debug#
The returned results are as follows:
{"johndoe": {"name": "centos01", "properties": [{"ip": "10.0.0.201", "power": "poweredOn"}]}}
{"johndoe": {"name": "dc01", "properties": [{"ip": "10.0.0.200", "power": "poweredOn"}]}}
I need to somehow combine these two into 1 object that I can store in a value which then I'm using a DJANGO web app iterate through the value to build a table using some javascript (tabullar.js). I can handle that part but what I'm struggling with is coming up with a way to make 1 object. A user might have more than one virtual machines and I need the properties of each but make the 'johndoe' be the parent.
Essentially I need it formatted to look like this to properly convert it to a table using the tabular.js.
{
"johndoe":[
{
"name":"centos01",
"properties":[
{
"ip":"10.0.0.201",
"power":"poweredOn"
}
]
},
{
"name":"dc01",
"properties":[
{
"ip":"10.0.0.200",
"power":"poweredOn"
}
]
}
]
}
Any help would greatly be appreciated!
try follwing:
from collections import defaultdict
virtual_machines = []
vm_username = defaultdict(list)
vm_container = defaultdict(list)
for child in children:
summary = child.summary
annotation = summary.config.annotation
if owner_logged_in == annotation:
children = []
children.append({'ip': summary.guest.ipAddress,'power': summary.runtime.powerState})
vm_container['name'] = summary.config.name
vm_username[owner_logged_in].append(vm_container)
vm_container['properties'] = children
jsonvalues = json.dumps(vm_username, indent=4)
#debug#
print(jsonvalues)
#debug#
Instead of printing if you can just append it to a list the following solution should work.
import json
d_list=[
{"johndoe": {"name": "centos01", "properties": [{"ip": "10.0.0.201", "power": "poweredOn"}]}},
{"johndoe": {"name": "dc01", "properties": [{"ip": "10.0.0.200", "power": "poweredOn"}]}},
{"janedoe": {"name": "centos02", "properties": [{"ip": "10.0.0.201", "power": "poweredOn"}]}},
{"janedoe": {"name": "dc02", "properties": [{"ip": "10.0.0.200", "power": "poweredOn"}]}}
]
d_new={name:[v for x in d_list for k,v in x.items() if k ==name] for name in set(list(y)[0] for y in d_list)}
# for printing output properly
print(json.dumps(d_new,indent=4))
Output
{
"johndoe": [
{
"name": "centos01",
"properties": [
{
"ip": "10.0.0.201",
"power": "poweredOn"
}
]
},
{
"name": "dc01",
"properties": [
{
"ip": "10.0.0.200",
"power": "poweredOn"
}
]
}
],
"janedoe": [
{
"name": "centos02",
"properties": [
{
"ip": "10.0.0.201",
"power": "poweredOn"
}
]
},
{
"name": "dc02",
"properties": [
{
"ip": "10.0.0.200",
"power": "poweredOn"
}
]
}
]
}

Create a JSON type nested dictionary from a list in Python

I wish to create a JSON type nested dictionary from a list of lists. The lists contained a full directory path, but I broke them into their individual components as I thought it would make the creation of the nested dictionaries easier.
An example list:
["root", "dir1", "file.txt"]
The expected result:
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "file",
"name": "file.txt",
}
]
}
]
}
I've tried using a recursive method but couldn't quite get there (new to recursive methods and my head continually spun out). Also tried an iterative method from an idea I found here (stack overflow) which inverted the list and build the dict backwards, which I kind of got to work, but was unable to solve one of the solution requirements, which is that the code can deal with duplication in parts of the directory paths as it iterates over the list of lists.
For example following on from the last example, the next inputted list is this:-
["root", "dir1", "dir2", "file2.txt"]
and it need to build onto the JSON dictionary to produce this:-
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "file",
"name": "file.txt",
}
{
"type": "directory",
"name": "dir2",
"children": [
{
"type": "file",
"name": "file2.txt"
}
]
}
]
}
]
}
and so on with an unknown number of lists containing directory paths.
Thanks.
A recursive solution with itertools.groupby is as follows (assuming all paths are absolute paths). The idea is to group paths by the first element in the path list. This groups similar directory roots together, allowing us to call the function recursively on that group.
Also note that file names cannot be duplicated in a directory, so all files will be grouped as single element lists by groupby:
from itertools import groupby
from operator import itemgetter
def build_dict(paths):
if len(paths) == 1 and len(paths[0]) == 1:
return {"type": "file", "name": paths[0][0]}
dirname = paths[0][0]
d = {"type": "directory", "name": dirname, "children": []}
for k, g in groupby(sorted([p[1:] for p in paths], key=itemgetter(0)),
key=itemgetter(0)):
d["children"].append(build_dict(list(g)))
return d
paths = [["root", "dir1", "file.txt"], ["root", "dir1", "dir2", "file2.txt"]]
print(build_dict(paths))
Output
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "directory",
"name": "dir2",
"children": [
{
"type": "file",
"name": "file2.txt"
}
]
},
{
"type": "file",
"name": "file.txt"
}
]
}
]
}
Here's a naive recursive solution that simply walks through the tree structure, adding children as necessary, until the last element of path is reached (assumed to be a file).
import json
def path_to_json(path, root):
if path:
curr = path.pop(0)
if not root:
root["type"] = "file"
root["name"] = curr
if path:
root["children"] = [{}]
root["type"] = "directory"
path_to_json(path, root["children"][0])
elif path:
try:
i = [x["name"] for x in root["children"]].index(path[0])
path_to_json(path, root["children"][i])
except ValueError:
root["children"].append({})
path_to_json(path, root["children"][-1])
return root
if __name__ == "__main__":
paths = [["root", "dir1", "file.txt"],
["root", "dir1", "dir2", "file2.txt"]]
result = {}
print(json.dumps([path_to_json(x, result) for x in paths][0], indent=4))
Output:
{
"type": "directory",
"name": "root",
"children": [
{
"type": "directory",
"name": "dir1",
"children": [
{
"type": "file",
"name": "file.txt"
},
{
"type": "directory",
"name": "dir2",
"children": [
{
"type": "file",
"name": "file2.txt"
}
]
}
]
}
]
}
Try it!
Given not much detail has been provided, here is a solution that uses a reference to enter each nested dict
In [537]: structure = ["root", "dir1", "dir2", "file2.txt"]
In [538]: d = {}
# Create a reference to the current dict
In [541]: curr = d
In [542]: for i, s in enumerate(structure):
...: curr['name'] = s
...: if i != len(structure) - 1:
...: curr['type'] = 'directory'
...: curr['children'] = {}
...: curr = curr['children'] # New reference is the child dict
...: else:
...: curr['type'] = 'file'
...:
In [544]: from pprint import pprint
In [545]: pprint(d)
{'children': {'children': {'children': {'name': 'file2.txt', 'type': 'file'},
'name': 'dir2',
'type': 'directory'},
'name': 'dir1',
'type': 'directory'},
'name': 'root',
'type': 'directory'}
I don't know if this will work for all of your questions as the spec isn't very detailed

How can I convert my JSON into the format required to make a D3 sunburst diagram?

I have the following JSON data:
{
"data": {
"databis": {
"dataexit": {
"databis2": {
"1250": { }
}
},
"datanode": {
"20544": { }
}
}
}
}
I want to use it to generate a D3 sunburst diagram, but that requires a different data format:
{
"name": "data",
"children": [
{
"name": "databis",
"children": [
{
"name": "dataexit",
"children": [
{
"name": "databis2",
"size": "1250"
}
]
},
{
"name": "datanode",
"size": "20544"
}
]
}
]
}
How can I do this with Python? I think I need to use a recursive function, but I don't know where to start.
You could use recursive solution with function that takes name and dictionary as parameter. For every item in given dict it calls itself again to generate list of children which look like this: {'name': 'name here', 'children': []}.
Then it will check for special case where there's only one child which has key children with value of empty list. In that case dict which has given parameter as a name and child name as size is returned. In all other cases function returns dict with name and children.
import json
data = {
"data": {
"databis": {
"dataexit": {
"databis2": {
"1250": { }
}
},
"datanode": {
"20544": { }
}
}
}
}
def helper(name, d):
# Collect all children
children = [helper(*x) for x in d.items()]
# Return dict containing size in case only child looks like this:
# {'name': '1250', 'children': []}
# Note that get is used to so that cases where child already has size
# instead of children work correctly
if len(children) == 1 and children[0].get('children') == []:
return {'name': name, 'size': children[0]['name']}
# Normal case where returned dict has children
return {'name': name, 'children': [helper(*x) for x in d.items()]}
def transform(d):
return helper(*next(iter(d.items())))
print(json.dumps(transform(data), indent=4))
Output:
{
"name": "data",
"children": [
{
"name": "databis",
"children": [
{
"name": "dataexit",
"children": [
{
"name": "databis2",
"size": "1250"
}
]
},
{
"name": "datanode",
"size": "20544"
}
]
}
]
}

Categories

Resources