Python: how to loop through all unknown depth of a tree?

Python: how to loop through all unknown depth of a tree? - python

I have a strategic issue of writing a program doing a job.
I have CSV files like:
Column1 Column 2
------- ----------
parent1 [child1, child2, child3]
parent2 [child4, child5, child6]
child1 [child7, child8]
child5 [child10, child33]
... ...
It is unknown how deep each element of those lists will be extended and I want to loop through them.
Code:
def make_parentClass(self):
for i in self.csv_rows_list:
self.parentClassList.append(parentClass(i))
# after first Parent
for i in self.parentClassList:
if i.children !=[]:
for child in i.children:
for z in self.parentClassList:
if str(child) == str(z.node_parent):
i.node_children.append(z)
self.parentClassList.remove(z)
class parentClass():
node_children = []
def __init__(self, the_list):
self.node_parent = the_list[0]
self.children = the_list[1]
The above code might be a solution if I will find a way to iterate. Let me see if you like the question and makes sense now.
Output:
My aim is to build up a treeview through another language but first I need to make this output in JSON format. So the output expected to be something like:
{
paren1:{'child1':{'child7':{}, 'child8':{}},
'child2': {},
'child3': {},
},
parent2: {
'child4':{},
'child5': {
'child10':{},
'child33':{}
},
'child6':{}
}
}

I would recommend a solution using two dictionaries. One nested one with the actually data structure you plan to convert to JSON, and one flat one that will let you actually find the keys. Since everything is a reference in Python, you can make sure that both dictionaries have the exact same values. Carefully modifying the flat dictionary will build your structure for you.
The following code assumes that you have already managed to split each line into a string parent and list children, containing values form the two columns.
json_dict = {}
flat_dict = {}
for parent, children in file_iterator():
if parent in flat_dict:
value = flat_dict[parent]
else:
value = {}
flat_dict[parent] = json_dict[parent] = value
for child in children:
flat_dict[child] = value[child] = {}
Running this produces json_dict like this:
{
'parent1': {
'child1': {
'child7': {},
'child8': {}
},
'child2': {},
'child3': {}
},
'parent2': {
'child4': {},
'child5': {
'child10': {},
'child33': {}
},
'child6': {}
}
}
Here is an IDEOne link to play with.

Related

python merge json's dict's created in differnet sources to 1 dict json

This is the only an example but for the general idea :
i have Json as dict created in different places in the application
and in the end i like to merge them as 1 JSON .
json_1 ={}
formated_db_name = "formatedname"
json_1[formated_db_name] = {"data_source_name": formated_db_name}
json_1[formated_db_name] = {"db_servers_entry_list": {}}
json_2 = {}
formated_db_name2 = "formatedname2"
json_2[formated_db_name2] = {"data_source_name2": formated_db_name2}
json_2[formated_db_name2] = {"db_servers_entry_list2": {}}
it creates 2 jsons :
{
"formatedname2": {
"db_servers_entry_list2": {}
}
}
and
{
"formatedname": {
"db_servers_entry_list": {}
}
}
now i like to combine them to look like this :
{
"formatedname2": {
"db_servers_entry_list2": {}
},
"formatedname1": {
"db_servers_entry_list1": {}
}
}
didn't found any json.dumps method to combine both . ( there can be more then 2 such dict that i need to combine )

You can just iterate through the sequence of the dictionaries and call update and pass the individual dictionaries to a dictionary variable:
out = {}
for each in (json_1, json_2):
out.update(each)
# out
{'formatedname': {'db_servers_entry_list': {}}, 'formatedname2': {'db_servers_entry_list2': {}}}
You can later call json.dumps and pass out once you are done merging the dictionaries.
Since the dictionary is being update via the call to update method, it will maintain the the dictionary/json property (there is never going to be multiple key value pair for a single key, it will just update the dictionary with the latest value being passed for a key.)

You can join the dicts beforehand and then you can use the dumps method!
a = dict(a="Test")
b = dict(b=True)
c = dict()
c.update(a)
c.update(b)
print(c)
{
"a": "Test",
"b": True
}

The best way to transform a response to a json format in the example

Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards

Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]

Access dictionary fields by a conditional key

I have a class in python with plenty of methods that access multiple fields of a dictionary like this:
{
"a": {
"element_in_a": {
"element_in_a_id": "some_id"
}
},
"b": {
"element_in_b": {
"element_in_b_id": "some_id"
}
}
}
This dictionary was eventually changed to:
{
"a": {
"element_in_a": {
"element_in_a_id": "some_id"
},
"optional_element_in_a": {
"optional_element_in_a_id": "some_id"
}
},
"b": {
"element_in_b": {
"element_in_b_id": "some_id"
},
"optional_element_in_b": {
"optional_element_in_b_id": "some_id"
}
}
}
I have to modify the methods that access element_in_a or element_in_b of this dictionary such that, whenever optional_element_in_a exists in a, the methods must read only the optional elements everywhere (as a business rule, if the optional element exists in a, it also must exist in b), otherwise keep the old behaviour.
So far, as a quick solution, I've been adding these lines of code at the top of said methods
if "optional_element_in_a" in the_dict["a"]:
element_key_a = "optional_element_in_a"
element_key_b = "optional_element_in_b"
id_key_a = "optional_element_in_a_id"
id_key_b = "optional_element_in_b_id"
else:
element_key_a = "element_in_a"
element_key_b = "element_in_b"
id_key_a = "element_in_a_id"
id_key_b = "element_in_b_id"
and then read the elements using those keys I just defined.
This is obviously not a good solution, as I keep finding more and more methods needing this change.
I need to find the best pythonic way of having this logic in 1 place, so that all methods who need it can easily access it.
Btw, the_dict is not a class property/attribute. It is passed as argument to each method that operates on it, so I don't have access to it in __init__.
Thanks in advance.

if "optional_element_in_a" in the_dict["a"]:
element_key_a = "optional_element_in_a"
element_key_b = "optional_element_in_b"
id_key_a = "optional_element_in_a_id"
id_key_b = "optional_element_in_b_id"
else:
element_key_a = "element_in_a"
element_key_b = "element_in_b"
id_key_a = "element_in_a_id"
id_key_b = "element_in_b_id"
The main problem you have here is code repetition.
We can compress it by doing if on prefix, like such:
prefix = "optional_" if "optional_element_in_a" in the_dict["a"] else ""
element_key_a = prefix+"element_in_a"
element_key_b = prefix+"element_in_b"
id_key_a = prefix+"element_in_a_id"
id_key_b = prefix+"element_in_b_id"
Moreover, if you say "I've been adding these lines of code at the top of said methods", then it means even more repetition. And it's a sign you might need to define a function - either with this code above, some variation of it, or even something else.
Decide what you need: What is the structure of those keys you need? Do you need the key or the value? Do you need to extract always the same number of keys/values or different? Do you need to operate on the dict you have, or can you do a new dict with values you extracted (so that you don't care about whether the keys have "optional_" or not)?

I eventually went with a function that returns a namedtuple, with the keys as members of that tuple.
Like this:
def get_keys(a_elem):
has_optional_elem = "optional_element_in_a" in a_elem
Keys = namedtuple("Keys", ["element_key_a", "element_key_b", ...])
return Keys(
element_key_a = "optional_element_in_a" if has_optional_elem else "element_in_a",
element_key_b = "optional_element_in_b" if has_optional_elem else "element_in_b",
...
)
so that then in each method that needs these keys I can do simply
keys = get_keys(the_dict[a])
the_needed_a_elem = the_dict[a][keys.element_key_a]
the_needed_b_elem = the_dict[b][keys.element_key_b]
...
which I believe is simple enough for good readability when using the keys.

Use Python and JSON to recursively get all keys associated with a value

Giving data organized in JSON format (code example bellow) how can we get the path of keys and sub-keys associated with a given value?
i.e.
Giving an input "23314" we need to return a list with:
Fanerozoico, Cenozoico, Quaternario, Pleistocenico, Superior.
Since data is a json file, using python and json lib we had decoded it:
import json
def decode_crono(crono_file):
with open(crono_file) as json_file:
data = json.load(json_file)
Now on we do not know how to treat it in a way to get what we need.
We can access keys like this:
k = data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "].keys()
or values like this:
v= data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "]["Superior"].values()
but this is still far from what we need.
{
"Fanerozoico": {
"id": "20000",
"Cenozoico": {
"id": "23000",
"Quaternario": {
"id": "23300",
"Pleistocenico": {
"id": "23310",
"Superior": {
"id": "23314"
},
"Medio": {
"id": "23313"
},
"Calabriano": {
"id": "23312"
},
"Gelasiano": {
"id": "23311"
}
}
}
}
}
}

It's a little hard to understand exactly what you are after here, but it seems like for some reason you have a bunch of nested json and you want to search it for an id and return a list that represents the path down the json nesting. If so, the quick and easy path is to recurse on the dictionary (that you got from json.load) and collect the keys as you go. When you find an 'id' key that matches the id you are searching for you are done. Here is some code that does that:
def all_keys(search_dict, key_id):
def _all_keys(search_dict, key_id, keys=None):
if not keys:
keys = []
for i in search_dict:
if search_dict[i] == key_id:
return keys + [i]
if isinstance(search_dict[i], dict):
potential_keys = _all_keys(search_dict[i], key_id, keys + [i])
if 'id' in potential_keys:
keys = potential_keys
break
return keys
return _all_keys(search_dict, key_id)[:-1]
The reason for the nested function is to strip off the 'id' key that would otherwise be on the end of the list.
This is really just to give you an idea of what a solution might look like. Beware the python recursion limit!

Based on the assumption that you need the full dictionary path until a key named id has a particular value, here's a recursive solution that iterates the whole dict. Bear in mind that:
The code is not optimized at all
For huge json objects it might yield StackOverflow :)
It will stop at first encountered value found (in theory there shouldn't be more than 1 if the json is semantically correct)
The code:
import json
from types import DictType
SEARCH_KEY_NAME = "id"
FOUND_FLAG = ()
CRONO_FILE = "a.jsn"
def decode_crono(crono_file):
with open(crono_file) as json_file:
return json.load(json_file)
def traverse_dict(dict_obj, value):
for key in dict_obj:
key_obj = dict_obj[key]
if key == SEARCH_KEY_NAME and key_obj == value:
return FOUND_FLAG
elif isinstance(key_obj, DictType):
inner = traverse_dict(key_obj, value)
if inner is not None:
return (key,) + inner
return None
if __name__ == "__main__":
value = "23314"
json_dict = decode_crono(CRONO_FILE)
result = traverse_dict(json_dict, value)
print result

Recursively build hierarchical JSON tree?

I have a database of parent-child connections. The data look like the following but could be presented in whichever way you want (dictionaries, list of lists, JSON, etc).
links=(("Tom","Dick"),("Dick","Harry"),("Tom","Larry"),("Bob","Leroy"),("Bob","Earl"))
The output that I need is a hierarchical JSON tree, which will be rendered with d3. There are discrete sub-trees in the data, which I will attach to a root node. So I need to recursively go though the links, and build up the tree structure. The furthest I can get is to iterate through all the people and append their children, but I can't figure out to do the higher order links (e.g. how to append a person with children to the child of someone else). This is similar to another question here, but I have no way to know the root nodes in advance, so I can't implement the accepted solution.
I am going for the following tree structure from my example data.
{
"name":"Root",
"children":[
{
"name":"Tom",
"children":[
{
"name":"Dick",
"children":[
{"name":"Harry"}
]
},
{
"name":"Larry"}
]
},
{
"name":"Bob",
"children":[
{
"name":"Leroy"
},
{
"name":"Earl"
}
]
}
]
}
This structure renders like this in my d3 layout.

To identify the root nodes you can unzip links and look for parents who are not children:
parents, children = zip(*links)
root_nodes = {x for x in parents if x not in children}
Then you can apply the recursive method:
import json
links = [("Tom","Dick"),("Dick","Harry"),("Tom","Larry"),("Bob","Leroy"),("Bob","Earl")]
parents, children = zip(*links)
root_nodes = {x for x in parents if x not in children}
for node in root_nodes:
links.append(('Root', node))
def get_nodes(node):
d = {}
d['name'] = node
children = get_children(node)
if children:
d['children'] = [get_nodes(child) for child in children]
return d
def get_children(node):
return [x[1] for x in links if x[0] == node]
tree = get_nodes('Root')
print json.dumps(tree, indent=4)
I used a set to get the root nodes, but if order is important you can use a list and remove the duplicates.

Try follwing code:
import json
links = (("Tom","Dick"),("Dick","Harry"),("Tom","Larry"),("Tom","Hurbert"),("Tom","Neil"),("Bob","Leroy"),("Bob","Earl"),("Tom","Reginald"))
name_to_node = {}
root = {'name': 'Root', 'children': []}
for parent, child in links:
parent_node = name_to_node.get(parent)
if not parent_node:
name_to_node[parent] = parent_node = {'name': parent}
root['children'].append(parent_node)
name_to_node[child] = child_node = {'name': child}
parent_node.setdefault('children', []).append(child_node)
print json.dumps(root, indent=4)

In case you want to format the data as a hierarchy in the HTML/JS itself, take a look at:
Generate (multilevel) flare.json data format from flat json
In case you have tons of data the Web conversion will be faster since it uses the reduce functionality while Python lacks functional programming.
BTW: I am also working on the same topic i.e. generating the collapsible tree structure in d3.js. If you want to work along, my email is: erprateek.vit#gmail.com.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to loop through all unknown depth of a tree? - python

Related

python merge json's dict's created in differnet sources to 1 dict json

The best way to transform a response to a json format in the example

Access dictionary fields by a conditional key

Use Python and JSON to recursively get all keys associated with a value

Recursively build hierarchical JSON tree?

Categories

Resources