Build tree-hierachy from two-dimensional list - python

I have a 2D-list that looks something like this:
[
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
...
]
And I want to create a nested dictionary that then looks something like this:
[{"elem1":["elem2","elem3"]},{"elem4":"elem7"}]
So the higher the index in one of the initial sublists the higher will be the hierachical posiiton in the generated tree. How would you go about this in python? How do you call that "treeification"? I feel like there has to be a package out there that does exactly that.

I don't imagine there is something in a library for this considering it is fairly simple and not that useful for most people. It is better to write the code manually.
First of all, the output format in the question cannot fully represent a tree: for example the data
[
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]
...would need to be similar to [{elem1":["elem2","elem3"]},{"elem4":"elem7"}] but add elem5 as a child of elem3, however elem3 is a string type, with no place for children to be stored. Thus, I suggest the following output format:
{'elem4': {'elem7': {}}, 'elem1': {'elem2': {}, 'elem3': {'elem5': {}}}}
Here every node is represented as a dictionary from child node names to child node values, so a tree containing only a root node looks like {}, and a tree with 3 nodes (the root + 2 children) looks like {'child1': {}, 'child2': {}}.
To take the turn a list of parent-child associations and turn them into such a tree you can use this code:
def treeify(data):
# result dictionary
map_list = {}
# initially all nodes with a child, will have items removed later
root_nodes = {parent for parent, child in data}
for parent, child in data:
# get the dictionary that this node maps to (empty dictionary by default)
children = map_list.setdefault(parent, {})
# add this connection
children[child] = map_list.setdefault(child, {})
# remove node with a parent from the set of root_nodes
if child in root_nodes:
root_nodes.remove(child)
# return the dictionary with only root nodes at the root
return dict((root_node, map_list[root_node]) for root_node in root_nodes)
print(treeify([
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]))

Here is code which can help you to get as your required output
data = [
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
]
maplist = {}
for a in data:
if a[0] in maplist:
maplist[a[0]].append(a[1])
else:
maplist[a[0]] = [a[1]]
print(maplist)
To get sorted based on list item you can use below code
sorted_items = sorted(maplist.items(), key = lambda item : len(item[1]), reverse=True)

Related

How to map all keys of a nested dictionary to a dataframe using an individual column per key in python

I am new to python and I am trying to achieve the following.
I have this dictionary:
cust_DB = {'cust_ID': {'NAME': 'name', 'ADDRESS': 'address', 'PHONES': {'HOME_PHONE': 'home#', 'WORK_PHONE': 'work#', 'MOBILE_PHONE': 'mobile#'}, 'EMAILS': {'HOME_EMAIL': 'email#home', 'WORK_EMAIL': 'email#work'}
I would like to transform and put this dictionary into a pandas dataframe df with the following columns df and corresponding records:
'ID'|'NAME'|'ADDRESS'|'HOME_PHONE'|'WORK_PHONE'|'MOBILE_PHONE'|'HOME_EMAIL'|'WORK_EMAIL'
If I use pandas.DataFrame.from_dict(), the nested phone numbers and emails are grouped in one column each. Is there a quick way to populate that dataframe?
Thank you!
Let's try this custom flatten function:
def flatten(d):
ret = dict()
for k, v in d.items():
if isinstance(v, dict):
sub = flatten(v)
for kk, vv in sub.items():
ret[kk] = vv
else: ret[k] = v
return ret
out = pd.DataFrame({k:flatten(v) for k, v in cust_DB.items()})
out = out.T.rename_axis('ID').reset_index()
Output:
ID ADDRESS HOME_EMAIL HOME_PHONE MOBILE_PHONE NAME WORK_EMAIL WORK_PHONE
-- ------- --------- ------------ ------------ -------------- ------ ------------ ------------
0 cust_ID address email#home home# mobile# name email#work work#
My answer below is very long.
However, I am a believer in the following old adage:
If you give a man a fish, he will eat for a day.
If you teach a man to fish, he will eat for a lifetime.
You have a dictionary of dictionaries.
I encourage you to visualize the nested dictionary as a tree:
A "leaf " is simply a node (nodes are circle/dot) in a tree which does not have any children. "Leaves" are like human bachelors, or spinsters: they have no kids.
It sounds like you want one of the following two things. I am not sure which....
you want to use only the leaves of the tree as column headers of a table.
you want extract parents of leaf nodes in the tree. The parents of leaves will become column headers for a table
This is often referred to as "flattening" the tree.
cust_DB = {'cust_ID': {'NAME': 'name', 'ADDRESS': 'address', 'PHONES': {'HOME_PHONE': 'home#', 'WORK_PHONE': 'work#', 'MOBILE_PHONE': 'mobile#'}, 'EMAILS': {'HOME_EMAIL': 'email#home', 'WORK_EMAIL': 'email#work'}}}
def flatten(parent):
"""
[* crickets chirping for lack of code-comments* ]
"""
if not hasattr(parent, '__iter__'):
return parent
leaves = list()
root_type = type(parent)
for child_key in iter(parent):
child = parent[child_key]
if type(child) == type(parent):
subleaves = flatten(child)
leaves.extend(subleaves)
else:
leaves.append(child)
return leaves
leaves = flatten(cust_DB)
result = "\n".join(map(str, iter(leaves)))
print(result)
I see that:
The leaves of your tree might be actual data (actual phone numbers)
the parents of the leaf nodes are column headers of the table ("home phone," "work phone," etc...)
I recommend the following solution:
Google "python flatten dictionary tree." The flatten function I give above works, but is ugly. Some other people have written more elegant flatten functions.
Modify someone's flatten function to get parents of leaf nodes instead of leaves.
If you have never learned about depth-first-search or breadth-first-search, I recommend watching a YouTube video on those topics.
WARNING/CAUTION:
There is one issue which arises:
iterators of python lists return values, not key.
iterators of python dictionaries return keys, not values.
Python is what I like to call a "two steps forward, one step back" language. That is, Python is better than old Languages ( like C) in some ways, and Python is worse in other ways.
If Python were well-designed, all built-in containers (list, dict, etc...) would distinguish between key-iterators and value-iterators.
lyst = ["Spam", "Toast", "Eggs"]
# The keys (inputs) of lyst are:
# 0, 1, 2
# The values (outputs) of `lyst` are:
# Spam", "Toast", and "Eggs"
dyct = {0: "Spam", 1:"Toast", 2:"Eggs"}
# The keys (inputs) of dyct are:
# 0, 1, 2
# The values (outputs) of `dyct` are:
# Spam", "Toast", and "Eggs"
print("LIST STUFF IS BELOW")
for element in lyst:
print(element)
print(end = "\n")
print("DICT STUFF IS BELOW")
for element in dyct:
print(element)
You would think the output would be the same for both lists and dictionaries, but the output is different.
LIST STUFF IS BELOW
Spam
Toast
Eggs
DICT STUFF IS BELOW
0
1
2
You have to be very carful, and know ahead of time, whether the container is a dictionary or a list. Otherwise, your for-loops will loop over indices/keys (inputs) instead of values (outputs).
#####################################
# BEGIN CODE FOR LISTS
#####################################
for child in parent:
flatten(child) # FOR LISTS
#####################################
# END OF CODE FOR LISTS
#---------------------------
# BEGIN CODE FOR DICTIONARIES
#####################################
for child_key in parent:
child = parent[child_key] # FOR DICTIONARIES
flatten(child)
#####################################
# END OF CODE FOR DICTIONARIES
#####################################

Store dictionary key value pair to tree structure

I have generated a key:value pair from the excel data and now I want to store the key:value pair in the tree structure. Since the dictionary lost its order,I have stored all the keys in the separate data frame to get the order of tree generation.
Here is the example data:
key_value_dict = {(key3:value),(key2:value),(key4:value),(key1:value),(key5:value),..}
df_all_keys_inOrder = [key1,key2,key3,key1,key4,key5,key1,key2,key6,..]
for i in df_all_keys_inOrder.index:
for key, value in key_value_dict.iteritems():
if df_all_keys_inOrder[i] == key:
if i == 0:
root = Node((key,value))
leaf = root
else:
leaf = Node((key,value), parent= leaf)
The problem with this code is: when it come to root node (key]1) again, instead of creating the children of root node, it is creating a new node with key1.
The resultant tree should look like : https://i.stack.imgur.com/tN7em.png

Tree traversal and getting neighbouring child nodes in Python

I'm trying to traverse a tree, and get certain subtrees into a particular data structure. I think an example is the best way to explain it:
For this tree, I want the root node and it's children. Then any children that have their own children should be traversed in the same way, and so on. So for the above tree, we would end up with a data structure such as:
[
(a, [b, c]),
(c, [d, e, f]),
(f, [g, h]),
]
I have some code so far to produce this, but there's an issue that it stops too early (or that's what it seems like):
from spacy.en import English
def _subtrees(sent, root=None, subtrees=[]):
if not root:
root = sent.root
children = list(root.children)
if not children:
return subtrees
subtrees.append((root, [child for child in children]))
for child in children:
return _subtrees(sent, child, subtrees)
nlp = English()
doc = nlp('they showed us an example')
print(_subtrees(list(doc.sents)[0]))
Note that this code won't produce the same tree as in the image. I feel like a generator would be better suited here also, but my generator-fu is even worse than my recursion-fu.
Let's first sketch the recursive algorithm:
Given a tree node, return:
A tuple of the node with its children
The subtrees of each child.
That's all it takes, so let's convert it to pseudocode, ehm, python:
def subtrees(node):
if not node.children:
return []
result = [ (node.dep, list(node.children)) ]
for child in node.children:
result.extend(subtrees(child))
return result
The root is just a node, so it shouldn't need special treatment. But please fix the member references if I misunderstood the data structure.
def _subtrees(root):
subtrees=[]
queue = []
queue.append(root)
while(len(queue)=!0):
root=queue[0]
children = list(root.children)
if (children):
queue = queue + list(root.children)
subtrees.append((root.dep, [child.dep for child in children]))
queue=queue.pop(0)
return subtrees
Assuming you want to know this for using spaCy specifically, why not just:
[(word, list(word.children)) for word in sent]
The Doc object lets you iterate over all nodes in order. So you don't need to walk the tree recursively here --- just iterate.
I can't quite comment yet, but if you modify the response by #syllogism_ like so and it'll omit all nodes that haven't any children in them.
[(word, list(word.children)) for word in s if bool(list(word.children))]

List vs Dict for list of objects with an ID

Firstly, speed is not a massive issue here as the length of lists is relatively small. I'm more interested in style, and code-economy.
I have a graph (nodes and edges) where I need to store data for each node. I use a class like this:
class Node:
def __init__(self,node_id,name,edges,[more data]):
self.node_id = node_id
self.name = name
etc.
etc.
My nodes are then (currently) read from a file and put into a list, like this:
with open("filepath.txt") as f:
content = f.readlines()
nodes = []
for line in content:
lst = ast.literal_eval(line)
nodes.append(Node([lst[0],lst[1],lst[2]...))
I don't really use the position of a node in the list nodes to mean anything; the node is always identified by node_id which is uniquely determined previously.
This means if I want to get the attribute someData from the node with node_id of 7, say, I have to use:
for n in nodes:
if n.node_id == 7:
print(n.someData)
which seems awfully inefficient.
So, I decided to use a dictionary, removing node_id from the Node class and using it as the key instead. A dictionary seems like the 'correct' structure to use, surely? However, in many places this has made my code worse!
For example, where before I had:
sumTotal = sum(n.someData for n in nodes)
I now have to use:
sumTotal = sum(nodes[k].someData for k in nodes)
or
sumTotal = sum(n.someData for n in nodes.values())
Am I missing something here? What would be the best practice for this type of data?
If the node_id is a unique key, you can do this:
nodes = {}
for line in content:
lst = ast.literal_eval(line)
nodes[lst[0]] = Node(lst[0],lst[1],lst[2]...))
And if you need to do anything with them later it will be faster and cleaner:
print nodes[7].someData
You will have to do something like this to get the sum though:
sumTotal = sum(nodes[k].someData for k in nodes)

Recursive? looping to n levels in Python

Working in python I want to extract a dataset with the following structure:
Each item has a unique ID and the unique ID of its parent. Each parent can have one or more children, each of which can have one or more children of its own, to n levels i.e. the data has an upturned tree-like structure. While it has the potential to go on for infinity, in reality a depth of 10 levels is unusual, as is having more than 10 siblings at each level.
For each item in the dataset I want to show show all items for which this item is their parent... and so on until it reaches the bottom of the dataset.
Doing the first two levels is easy, but I'm unsure how to make it efficiently recurs down through the levels.
Any pointers very much appreciated.
you should probably use a defaultdictionary for this:
from collections import defaultdict
itemdict = defaultdict(list)
for id, parent_id in itemlist:
itemdict[parent_id].append(id)
then you can recursively print it (with indentation) like
def printitem(id, depth=0):
print ' '*depth, id
for child in itemdict[id]:
printitem(child, depth+1)
Are you saying that each item only maintains a reference to its parents? If so, then how about
def getChildren(item) :
children = []
for possibleChild in allItems :
if (possibleChild.parent == item) :
children.extend(getChildren(possibleChild))
return children
This returns a list that contains all items who are in some way descended from item.
If you want to keep the structure of your dataset, this will produce a list of the format [id, [children of id], id2, [children of id2]]
def children(id):
return [id]+[children(x.id) for x in filter(lambda x:x.parent == id, items)]
How about something like this,
#!/usr/bin/python
tree = { 0:(None, [1,2,3]),
1:(0, [4]),
2:(0, []),
3:(0, [5,6]),
4:(1, [7]),
5:(3, []),
6:(3, []),
7:(4, []),
}
def find_children( tree, id ):
print "node:", id, tree[id]
for child in tree[id][1]:
find_children( tree, child )
if __name__=="__main__":
import sys
find_children( tree, int(sys.argv[1]) )
$ ./tree.py 3
node: 3 (0, [5, 6])
node: 5 (3, [])
node: 6 (3, [])
It's also worth noting that python has a pretty low default recursion limit, 1000 I think.
In the event that your tree actually gets pretty deep you'll hit this very quickly.
You can crank this up with,
sys.setrecursionlimit(100000)
and check it with,
sys.getrecursionlimit()

Categories

Resources