Recursive? looping to n levels in Python - python

Working in python I want to extract a dataset with the following structure:
Each item has a unique ID and the unique ID of its parent. Each parent can have one or more children, each of which can have one or more children of its own, to n levels i.e. the data has an upturned tree-like structure. While it has the potential to go on for infinity, in reality a depth of 10 levels is unusual, as is having more than 10 siblings at each level.
For each item in the dataset I want to show show all items for which this item is their parent... and so on until it reaches the bottom of the dataset.
Doing the first two levels is easy, but I'm unsure how to make it efficiently recurs down through the levels.
Any pointers very much appreciated.

you should probably use a defaultdictionary for this:
from collections import defaultdict
itemdict = defaultdict(list)
for id, parent_id in itemlist:
itemdict[parent_id].append(id)
then you can recursively print it (with indentation) like
def printitem(id, depth=0):
print ' '*depth, id
for child in itemdict[id]:
printitem(child, depth+1)

Are you saying that each item only maintains a reference to its parents? If so, then how about
def getChildren(item) :
children = []
for possibleChild in allItems :
if (possibleChild.parent == item) :
children.extend(getChildren(possibleChild))
return children
This returns a list that contains all items who are in some way descended from item.

If you want to keep the structure of your dataset, this will produce a list of the format [id, [children of id], id2, [children of id2]]
def children(id):
return [id]+[children(x.id) for x in filter(lambda x:x.parent == id, items)]

How about something like this,
#!/usr/bin/python
tree = { 0:(None, [1,2,3]),
1:(0, [4]),
2:(0, []),
3:(0, [5,6]),
4:(1, [7]),
5:(3, []),
6:(3, []),
7:(4, []),
}
def find_children( tree, id ):
print "node:", id, tree[id]
for child in tree[id][1]:
find_children( tree, child )
if __name__=="__main__":
import sys
find_children( tree, int(sys.argv[1]) )
$ ./tree.py 3
node: 3 (0, [5, 6])
node: 5 (3, [])
node: 6 (3, [])
It's also worth noting that python has a pretty low default recursion limit, 1000 I think.
In the event that your tree actually gets pretty deep you'll hit this very quickly.
You can crank this up with,
sys.setrecursionlimit(100000)
and check it with,
sys.getrecursionlimit()

Related

How to map all keys of a nested dictionary to a dataframe using an individual column per key in python

I am new to python and I am trying to achieve the following.
I have this dictionary:
cust_DB = {'cust_ID': {'NAME': 'name', 'ADDRESS': 'address', 'PHONES': {'HOME_PHONE': 'home#', 'WORK_PHONE': 'work#', 'MOBILE_PHONE': 'mobile#'}, 'EMAILS': {'HOME_EMAIL': 'email#home', 'WORK_EMAIL': 'email#work'}
I would like to transform and put this dictionary into a pandas dataframe df with the following columns df and corresponding records:
'ID'|'NAME'|'ADDRESS'|'HOME_PHONE'|'WORK_PHONE'|'MOBILE_PHONE'|'HOME_EMAIL'|'WORK_EMAIL'
If I use pandas.DataFrame.from_dict(), the nested phone numbers and emails are grouped in one column each. Is there a quick way to populate that dataframe?
Thank you!
Let's try this custom flatten function:
def flatten(d):
ret = dict()
for k, v in d.items():
if isinstance(v, dict):
sub = flatten(v)
for kk, vv in sub.items():
ret[kk] = vv
else: ret[k] = v
return ret
out = pd.DataFrame({k:flatten(v) for k, v in cust_DB.items()})
out = out.T.rename_axis('ID').reset_index()
Output:
ID ADDRESS HOME_EMAIL HOME_PHONE MOBILE_PHONE NAME WORK_EMAIL WORK_PHONE
-- ------- --------- ------------ ------------ -------------- ------ ------------ ------------
0 cust_ID address email#home home# mobile# name email#work work#
My answer below is very long.
However, I am a believer in the following old adage:
If you give a man a fish, he will eat for a day.
If you teach a man to fish, he will eat for a lifetime.
You have a dictionary of dictionaries.
I encourage you to visualize the nested dictionary as a tree:
A "leaf " is simply a node (nodes are circle/dot) in a tree which does not have any children. "Leaves" are like human bachelors, or spinsters: they have no kids.
It sounds like you want one of the following two things. I am not sure which....
you want to use only the leaves of the tree as column headers of a table.
you want extract parents of leaf nodes in the tree. The parents of leaves will become column headers for a table
This is often referred to as "flattening" the tree.
cust_DB = {'cust_ID': {'NAME': 'name', 'ADDRESS': 'address', 'PHONES': {'HOME_PHONE': 'home#', 'WORK_PHONE': 'work#', 'MOBILE_PHONE': 'mobile#'}, 'EMAILS': {'HOME_EMAIL': 'email#home', 'WORK_EMAIL': 'email#work'}}}
def flatten(parent):
"""
[* crickets chirping for lack of code-comments* ]
"""
if not hasattr(parent, '__iter__'):
return parent
leaves = list()
root_type = type(parent)
for child_key in iter(parent):
child = parent[child_key]
if type(child) == type(parent):
subleaves = flatten(child)
leaves.extend(subleaves)
else:
leaves.append(child)
return leaves
leaves = flatten(cust_DB)
result = "\n".join(map(str, iter(leaves)))
print(result)
I see that:
The leaves of your tree might be actual data (actual phone numbers)
the parents of the leaf nodes are column headers of the table ("home phone," "work phone," etc...)
I recommend the following solution:
Google "python flatten dictionary tree." The flatten function I give above works, but is ugly. Some other people have written more elegant flatten functions.
Modify someone's flatten function to get parents of leaf nodes instead of leaves.
If you have never learned about depth-first-search or breadth-first-search, I recommend watching a YouTube video on those topics.
WARNING/CAUTION:
There is one issue which arises:
iterators of python lists return values, not key.
iterators of python dictionaries return keys, not values.
Python is what I like to call a "two steps forward, one step back" language. That is, Python is better than old Languages ( like C) in some ways, and Python is worse in other ways.
If Python were well-designed, all built-in containers (list, dict, etc...) would distinguish between key-iterators and value-iterators.
lyst = ["Spam", "Toast", "Eggs"]
# The keys (inputs) of lyst are:
# 0, 1, 2
# The values (outputs) of `lyst` are:
# Spam", "Toast", and "Eggs"
dyct = {0: "Spam", 1:"Toast", 2:"Eggs"}
# The keys (inputs) of dyct are:
# 0, 1, 2
# The values (outputs) of `dyct` are:
# Spam", "Toast", and "Eggs"
print("LIST STUFF IS BELOW")
for element in lyst:
print(element)
print(end = "\n")
print("DICT STUFF IS BELOW")
for element in dyct:
print(element)
You would think the output would be the same for both lists and dictionaries, but the output is different.
LIST STUFF IS BELOW
Spam
Toast
Eggs
DICT STUFF IS BELOW
0
1
2
You have to be very carful, and know ahead of time, whether the container is a dictionary or a list. Otherwise, your for-loops will loop over indices/keys (inputs) instead of values (outputs).
#####################################
# BEGIN CODE FOR LISTS
#####################################
for child in parent:
flatten(child) # FOR LISTS
#####################################
# END OF CODE FOR LISTS
#---------------------------
# BEGIN CODE FOR DICTIONARIES
#####################################
for child_key in parent:
child = parent[child_key] # FOR DICTIONARIES
flatten(child)
#####################################
# END OF CODE FOR DICTIONARIES
#####################################

Build tree-hierachy from two-dimensional list

I have a 2D-list that looks something like this:
[
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
...
]
And I want to create a nested dictionary that then looks something like this:
[{"elem1":["elem2","elem3"]},{"elem4":"elem7"}]
So the higher the index in one of the initial sublists the higher will be the hierachical posiiton in the generated tree. How would you go about this in python? How do you call that "treeification"? I feel like there has to be a package out there that does exactly that.
I don't imagine there is something in a library for this considering it is fairly simple and not that useful for most people. It is better to write the code manually.
First of all, the output format in the question cannot fully represent a tree: for example the data
[
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]
...would need to be similar to [{elem1":["elem2","elem3"]},{"elem4":"elem7"}] but add elem5 as a child of elem3, however elem3 is a string type, with no place for children to be stored. Thus, I suggest the following output format:
{'elem4': {'elem7': {}}, 'elem1': {'elem2': {}, 'elem3': {'elem5': {}}}}
Here every node is represented as a dictionary from child node names to child node values, so a tree containing only a root node looks like {}, and a tree with 3 nodes (the root + 2 children) looks like {'child1': {}, 'child2': {}}.
To take the turn a list of parent-child associations and turn them into such a tree you can use this code:
def treeify(data):
# result dictionary
map_list = {}
# initially all nodes with a child, will have items removed later
root_nodes = {parent for parent, child in data}
for parent, child in data:
# get the dictionary that this node maps to (empty dictionary by default)
children = map_list.setdefault(parent, {})
# add this connection
children[child] = map_list.setdefault(child, {})
# remove node with a parent from the set of root_nodes
if child in root_nodes:
root_nodes.remove(child)
# return the dictionary with only root nodes at the root
return dict((root_node, map_list[root_node]) for root_node in root_nodes)
print(treeify([
["elem1", "elem2"],
["elem1", "elem3"],
["elem4", "elem7"],
["elem3", "elem5"],
]))
Here is code which can help you to get as your required output
data = [
["elem1","elem2"],
["elem1","elem3"],
["elem4","elem7"],
]
maplist = {}
for a in data:
if a[0] in maplist:
maplist[a[0]].append(a[1])
else:
maplist[a[0]] = [a[1]]
print(maplist)
To get sorted based on list item you can use below code
sorted_items = sorted(maplist.items(), key = lambda item : len(item[1]), reverse=True)

Sum of all Nodes Iteratively - Not Recursively - Without 'left' and 'right'

I have this Binary Tree Structure:
# A Node is an object
# - value : Number
# - children : List of Nodes
class Node:
def __init__(self, value, children):
self.value = value
self.children = children
I can easily sum the Nodes, recursively:
def sumNodesRec(root):
sumOfNodes = 0
for child in root.children:
sumOfNodes += sumNodesRec(child)
return root.value + sumOfNodes
Example Tree:
exampleTree = Node(1,[Node(2,[]),Node(3,[Node(4,[Node(5,[]),Node(6,[Node(7,[])])])])])
sumNodesRec(exampleTree)
> 28
However, I'm having difficulty figuring out how to sum all the nodes iteratively. Normally, with a binary tree that has 'left' and 'right' in the definition, I can find the sum. But, this definition is tripping me up a bit when thinking about it iteratively.
Any help or explanation would be great. I'm trying to make sure I'm not always doing things recursively, so I'm trying to practice creating normally recursive functions as iterative types, instead.
If we're talking iteration, this is a good use case for a queue.
total = 0
queue = [exampleTree]
while queue:
v = queue.pop(0)
queue.extend(v.children)
total += v.value
print(total)
28
This is a common idiom. Iterative graph traversal algorithms also work in this manner.
You can simulate stacks/queues using python's vanilla lists. Other (better) alternatives would be the collections.deque structure in the standard library. I should explicitly mention that its enque/deque operations are more efficient than what you'd expect from a vanilla list.
Iteratively you can create a list, stack, queue, or other structure that can hold the items you run through. Put the root into it. Start going through the list, take an element and add its children into the list also. Add the value to the sum. Take next element and repeat. This way there’s no recursion but performance and memory usage may be worse.
In response to the first answer:
def sumNodes(root):
current = [root]
nodeList = []
while current:
next_level = []
for n in current:
nodeList.append(n.value)
next_level.extend(n.children)
current = next_level
return sum(nodeList)
Thank you! That explanation helped me think through it more clearly.

List vs Dict for list of objects with an ID

Firstly, speed is not a massive issue here as the length of lists is relatively small. I'm more interested in style, and code-economy.
I have a graph (nodes and edges) where I need to store data for each node. I use a class like this:
class Node:
def __init__(self,node_id,name,edges,[more data]):
self.node_id = node_id
self.name = name
etc.
etc.
My nodes are then (currently) read from a file and put into a list, like this:
with open("filepath.txt") as f:
content = f.readlines()
nodes = []
for line in content:
lst = ast.literal_eval(line)
nodes.append(Node([lst[0],lst[1],lst[2]...))
I don't really use the position of a node in the list nodes to mean anything; the node is always identified by node_id which is uniquely determined previously.
This means if I want to get the attribute someData from the node with node_id of 7, say, I have to use:
for n in nodes:
if n.node_id == 7:
print(n.someData)
which seems awfully inefficient.
So, I decided to use a dictionary, removing node_id from the Node class and using it as the key instead. A dictionary seems like the 'correct' structure to use, surely? However, in many places this has made my code worse!
For example, where before I had:
sumTotal = sum(n.someData for n in nodes)
I now have to use:
sumTotal = sum(nodes[k].someData for k in nodes)
or
sumTotal = sum(n.someData for n in nodes.values())
Am I missing something here? What would be the best practice for this type of data?
If the node_id is a unique key, you can do this:
nodes = {}
for line in content:
lst = ast.literal_eval(line)
nodes[lst[0]] = Node(lst[0],lst[1],lst[2]...))
And if you need to do anything with them later it will be faster and cleaner:
print nodes[7].someData
You will have to do something like this to get the sum though:
sumTotal = sum(nodes[k].someData for k in nodes)

Unions in Python

Say I'm given a tuple of strings, representing relationships between objects, for example:
connections = ("dr101-mr99", "mr99-out00", "dr101-out00", "scout1-scout2","scout3-scout1", "scout1-scout4", "scout4-sscout", "sscout-super")
each dash "-" shows a relationship between the two items in the string. Then I'm given two items:
first = "scout2"
second = "scout3"
How might I go about finding if first and second are interrelated, meaning I could find a path that connects them, not necessarily if they are just in a string group.
You can try concatenating the strings and using the in operator to check if it is an element of the tuple connections:
if first + "-" + second in connections:
# ...
Edit:
You can also use the join() function:
if "-".join((first, second)) in connections:
# ...
If you plan on doing this any number of times, I'd consider frozensets...
connections_set = set(frozenset(c.split('-')) for c in connections)
Now you can do something like:
if frozenset((first, second)) in connections_set:
...
and you have an O(1) solution (plus the O(N) upfront investment). Note that I'm assuming the order of the pairs is irrelevant. If it's relevant, just use a tuple instead of frozenset and you're good to go.
If you actually need to walk through a graph, an adjacency list implementation might be a little better.
from collections import defaultdict
adjacency_dict = defaultdict(list)
for c in connections:
left, right = c.split('-')
adjacency_dict[left].append(right)
# if undirected: adjacency_dict[right].append(left)
class DFS(object):
def __init__(self, graph):
self.graph = graph
def is_connected(self, node1, node2):
self._seen = set()
self._walk_connections(node1)
output = node2 in self._seen
del self._seen
return output
def _walk_connections(self, node):
if node in self._seen:
return
self._seen.add(node)
for subnode in self.graph[node]:
self._walk_connections(subnode)
print DFS(adjacency_dict).is_connected()
Note that this implementation is definitely suboptimal (I don't stop when I found the node I'm looking for for example) -- and I don't check for an optimal path from node1 to node2. For that, you'd want something like Dijkstra's algorithm
You could use a set of pairs (tuples):
connections = {("dr101", "mr99"), ("mr99", "out00"), ("dr101", "out00")} # ...
if ("scout2", "scout3") in connections:
print "scout2-scout3 in connections"
This only works if the 2 elements are already in the right order, though, because ("scout3", "scout2") != ("scout2", "scout3"), but maybe this is what you want.
If the order of the items in the connection is not significant, you can use a set of frozensets instead (see mgilson's answer). Then you can look up pairs of item regardless of which order they appear in, but the order of the original pairs in connections is lost.

Categories

Resources