I have a list of parent-child relations where the structure isn't a true tree. Some parents can have many children and also some children can have more than one parent.
import pandas as pd
df = pd.DataFrame([[123,234],[123,235],[123,236],[124,236],[234,345],[236,346]], columns=['Parent','Child'])*
I would like to group all children for specific ancestors. From the data:
123,234,235,236,345,346
124,235,346
Should be the correct groups.
I tried with:
parents = set()
children = {}
for p, c in df.to_records(index=False).tolist():
parents.add(p)
children[c] = p
def getAncestors(p):
return (getAncestors(children[p]) if p in children else []) + [p]
But on 346 it only returns one group.
Also, how to then find all children for 123 and 124?
Thank you!
As you said, it's not really a tree, but more like a directed acyclic graph, so you can't map each child to just one parent; it'd have to be a list of parents. Also, given your use case, I'd suggest mapping parents to their lists of children instead.
relations = [[123,234],[234,345],[123,235],[123,236],[124,236],[236,346]]
children = {}
for p, c in relations:
children.setdefault(p, []).append(c)
roots = set(children) - set(c for cc in children.values() for c in cc)
You can then use a recursive function similar to the one you already have to get all the children to a given root node (or any parent node). The root itself is not in the list, but can easily be added.
def all_children(p):
if p not in children:
return set()
return set(children[p] + [b for a in children[p] for b in all_children(a)])
print({p: all_children(p) for p in roots})
# {123: {234, 235, 236, 345, 346}, 124: {346, 236}}
Related
Dear experienced friends, I am looking for an algorithm (Python) that outputs the width of a tree at each level. Here are the input and expected outputs.
(I have updated the problem with a more complex edge list. The original question with sorted edge list can be elegantly solved by #Samwise answer.)
Input (Edge List: source-->target)
[[11,1],[11,2],
[10,11],[10,22],[10,33],
[33,3],[33,4],[33,5],[33,6]]
The tree graph looks like this:
10
/ | \
11 22 33
/ \ / | \ \
1 2 3 4 5 6
Expected Output (Width of each level/height)
[1,3,6] # according to the width of level 0,1,2
I have looked through the web. It seems this topic related to BFS and Level Order Traversal. However, most solutions are based on the binary tree. How can solve the problem when the tree is not binary (e.g. the above case)?
(I'm new to the algorithm, and any references would be really appreciated. Thank you!)
Build a dictionary of the "level" of each node, and then count the number of nodes at each level:
>>> from collections import Counter
>>> def tree_width(edges):
... levels = {} # {node: level}
... for [p, c] in edges:
... levels[c] = levels.setdefault(p, 0) + 1
... widths = Counter(levels.values()) # {level: width}
... return [widths[level] for level in sorted(widths)]
...
>>> tree_width([[0,1],[0,2],[0,3],
... [1,4],[1,5],
... [3,6],[3,7],[3,8],[3,9]])
[1, 3, 6]
This might not be the most efficient, but it requires only two scans over the edge list, so it's optimal up to a constant factor. It places no requirement on the order of the edges in the edge list, but does insist that each edge be (source, dest). Also, doesn't check that the edge list describes a connected tree (or a tree at all; if the edge list is cyclic, the program will never terminate).
from collections import defauiltdict
# Turn the edge list into a (non-binary) tree, represented as a
# dictionary whose keys are the source nodes with the list of children
# as its value.
def edge_list_to_tree(edges):
'''Given a list of (source, dest) pairs, constructs a tree.
Returns a tuple (tree, root) where root is the root node
and tree is a dict which maps each node to a list of its children.
(Leaves are not present as keys in the dictionary.)
'''
tree = defaultdict(list)
sources = set() # nodes used as sources
dests = set() # nodes used as destinations
for source, dest in edges:
tree[source].append(dest)
sources.add(source)
dests.add(dest)
roots = sources - dests # Source nodes which are not destinations
assert(len(roots) == 1) # There is only one in a tree
tree.default_factory = None # Defang the defaultdict
return tree, roots.pop()
# A simple breadth-first-search, keeping the count of nodes at each level.
def level_widths(tree, root):
'''Does a BFS of tree starting at root counting nodes at each level.
Returns a list of counts.
'''
widths = [] # Widths of the levels
fringe = [root] # List of nodes at current level
while fringe:
widths.append(len(fringe))
kids = [] # List of nodes at next level
for parent in fringe:
if parent in tree:
for kid in tree[parent]:
kids.append(kid)
fringe = kids # For next iteration, use this level's kids
return widths
# Put the two pieces together.
def tree_width(edges):
return level_widths(*edge_list_to_tree(edges))
Possible solution that is based on Width-First-Traversal
In Width-First-Traversal we add the node to the array, but in this solution we put the array in an object together with its level and then add it to the array.
function levelWidth(root) {
const counter = [];
const traverseBF = fn => {
const arr = [{n: root, l:0}];
const pushToArr = l => n => arr.push({n, l});
while (arr.length) {
const node = arr.shift();
node.n.children.forEach(pushToArr(node.l+1));
fn(node);
}
};
traverseBF(node => {
counter[node.l] = (+counter[node.l] || 0) + 1;
});
return counter;
}
I have a tree like this:
A
/ | \
B C D
/ \ | / | \
E F G H I J
and I am trying to append the nodes of the tree to an empty list
such that the list looks like:
[[A], [B, C, D], [E, F, G, H, I, J]]
Suppose that I have a root_node A and I don't know how deep my tree is.
How can I append nodes from the tree to an empty list in the above-mentioned format?
I tried breadth first search, but my list length is way longer than than the
depth of the tree.
Append each new depth of nodes as a new list.
Start with an empty list: tree = []
Create a new inner list for the current depth
Append each element at the current depth in the list: tree[depth].append(element)
Go to the next depth and repeat
Given a Node implementation like:
class Node:
def __init__(self, name):
self.name = name
self.children = []
def __repr__(self):
return f'Node({self.name})'
You can create your nodes and arrange a graph with:
nodes = {letter: Node(letter) for letter in 'ABCDEFGHIJ'}
nodes['A'].children.extend([nodes['B'], nodes['C'], nodes['D']])
nodes['B'].children.extend([nodes['E'], nodes['F']])
nodes['C'].children.extend([nodes['G']])
nodes['D'].children.extend([nodes['H'], nodes['I'], nodes['J']])
Now you can start with the root node, and continually make a new list of nodes until you run out with a simple generator:
def make_lists(root):
current = [root]
while current:
yield current
current = [c for n in current for c in n.children]
list(make_lists(nodes['A']))
The while loop will end when there are no more children, resulting in:
[[Node(A)],
[Node(B), Node(C), Node(D)],
[Node(E), Node(F), Node(G), Node(H), Node(I), Node(J)]]
I'm trying to get an efficient algorithm to calculate the height of a tree in Python for large datasets. The code I have works for small datasets, but takes a long time for really large ones (100,000 items) so I'm trying to figure out ways to optimize it but am getting stuck. Sorry if it seems like a really newbie question, I'm pretty new to Python.
The input is a list length and a list of values, with each list item pointing to its parent, with list item -1 indicating the root of the tree. So with an input of:
5
4 -1 4 1 1
The answer would be 3 - the tree is: ({key:1, children: [{key: 3}, {key:4, children:[{key:0, {key:2}]}] }
Here is the code that I have so far:
import sys, threading
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**25) # new thread will get stack of such size
class TreeHeight:
def read(self):
self.n = int(sys.stdin.readline())
self.parent = list(map(int, sys.stdin.readline().split()))
def getChildren(self, node, nodes):
parent = {'key': node, 'children': []}
children = [i for i, x in enumerate(nodes) if x == parent['key']]
for child in children:
parent['children'].append(self.getChildren(child, nodes))
return parent
def compute_height(self, tree):
if len(tree['children']) == 0:
return 0
else:
max_values = []
for child in tree['children']:
max_values.append(self.compute_height(child))
return 1 + max(max_values)
def main():
tree = TreeHeight()
tree.read()
treeChild = tree.getChildren(-1, tree.parent)
print(tree.compute_height(treeChild))
threading.Thread(target=main).start()
first, while python is really a great general purpose language, using raw python for large datasets is not very efficient. consider using pandas, NumPy, SciPy or one of the many great alternatives.
second, if you're concerned with tree's height, and your tree is a write-once-read-always one. you could simply alter the code that reads the input to not only fill the tree but also measure the number of height.
this attitude makes sense when you don't expect you tree to change after been created
Use DFS to avoid stack overflow in recursive calls. Use a marker to know the end of a level during the traversal.
from collections import defaultdict
def compute_height(root, tree):
q = ListQueue()
q.enqueue(root)
q.enqueue('$')
height = 1
while not q.isEmpty():
elem = q.dequeue()
if elem =='$' and not q.isEmpty():
elem = q.dequeue()
height+=1
q.enqueue('$')
for child in tree[elem]:
q.enqueue(child)
return height
tree = defaultdict(list)
parents = [4, -1, 4, 1, 1]
for node,parent in enumerate(parents):
tree[parent].append(node)
root = tree.pop(-1)[0]
print(compute_height(root, tree))
I'm relatively new to python, and I was trying out some questions when I encountered this problem. A tree is defined in a text file in the following manner,
d:
e:
b: d e
c:
a: b c
So, I want to write a simple python script that finds the depth of this. I'm not able to figure out a strategy to work this out. Is there any algorithm or technique for this?
My strategy would be as follows:
Find elements with no children.
For each of these, find the parent. Determine if any elements have this parent as a child - if not, your length is two (2).
If so, find the parent of the parent. Repeat step 2, incrementing your length counter. Continue the process updating a counter with each step.
For your case:
d -> b -> a (len 3)
e -> b -> a (len 3)
c -> a (len 2)
This could be described as a 'bottom up' tree construction method/algorithm.
The tree format you've given has a nice property: if x is the child of y, then x is given before y in the file. So you can simply loop through the file once and read the depth into a dictionary. For example:
depth = {}
for line in f:
parent, children = read_node(line)
if children:
depth[parent] = max(depth.get(child,1) for child in children) + 1
Then just print depth['a'], as a is the root. Here read_node is a quick function to parse the parent and children from a line of the file:
def read_node(line):
parent, children = line.split(":")
return parent, children.split()
I'm not sure what you mean by depth, if it's how many steps you have to go to visit every node, you could use the Depth-First Search to see how long it takes to visit every node in the graph.
Here's a simple implementation:
text_tree = """d:
e:
b: d e
c:
a: b c"""
tree = {}
for line in text_tree.splitlines():
node, childs = line.split(":")
tree[node] = set(childs.split())
def dfs(graph, start):
visited, stack = [], [start]
while stack:
vertex = stack.pop()
if vertex not in visited:
visited.append(vertex)
stack.extend(graph[vertex])
return visited
result = dfs(tree,"a")
print "It took %d steps, to visit every node in tree, the path took was %s"%(len(result),result)
Which outputs:
It took 5 steps, to visit every node in tree, the path took was ['a', 'b', 'd', 'e', 'c']
This is a follow-up question to Combinatorics in Python
I have a tree or directed acyclic graph if you will with a structure as:
Where r are root nodes, p are parent nodes, c are child nodes and b are hypothetical branches. The root nodes are not directly linked to the parent nodes, it is only a reference.
I am intressted in finding all the combinations of branches under the constraints:
A child can be shared by any number of parent nodes given that these parent nodes do not share root node.
A valid combination should not be a subset of another combination
In this example only two valid combinations are possible under the constraints:
combo[0] = [b[0], b[1], b[2], b[3]]
combo[1] = [b[0], b[1], b[2], b[4]]
The data structure is such as b is a list of branch objects, which have properties r, c and p, e.g.:
b[3].r = 1
b[3].p = 3
b[3].c = 2
This problem can be solved in Python easily and elegantly, because there is a module called "itertools".
Lets say we have objects of type HypotheticalBranch, which have attributes r, p and c. Just as you described it in your post:
class HypotheticalBranch(object):
def __init__(self, r, p, c):
self.r=r
self.p=p
self.c=c
def __repr__(self):
return "HypotheticalBranch(%d,%d,%d)" % (self.r,self.p,self.c)
Your set of hypothetical branches is thus
b=[ HypotheticalBranch(0,0,0),
HypotheticalBranch(0,1,1),
HypotheticalBranch(1,2,1),
HypotheticalBranch(1,3,2),
HypotheticalBranch(1,4,2) ]
The magical function that returns a list of all possible branch combos could be written like so:
import collections, itertools
def get_combos(branches):
rc=collections.defaultdict(list)
for b in branches:
rc[b.r,b.c].append(b)
return itertools.product(*rc.values())
To be precise, this function returns an iterator. Get the list by iterating over it. These four lines of code will print out all possible combos:
for combo in get_combos(b):
print "Combo:"
for branch in combo:
print " %r" % (branch,)
The output of this programme is:
Combo:
HypotheticalBranch(0,1,1)
HypotheticalBranch(1,3,2)
HypotheticalBranch(0,0,0)
HypotheticalBranch(1,2,1)
Combo:
HypotheticalBranch(0,1,1)
HypotheticalBranch(1,4,2)
HypotheticalBranch(0,0,0)
HypotheticalBranch(1,2,1)
...which is just what you wanted.
So what does the script do? It creates a list of all hypothetical branches for each combination (root node, child node). And then it yields the product of these lists, i.e. all possible combinations of one item from each of the lists.
I hope I got what you actually wanted.
You second constraint means you want maximal combinations, i.e. all the combinations with the length equal to the largest combination.
I would approach this by first traversing the "b" structure and creating a structure, named "c", to store all branches coming to each child node and categorized by the root node that comes to it.
Then to construct combinations for output, for each child you can include one entry from each root set that is not empty. The order (execution time) of the algorithm will be the order of the output, which is the best you can get.
For example, your "c" structure, will look like:
c[i][j] = [b_k0, ...]
--> means c_i has b_k0, ... as branches that connect to root r_j)
For the example you provided:
c[0][0] = [0]
c[0][1] = []
c[1][0] = [1]
c[1][1] = [2]
c[2][0] = []
c[2][1] = [3, 4]
It should be fairly easy to code it using this approach. You just need to iterate over all branches "b" and fill the data structure for "c". Then write a small recursive function that goes through all items inside "c".
Here is the code (I entered your sample data at the top for testing sake):
class Branch:
def __init__(self, r, p, c):
self.r = r
self.p = p
self.c = c
b = [
Branch(0, 0, 0),
Branch(0, 1, 1),
Branch(1, 2, 1),
Branch(1, 3, 2),
Branch(1, 4, 2)
]
total_b = 5 # Number of branches
total_c = 3 # Number of child nodes
total_r = 2 # Number of roots
c = []
for i in range(total_c):
c.append([])
for j in range(total_r):
c[i].append([])
for k in range(total_b):
c[b[k].c][b[k].r].append(k)
combos = []
def list_combos(n_c, n_r, curr):
if n_c == total_c:
combos.append(curr)
elif n_r == total_r:
list_combos(n_c+1, 0, curr)
elif c[n_c][n_r]:
for k in c[n_c][n_r]:
list_combos(n_c, n_r+1, curr + [b[k]])
else:
list_combos(n_c, n_r+1, curr)
list_combos(0, 0, [])
print combos
There are really two problems here: firstly, you need to work out the algorithm that you will use to solve this problem and secondly, you need to implement it (in Python).
Algorithm
I shall assume you want a maximal collection of branches; that is, once to which you can't add any more branches. If you don't, you can consider all subsets of a maximal collection.
Therefore, for a child node we want to take as many branches as possible, subject to the constraint that no two parent nodes share a root. In other words, from each child you may have at most one edge in the neighbourhood of each root node. This seems to suggest that you want to iterate first over the children, then over the (neighbourhoods of the) root nodes, and finally over the edges between these. This concept gives the following pseudocode:
for each child node:
for each root node:
remember each permissible edge
find all combinations of permissible edges
Code
>>> import networkx as nx
>>> import itertools
>>>
>>> G = nx.DiGraph()
>>> G.add_nodes_from(["r0", "r1", "p0", "p1", "p2", "p3", "p4", "c0", "c1", "c2"])
>>> G.add_edges_from([("r0", "p0"), ("r0", "p1"), ("r1", "p2"), ("r1", "p3"),
... ("r1", "p4"), ("p0", "c0"), ("p1", "c1"), ("p2", "c1"),
... ("p3", "c2"), ("p4", "c2")])
>>>
>>> combs = set()
>>> leaves = [node for node in G if not G.out_degree(node)]
>>> roots = [node for node in G if not G.in_degree(node)]
>>> for leaf in leaves:
... for root in roots:
... possibilities = tuple(edge for edge in G.in_edges_iter(leaf)
... if G.has_edge(root, edge[0]))
... if possibilities: combs.add(possibilities)
...
>>> combs
set([(('p1', 'c1'),),
(('p2', 'c1'),),
(('p3', 'c2'), ('p4', 'c2')),
(('p0', 'c0'),)])
>>> print list(itertools.product(*combs))
[(('p1', 'c1'), ('p2', 'c1'), ('p3', 'c2'), ('p0', 'c0')),
(('p1', 'c1'), ('p2', 'c1'), ('p4', 'c2'), ('p0', 'c0'))]
The above seems to work, although I haven't tested it.
For each child c, with hypothetical parents p(c), with roots r(p(c)), choose exactly one parent p from p(c) for each root r in r(p(c)) (such that r is the root of p) and include b in the combination where b connects p to c (assuming there is only one such b, meaning it's not a multigraph). The number of combinations will be the product of the numbers of parents by which each child is hypothetically connected to each root. In other words, the size of the set of combinations will be equal to the product of the hypothetical connections of all child-root pairs. In your example all such child-root pairs have only one path, except r1-c2, which has two paths, thus the size of the set of combinations is two.
This satisfies the constraint of no combination being a subset of another because by choosing exactly one parent for each root of each child, we maximize the number connections. Subsequently adding any edge b would cause its root to be connected to its child twice, which is not allowed. And since we are choosing exactly one, all combinations will be exactly the same length.
Implementing this choice recursively will yield the desired combinations.