Python Combinatorics, part 2 - python

This is a follow-up question to Combinatorics in Python
I have a tree or directed acyclic graph if you will with a structure as:
Where r are root nodes, p are parent nodes, c are child nodes and b are hypothetical branches. The root nodes are not directly linked to the parent nodes, it is only a reference.
I am intressted in finding all the combinations of branches under the constraints:
A child can be shared by any number of parent nodes given that these parent nodes do not share root node.
A valid combination should not be a subset of another combination
In this example only two valid combinations are possible under the constraints:
combo[0] = [b[0], b[1], b[2], b[3]]
combo[1] = [b[0], b[1], b[2], b[4]]
The data structure is such as b is a list of branch objects, which have properties r, c and p, e.g.:
b[3].r = 1
b[3].p = 3
b[3].c = 2

This problem can be solved in Python easily and elegantly, because there is a module called "itertools".
Lets say we have objects of type HypotheticalBranch, which have attributes r, p and c. Just as you described it in your post:
class HypotheticalBranch(object):
def __init__(self, r, p, c):
self.r=r
self.p=p
self.c=c
def __repr__(self):
return "HypotheticalBranch(%d,%d,%d)" % (self.r,self.p,self.c)
Your set of hypothetical branches is thus
b=[ HypotheticalBranch(0,0,0),
HypotheticalBranch(0,1,1),
HypotheticalBranch(1,2,1),
HypotheticalBranch(1,3,2),
HypotheticalBranch(1,4,2) ]
The magical function that returns a list of all possible branch combos could be written like so:
import collections, itertools
def get_combos(branches):
rc=collections.defaultdict(list)
for b in branches:
rc[b.r,b.c].append(b)
return itertools.product(*rc.values())
To be precise, this function returns an iterator. Get the list by iterating over it. These four lines of code will print out all possible combos:
for combo in get_combos(b):
print "Combo:"
for branch in combo:
print " %r" % (branch,)
The output of this programme is:
Combo:
HypotheticalBranch(0,1,1)
HypotheticalBranch(1,3,2)
HypotheticalBranch(0,0,0)
HypotheticalBranch(1,2,1)
Combo:
HypotheticalBranch(0,1,1)
HypotheticalBranch(1,4,2)
HypotheticalBranch(0,0,0)
HypotheticalBranch(1,2,1)
...which is just what you wanted.
So what does the script do? It creates a list of all hypothetical branches for each combination (root node, child node). And then it yields the product of these lists, i.e. all possible combinations of one item from each of the lists.
I hope I got what you actually wanted.

You second constraint means you want maximal combinations, i.e. all the combinations with the length equal to the largest combination.
I would approach this by first traversing the "b" structure and creating a structure, named "c", to store all branches coming to each child node and categorized by the root node that comes to it.
Then to construct combinations for output, for each child you can include one entry from each root set that is not empty. The order (execution time) of the algorithm will be the order of the output, which is the best you can get.
For example, your "c" structure, will look like:
c[i][j] = [b_k0, ...]
--> means c_i has b_k0, ... as branches that connect to root r_j)
For the example you provided:
c[0][0] = [0]
c[0][1] = []
c[1][0] = [1]
c[1][1] = [2]
c[2][0] = []
c[2][1] = [3, 4]
It should be fairly easy to code it using this approach. You just need to iterate over all branches "b" and fill the data structure for "c". Then write a small recursive function that goes through all items inside "c".
Here is the code (I entered your sample data at the top for testing sake):
class Branch:
def __init__(self, r, p, c):
self.r = r
self.p = p
self.c = c
b = [
Branch(0, 0, 0),
Branch(0, 1, 1),
Branch(1, 2, 1),
Branch(1, 3, 2),
Branch(1, 4, 2)
]
total_b = 5 # Number of branches
total_c = 3 # Number of child nodes
total_r = 2 # Number of roots
c = []
for i in range(total_c):
c.append([])
for j in range(total_r):
c[i].append([])
for k in range(total_b):
c[b[k].c][b[k].r].append(k)
combos = []
def list_combos(n_c, n_r, curr):
if n_c == total_c:
combos.append(curr)
elif n_r == total_r:
list_combos(n_c+1, 0, curr)
elif c[n_c][n_r]:
for k in c[n_c][n_r]:
list_combos(n_c, n_r+1, curr + [b[k]])
else:
list_combos(n_c, n_r+1, curr)
list_combos(0, 0, [])
print combos

There are really two problems here: firstly, you need to work out the algorithm that you will use to solve this problem and secondly, you need to implement it (in Python).
Algorithm
I shall assume you want a maximal collection of branches; that is, once to which you can't add any more branches. If you don't, you can consider all subsets of a maximal collection.
Therefore, for a child node we want to take as many branches as possible, subject to the constraint that no two parent nodes share a root. In other words, from each child you may have at most one edge in the neighbourhood of each root node. This seems to suggest that you want to iterate first over the children, then over the (neighbourhoods of the) root nodes, and finally over the edges between these. This concept gives the following pseudocode:
for each child node:
for each root node:
remember each permissible edge
find all combinations of permissible edges
Code
>>> import networkx as nx
>>> import itertools
>>>
>>> G = nx.DiGraph()
>>> G.add_nodes_from(["r0", "r1", "p0", "p1", "p2", "p3", "p4", "c0", "c1", "c2"])
>>> G.add_edges_from([("r0", "p0"), ("r0", "p1"), ("r1", "p2"), ("r1", "p3"),
... ("r1", "p4"), ("p0", "c0"), ("p1", "c1"), ("p2", "c1"),
... ("p3", "c2"), ("p4", "c2")])
>>>
>>> combs = set()
>>> leaves = [node for node in G if not G.out_degree(node)]
>>> roots = [node for node in G if not G.in_degree(node)]
>>> for leaf in leaves:
... for root in roots:
... possibilities = tuple(edge for edge in G.in_edges_iter(leaf)
... if G.has_edge(root, edge[0]))
... if possibilities: combs.add(possibilities)
...
>>> combs
set([(('p1', 'c1'),),
(('p2', 'c1'),),
(('p3', 'c2'), ('p4', 'c2')),
(('p0', 'c0'),)])
>>> print list(itertools.product(*combs))
[(('p1', 'c1'), ('p2', 'c1'), ('p3', 'c2'), ('p0', 'c0')),
(('p1', 'c1'), ('p2', 'c1'), ('p4', 'c2'), ('p0', 'c0'))]
The above seems to work, although I haven't tested it.

For each child c, with hypothetical parents p(c), with roots r(p(c)), choose exactly one parent p from p(c) for each root r in r(p(c)) (such that r is the root of p) and include b in the combination where b connects p to c (assuming there is only one such b, meaning it's not a multigraph). The number of combinations will be the product of the numbers of parents by which each child is hypothetically connected to each root. In other words, the size of the set of combinations will be equal to the product of the hypothetical connections of all child-root pairs. In your example all such child-root pairs have only one path, except r1-c2, which has two paths, thus the size of the set of combinations is two.
This satisfies the constraint of no combination being a subset of another because by choosing exactly one parent for each root of each child, we maximize the number connections. Subsequently adding any edge b would cause its root to be connected to its child twice, which is not allowed. And since we are choosing exactly one, all combinations will be exactly the same length.
Implementing this choice recursively will yield the desired combinations.

Related

Find the width of tree at each level/height (non-binary tree)

Dear experienced friends, I am looking for an algorithm (Python) that outputs the width of a tree at each level. Here are the input and expected outputs.
(I have updated the problem with a more complex edge list. The original question with sorted edge list can be elegantly solved by #Samwise answer.)
Input (Edge List: source-->target)
[[11,1],[11,2],
[10,11],[10,22],[10,33],
[33,3],[33,4],[33,5],[33,6]]
The tree graph looks like this:
10
/ | \
11 22 33
/ \ / | \ \
1 2 3 4 5 6
Expected Output (Width of each level/height)
[1,3,6] # according to the width of level 0,1,2
I have looked through the web. It seems this topic related to BFS and Level Order Traversal. However, most solutions are based on the binary tree. How can solve the problem when the tree is not binary (e.g. the above case)?
(I'm new to the algorithm, and any references would be really appreciated. Thank you!)
Build a dictionary of the "level" of each node, and then count the number of nodes at each level:
>>> from collections import Counter
>>> def tree_width(edges):
... levels = {} # {node: level}
... for [p, c] in edges:
... levels[c] = levels.setdefault(p, 0) + 1
... widths = Counter(levels.values()) # {level: width}
... return [widths[level] for level in sorted(widths)]
...
>>> tree_width([[0,1],[0,2],[0,3],
... [1,4],[1,5],
... [3,6],[3,7],[3,8],[3,9]])
[1, 3, 6]
This might not be the most efficient, but it requires only two scans over the edge list, so it's optimal up to a constant factor. It places no requirement on the order of the edges in the edge list, but does insist that each edge be (source, dest). Also, doesn't check that the edge list describes a connected tree (or a tree at all; if the edge list is cyclic, the program will never terminate).
from collections import defauiltdict
# Turn the edge list into a (non-binary) tree, represented as a
# dictionary whose keys are the source nodes with the list of children
# as its value.
def edge_list_to_tree(edges):
'''Given a list of (source, dest) pairs, constructs a tree.
Returns a tuple (tree, root) where root is the root node
and tree is a dict which maps each node to a list of its children.
(Leaves are not present as keys in the dictionary.)
'''
tree = defaultdict(list)
sources = set() # nodes used as sources
dests = set() # nodes used as destinations
for source, dest in edges:
tree[source].append(dest)
sources.add(source)
dests.add(dest)
roots = sources - dests # Source nodes which are not destinations
assert(len(roots) == 1) # There is only one in a tree
tree.default_factory = None # Defang the defaultdict
return tree, roots.pop()
# A simple breadth-first-search, keeping the count of nodes at each level.
def level_widths(tree, root):
'''Does a BFS of tree starting at root counting nodes at each level.
Returns a list of counts.
'''
widths = [] # Widths of the levels
fringe = [root] # List of nodes at current level
while fringe:
widths.append(len(fringe))
kids = [] # List of nodes at next level
for parent in fringe:
if parent in tree:
for kid in tree[parent]:
kids.append(kid)
fringe = kids # For next iteration, use this level's kids
return widths
# Put the two pieces together.
def tree_width(edges):
return level_widths(*edge_list_to_tree(edges))
Possible solution that is based on Width-First-Traversal
In Width-First-Traversal we add the node to the array, but in this solution we put the array in an object together with its level and then add it to the array.
function levelWidth(root) {
const counter = [];
const traverseBF = fn => {
const arr = [{n: root, l:0}];
const pushToArr = l => n => arr.push({n, l});
while (arr.length) {
const node = arr.shift();
node.n.children.forEach(pushToArr(node.l+1));
fn(node);
}
};
traverseBF(node => {
counter[node.l] = (+counter[node.l] || 0) + 1;
});
return counter;
}

Is the given Graph a tree? Faster than below approach -

I was given a question during an interview and although my answer was accepted at the end they wanted a faster approach and I went blank..
Question :
Given an undirected graph, can you see if it's a tree? If so, return true and false otherwise.
A tree:
A - B
|
C - D
not a tree:
A
/ \
B - C
/
D
You'll be given two parameters: n for number of nodes, and a multidimensional array of edges like such: [[1, 2], [2, 3]], each pair representing the vertices connected by the edge.
Note:Expected space complexity : O(|V|)
The array edges can be empty
Here is My code: 105ms
def is_graph_tree(n, edges):
nodes = [None] * (n + 1)
for i in range(1, n+1):
nodes[i] = i
for i in range(len(edges)):
start_edge = edges[i][0]
dest_edge = edges[i][1]
if nodes[start_edge] != start_edge:
start_edge = nodes[start_edge]
if nodes[dest_edge] != dest_edge:
dest_edge = nodes[dest_edge]
if start_edge == dest_edge:
return False
nodes[start_edge] = dest_edge
return len(edges) <= n - 1
Here's one approach using a disjoint-set-union / union-find data structure:
def is_graph_tree(n, edges):
parent = list(range(n+1))
size = [1] * (n + 1)
for x, y in edges:
# find x (path splitting)
while parent[x] != x:
x, parent[x] = parent[x], parent[parent[x]]
# find y
while parent[y] != y:
y, parent[y] = parent[y], parent[parent[y]]
if x == y:
# Already connected
return False
# Union (by size)
if size[x] < size[y]:
x, y = y, x
parent[y] = x
size[x] += size[y]
return True
assert not is_graph_tree(4, [(1, 2), (2, 3), (3, 4), (4, 2)])
assert is_graph_tree(6, [(1, 2), (2, 3), (3, 4), (3, 5), (1, 6)])
The runtime is O(V + E*InverseAckermannFunction(V)), which better than O(V + E * log(log V)), so it's basically O(V + E).
Tim Roberts has posted a candidate solution, but this will work in the case of disconnected subtrees:
import queue
def is_graph_tree(n, edges):
# A tree with n nodes has n - 1 edges.
if len(edges) != n - 1:
return False
# Construct graph.
graph = [[] for _ in range(n)]
for first_vertex, second_vertex in edges:
graph[first_vertex].append(second_vertex)
graph[second_vertex].append(first_vertex)
# BFS to find edges that create cycles.
# The graph is undirected, so we can root the tree wherever we want.
visited = set()
q = queue.Queue()
q.put((0, None))
while not q.empty():
current_node, previous_node = q.get()
if current_node in visited:
return False
visited.add(current_node)
for neighbor in graph[current_node]:
if neighbor != previous_node:
q.put((neighbor, current_node))
# Only return true if the graph has only one connected component.
return len(visited) == n
This runs in O(n + len(edges)) time.
You could approach this from the perspective of tree leaves. Every leaf node in a tree will have exactly one edge connected to it. So, if you count the number of edges for each nodes, you can get the list of leaves (i.e. the ones with only one edge).
Then, take the linked node from these leaves and reduce their edge count by one (as if you were removing all the leaves from the tree. That will give you a new set of leaves corresponding to the parents of the original leaves. Repeat the process until you have no more leaves.
[EDIT] checking that the number of edges is N-1 eliminiates the need to do the multi-root check because there will be another discrepancy (e.g. double link, missing node) in the graph if there are multiple 'roots' or a disconnected subtree
If the graph is a tree, this process should eliminate all nodes from the node counts (i.e. they will all be flagged as leaves at some point).
Using the Counter class (from collections) will make this relatively easy to implement:
from collections import Counter
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
counts = Counter(n for ab in E for n in ab) # edge counts per node
if len(counts) != N : return False # unlinked nodes
while True:
leaves = {n for n,c in counts.items() if c==1} # new leaves
if not leaves:break
for a,b in E: # subtract leaf counts
if counts[a]>1 and b in leaves: counts[a] -= 1
if counts[b]>1 and a in leaves: counts[b] -= 1
for n in leaves: counts[n] = -1 # flag leaves in counts
return all(c==-1 for c in counts.values()) # all must become leaves
output:
G = [[1,2],[1,3],[4,5],[4,6]]
print(isTree(6,G)) # False (disconnected sub-tree)
G = [[1,2],[1,3],[1,4],[2,3],[5,6]]
print(isTree(6,G)) # False (doubly linked node 3)
G = [[1,2],[2,6],[3,4],[5,1],[2,3]]
print(isTree(6,G)) # True
G = [[1,2],[2,3]]
print(isTree(3,G)) # True
G = [[1,2],[2,3],[3,4]]
print(isTree(4,G)) # True
G = [[1,2],[1,3],[2,5],[2,4]]
print(isTree(6,G)) # False (missing node)
Space complexity is O(N) because the counts dictionary has one entry per node(vertex) with an integer as value. Time complexity will be O(ExL) where E is the number of edges and L is the number of levels in the tree. The worts case time is O(E^2) for a tree where all parents have only one child node. However, since the initial condition is for E to be less than V, the worst case will actually be O(V^2)
Note that this algorithm makes no assumption on edge order or numerical relationships between node numbers. The root (last node to be made a leaf) found by this algorithm is not necessarily the only possible root given that, unless the nodes have an implicit cardinality relationship (or edges have an order), there could be ambiguous scenarios:
[1,2],[2,3],[2,4] could be:
1 2 3
|_2 OR |_1 OR |_2
|_3 |_3 |_1
|_4 |_4 |_4
If a cardinality relationship between node numbers or an order of edges can be relied upon, the algorithm could potentially be made more time efficient (because we could easily determine which node is the root and start from there).
[EDIT2] Alternative method using groups.
When the number of edges is N-1, if the graph is a tree, all nodes should be reachable from any other node. This means that, if we form groups of reachable nodes for each node and merge them together based on the edges, we should end up with a single group after going through all the edges.
Here is the modified function based on that approach:
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
groups = {n:[n] for ab in E for n in ab} # each node in its own group
if len(groups) != N : return False # unlinked nodes
for a,b in E:
groups[a].extend(groups[b]) # merge groups
for n in groups[b]: groups[n] = groups[a] # update nodes' groups
return len(set(map(id,groups.values()))) == 1 # only one group when done
Given that we start out with fewer edges than nodes and that group merging will consume at most 2x a group size (so also < N), the space complexity will remain O(V). The time complexity will also be O(V^2) at for the worts case scenarios
You don't even need to know how many edges there are:
def is_graph_tree(n, edges):
seen = set()
for a,b in edges:
b = max(a,b)
if b in seen:
return False
seen.add(b)
return True
a = [[1,2],[2,3],[3,4]]
print(is_graph_tree(0,a))
b = [[1,2],[1,3],[2,3],[2,4]]
print(is_graph_tree(0,b))
Now, this WON'T catch the case of disconnected subtrees, but that wasn't in the problem description...

Faster way to add dummy nodes in networkx to limit degree

I am wondering if I can speed up my operation of limiting node degree using an inbuilt function.
A submodule of my task requires me to limit the indegree to 2. So, the solution I proposed was to introduce sequential dummy nodes and absorb the extra edges. Finally, the last dummy gets connected to the children of the original node. To be specific if an original node 2 is split into 3 nodes (original node 2 & two dummy nodes), ALL the properties of the graph should be maintained if we analyse the graph by packaging 2 & its dummies into one hypothetical node 2'; The function I wrote is shown below:
def split_merging(G, dummy_counter):
"""
Args:
G: as the name suggests
dummy_counter: as the name suggests
Returns:
G with each merging node > 2 incoming split into several consecutive nodes
and dummy_counter
"""
# we need two copies; one to ensure the sanctity of the input G
# and second, to ensure that while we change the Graph in the loop,
# the loop doesn't go crazy due to changing bounds
G_copy = nx.DiGraph(G)
G_copy_2 = nx.DiGraph(G)
for node in G_copy.nodes:
in_deg = G_copy.in_degree[node]
if in_deg > 2: # node must be split for incoming
new_nodes = ["dummy" + str(i) for i in range(dummy_counter, dummy_counter + in_deg - 2)]
dummy_counter = dummy_counter + in_deg - 2
upstreams = [i for i in G_copy_2.predecessors(node)]
downstreams = [i for i in G_copy_2.successors(node)]
for up in upstreams:
G_copy_2.remove_edge(up, node)
for down in downstreams:
G_copy_2.remove_edge(node, down)
prev_node = node
G_copy_2.add_edge(upstreams[0], prev_node)
G_copy_2.add_edge(upstreams[1], prev_node)
for i in range(2, len(upstreams)):
G_copy_2.add_edge(prev_node, new_nodes[i - 2])
G_copy_2.add_edge(upstreams[i], new_nodes[i - 2])
prev_node = new_nodes[i - 2]
for down in downstreams:
G_copy_2.add_edge(prev_node, down)
return G_copy_2, dummy_counter
For clarification, the input and output are shown below:
Input:
Output:
It works as expected. But the problem is that this is very slow for larger graphs. Is there a way to speed this up using some inbuilt function from networkx or any other library?
Sure; the idea is similar to balancing a B-tree. If a node has too many in-neighbors, create two new children, and split up all your in-neighbors among those children. The children have out-degree 1 and point to your original node; you may need to recursively split them as well.
This is as balanced as possible: node n becomes a complete binary tree rooted at node n, with external in-neighbors at the leaves only, and external out-neighbors at the root.
def recursive_split_node(G: 'nx.DiGraph', node, max_in_degree: int = 2):
"""Given a possibly overfull node, create a minimal complete
binary tree rooted at that node with no overfull nodes.
Return the new graph."""
global dummy_counter
current_in_degree = G.in_degree[node]
if current_in_degree <= max_in_degree:
return G
# Complete binary tree, so left gets 1 more descendant if tied
left_child_in_degree = (current_in_degree + 1) // 2
left_child = "dummy" + str(dummy_counter)
right_child = "dummy" + str(dummy_counter + 1)
dummy_counter += 2
G.add_node(left_child)
G.add_node(right_child)
old_predecessors = list(G.predecessors(node))
# Give all predecessors to left and right children
G.add_edges_from([(y, left_child)
for y in old_predecessors[:left_child_in_degree]])
G.add_edges_from([(y, right_child)
for y in old_predecessors[left_child_in_degree:]])
# Remove all incoming edges
G.remove_edges_from([(y, node) for y in old_predecessors])
# Connect children to me
G.add_edge(left_child, node)
G.add_edge(right_child, node)
# Split children
G = recursive_split_node(G, left_child, max_in_degree)
G = recursive_split_node(G, right_child, max_in_degree)
return G
def clean_graph(G: 'nx.DiGraph', max_in_degree: int = 2) -> 'nx.DiGraph':
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed our limit."""
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > max_in_degree:
G_copy = recursive_split_node(G_copy, node, max_in_degree)
return G_copy
This code for recursively splitting nodes is quite handy and easily generalized, and intentionally left unoptimized.
To solve your exact use case, you could go with an iterative solution: build a full, complete binary tree (with the same structure as a heap) implicitly as an array. This is, I believe, the theoretically optimal solution to the problem, in terms of minimizing the number of graph operations (new nodes, new edges, deleting edges) to achieve the constraint, and gives the same graph as the recursive solution.
def clean_graph(G):
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed 2."""
global dummy_counter
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > 2:
predecessors_list = list(G_copy.predecessors(node))
G_copy.remove_edges_from((y, node) for y in predecessors_list)
N = len(predecessors_list)
leaf_count = (N + 1) // 2
internal_count = leaf_count // 2
total_nodes = leaf_count + internal_count
node_names = [node]
node_names.extend(("dummy" + str(dummy_counter + i) for i in range(total_nodes - 1)))
dummy_counter += total_nodes - 1
for i in range(internal_count):
G_copy.add_edges_from(((node_names[2 * i + 1], node_names[i]), (node_names[2 * i + 2], node_names[i])))
for leaf in range(internal_count, internal_count + leaf_count):
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
return G_copy
From my testing, comparing performance on very dense graphs generated with nx.fast_gnp_random_graph(500, 0.3, directed=True), this is 2.75x faster than the recursive solution, and 1.75x faster than the original posted solution. The bottleneck for further optimizations is networkx and Python, or changing the input graphs to be less dense.

Representing a tree as glued half-edges

I have a tree, given e.g. as a networkx object. In order to inpput it into a black-box algorithm I was given, I need to save it in the following strange format:
Traverse the tree in a clockwise order. As I pass through one side of an edge, I label it incrementally. Then I want to save for each edge the labels of its two sides.
For example, a star will become a list [(0,1),(2,3),(4,5),...] and a path with 3 vertices will be [(0,3),(1,2)].
I am stumped with implementing this. How can this be done? I can use any library.
I'll answer this without reference to any library.
You would need to perform a depth-first traversal, and log the (global) incremental number before you visit a subtree, and also after you visited it. Those two numbers make up the tuple that you have to prepend to the result you get from the subtree traversal.
Here is an implementation that needs the graph to be represented as an adjacency list. The main function needs to get the root node and the adjacency list
def iter_naturals(): # helper function to produce sequential numbers
n = 0
while True:
yield n
n += 1
def half_edges(root, adj):
visited = set()
sequence = iter_naturals()
def dfs(node):
result = []
visited.add(node)
for child in adj[node]:
if child not in visited:
forward = next(sequence)
path = dfs(child)
backward = next(sequence)
result.extend([(forward, backward)] + path)
return result
return dfs(root)
Here is how you can run it for the two examples you mentioned. I have just implemented those graphs as adjacency lists, where nodes are identified by their index in that list:
Example 1: a "star":
The root is the parent of all other nodes
adj = [
[1,2,3], # 1,2,3 are children of 0
[],
[],
[]
]
print(half_edges(0, adj)) # [(0, 1), (2, 3), (4, 5)]
Example 2: a single path with 3 nodes
adj = [
[1], # 1 is a child of 0
[2], # 2 is a child of 1
[]
]
print(half_edges(0, adj)) # [(0, 3), (1, 2)]
I found this great built-in function dfs_labeled_edges in networkx. From there it is a breeze.
def get_new_encoding(G):
dfs = [(v[0],v[1]) for v in nx.dfs_labeled_edges(G, source=1) if v[0]!=v[1] and v[2]!="nontree"]
dfs_ind = sorted(range(len(dfs)), key=lambda k: dfs[k])
new_tree_encoding = [(dfs_ind[i],dfs_ind[i+1]) for i in range(0,len(dfs_ind),2)]
return new_tree_encoding

Find all children of top parent in python

I have a list of parent-child relations where the structure isn't a true tree. Some parents can have many children and also some children can have more than one parent.
import pandas as pd
df = pd.DataFrame([[123,234],[123,235],[123,236],[124,236],[234,345],[236,346]], columns=['Parent','Child'])*
I would like to group all children for specific ancestors. From the data:
123,234,235,236,345,346
124,235,346
Should be the correct groups.
I tried with:
parents = set()
children = {}
for p, c in df.to_records(index=False).tolist():
parents.add(p)
children[c] = p
def getAncestors(p):
return (getAncestors(children[p]) if p in children else []) + [p]
But on 346 it only returns one group.
Also, how to then find all children for 123 and 124?
Thank you!
As you said, it's not really a tree, but more like a directed acyclic graph, so you can't map each child to just one parent; it'd have to be a list of parents. Also, given your use case, I'd suggest mapping parents to their lists of children instead.
relations = [[123,234],[234,345],[123,235],[123,236],[124,236],[236,346]]
children = {}
for p, c in relations:
children.setdefault(p, []).append(c)
roots = set(children) - set(c for cc in children.values() for c in cc)
You can then use a recursive function similar to the one you already have to get all the children to a given root node (or any parent node). The root itself is not in the list, but can easily be added.
def all_children(p):
if p not in children:
return set()
return set(children[p] + [b for a in children[p] for b in all_children(a)])
print({p: all_children(p) for p in roots})
# {123: {234, 235, 236, 345, 346}, 124: {346, 236}}

Categories

Resources