Find all paths from leafs to each node in a forest

Find all paths from leafs to each node in a forest - python

I asked this question in parts, because I didn't have enough infromations, but now that I have, I can ask the full question. So I have data in text file which has 2 columns. First column is a predecessor and second is a successor. I load the data using this code:
[line.split() for line in open('data.txt', encoding ='utf-8')]
Lets say that our data looks like this in a file:
ANALYTICAL_BALANCE BFG_DEPOSIT
CUSTOMER_DETAIL BALANCE
BFG_2056 FFD_15
BALANCE BFG_16
BFG_16 STAT_HIST
ANALYTICAL_BALANCE BFG_2056
CUSTOM_DATA AND_11
AND_11 DICT_DEAL
DICT_DEAL BFG_2056
and after loading
[[ANALYTICAL_BALANCE,BFG_DEPOSIT],
[CUSTOMER_DETAIL,BALANCE],
[BFG_2056, FFD_15],
[BALANCE,BFG_16],
[BFG_16,STAT_HIST],
[ANALYTICAL_BALANCE,BFG_2056],
[CUSTOM_DATA,AND_11],
[AND_11,DICT_DEAL],
[DICT_DEAL,BFG_2056]]
Then I want to connect this data. I create the adjency list:
def create_adj(edges):
adj = {} # or use defaultdict(list) to avoid `if` in the loop below
for a, b in edges:
if not a in adj:
adj[a] = []
if not b in adj:
adj[b] = []
adj[a].append(b)
return adj
and the get all the paths:
def all_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
for node in adj:
yield from recur([node])
so the output looks like this:
data = [
["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
["CUSTOMER_DETAIL","BALANCE"],
["BFG_2056", "FFD_15"],
["BALANCE","BFG_16"],
["BFG_16","STAT_HIST"],
["ANALYTICAL_BALANCE","BFG_2056"],
["CUSTOM_DATA","AND_11"],
["AND_11","DICT_DEAL"],
["DICT_DEAL","BFG_2056"]
]
adj = create_adj(data)
print([path for path in all_paths(adj) if len(path) > 1])
[ANALYTICAL_BALANCE,BFG_DEPOSIT]
[CUSTOMER_DETAIL,BALANCE,BFG_16,STAT_HIST]
[BFG_2056,FFD_15]
[BALANCE,BFG_16,STAT_HIST]
[ANALYTICAL_BALANCE,BFG_2056,FFD_15]
[CUSTOM_DATA,AND_11,DICT_DEAL,BFG_2056,FFD_15]
[AND_11,DICT_DEAL,BFG_2056,FFD_15]
[DICT_DEAL,BFG_2056,FFD_15]
We can visualize the connections as a separate trees which creates the forest. The trees won't have any cycles, because of nature of the input data.
Now my question is. How can I get every connection from leaf to every node for every tree ? What I mean by that. We have 3 trees so I will start from the top one.
Tree1:
ANALYTICAL_BALANCE BFG_DEPOSIT
Tree2:
ANALYTICAL_BALANCE BFG_2056
ANALYTICAL_BALANCE FFD_15
CUSTOM_DATA AND_11
CUSTOM_DATA DICT_DEAL
CUSTOM_DATA BFG_2056
CUSTOM_DATA FFD_15
Tree3:
CUSTOMER_DETAIL BALANCE
CUSTOMER_DETAIL BFG_16
CUSTOMER_DETAIL STAT_HIST
As you can see, my first try was to create list of adjacencies and find all paths. Then I would delete the connections beetwen the nodes that are not leafs and filter the data. It was fine for input of 150 rows, but when I inputted the full file with 13k rows the code was compiling for 2 days without any signs of coming to an end. So I'm looking for most efficient code or algorithm as well as best data type for my job(Lists, Data Frames etc. ). Any help would be greatly appreciated, because I'm fighting with it for a few days now and I can't find any idea on how to solve this problem. If something is not clear I will edit the post.
The data will be saved into excel file with openpyxl so when I filter by successor I can see every leaf in predecessor column that is connected to this successor.
Here is my whole code.
# create adjacencies
def create_adj(edges):
adj = {}
for a, b in edges:
if not a in adj:
adj[a] = []
if not b in adj:
adj[b] = []
adj[a].append(b)
return adj
# find all paths
def all_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
for node in adj:
yield from recur([node])
# delete the connections from list
def conn_deletion(list, list_deletion):
after_del = [x for x in list if x[0] not in list_deletion]
return after_del
# get paths where len of path is > 2 and save them as a leaf to node. Also save connections to deletion.
def unpack_paths(my_list):
list_of_more_succ = []
to_deletion = []
for item in my_list:
if len(item) == 1:
print("len 1",item)
to_deletion.append(item[0])
elif len(item) > 2:
for i in range(1, len(item) - 1):
to_deletion.append(item[i])
print("len > 2", item[i])
if [item[0], item[i]] in list_of_more_succ:
pass
else:
list_of_more_succ.append([item[0], item[i]])
list_concat = my_list + list_of_more_succ
sorted_list = list(k for k, _ in itertools.groupby(list_concat))
final = conn_deletion(sorted_list, list(dict.fromkeys(to_deletion)))
return final
data = [line.split() for line in open('data.txt', encoding='utf-8')]
adj = create_adj(data)
print(adj)
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Source"
sheet["B1"] = "Child"
loaded = list(all_paths(adj))
final_edited = unpack_paths(loaded)
# Save data to excel file. We don't want paths with len == 1 or > 2.
for row, item in enumerate(final_edited, start=2):
if len(item) > 2:
pass
elif len(item) == 1:
pass
else:
sheet[f"A{row}"] = item[0]
sheet[f"B{row}"] = item[1]
workbook.save("DataMap.xlsx")

I would suggest changing all_paths to leaf_paths, meaning that it would only yield those paths that start at a leaf.
Then use those paths to:
Identify the root it leads to (the last element in the path)
Identify the leaf (the first element in the path)
Iterate all non-leaves in that path and combine each of them in a pair with the leaf.
Store these pairs in a dictionary that is keyed by the root
Here is how you would alter all_paths at two places marked with a comment:
def leaf_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
# identify the internal nodes (not leaves)
internals = set(parent for parents in adj.values() for parent in parents)
for node in adj:
if not node in internals: # require that the starting node is a leaf
yield from recur([node])
Then add this function:
def all_leaf_pairs(paths):
trees = {}
for path in paths:
if len(path) > 1:
root = path[-1]
if not root in trees:
trees[root] = []
it = iter(path)
leaf = next(it)
trees[root].extend((leaf, node) for node in it)
return trees
And your main program would do:
data = [
["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
["CUSTOMER_DETAIL","BALANCE"],
["BFG_2056", "FFD_15"],
["BALANCE","BFG_16"],
["BFG_16","STAT_HIST"],
["ANALYTICAL_BALANCE","BFG_2056"],
["CUSTOM_DATA","AND_11"],
["AND_11","DICT_DEAL"],
["DICT_DEAL","BFG_2056"]
]
adj = create_adj(data)
paths = leaf_paths(adj)
import pprint
pprint.pprint(all_leaf_pairs(paths))
This will output:
{'BFG_DEPOSIT': [('ANALYTICAL_BALANCE', 'BFG_DEPOSIT')],
'FFD_15': [('ANALYTICAL_BALANCE', 'BFG_2056'),
('ANALYTICAL_BALANCE', 'FFD_15'),
('CUSTOM_DATA', 'AND_11'),
('CUSTOM_DATA', 'DICT_DEAL'),
('CUSTOM_DATA', 'BFG_2056'),
('CUSTOM_DATA', 'FFD_15')],
'STAT_HIST': [('CUSTOMER_DETAIL', 'BALANCE'),
('CUSTOMER_DETAIL', 'BFG_16'),
('CUSTOMER_DETAIL', 'STAT_HIST')]}
Explanation of leaf_paths
This function uses recursion. It defines recur in its scope.
But the main code starts with identifying the internal nodes (i.e. the nodes that have at least one child). Since adj provides the parent(s) for a given node, we just have to collect all those parents.
We use this set of internal nodes to make sure we start the recursion only on leaf nodes, as in the output we want to have paths that always start out with a leaf node.
The recur function will walk from the given leaf towards any root it can find. It extends the path with the next parent it can find (neighbor) and performs recursion until there is no more parent (i.e., it is a root). When that happens the accumulated path (that starts with a leaf and ends with a root) is yielded.
leaf_paths itself yields any path that recur yields.

Related

Classifying edges in DFS on a directed graph

Based on a DFS traversal, I want to classify edges (u, v) in a directed graph as:
Tree edge: when v is visited for the first time as we traverse the edge
Back edge: when v is an ancestor of u in the traversal tree
Forward edge: when v is a descendant of u in the traversal tree
Cross edge: when v is neither an ancestor or descendant of u in the traversal tree
I was following a GeeksForGeeks tutorial to write this code:
class Graph:
def __init__(self, v):
self.time = 0
self.traversal_array = []
self.v = v
self.graph_list = [[] for _ in range(v)]
def dfs(self):
self.visited = [False]*self.v
self.start_time = [0]*self.v
self.end_time = [0]*self.v
self.ff = 0
self.fc = 0
for node in range(self.v):
if not self.visited[node]:
self.traverse_dfs(node)
def traverse_dfs(self, node):
# mark the node visited
self.visited[node] = True
# add the node to traversal
self.traversal_array.append(node)
# get the starting time
self.start_time[node] = self.time
# increment the time by 1
self.time += 1
# traverse through the neighbours
for neighbour in self.graph_list[node]:
# if a node is not visited
if not self.visited[neighbour]:
# marks the edge as tree edge
print('Tree Edge:', str(node)+'-->'+str(neighbour))
# dfs from that node
self.traverse_dfs(neighbour)
else:
# when the parent node is traversed after the neighbour node
if self.start_time[node] > self.start_time[neighbour] and self.end_time[node] < self.end_time[neighbour]:
print('Back Edge:', str(node)+'-->'+str(neighbour))
# when the neighbour node is a descendant but not a part of tree
elif self.start_time[node] < self.start_time[neighbour] and self.end_time[node] > self.end_time[neighbour]:
print('Forward Edge:', str(node)+'-->'+str(neighbour))
# when parent and neighbour node do not have any ancestor and a descendant relationship between them
elif self.start_time[node] > self.start_time[neighbour] and self.end_time[node] > self.end_time[neighbour]:
print('Cross Edge:', str(node)+'-->'+str(neighbour))
self.end_time[node] = self.time
self.time += 1
But it does not output the desired results for the following graph:
which is represented with:
self.v = 3
self.graph_list = [[1, 2], [], [1]]
The above code is not identifying the edge (2, 1) as a cross edge, but as a back edge.
I have no clue what to adapt in my code in order to detect cross edges correctly.
In a discussion someone gave this information, but I couldn't make work:
The checking condition is wrong when the node has not been completely visited when the edge is classified. This is because in the initial state the start & end times are set to 0.
if the graph looks like this:
0 --> 1
1 --> 2
2 --> 3
3 --> 1
When checking the 3 --> 1 edge: the answer should be a back edge.
But now the start/end [3] = 4/0 ; start/end [1] = 1/0
and the condition end[3] < end[1] is false because of the intialization problem.
I see two solutions,
traverse the graph first and determine the correct start/end [i] values, but it needs more time complexity, or
use black/gray/white and discover the order to classify the edges

Here are some issues:
By initialising start_time and end_time to 0 for each node, you cannot make the difference with a real time of 0, which is assigned to the very first node's start time. You should initialise these lists with a value that indicates there was no start/end at all. You could use the value -1 for this purpose.
The following statements should not be inside the loop:
self.end_time[node] = self.time
self.time += 1
They should be executed after the loop has completed. Only at that point you can "end" the visit of the current node. So the indentation of these two statements should be less.
There are several places where the value of self.end_time[node] is compared in a condition, but that time has not been set yet (apart from its default value), so this condition makes little sense.
The last elif should really be an else because there are no other possibilities to cover. If ever the execution gets there, it means no other possibility remains, so no condition should be checked.
The condition self.start_time[node] > self.start_time[neighbour] is not strong enough for identifying a back edge, and as already said, the second part of that if condition makes no sense, since self.end_time[node] has not been given a non-default value yet. And so this if block is entered also when it is not a back edge. What you really want to test here, is that the visit of neighbor has not been closed yet. In other words, you should check that self.start_time[neighbor] is still at its default value (and I propose to use -1 for that).
Not a problem, but there are also these remarks to make:
when you keep track of start_time and end_time, there is no need to have visited. Whether a node is visited follows from the value of start_time: if it still has its default value (-1), then the node has not yet been visited.
Don't use code comments to state the obvious. For instance the comment "increment the time by 1" really isn't explaining anything that cannot be seen directly from the code.
Attribute v could use a better name. Although V is often used to denote the set of nodes of a graph, it is not intuitive to see v as the number of nodes in the graph. I would suggest using num_nodes instead. It makes the code more readable.
Here is a correction of your code:
class Graph:
def __init__(self, num_nodes):
self.time = 0
self.traversal_array = []
self.num_nodes = num_nodes # Use more descriptive name
self.graph_list = [[] for _ in range(num_nodes)]
def dfs(self):
self.start_time = [-1]*self.num_nodes
self.end_time = [-1]*self.num_nodes
for node in range(self.num_nodes):
if self.start_time[node] == -1: # No need for self.visited
self.traverse_dfs(node)
def traverse_dfs(self, node):
self.traversal_array.append(node)
self.start_time[node] = self.time
self.time += 1
for neighbour in self.graph_list[node]:
# when the neighbor was not yet visited
if self.start_time[neighbour] == -1:
print(f"Tree Edge: {node}-->{neighbour}")
self.traverse_dfs(neighbour)
# otherwise when the neighbour's visit is still ongoing:
elif self.end_time[neighbour] == -1:
print(f"Back Edge: {node}-->{neighbour}")
# otherwise when the neighbour's visit started before the current node's visit:
elif self.start_time[node] < self.start_time[neighbour]:
print(f"Forward Edge: {node}-->{neighbour}")
else: # No condition here: there are no other options
print(f"Cross Edge: {node}-->{neighbour}")
# Indentation corrected:
self.end_time[node] = self.time
self.time += 1

Returning the Heaviest path (with the highest sum of a given property in all of it's relations) from a node to all of its leafs

Is there a way to write a Cypher query which returns the path which has the highest sum all of the (selected) relation's property among all of the existing paths from a given node to its leaf children?
Hello,
First i would state how I created the graph:
CREATE CONSTRAINT ON (j:JOB) ASSERT j.order_id IS UNIQUE
USING PERIODIC COMMIT 1000
//EXPLAIN
LOAD CSV WITH HEADERS FROM "file:///jobs.csv" AS row
MERGE (j:JOB {order_id: row.child_order_id})
SET j.job_name = row.child_job_name,
j.job_owner = row.child_job_owner,
j.group_name = row.child_group_name,
j.order_time = row.child_order_time,
j.start_time = row.child_start_time,
j.end_time = row.child_end_time;
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///child_father.csv" AS row
MATCH (c:JOB {order_id: row.child_order_id})
MATCH (f:JOB {order_id: row.father_order_id})
MERGE (c)-[d:DEPENDS_ON]->(f)
SET d.elapsed_min = row.elapsed_min;
Now, my goal is to return the path with the highest sum of the relations property 'elapsed_min' from a given order id, to all of the leaf nodes it depends on.
since i couldn't find a way to do so in Cypher, I tried on python using py2neo library.
At first i tried using a normal Dijksta algorithm to return the lightest path and after i can do that i would alter the algorithm to return the heaviest path
so i made this:
import py2neo
from py2neo import Graph
from py2neo import Node, Relationship
NEO4J_URI = "bolt://127.0.0.1:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "neo4j"
graph = Graph(NEO4J_URI, auth = (NEO4J_USER, NEO4J_PASSWORD), bolt = True)
def dijkstra(graph,start,goal):
shortest_distance = {}
predecessor = {}
unseenNodes = graph
infinity = 999999
path = []
for node in unseenNodes:
shortest_distance[node] = infinity
shortest_distance[start] = 0
while unseenNodes:
minNode = None
for node in unseenNodes:
if minNode is None:
minNode = node
elif shortest_distance[node] < shortest_distance[minNode]:
minNode = node
for childNode, weight in graph[minNode].items():
if weight + shortest_distance[minNode] < shortest_distance[childNode]:
shortest_distance[childNode] = weight + shortest_distance[minNode]
predecessor[childNode] = minNode
unseenNodes.pop(minNode)
# get the path
currentNode = goal
while currentNode != start:
try:
path.insert(0,currentNode)
currentNode = predecessor[currentNode]
except KeyError:
print("Path not reachable")
break
if shortest_distance[goal] != infinity:
print('Shortest distance is: ' + str(shortest_distance[goal]))
print('And the path is: ' + str(path))
now i need to find a way to return the paths in this json format so i could run the Dijkstra algorithm on it, like this:
testGraph = {'a':{'b':10,'c':3},'b':{'c':1,'d':2},'c':{'b':4,'d':8,'e':2},'d':{'e':7},'e':{'d':9}}
#the relation property that means the distance from node: a to b is 10, a to c is 3, b to c is 1 and so on...
dijkstra(testGraph, 'a', 'd')
#the output is: Shortest distance is: 9
# And the path is: ['c', 'b', 'd']
But I am not sure how to return the right path and which format will fit best..
this is what I've got and i cant send this to my algorithm:
testGraph = graph.run( "MATCH (c:JOB)-[d:DEPENDS_ON*]->(f:JOB) "
"WHERE c.order_id = '4p0ta' "
"RETURN * "
"LIMIT 50").to_table()#data() #to_subgraph #to_data_frame()

This Cypher query should return the path (ending at a leaf node) with the highest sum:
MATCH p=(c:JOB)-[:DEPENDS_ON*]->(f:JOB)
WHERE c.order_id = '4p0ta' AND NOT (f)-[:DEPENDS_ON]->()
RETURN p, REDUCE(s = 0, d IN RELATIONSHIPS(p) | s + d.elapsed_min) AS total
ORDER BY total DESC
LIMIT 1
Note that variable-length relationships can be very expensive (i.e., take a very long time or even run out of memory) if your paths are long and/or your nodes have a lot of relationships. You may need to set an upper bound on the length to be able to use this query.

How to modify Johnson's elementary cycles algorithm to cap maximum cycle length?

I would like to modify the networkx implementation of Johnson's algorithm for finding all elementary cycles in a graph (also copied below) so that is does not search for cycles larger than some maximum length.
def simple_cycles(G):
def _unblock(thisnode,blocked,B):
stack=set([thisnode])
while stack:
node=stack.pop()
if node in blocked:
blocked.remove(node)
stack.update(B[node])
B[node].clear()
# Johnson's algorithm requires some ordering of the nodes.
# We assign the arbitrary ordering given by the strongly connected comps
# There is no need to track the ordering as each node removed as processed.
subG = type(G)(G.edges_iter()) # save the actual graph so we can mutate it here
# We only take the edges because we do not want to
# copy edge and node attributes here.
sccs = list(nx.strongly_connected_components(subG))
while sccs:
scc=sccs.pop()
# order of scc determines ordering of nodes
startnode = scc.pop()
# Processing node runs "circuit" routine from recursive version
path=[startnode]
blocked = set() # vertex: blocked from search?
closed = set() # nodes involved in a cycle
blocked.add(startnode)
B=defaultdict(set) # graph portions that yield no elementary circuit
stack=[ (startnode,list(subG[startnode])) ] # subG gives component nbrs
while stack:
thisnode,nbrs = stack[-1]
if nbrs:
nextnode = nbrs.pop()
# print thisnode,nbrs,":",nextnode,blocked,B,path,stack,startnode
# f=raw_input("pause")
if nextnode == startnode:
yield path[:]
closed.update(path)
# print "Found a cycle",path,closed
elif nextnode not in blocked:
path.append(nextnode)
stack.append( (nextnode,list(subG[nextnode])) )
closed.discard(nextnode)
blocked.add(nextnode)
continue
# done with nextnode... look for more neighbors
if not nbrs: # no more nbrs
if thisnode in closed:
_unblock(thisnode,blocked,B)
else:
for nbr in subG[thisnode]:
if thisnode not in B[nbr]:
B[nbr].add(thisnode)
stack.pop()
assert path[-1]==thisnode
path.pop()
# done processing this node
subG.remove_node(startnode)
H=subG.subgraph(scc) # make smaller to avoid work in SCC routine
sccs.extend(list(nx.strongly_connected_components(H)))
Of course, I'd also accept a suggestion that differs from the implementation above but runs in similar time. Also, my project uses networkx, so feel free to use any other function from that library, such as shortest_path.
(Note: not homework!)
Edit
Dorijan Cirkveni suggested (if I understood correctly):
if len(blocked) >= limit + 1:
continue
elif nextnode == startnode:
yield path[:]
However, that doesn't work. Here's a counterexample:
G = nx.DiGraph()
G.add_edge(1, 2)
G.add_edge(2, 3)
G.add_edge(3, 1)
G.add_edge(3, 2)
G.add_edge(3, 4)
my_cycles = list(simple_cycles(G, limit = 3)) # Modification
nx_cycles = list(nx.simple_cycles(G)) # Original networkx code
print("MY:", my_cycles)
print("NX:", nx_cycles)
Will output
MY: [[2, 3]]
NX: [[1, 2, 3], [2, 3]]
Also, if we substitute blocked by stack or path, the result will be correct for this example, but will give the wrong answer for other graphs.

This is a highly modified version of this code, but at least it is working.
def simple_cycles(G, limit):
subG = type(G)(G.edges())
sccs = list(nx.strongly_connected_components(subG))
while sccs:
scc = sccs.pop()
startnode = scc.pop()
path = [startnode]
blocked = set()
blocked.add(startnode)
stack = [(startnode, list(subG[startnode]))]
while stack:
thisnode, nbrs = stack[-1]
if nbrs and len(path) < limit:
nextnode = nbrs.pop()
if nextnode == startnode:
yield path[:]
elif nextnode not in blocked:
path.append(nextnode)
stack.append((nextnode, list(subG[nextnode])))
blocked.add(nextnode)
continue
if not nbrs or len(path) >= limit:
blocked.remove(thisnode)
stack.pop()
path.pop()
subG.remove_node(startnode)
H = subG.subgraph(scc)
sccs.extend(list(nx.strongly_connected_components(H)))

You only need to change two things:
The definition line (obviously)
def simple_cycles(G,limit):
Add an overriding condition somewhere in the next node processor (example below:)
...
if blocked.size>=limit+1:
pass
elif if nextnode == startnode:
yield path[:] ...
Bonus: Using == instead of >= will result in the function running as there is no limit when a negative value is used, as opposed to not returning any nodes.

BFS in the nodes of a graph

Graph
I am trying to perform BFS on this graph starting from node 16. But my code is giving erroneous output. Can you please help me out. Thanks.
visited_nodes = set()
queue = [16]
pardaught = dict()
exclu = list()
path = set()
for node in queue:
path.add(node)
neighbors = G.neighbors(node)
visited_nodes.add(node)
queue.remove(node)
queue.extend([n for n in neighbors if n not in visited_nodes])
newG = G.subgraph(path)
nx.draw(newG, with_labels=True)
My output is:
Output

The cause of your problem is that you are removing things from (the start of) queue while looping through it. As it loops it steps ahead, but because the element is removed from the start, the list "steps" one in the opposite direction. The net result is that it appears to jump 2 at a time. Here's an example:
integer_list = [1,2,3]
next_int = 4
for integer in integer_list:
print integer
integer_list.remove(integer)
integer_list.append(next_int)
next_int += 1
Produces output
1
3
5

path should be a list, not set since set has no order.
That should work:
visited_nodes = set()
path = []
queue = [16]
while queue:
node = queue.pop(0)
visited_nodes.add(node)
path.append(node)
for neighbor in G.neighbors(node):
if neighbor in visited_nodes:
continue
queue.append(neighbor)

Python: How to find more than one pathway in a recursive loop when multiple child nodes refers back to the parent?

I'm using recursion to find the path from some point A to some point D.
I'm transversing a graph to find the pathways.
Lets say:
Graph = {'A':['route1','route2'],'B':['route1','route2','route3','route4'], 'C':['route3','route4'], 'D':['route4'] }
Accessible through:
A -> route1, route2
B -> route2, route 3, route 4
C -> route3, route4
There are two solutions in this path from A -> D:
route1 -> route2 -> route4
route1 -> route2 -> route3 -> route4
Since point B and point A has both route 1, and route 2. There is an infinite loop so i add a check whenever
i visit the node( 0 or 1 values ).
However with the check, i only get one solution back: route1 -> route2 -> route4, and not the other possible solution.
Here is the actual coding: Routes will be substituted by Reactions.
def find_all_paths(graph,start, end, addReaction, passed = {}, reaction = [] ,path=[]):
passOver = passed
path = path + [start]
reaction = reaction + [addReaction]
if start == end:
return [reaction]
if not graph.has_key(start):
return []
paths=[]
reactions=[]
for x in range (len(graph[start])):
for y in range (len(graph)):
for z in range (len(graph.values()[y])):
if (graph[start][x] == graph.values()[y][z]):
if passOver.values()[y][z] < 161 :
passOver.values()[y][z] = passOver.values()[y][z] + 1
if (graph.keys()[y] not in path):
newpaths = find_all_paths(graph, (graph.keys()[y]), end, graph.values()[y][z], passOver , reaction, path)
for newpath in newpaths:
reactions.append(newpath)
return reactions
Here is the method call: dic_passOver is a dictionary keeping track if the nodes are visited
solution = (find_all_paths( graph, "M_glc_DASH_D_c', 'M_pyr_c', 'begin', dic_passOver ))
My problem seems to be that once a route is visited, it can no longer be access, so other possible solutions are not possible. I accounted for this by adding a maximum amount of recursion at 161, where all the possible routes are found for my specific code.
if passOver.values()[y][z] < 161 :
passOver.values()[y][z] = passOver.values()[y][z] + 1
However, this seem highly inefficient, and most of my data will be graphs with indexes in their thousands. In addition i won't know the amount of allowed node visits to find all routes. The number 161 was manually figured out.

Well, I can't understand your representation of the graph. But this is a generic algorithm you can use for finding all paths which avoids infinite loops.
First you need to represent your graph as a dictionary which maps nodes to a set of nodes they are connected to. Example:
graph = {'A':{'B','C'}, 'B':{'D'}, 'C':{'D'}}
That means that from A you can go to B and C. From B you can go to D and from C you can go to D. We're assuming the links are one-way. If you want them to be two way just add links for going both ways.
If you represent your graph in that way, you can use the below function to find all paths:
def find_all_paths(start, end, graph, visited=None):
if visited is None:
visited = set()
visited |= {start}
for node in graph[start]:
if node in visited:
continue
if node == end:
yield [start,end]
else:
for path in find_all_paths(node, end, graph, visited):
yield [start] + path
Example usage:
>>> graph = {'A':{'B','C'}, 'B':{'D'}, 'C':{'D'}}
>>> for path in find_all_paths('A','D', graph):
... print path
...
['A', 'C', 'D']
['A', 'B', 'D']
>>>
Edit to take into account comments clarifying graph representation
Below is a function to transform your graph representation(assuming I understood it correctly and that routes are bi-directional) to the one used in the algorithm above
def change_graph_representation(graph):
reverse_graph = {}
for node, links in graph.items():
for link in links:
if link not in reverse_graph:
reverse_graph[link] = set()
reverse_graph[link].add(node)
result = {}
for node,links in graph.items():
adj = set()
for link in links:
adj |= reverse_graph[link]
adj -= {node}
result[node] = adj
return result
If it is important that you find the path in terms of the links, not the nodes traversed you can preserve this information like so:
def change_graph_representation(graph):
reverse_graph = {}
for node, links in graph.items():
for link in links:
if link not in reverse_graph:
reverse_graph[link] = set()
reverse_graph[link].add(node)
result = {}
for node,links in graph.items():
adj = {}
for link in links:
for n in reverse_graph[link]:
adj[n] = link
del(adj[node])
result[node] = adj
return result
And use this modified search:
def find_all_paths(start, end, graph, visited=None):
if visited is None:
visited = set()
visited |= {start}
for node,link in graph[start].items():
if node in visited:
continue
if node == end:
yield [link]
else:
for path in find_all_paths(node, end, graph, visited):
yield [link] + path
That will give you paths in terms of links to follow instead of nodes to traverse. Hope this helps :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find all paths from leafs to each node in a forest - python

Related

Classifying edges in DFS on a directed graph

Returning the Heaviest path (with the highest sum of a given property in all of it's relations) from a node to all of its leafs

How to modify Johnson's elementary cycles algorithm to cap maximum cycle length?

BFS in the nodes of a graph

Python: How to find more than one pathway in a recursive loop when multiple child nodes refers back to the parent?

Categories

Resources