BFS in the nodes of a graph - python

Graph
I am trying to perform BFS on this graph starting from node 16. But my code is giving erroneous output. Can you please help me out. Thanks.
visited_nodes = set()
queue = [16]
pardaught = dict()
exclu = list()
path = set()
for node in queue:
path.add(node)
neighbors = G.neighbors(node)
visited_nodes.add(node)
queue.remove(node)
queue.extend([n for n in neighbors if n not in visited_nodes])
newG = G.subgraph(path)
nx.draw(newG, with_labels=True)
My output is:
Output

The cause of your problem is that you are removing things from (the start of) queue while looping through it. As it loops it steps ahead, but because the element is removed from the start, the list "steps" one in the opposite direction. The net result is that it appears to jump 2 at a time. Here's an example:
integer_list = [1,2,3]
next_int = 4
for integer in integer_list:
print integer
integer_list.remove(integer)
integer_list.append(next_int)
next_int += 1
Produces output
1
3
5

path should be a list, not set since set has no order.
That should work:
visited_nodes = set()
path = []
queue = [16]
while queue:
node = queue.pop(0)
visited_nodes.add(node)
path.append(node)
for neighbor in G.neighbors(node):
if neighbor in visited_nodes:
continue
queue.append(neighbor)

Related

Find all paths from leafs to each node in a forest

I asked this question in parts, because I didn't have enough infromations, but now that I have, I can ask the full question. So I have data in text file which has 2 columns. First column is a predecessor and second is a successor. I load the data using this code:
[line.split() for line in open('data.txt', encoding ='utf-8')]
Lets say that our data looks like this in a file:
ANALYTICAL_BALANCE BFG_DEPOSIT
CUSTOMER_DETAIL BALANCE
BFG_2056 FFD_15
BALANCE BFG_16
BFG_16 STAT_HIST
ANALYTICAL_BALANCE BFG_2056
CUSTOM_DATA AND_11
AND_11 DICT_DEAL
DICT_DEAL BFG_2056
and after loading
[[ANALYTICAL_BALANCE,BFG_DEPOSIT],
[CUSTOMER_DETAIL,BALANCE],
[BFG_2056, FFD_15],
[BALANCE,BFG_16],
[BFG_16,STAT_HIST],
[ANALYTICAL_BALANCE,BFG_2056],
[CUSTOM_DATA,AND_11],
[AND_11,DICT_DEAL],
[DICT_DEAL,BFG_2056]]
Then I want to connect this data. I create the adjency list:
def create_adj(edges):
adj = {} # or use defaultdict(list) to avoid `if` in the loop below
for a, b in edges:
if not a in adj:
adj[a] = []
if not b in adj:
adj[b] = []
adj[a].append(b)
return adj
and the get all the paths:
def all_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
for node in adj:
yield from recur([node])
so the output looks like this:
data = [
["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
["CUSTOMER_DETAIL","BALANCE"],
["BFG_2056", "FFD_15"],
["BALANCE","BFG_16"],
["BFG_16","STAT_HIST"],
["ANALYTICAL_BALANCE","BFG_2056"],
["CUSTOM_DATA","AND_11"],
["AND_11","DICT_DEAL"],
["DICT_DEAL","BFG_2056"]
]
adj = create_adj(data)
print([path for path in all_paths(adj) if len(path) > 1])
[ANALYTICAL_BALANCE,BFG_DEPOSIT]
[CUSTOMER_DETAIL,BALANCE,BFG_16,STAT_HIST]
[BFG_2056,FFD_15]
[BALANCE,BFG_16,STAT_HIST]
[ANALYTICAL_BALANCE,BFG_2056,FFD_15]
[CUSTOM_DATA,AND_11,DICT_DEAL,BFG_2056,FFD_15]
[AND_11,DICT_DEAL,BFG_2056,FFD_15]
[DICT_DEAL,BFG_2056,FFD_15]
We can visualize the connections as a separate trees which creates the forest. The trees won't have any cycles, because of nature of the input data.
Now my question is. How can I get every connection from leaf to every node for every tree ? What I mean by that. We have 3 trees so I will start from the top one.
Tree1:
ANALYTICAL_BALANCE BFG_DEPOSIT
Tree2:
ANALYTICAL_BALANCE BFG_2056
ANALYTICAL_BALANCE FFD_15
CUSTOM_DATA AND_11
CUSTOM_DATA DICT_DEAL
CUSTOM_DATA BFG_2056
CUSTOM_DATA FFD_15
Tree3:
CUSTOMER_DETAIL BALANCE
CUSTOMER_DETAIL BFG_16
CUSTOMER_DETAIL STAT_HIST
As you can see, my first try was to create list of adjacencies and find all paths. Then I would delete the connections beetwen the nodes that are not leafs and filter the data. It was fine for input of 150 rows, but when I inputted the full file with 13k rows the code was compiling for 2 days without any signs of coming to an end. So I'm looking for most efficient code or algorithm as well as best data type for my job(Lists, Data Frames etc. ). Any help would be greatly appreciated, because I'm fighting with it for a few days now and I can't find any idea on how to solve this problem. If something is not clear I will edit the post.
The data will be saved into excel file with openpyxl so when I filter by successor I can see every leaf in predecessor column that is connected to this successor.
Here is my whole code.
# create adjacencies
def create_adj(edges):
adj = {}
for a, b in edges:
if not a in adj:
adj[a] = []
if not b in adj:
adj[b] = []
adj[a].append(b)
return adj
# find all paths
def all_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
for node in adj:
yield from recur([node])
# delete the connections from list
def conn_deletion(list, list_deletion):
after_del = [x for x in list if x[0] not in list_deletion]
return after_del
# get paths where len of path is > 2 and save them as a leaf to node. Also save connections to deletion.
def unpack_paths(my_list):
list_of_more_succ = []
to_deletion = []
for item in my_list:
if len(item) == 1:
print("len 1",item)
to_deletion.append(item[0])
elif len(item) > 2:
for i in range(1, len(item) - 1):
to_deletion.append(item[i])
print("len > 2", item[i])
if [item[0], item[i]] in list_of_more_succ:
pass
else:
list_of_more_succ.append([item[0], item[i]])
list_concat = my_list + list_of_more_succ
sorted_list = list(k for k, _ in itertools.groupby(list_concat))
final = conn_deletion(sorted_list, list(dict.fromkeys(to_deletion)))
return final
data = [line.split() for line in open('data.txt', encoding='utf-8')]
adj = create_adj(data)
print(adj)
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Source"
sheet["B1"] = "Child"
loaded = list(all_paths(adj))
final_edited = unpack_paths(loaded)
# Save data to excel file. We don't want paths with len == 1 or > 2.
for row, item in enumerate(final_edited, start=2):
if len(item) > 2:
pass
elif len(item) == 1:
pass
else:
sheet[f"A{row}"] = item[0]
sheet[f"B{row}"] = item[1]
workbook.save("DataMap.xlsx")
I would suggest changing all_paths to leaf_paths, meaning that it would only yield those paths that start at a leaf.
Then use those paths to:
Identify the root it leads to (the last element in the path)
Identify the leaf (the first element in the path)
Iterate all non-leaves in that path and combine each of them in a pair with the leaf.
Store these pairs in a dictionary that is keyed by the root
Here is how you would alter all_paths at two places marked with a comment:
def leaf_paths(adj):
def recur(path):
node = path[-1]
neighbors = [neighbor for neighbor in adj[node] if not neighbor in path]
if not neighbors:
yield path
for neighbor in neighbors:
yield from recur(path + [neighbor])
# identify the internal nodes (not leaves)
internals = set(parent for parents in adj.values() for parent in parents)
for node in adj:
if not node in internals: # require that the starting node is a leaf
yield from recur([node])
Then add this function:
def all_leaf_pairs(paths):
trees = {}
for path in paths:
if len(path) > 1:
root = path[-1]
if not root in trees:
trees[root] = []
it = iter(path)
leaf = next(it)
trees[root].extend((leaf, node) for node in it)
return trees
And your main program would do:
data = [
["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
["CUSTOMER_DETAIL","BALANCE"],
["BFG_2056", "FFD_15"],
["BALANCE","BFG_16"],
["BFG_16","STAT_HIST"],
["ANALYTICAL_BALANCE","BFG_2056"],
["CUSTOM_DATA","AND_11"],
["AND_11","DICT_DEAL"],
["DICT_DEAL","BFG_2056"]
]
adj = create_adj(data)
paths = leaf_paths(adj)
import pprint
pprint.pprint(all_leaf_pairs(paths))
This will output:
{'BFG_DEPOSIT': [('ANALYTICAL_BALANCE', 'BFG_DEPOSIT')],
'FFD_15': [('ANALYTICAL_BALANCE', 'BFG_2056'),
('ANALYTICAL_BALANCE', 'FFD_15'),
('CUSTOM_DATA', 'AND_11'),
('CUSTOM_DATA', 'DICT_DEAL'),
('CUSTOM_DATA', 'BFG_2056'),
('CUSTOM_DATA', 'FFD_15')],
'STAT_HIST': [('CUSTOMER_DETAIL', 'BALANCE'),
('CUSTOMER_DETAIL', 'BFG_16'),
('CUSTOMER_DETAIL', 'STAT_HIST')]}
Explanation of leaf_paths
This function uses recursion. It defines recur in its scope.
But the main code starts with identifying the internal nodes (i.e. the nodes that have at least one child). Since adj provides the parent(s) for a given node, we just have to collect all those parents.
We use this set of internal nodes to make sure we start the recursion only on leaf nodes, as in the output we want to have paths that always start out with a leaf node.
The recur function will walk from the given leaf towards any root it can find. It extends the path with the next parent it can find (neighbor) and performs recursion until there is no more parent (i.e., it is a root). When that happens the accumulated path (that starts with a leaf and ends with a root) is yielded.
leaf_paths itself yields any path that recur yields.

Sort edges in BFS/DFS

I am working a problem of sorting edges, where edges are stored in a tuple form (node_i, node_j) like below
>> edgeLst
>> [('123','234'),
('123','456'),
('123','789'),
('456','765'),
('456','789')
('234','765')]
Note that edges are unique, and if you see ('123', '234'), you won't see ('234', '123') (the graph is undirected). And there might be a loop in the graph. Since the graph is very large, can anyone show me the efficient way to sort edges in BFS and DFS with a given start node, e.g., '123'?
Demo output:
>> edgeSorting(input_lst=edgeLst, by='BFS', start_node='123')
>> [('123','234'),
('123','456'),
('123','789'),
('234','765')
('456','765'),
('456','789')]
Here is how you could do it for BFS and DFS:
from collections import defaultdict
def sorted_edges(input_lst, by, start_node):
# Key the edges by the vertices
vertices = defaultdict(lambda: [])
for u, v in input_lst:
vertices[u].append(v)
vertices[v].append(u)
if by == 'DFS':
# Sort the lists
for lst in vertices:
lst = sorted(lst)
# Perform DFS
visited = set()
def recurse(a):
for b in vertices[a]:
if not (a, b) in visited:
yield ((a,b))
# Make sure this edge is not visited anymore in either direction
visited.add((a,b))
visited.add((b,a))
for edge in recurse(b):
yield edge
for edge in recurse(start_node):
yield edge
else: #BFS
# Collect the edges
visited = set()
queue = [start_node]
while len(queue):
for a in queue: # Process BFS level by order of id
level = []
for b in vertices[a]:
if not (a, b) in visited:
yield ((a,b))
# Add to next level to process
level.append(b)
# Make sure this edge is not visited anymore in either direction
visited.add((a,b))
visited.add((b,a))
queue = sorted(level)
edgeLst = [('123','234'),
('123','456'),
('123','789'),
('456','765'),
('456','789'),
('234','765')]
print (list(sorted_edges(edgeLst, 'DFS', '123')))
In Python 3, you can simplify:
for edge in recurse(b):
yield edge
to:
yield from recurse(b)
... and same for start_node.

Counting nodes of a binary tree without recursion Python

I know how to do it recursively:
def num_nodes(tree):
if not tree.left and not tree.right:
return 1
else:
return 1 + num_nodes(tree.left) + num_nodes(tree.right)
But how would you do it non recursively? Having trouble accessing a node that's on the right of a left subtree.
You could store the nodes you have to do in a list.
queue = []
count = 0
queue.append(root)
# Above is start of tree
while(queue):
if queue[0].left:
queue.append(queue[0].left)
if queue[0].right:
queue.append(queue[0].right)
queue.pop(0)
count += 1
Which should work as long as your tree does not have any loops in it.

kosaraju finding finishing time using iterative dfs

here is the first part of the code that i have did for Kosaraju's algorithm.
###### reading the data #####
with open('data.txt') as req_file:
ori_data = []
for line in req_file:
line = line.split()
if line:
line = [int(i) for i in line]
ori_data.append(line)
###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
if temp[1] not in revscc_dic:
revscc_dic[temp[1]] = [temp[0]]
else:
revscc_dic[temp[1]].append(temp[0])
print revscc_dic
######## finding the G#####
scc_dic = {}
for temp in ori_data:
if temp[0] not in scc_dic:
scc_dic[temp[0]] = [temp[1]]
else:
scc_dic[temp[0]].append(temp[1])
print scc_dic
##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
start = i
q=[start]
while q:
v=q.pop(0)
if v not in path:
path.append(v)
q=revscc_dic[v]+q
print path
The code reads the data and forms Grev and G correctly. I have written a code for iterative dfs. How can i include to find the finishing time ?? I understand finding the finishing time using paper and pen but I do not understand the part of finishing time as a code ?? how can I implement it.. Only after this I can proceed my next part of code. Pls help. Thanks in advance.
The data.txt file contains:
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
please save it as data.txt.
With recursive dfs, it is easy to see when a given vertex has "finished" (i.e. when we have visited all of its children in the dfs tree). The finish time can be calculated just after the recursive call has returned.
However with iterative dfs, this is not so easy. Now that we are iteratively processing the queue using a while loop we have lost some of the nested structure that is associated with function calls. Or more precisely, we don't know when backtracking occurs. Unfortunately, there is no way to know when backtracking occurs without adding some additional information to our stack of vertices.
The quickest way to add finishing times to your dfs implementation is like so:
##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
start = i
q = [start]
while q:
v = q.pop(0)
if v not in path:
path.append(v)
q = [v] + q
for w in revscc_dic[v]:
if w not in path: q = [w] + q
else:
if v not in finish_time_dic:
finish_time_dic[v] = time
time += 1
print path
print finish_time_dic
The trick used here is that when we pop off v from the stack, if it is the first time we have seen it, then we add it back to the stack again. This is done using: q = [v] + q. We must push v onto the stack before we push on its neighbours (we write the code that pushes v before the for loop that pushes v's neighbours) - or else the trick doesn't work. Eventually we will pop v off the stack again. At this point, v has finished! We have seen v before, so, we go into the else case and compute a fresh finish time.
For the graph provided, finish_time_dic gives the correct finishing times:
{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}
Note that this dfs algorithm (with the finishing times modification) still has O(V+E) complexity, despite the fact that we are pushing each node of the graph onto the stack twice. However, more elegant solutions exist.
I recommend reading Chapter 5 of Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland (ISBN: 1430232374, 9781430232377). Question 5-6 and 5-7 (on page 122) describe your problem exactly. The author answers these questions and gives an alternate solution to the problem.
Questions:
5-6 In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where has the backtracking gone in the iterative version?
5-7. Write a nonrecursive version of DFS that can deal determine finish-times.
Answers:
5-6 It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.
5-7 As explained in Exercise 5-6, there is no point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.
Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.
Here's my implementation which I feel demonstrates what's going on a bit more clearly:
step = 0 # time counter
def dfs_visit(g, v):
"""Run iterative DFS from node V"""
global step
total = 0
stack = [v] # create stack with starting vertex
while stack: # while stack is not empty
step += 1
v = stack[-1] # peek top of stack
if v.color: # if already seen
v = stack.pop() # done with this node, pop it from stack
if v.color == 1: # if GRAY, finish this node
v.time_finish = step
v.color = 2 # BLACK, done
else: # seen for first time
v.color = 1 # GRAY: discovered
v.time_discover = step
total += 1
for w in v.child: # for all neighbor (v, w)
if not w.color: # if not seen
stack.append(w)
return total
def dfs(g):
"""Run DFS on graph"""
global step
step = 0 # reset step counter
for k, v in g.nodes.items():
if not v.color:
dfs_visit(g, v)
I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.
All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.
Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.
It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.
Here are some additional support codes:
class Node(object):
def __init__(self, name=None):
self.name = name
self.child = [] # children | adjacency list
self.color = 0 # 0: white [unvisited], 1: gray [found], 2: black [finished]
self.time_discover = None # DFS
self.time_finish = None # DFS
class Graph(object):
def __init__(self):
self.nodes = defaultdict(Node) # list of Nodes
self.max_heap = [] # nodes in decreasing finish time for SCC
def build_max_heap(self):
"""Build list of nodes in max heap using DFS finish time"""
for k, v in self.nodes.items():
self.max_heap.append((0-v.time_finish, v)) # invert finish time for max heap
heapq.heapify(self.max_heap)
To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().
To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.
while g.max_heap:
v = heapq.heappop(g.max_heap)[1]
if not v.color:
size = dfs_visit(g, v)
scc_size.append(size)
I had a few issues with the order produced by Lawson's version of the iterative DFS. Here is code for my version which has a 1-to-1 mapping with a recursive version of DFS.
n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)
# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
if v in G:
for w in G[v]:
if not explored[w]:
return False
return True
# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
if not explored[v]:
stack = [v]
while stack:
print(stack)
v = stack[-1]
explored[v] = True
# If v still has outgoing edges to explore
if not all_explored(graph_reversed, v):
for w in graph_reversed[v]:
# Explore w before others attached to v
if not explored[w]:
stack.append(w)
break
# We have explored vertices findable from v
else:
stack.pop()
time += 1
finish_times[v] = time
Here are the recursive and iterative implementations in java:
int time = 0;
public void dfsRecursive(Vertex vertex) {
time += 1;
vertex.setVisited(true);
vertex.setDiscovered(time);
for (String neighbour : vertex.getNeighbours()) {
if (!vertices.get(neighbour).getVisited()) {
dfsRecursive(vertices.get(neighbour));
}
}
time += 1;
vertex.setFinished(time);
}
public void dfsIterative(Vertex vertex) {
Stack<Vertex> stack = new Stack<>();
stack.push(vertex);
while (!stack.isEmpty()) {
Vertex current = stack.pop();
if (!current.getVisited()) {
time += 1;
current.setVisited(true);
current.setDiscovered(time);
stack.push(current);
List<String> currentsNeigbours = current.getNeighbours();
for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
String currentNeigbour = currentsNeigbours.get(i);
Vertex neighBour = vertices.get(currentNeigbour);
if (!neighBour.getVisited())
stack.push(neighBour);
}
} else {
if (current.getFinished() < 1) {
time += 1;
current.setFinished(time);
}
}
}
}
First, you should know exactly what is finished time. In recursive dfs, finished time is when all of the adjacent nodes [V]s of a Node v is finished,
with this keeping in mind you need to have additional data structure to store all infos.
adj[][] //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack //normal stack
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
if (!vistied.contains[i]){
st.push(i);
visited.add(i)
while(!st.isEmpty){
int j=st.pop();
int[] unvisitedChild= getUnvistedChild(j);
if(unvisitedChild!=null){
for(int c in unvisitedChild){
st.push(c);
visited.add(c);
}
backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
}
else{
finished.add(j);
while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
{
parent=backtrack.pop()[0];
finished.add(parent);
}
}
}
}
}
function getUnvistedChild(int i){
unvisitedChild[]=null
for(int child in adj[i]){
if(!visited.contains(child))
unvisitedChild.add(child);
}
return unvisitedChild;
}
and the finished time should be
[5, 2, 8, 3, 6, 9, 1, 4, 7]

Python: How to find more than one pathway in a recursive loop when multiple child nodes refers back to the parent?

I'm using recursion to find the path from some point A to some point D.
I'm transversing a graph to find the pathways.
Lets say:
Graph = {'A':['route1','route2'],'B':['route1','route2','route3','route4'], 'C':['route3','route4'], 'D':['route4'] }
Accessible through:
A -> route1, route2
B -> route2, route 3, route 4
C -> route3, route4
There are two solutions in this path from A -> D:
route1 -> route2 -> route4
route1 -> route2 -> route3 -> route4
Since point B and point A has both route 1, and route 2. There is an infinite loop so i add a check whenever
i visit the node( 0 or 1 values ).
However with the check, i only get one solution back: route1 -> route2 -> route4, and not the other possible solution.
Here is the actual coding: Routes will be substituted by Reactions.
def find_all_paths(graph,start, end, addReaction, passed = {}, reaction = [] ,path=[]):
passOver = passed
path = path + [start]
reaction = reaction + [addReaction]
if start == end:
return [reaction]
if not graph.has_key(start):
return []
paths=[]
reactions=[]
for x in range (len(graph[start])):
for y in range (len(graph)):
for z in range (len(graph.values()[y])):
if (graph[start][x] == graph.values()[y][z]):
if passOver.values()[y][z] < 161 :
passOver.values()[y][z] = passOver.values()[y][z] + 1
if (graph.keys()[y] not in path):
newpaths = find_all_paths(graph, (graph.keys()[y]), end, graph.values()[y][z], passOver , reaction, path)
for newpath in newpaths:
reactions.append(newpath)
return reactions
Here is the method call: dic_passOver is a dictionary keeping track if the nodes are visited
solution = (find_all_paths( graph, "M_glc_DASH_D_c', 'M_pyr_c', 'begin', dic_passOver ))
My problem seems to be that once a route is visited, it can no longer be access, so other possible solutions are not possible. I accounted for this by adding a maximum amount of recursion at 161, where all the possible routes are found for my specific code.
if passOver.values()[y][z] < 161 :
passOver.values()[y][z] = passOver.values()[y][z] + 1
However, this seem highly inefficient, and most of my data will be graphs with indexes in their thousands. In addition i won't know the amount of allowed node visits to find all routes. The number 161 was manually figured out.
Well, I can't understand your representation of the graph. But this is a generic algorithm you can use for finding all paths which avoids infinite loops.
First you need to represent your graph as a dictionary which maps nodes to a set of nodes they are connected to. Example:
graph = {'A':{'B','C'}, 'B':{'D'}, 'C':{'D'}}
That means that from A you can go to B and C. From B you can go to D and from C you can go to D. We're assuming the links are one-way. If you want them to be two way just add links for going both ways.
If you represent your graph in that way, you can use the below function to find all paths:
def find_all_paths(start, end, graph, visited=None):
if visited is None:
visited = set()
visited |= {start}
for node in graph[start]:
if node in visited:
continue
if node == end:
yield [start,end]
else:
for path in find_all_paths(node, end, graph, visited):
yield [start] + path
Example usage:
>>> graph = {'A':{'B','C'}, 'B':{'D'}, 'C':{'D'}}
>>> for path in find_all_paths('A','D', graph):
... print path
...
['A', 'C', 'D']
['A', 'B', 'D']
>>>
Edit to take into account comments clarifying graph representation
Below is a function to transform your graph representation(assuming I understood it correctly and that routes are bi-directional) to the one used in the algorithm above
def change_graph_representation(graph):
reverse_graph = {}
for node, links in graph.items():
for link in links:
if link not in reverse_graph:
reverse_graph[link] = set()
reverse_graph[link].add(node)
result = {}
for node,links in graph.items():
adj = set()
for link in links:
adj |= reverse_graph[link]
adj -= {node}
result[node] = adj
return result
If it is important that you find the path in terms of the links, not the nodes traversed you can preserve this information like so:
def change_graph_representation(graph):
reverse_graph = {}
for node, links in graph.items():
for link in links:
if link not in reverse_graph:
reverse_graph[link] = set()
reverse_graph[link].add(node)
result = {}
for node,links in graph.items():
adj = {}
for link in links:
for n in reverse_graph[link]:
adj[n] = link
del(adj[node])
result[node] = adj
return result
And use this modified search:
def find_all_paths(start, end, graph, visited=None):
if visited is None:
visited = set()
visited |= {start}
for node,link in graph[start].items():
if node in visited:
continue
if node == end:
yield [link]
else:
for path in find_all_paths(node, end, graph, visited):
yield [link] + path
That will give you paths in terms of links to follow instead of nodes to traverse. Hope this helps :)

Categories

Resources