Discover All Paths in Single Source, Multi-Terminal (possibly cyclic) Directed Graph - python

I have a graph G = (V,E), where
V is a subset of {0, 1, 2, 3, …}
E is a subset of VxV
There are no unconnected components in G
The graph may contain cycles
There is a known node v in V, which is the source; i.e. there is no u in V such that (u,v) is an edge
There is at least one sink/terminal node v in V; i.e. there is no u in V such that (v,u) is an edge. The identities of the terminal nodes are not known - they must be discovered through traversal
What I need to do is to compute a set of paths P such that every possible path from the source node to any terminal node is in P. Now, if the graph contains cycles, it is possible that by this definition, P becomes an infinite set. This is not what I need. Rather, what I need is forPto contain a path that doesn't explore the loop and at least one path that does explore the loop.
I say "at least one path that does explore the loop", as the loop may contain branches internally, in which case, all of those branches will need to be explored as well. Thus, if the loop contains two internal branches, each with a branching factor of 2, then I need a total of four paths inP` that explore the loop.
For example, an algorithm run on the following graph:
+-------+
| |
v |
1->2->3->4->5->6 |
| | | |
v | v |
9 +->7-+
|
v
8
which can be represented as:
1:{2}
2:{3}
3:{4}
4:{5,9}
5:{6,7}
6:{7}
7:{4,8}
8:{}
9:{}
Should produce the set of paths:
1,2,3,4,9
1,2,3,4,5,6,7,8
1,2,3,4,5,6,7,4,9
1,2,3,4,5,7,8
1,2,3,4,5,7,4,9
1,2,3,4,5,7,4,5,6,7,8
1,2,3,4,5,7,4,5,7,8
Thus far, I have the following algorithm (in python) that works in some simple cases:
def extractPaths(G, s=None, explored=None, path=None):
_V,E = G
if s is None: s = 0
if explored is None: explored = set()
if path is None: path = [s]
explored.add(s)
if not len(set(E[s]) - explored):
print path
for v in set(E[s]) - explored:
if len(E[v]) > 1:
path.append(v)
for vv in set(E[v]) - explored:
extractPaths(G, vv, explored-set(n for n in path if len(E[n])>1), path+[vv])
else:
extractPaths(G, v, explored, path+[v])
but it fails horribly in the more complex cases.
I'd appreciate any help as this is a tool to validate an algorithm that I have developed for my Master's thesis.
Thank you in advance

I've though about this for a couple of hours, and have come up with this algorithm. It doesn't quite give the result you're asking for, but it's similar (and might be equivalent).
Observation: If we try to go to a node that has been seen before, the most recent visit, up until the current node, can be considered a loop. If we have seen that loop, we cannot go to that node.
def extractPaths(current_node,path,loops_seen):
path.append(current_node)
# if node has outgoing edges
if nodes[current_node]!=None:
for thatnode in nodes[current_node]:
valid=True
# if the node we are going to has been
# visited before, we are completeing
# a loop.
if thatnode-1 in path:
i=len(path)-1
# find the last time we visited
# that node
while path[i]!=thatnode-1:
i-=1
# the last time, to this time is
# a single loop.
new_loop=path[i:len(path)]
# if we haven't seen this loop go to
# the node and node we have seen this
# loop. else don't go to the node.
if new_loop in loops_seen:
valid=False
else:
loops_seen.append(new_loop)
if valid:
extractPaths(thatnode-1,path,loops_seen)
# this is the end of the path
else:
newpath=list()
# increment all the values for printing
for i in path:
newpath.append(i+1)
found_paths.append(newpath)
# backtrack
path.pop()
# graph defined by lists of outgoing edges
nodes=[[2],[3],[4],[5,9],[6,7],[7],[4,8],None,None]
found_paths=list()
extractPaths(0,list(),list())
for i in found_paths:
print(i)

Related

Improving BFS performance with some kind of memoization

I have this issue that I'm trying to build an algorithm which will find distances from one vertice to others in graph.
Let's say with the really simple example that my network looks like this:
network = [[0,1,2],[2,3,4],[4,5,6],[6,7]]
I created a BFS code which is supposed to find length of paths from the specified source to other graph's vertices
from itertools import chain
import numpy as np
n = 8
graph = {}
for i in range(0, n):
graph[i] = []
for communes in communities2:
for vertice in communes:
work = communes.copy()
work.remove(vertice)
graph[vertice].append(work)
for k, v in graph.items():
graph[k] = list(chain(*v))
def bsf3(graph, s):
matrix = np.zeros([n,n])
dist = {}
visited = []
queue = [s]
dist[s] = 0
visited.append(s)
matrix[s][s] = 0
while queue:
v = queue.pop(0)
for neighbour in graph[v]:
if neighbour in visited:
pass
else:
matrix[s][neighbour] = matrix[s][v] + 1
queue.append(neighbour)
visited.append(neighbour)
return matrix
bsf3(graph,2)
First I'm creating graph (dictionary) and than use the function to find distances.
What I'm concerned about is that this approach doesn't work with larger networks (let's say with 1000 people in there). And what I'm thinking about is to use some kind of memoization (actually that's why I made a matrix instead of list). The idea is that when the algorithm calculates the path from let's say 0 to 3 (what it does already) it should keep track for another routes in such a way that matrix[1][3] = 1 etc.
So I would use the function like bsf3(graph, 1) it would not calculate everything from scratch, but would be able to access some values from matrix.
Thanks in advance!
Knowing this not fully answer your question, but this is another approach you cabn try.
In networks you will have a routing table for each node inside your network. You simple save a list of all nodes inside the network and in which node you have to go. Example of routing table of node D
A -> B
B -> B
C -> E
D -> D
E -> E
You need to run BFS on each node to build all routing table and it will take O(|V|*(|V|+|E|). The space complexity is quadratic but you have to check all possible paths.
When you create all this information you can simple start from a node and search for your destination node inside the table and find the next node to go. This will give a more better time complexity (if you use the right data structure for the table).

How can I make a recursive search for longest node more efficient?

I'm trying to find the longest path in a Directed Acyclic graph. At the moment, my code seems to be running time complexity of O(n3).
The graph is of input {0: [1,2], 1: [2,3], 3: [4,5] }
#Input: dictionary: graph, int: start, list: path
#Output: List: the longest path in the graph (Recurrance)
# This is a modification of a depth first search
def find_longest_path(graph, start, path=[]):
path = path + [start]
paths = path
for node in graph[start]:
if node not in path:
newpaths = find_longest_path(graph, node, path)
#Only take the new path if its length is greater than the current path
if(len(newpaths) > len(paths)):
paths = newpaths
return paths
It returns a list of nodes in the form e.g. [0,1,3,5]
How can I make this more efficient than O(n3)? Is recursion the right way to solve this or should I be using a different loop?
You can solve this problem in O(n+e) (i.e. linear in the number of nodes + edges).
The idea is that you first create a topological sort (I'm a fan of Tarjan's algorithm) and the set of reverse edges. It always helps if you can decompose your problem to leverage existing solutions.
You then walk the topological sort backwards pushing to each parent node its child's distance + 1 (keeping maximums in case there are multiple paths). Keep track of the node with the largest distance seen so far.
When you have finished annotating all of the nodes with distances you can just start at the node with the largest distance which will be your longest path root, and then walk down your graph choosing the children that are exactly one count less than the current node (since they lie on the critical path).
In general, when trying to find an optimal complexity algorithm don't be afraid to run multiple stages one after the other. Five O(n) algorithms run sequentially is still O(n) and is still better than O(n2) from a complexity perspective (although it may be worse real running time depending on the constant costs/factors and the size of n).
ETA: I just noticed you have a start node. This makes it simply a case of doing a depth first search and keeping the longest solution seen so far which is just O(n+e) anyway. Recursion is fine or you can keep a list/stack of visited nodes (you have to be careful when finding the next child each time you backtrack).
As you backtrack from your depth first search you need to store the longest path from that node to a leaf so that you don't re-process any sub-trees. This will also serve as a visited flag (i.e. in addition to doing the node not in path test also have a node not in subpath_cache test before recursing). Instead of storing the subpath you could store the length and then rebuild the path once you're finished based on sequential values as discussed above (critical path).
ETA2: Here's a solution.
def find_longest_path_rec(graph, parent, cache):
maxlen = 0
for node in graph[parent]:
if node in cache:
pass
elif node not in graph:
cache[node] = 1
else:
cache[node] = find_longest_path_rec(graph, node, cache)
maxlen = max(maxlen, cache[node])
return maxlen + 1
def find_longest_path(graph, start):
cache = {}
maxlen = find_longest_path_rec(graph, start, cache)
path = [start]
for i in range(maxlen-1, 0, -1):
for node in graph[path[-1]]:
if cache[node] == i:
path.append(node)
break
else:
assert(0)
return path
Note that I've removed the node not in path test because I'm assuming that you're actually supplying a DAG as claimed. If you want that check you should really be raising an error rather than ignoring it. Also note that I've added the assertion to the else clause of the for to document that we must always find a valid next (sequential) node in the path.
ETA3: The final for loop is a little confusing. What we're doing is considering that in the critical path all of the node distances must be sequential. Consider node 0 is distance 4, node 1 is distance 3 and node 2 is distance 1. If our path started [0, 2, ...] we have a contradiction because node 0 is not 1 further from a leaf than 2.
There are a couple of non-algorithmic improvements I'd suggest (these are related to Python code quality):
def find_longest_path_from(graph, start, path=None):
"""
Returns the longest path in the graph from a given start node
"""
if path is None:
path = []
path = path + [start]
max_path = path
nodes = graph.get(start, [])
for node in nodes:
if node not in path:
candidate_path = find_longest_path_from(graph, node, path)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
Changes explained:
def find_longest_path_from(graph, start, path=None):
if path is None:
path = []
I've renamed find_longest_path as find_longest_path_from to better explain what it does.
Changed the path argument to have a default argument value of None instead of []. Unless you know you will specifically benefit from them, you want to avoid using mutable objects as default arguments in Python. This means you should typically set path to None by default and then when the function is invoked, check whether path is None and create an empty list accordingly.
max_path = path
...
candidate_path = find_longest_path_from(graph, node, path)
...
I've updated the names of your variables from paths to max_path and newpaths to candidate_path. These were confusing variable names because they referred to the plural of path -- implying that the value they stored consisted of multiple paths -- when in fact they each just held a single path. I tried to give them more descriptive names.
nodes = graph.get(start, [])
for node in nodes:
Your code errors out on your example input because the leaf nodes of the graph are not keys in the dict so graph[start] would raise a KeyError when start is 2, for instance. This handles the case where start is not a key in graph by returning an empty list.
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
A method to find the longest path in a graph that iterates over the keys. This is entirely separate from your algorithmic analysis of find_longest_path_from but I wanted to include it.

How to cleanly avoid loops in recursive function (breadth-first traversal)

I'm writing a recursive breadth-first traversal of a network. The problem I ran into is that the network often looks like this:
1
/ \
2 3
\ /
4
|
5
So my traversal starts at 1, then traverses to 2, then 3. The next stop is to proceed to 4, so 2 traverses to 4. After this, 3 traverses to 4, and suddenly I'm duplicating work as both lines try to traverse to 5.
The solution I've found is to create a list called self.already_traversed, and every time a node is traversed, I append it to the list. Then, when I'm traversing from node 4, I check to make sure it hasn't already been traversed.
The problem here is that I'm using an instance variable for this, so I need a way to set up the list before the first recursion and a way to clean it up afterwards. The way I'm currently doing this is:
self.already_traversed = []
self._traverse_all_nodes(start_id)
self.already_traversed = []
Of course, it sucks to be twoggling variables outside of the function that's using them. Is there a better way to do this so this can be put into my traversal function?
Here's the actual code, though I recognize it's a bit dense:
def _traverse_all_nodes(self, root_authority, max_depth=6):
"""Recursively build a networkx graph
Process is:
- Work backwards through the authorities for self.cluster_end and all
of its children.
- For each authority, add it to a networkx graph, if:
- it happened after self.cluster_start
- it's in the Supreme Court
- we haven't exceeded a max_depth of six cases.
- we haven't already followed this path
"""
g = networkx.Graph()
if hasattr(self, 'already_traversed'):
is_already_traversed = (root_authority.pk in self.visited_nodes)
else:
# First run. Create an empty list.
self.already_traversed = []
is_already_traversed = False
is_past_max_depth = (max_depth <= 0)
is_cluster_start_obj = (root_authority == self.cluster_start)
blocking_conditions = [
is_past_max_depth,
is_cluster_start_obj,
is_already_traversed,
]
if not any(blocking_conditions):
print " No blocking conditions. Pressing on."
self.visited_nodes.append(root_authority.pk)
for authority in root_authority.authorities.filter(
docket__court='scotus',
date_filed__gte=self.cluster_start.date_filed):
g.add_edge(root_authority.pk, authority.pk)
# Combine our present graph with the result of the next
# recursion
g = networkx.compose(g, self._build_graph(
authority,
max_depth - 1,
))
return g
def add_clusters(self):
"""Do the network analysis to add clusters to the model.
Process is to:
- Build a networkx graph
- For all nodes in the graph, add them to self.clusters
"""
self.already_traversed = []
g = self._traverse_all_nodes(
self.cluster_end,
max_depth=6,
)
self.already_traversed = []
Check out:
How do I pass a variable by reference?
which contains an example on how to past a list by reference. If you pass the list by reference, every call to your function will refer to the same list.

Algorithm Is Node A Connected to Node B in Graph

I am looking for an algorithm to check for any valid connection (shortest or longest) between two arbitrary nodes on a graph.
My graph is fixed to a grid with logical (x, y) coordinates with north/south/east/west connections, but nodes can be removed randomly so you can't assume that taking the edge with coords closest to the target is always going to get you there.
The code is in python. The data structure is each node (object) has a list of connected nodes. The list elements are object refs, so we can then search that node's list of connected nodes recursively, like this:
for pnode in self.connected_nodes:
for cnode in pnode.connected_nodes:
...etc
I've included a diagram showing how the nodes map to x,y coords and how they are connected in north/east/south/west. Sometimes there are missing nodes (i.e between J and K), and sometimes there are missing edges (i.e between G and H). The presence of nodes and edges is in flux (although when we run the algorithm, it is taking a fixed snapshot in time), and can only be determined by checking each node for it's list of connected nodes.
The algorithm needs to yield a simple true/false to whether there is a valid connection between two nodes. Recursing through every list of connected nodes explodes the number of operations required - if the node is n edges away, it requires at most 4^n operations. My understanding is something like Dijistrka's algorithm works by finding the shortest path based on edge weights, but if there is no connection at all then would it still work?
For some background, I am using this to model 2D destructible objects. Each node represents a chunk of the material, and if one or more nodes do not have a connection to the rest of the material then it should separate off. In the diagram - D, H, R - should pare off from the main body as they are not connected.
UPDATE:
Although many of the posted answers might well work, DFS is quick, easy and very appropriate. I'm not keen on the idea of sticking extra edges between nodes with high value weights to use Dijkstra because node's themselves might disappear as well as edges. The SSC method seems more appropriate for distinguishing between strong and weakly connected graph sections, which in my graph would work if there was a single edge between G and H.
Here is my experiment code for DFS search, which creates the same graph as shown in the diagram.
class node(object):
def __init__(self, id):
self.connected_nodes = []
self.id = id
def dfs_is_connected(self, node):
# Initialise our stack and our discovered list
stack = []
discovered = []
# Declare operations count to track how many iterations it took
op_count = 0
# Push this node to the stack, for our starting point
stack.append(self)
# Keeping iterating while the stack isn't empty
while stack:
# Pop top element off the stack
current_node = stack.pop()
# Is this the droid/node you are looking for?
if current_node.id == node.id:
# Stop!
return True, op_count
# Check if current node has not been discovered
if current_node not in discovered:
# Increment op count
op_count += 1
# Is this the droid/node you are looking for?
if current_node.id == node.id:
# Stop!
return True, op_count
# Put this node in the discovered list
discovered.append(current_node)
# Iterate through all connected nodes of the current node
for connected_node in current_node.connected_nodes:
# Push this connected node into the stack
stack.append(connected_node)
# Couldn't find the node, return false. Sorry bud
return False, op_count
if __name__ == "__main__":
# Initialise all nodes
a = node('a')
b = node('b')
c = node('c')
d = node('d')
e = node('e')
f = node('f')
g = node('g')
h = node('h')
j = node('j')
k = node('k')
l = node('l')
m = node('m')
n = node('n')
p = node('p')
q = node('q')
r = node('r')
s = node('s')
# Connect up nodes
a.connected_nodes.extend([b, e])
b.connected_nodes.extend([a, f, c])
c.connected_nodes.extend([b, g])
d.connected_nodes.extend([r])
e.connected_nodes.extend([a, f, j])
f.connected_nodes.extend([e, b, g])
g.connected_nodes.extend([c, f, k])
h.connected_nodes.extend([r])
j.connected_nodes.extend([e, l])
k.connected_nodes.extend([g, n])
l.connected_nodes.extend([j, m, s])
m.connected_nodes.extend([l, p, n])
n.connected_nodes.extend([k, m, q])
p.connected_nodes.extend([s, m, q])
q.connected_nodes.extend([p, n])
r.connected_nodes.extend([h, d])
s.connected_nodes.extend([l, p])
# Check if a is connected to q
print a.dfs_is_connected(q)
print a.dfs_is_connected(d)
print p.dfs_is_connected(h)
To find this out, you just need to run simple DFS or BFS algorithm on one of the nodes, it'll find all reachable nodes within a continuous component of the graph, so you just mark it down if you've found the other node during the run of algorithm.
There is a way to use Dijkstra to find the path. If there is an edge between two nodes put 1 for weight, if there is no node, put weight of sys.maxint. Then when the min path is calculated, if it is larger than the number of nodes - there is no path between them.
Another approach is to first find the strongly connected components of the graph. If the nodes are on the same strong component then use Dijkstra to find the path, otherwise there is no path that connects them.
You could take a look at the A* Path Finding Algorithm (which uses heuristics to make it more efficient than Dijkstra's, so if there isn't anything you can exploit in your problem, you might be better off using Dijkstra's algorithm. You would need positive weights though. If this is not something you have in your graph, you could simply give each edge a weight of 1).
Looking at the pseudo code on Wikipedia, A* moves from one node to another by getting the neighbours of the current node. Dijkstra's Algorithm keeps an adjacency list so that it knows which nodes are connected to each other.
Thus, if you where to start from node H, you could only go to R and D. Since these nodes are not connected to the others, the algorithm will not go through the other nodes.
You can find strongly connected components(SCC) of your graph and then check if nodes of interest in one component or not. In your example H-R-D will be first component and rest second, so for H and R result will be true but H and A false.
See SCC algorithm here: https://class.coursera.org/algo-004/lecture/53.

By having a list with mazes houses(2 dimensions), how do i create a directed graph with a dictionary

I can only make an undirected graph. no idea on how i can make a directed one.
Any idea?
Apologies for the rather long winded post. I had time to kill on the train.
I'm guessing what you're after is a directed graph representing all paths leading away from your starting position (as opposed to a graph representation of the maze which once can use to solve arbitrary start/end positions).
(No offence meant, but) this sounds like homework, or at least, a task that is very suitable for homework. With this in mind, the following solution focuses on simplicity rather than performance or elegance.
Approach
One straight-forward way to do this would be to first store your map in a more navigable format, then, beginning with the start node do the following:
look up neighbours (top, bottom, left, right)
for each neighbour:
if it is not a possible path, ignore
if we have processed this node before, ignore
else, add this node as an edge and push it a queue (not a stack, more on this later) for further processing
for each node in the queue/stack, repeat from step 1.
(See example implementation below)
At this point, you'll end up with a directed acyclic graph (DAG) with the starting node at the top of the tree and end node as one of the leaves. Solving this would be easy at this point. See this answer on solving a maze representing as a graph.
A possible optimisation when building the graph would be to stop once the end point is found. You'll end up with an incomplete graph, but if you're only concerned about the final solution this doesn't matter.
stack or queue?
Note that using a stack (first in last out) would mean building the graph in a depth-first manner, while using a queue (first in first out) would result in a breadth-first approach.
You would generally want to use a queue (breadth first if the intention is to look for the shortest path. Consider the following map:
START
######## ######
######## ######
### b a ######
### ## ######
### ## e ######
### c d ######
######## ######
######## END
#################
If the path is traversed depth-first and at branch a you happen take the a->b path before a->e, you end up with the graph:
START
|
a
/ \
b e <-- end, since d already visited
|
c
|
d
\
END
However, using a breadth-first approach the a->e path would come across node d earlier, resulting in a shorter graph and a better solution:
START
|
a
/ \
b e
| |
c d
|
END
Example code
Sample input provided:
..........
#########.
..........
.#########
......#...
#####...#.
##...####.
##.#......
...#######
e = (0,0)
s = (8,0)
DISCLAIMER: The following code is written for clarity, not speed. It is not fully tested so there is no guarantee of correctness but it should give you an idea of what is possible.
We assumes that the input file is formatted consistently. Most error checking left out for brevity.
# regex to extract start/end positions
import re
re_sepos = re.compile("""
^([se])\s* # capture first char (s or e) followed by arbitrary spaces
=\s* # equal sign followed by arbitrary spaces
\( # left parenthesis
(\d+),(\d+) # capture 2 sets of integers separated by comma
\) # right parenthesis
""", re.VERBOSE)
def read_maze(filename):
"""
Reads input from file and returns tuple (maze, start, end)
maze : dict holding value of each maze cell { (x1,y1):'#', ... }
start: start node coordinage (x1,y1)
end : end node coordinate (x2,y2)
"""
# read whole file into a list
f = open(filename, "r")
data = f.readlines()
f.close()
# parse start and end positions from last 2 lines
pos = {}
for line in data[-2:]:
match = re_sepos.match(line)
if not match:
raise ValueError("invalid input file")
c,x,y = match.groups() # extract values
pos[c] = (int(x),int(y))
try:
start = pos["s"]
end = pos["e"]
except KeyError:
raise ValueError("invalid input file")
# read ascii maze, '#' for wall '.' for empty slor
# store as maze = { (x1,y1):'#', (x2,y2):'.', ... }
# NOTE: this is of course not optimal, but leads to a simpler access later
maze = {}
for line_num, line in enumerate(data[:-3]): # ignore last 3 lines
for col_num, value in enumerate(line[:-1]): # ignore \n at end
maze[(line_num, col_num)] = value
return maze, start, end
def maze_to_dag(maze, start, end):
"""
Traverses the map starting from start coordinate.
Returns directed acyclic graph in the form {(x,y):[(x1,y1),(x2,y2)], ...}
"""
dag = {} # directed acyclic graph
queue = [start] # queue of nodes to process
# repeat till queue is empty
while queue:
x,y = queue.pop(0) # get next node in queue
edges = dag[(x,y)] = [] # init list to store edges
# for each neighbour (top, bottom, left, right)
for coord in ((x,y-1), (x,y+1), (x-1,y), (x+1,y)):
if coord in dag.keys(): continue # visited before, ignore
node_value = maze.get(coord, None) # Return None if outside maze
if node_value == ".": # valid path found
edges.append(coord) # add as edge
queue.append(coord) # push into queue
# uncomment this to stop once we've found the end point
#if coord == end: return dag
return dag
if __name__ == "__main__":
maze,start,end = read_maze("l4.txt")
dag = maze_to_dag(maze, start, end)
print dag
This page provides a nice tutorial on implementing graphs with python. From the article, this is an example of a directory graph represented by dictionary:
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
That said, you might also want to look into existing graph libraries such as NetworkX and igraph.
Since you already have a list, try creating an Adjacency Matrix instead of a dictionary.
list_of_houses = []
directed_graph = [][]
for i in xrange(len(list_of_houses)):
for i in xrange(len(list_of_houses)):
directed_graph[i][i] = 0
Then for any new edge from one house to another (or w/e the connection is)
directed_graph[from_house][to_house] = 1
and you're done. If there is an edge from house_a to house_b then directed_graph[house_a][house_b] == 1.

Categories

Resources