Finding all paths/walks of given length in a networkx graph - python

I'm using networkx and trying to find all the walks with length 3 in the graph, specifically the paths with three edges. I tried to find some information about the algorithms in the networkx documentation but I could only find the algorithms for the shortest path in the graph. Can I find a length of a path trough specific nodes, for example a path trough nodes 14 -> 11 -> 12 -> 16 if the shortest path is 14 -> 15 -> 16? Here's an image of a graph for an example:

Simplest version (another version is below which I think is faster):
def findPaths(G,u,n):
if n==0:
return [[u]]
paths = [[u]+path for neighbor in G.neighbors(u) for path in findPaths(G,neighbor,n-1) if u not in path]
return paths
This takes a network G and a node u and a length n. It recursively finds all paths of length n-1 starting from neighbors of u that don't include u. Then it sticks u at the front of each such path and returns a list of those paths.
Note, each path is an ordered list. They all start from the specified node. So for what you want, just wrap a loop around this:
allpaths = []
for node in G:
allpaths.extend(findPaths(G,node,3))
Note that this will have any a-b-c-d path as well as the reverse d-c-b-a path.
If you find the "list comprehension" to be a challenge to interpret, here's an equivalent option:
def findPathsNoLC(G,u,n):
if n==0:
return [[u]]
paths = []
for neighbor in G.neighbors(u):
for path in findPathsNoLC(G,neighbor,n-1):
if u not in path:
paths.append([u]+path)
return paths
For optimizing this, especially if there are many cycles, it may be worth sending in a set of disallowed nodes. At each nested call it would know not to include any nodes from higher up in the recursion. This would work instead of the if u not in path check. The code would be a bit more difficult to understand, but it would run faster.
def findPaths2(G,u,n,excludeSet = None):
if excludeSet == None:
excludeSet = set([u])
else:
excludeSet.add(u)
if n==0:
return [[u]]
paths = [[u]+path for neighbor in G.neighbors(u) if neighbor not in excludeSet for path in findPaths2(G,neighbor,n-1,excludeSet)]
excludeSet.remove(u)
return paths
Note that I have to add u to excludeSet before the recursive call and then remove it before returning.

Related

Find shortest path on graph with 1 character difference

I have a somewhat complicated question. I am provided a list of words (each word has the same length). I am given two words into my function (StartNode and EndNode) and my task is to find the shortest path between the two (A follow-up would be how to collect all paths from startNode to EndNode). The words can only be connected if they have at most a 1 word difference. For example, TRIE and TREE could be connected since they only differ by one letter (I v E) but TRIE and TWEP can't be connected since they have 2 character differences.
My solution was to first build an adjacency list, which I successfully implemented, and then compute a BFS to determine whether a path exists between the startNode and endNode. I am able to determine if a path exists but I'm unsure on how I can keep track of the path.
My attempt is as follows:
def shortestPath(startNode, endNode, words):
adjList=createGraph(words)
print(adjList)
#Returns shortest path from startNode to EndNode
visited=set()
q=collections.deque()
total=-1
q.append(startNode)
while q:
node=q.popleft()
visited.add(node)
if node==endNode:
if node!=startNode:
return total+1
total=total+1
for i in adjList[node]:
if i not in visited:
print(i)
q.append(i)
return -1
My BFS doesn't take in the path and the total_length is quite obviously wrong too. Is there any way I can improve my algorithm?
Sample Input:
{'POON': ['POIN', 'LOON'], 'PLEE': ['PLEA', 'PLIE'], 'SAME': [], 'POIE': ['PLIE', 'POIN'], 'PLEA': ['PLEE', 'PLIE'], 'PLIE': ['PLEE', 'POIE', 'PLEA'], 'POIN': ['POON', 'POIE'], 'LOON': ['POON']}
startWord: POON
endWord: PLEA
Expected Output:
POON -> POIN -> POIE -> PLIE -> PLEA
Current Output:
POIN
LOON
POIE
PLIE
PLEE
PLEA
PLEA
6
Any tips on where I am going wrong?
For anyone who stumbled upon this question, I did figure out a solution. A normal BFS just figures out if a path exists to the node and implicitly goes through the shortest traversal BUT if you want to show that traversal (path or length of path), it becomes necessary to keep two more counters.
In this case, I kept a counter of predecessor and a distance from source, my function therefore becomes:
def shortestPath(startNode, endNode, words):
adjList=createGraph(words)
print(adjList)
#Returns shortest path from startNode to EndNode
visited=set()
pred={i:-1 for i in adjList} #Keep the predecessor to each node as -1 initially
dist = {i:10000000 for i in adjList} #Initially set distance for each node from src to max
#Distance and Predecessor:
dist[startNode]=0 #initialize distance of distance from startNode to startNode =0
q=collections.deque()
total=-1
q.append(startNode)
while q:
node=q.popleft()
visited.add(node)
if node==endNode:
if node!=startNode:
findShortestPath(startNode, endNode, pred) #Pass it into another helper function since pred is already constructed
return total+1
total=total+1
for i in adjList[node]:
if i not in visited:
dist[i]=dist[node]+1
pred[i]=node
q.append(i)
#If there is no available path between the two Nodes
return -1
When the BFS is complete, we will also have a pred and distance array set up. Predecessor would contain each node and its predecessor in the path from start -> end (and -1 if no connection exists). To print out the path from start-> end, we could use a helper function.
Additionally, I also kept the distance dictionary. It would show the path to each node.
Helper Function:
def findShortestPath(startNode, endNode, pred):
path=[]
crawl=endNode
path.append(crawl)
while (pred[crawl]!=-1):
path.append(pred[crawl])
crawl=pred[crawl]
path.reverse()
print(path)
This is kind of a Djikstra's algorithm approach but I'm unsure on how else I can achieve this

how to create a route knowing starting point, ending point and distance to travel

I am trying to create a walk path on a map using python. And I need to set not only start point and end point, but distance to travel too. So I can not just create a shortest path from point to point.
I started with osmnx and networkx. I created different paths, but I can not check their distance. Can not find anything on that on documentation.
The idea is to make a telegram bot which would create a walking path, so the point is to walk for 5 km for example. Bot was easy, but I have no idea how to create a route based on distance I want to travel (with start and end points)
I'm not clear what the exact question is here, but as for:
I started with osmnx and networkx. I created different paths, but I can not check their distance. Can not find anything on that on documentation.
This functionality is explained in both the OSMnx usage examples/documentation as well as the NetworkX documentation:
import networkx as nx
import osmnx as ox
ox.config(use_cache=True, log_console=True)
# get a graph, an origin, and a destination
G = ox.graph_from_place('Piedmont, CA, USA', network_type='drive')
orig, dest = list(G)[0], list(G)[-10]
# calculate the shortest path from origin to destination
path = nx.shortest_path(G, orig, dest, weight='length')
# the length of each edge traversed along the path
lengths = ox.utils_graph.get_route_edge_attributes(G, path, 'length')
# the total length of the path
path_length = sum(lengths)
# or just directly calculate the shortest path's length from origin to destination
path_length = nx.shortest_path_length(G, orig, dest, weight='length')
If you want to find the shortest path at of least L length, you could use OSMnx's k_shortest_paths function then iterate through the paths until you find one with length >= L (determining each's length as demonstrated in the code snippet above).
L = 3400
paths = ox.k_shortest_paths(G, orig, dest, 1000, 'length')
for i, path in enumerate(paths):
length = sum(ox.utils_graph.get_route_edge_attributes(G, path, 'length'))
if length >= L:
break
i, length, path

Improving BFS performance with some kind of memoization

I have this issue that I'm trying to build an algorithm which will find distances from one vertice to others in graph.
Let's say with the really simple example that my network looks like this:
network = [[0,1,2],[2,3,4],[4,5,6],[6,7]]
I created a BFS code which is supposed to find length of paths from the specified source to other graph's vertices
from itertools import chain
import numpy as np
n = 8
graph = {}
for i in range(0, n):
graph[i] = []
for communes in communities2:
for vertice in communes:
work = communes.copy()
work.remove(vertice)
graph[vertice].append(work)
for k, v in graph.items():
graph[k] = list(chain(*v))
def bsf3(graph, s):
matrix = np.zeros([n,n])
dist = {}
visited = []
queue = [s]
dist[s] = 0
visited.append(s)
matrix[s][s] = 0
while queue:
v = queue.pop(0)
for neighbour in graph[v]:
if neighbour in visited:
pass
else:
matrix[s][neighbour] = matrix[s][v] + 1
queue.append(neighbour)
visited.append(neighbour)
return matrix
bsf3(graph,2)
First I'm creating graph (dictionary) and than use the function to find distances.
What I'm concerned about is that this approach doesn't work with larger networks (let's say with 1000 people in there). And what I'm thinking about is to use some kind of memoization (actually that's why I made a matrix instead of list). The idea is that when the algorithm calculates the path from let's say 0 to 3 (what it does already) it should keep track for another routes in such a way that matrix[1][3] = 1 etc.
So I would use the function like bsf3(graph, 1) it would not calculate everything from scratch, but would be able to access some values from matrix.
Thanks in advance!
Knowing this not fully answer your question, but this is another approach you cabn try.
In networks you will have a routing table for each node inside your network. You simple save a list of all nodes inside the network and in which node you have to go. Example of routing table of node D
A -> B
B -> B
C -> E
D -> D
E -> E
You need to run BFS on each node to build all routing table and it will take O(|V|*(|V|+|E|). The space complexity is quadratic but you have to check all possible paths.
When you create all this information you can simple start from a node and search for your destination node inside the table and find the next node to go. This will give a more better time complexity (if you use the right data structure for the table).

How can I make a recursive search for longest node more efficient?

I'm trying to find the longest path in a Directed Acyclic graph. At the moment, my code seems to be running time complexity of O(n3).
The graph is of input {0: [1,2], 1: [2,3], 3: [4,5] }
#Input: dictionary: graph, int: start, list: path
#Output: List: the longest path in the graph (Recurrance)
# This is a modification of a depth first search
def find_longest_path(graph, start, path=[]):
path = path + [start]
paths = path
for node in graph[start]:
if node not in path:
newpaths = find_longest_path(graph, node, path)
#Only take the new path if its length is greater than the current path
if(len(newpaths) > len(paths)):
paths = newpaths
return paths
It returns a list of nodes in the form e.g. [0,1,3,5]
How can I make this more efficient than O(n3)? Is recursion the right way to solve this or should I be using a different loop?
You can solve this problem in O(n+e) (i.e. linear in the number of nodes + edges).
The idea is that you first create a topological sort (I'm a fan of Tarjan's algorithm) and the set of reverse edges. It always helps if you can decompose your problem to leverage existing solutions.
You then walk the topological sort backwards pushing to each parent node its child's distance + 1 (keeping maximums in case there are multiple paths). Keep track of the node with the largest distance seen so far.
When you have finished annotating all of the nodes with distances you can just start at the node with the largest distance which will be your longest path root, and then walk down your graph choosing the children that are exactly one count less than the current node (since they lie on the critical path).
In general, when trying to find an optimal complexity algorithm don't be afraid to run multiple stages one after the other. Five O(n) algorithms run sequentially is still O(n) and is still better than O(n2) from a complexity perspective (although it may be worse real running time depending on the constant costs/factors and the size of n).
ETA: I just noticed you have a start node. This makes it simply a case of doing a depth first search and keeping the longest solution seen so far which is just O(n+e) anyway. Recursion is fine or you can keep a list/stack of visited nodes (you have to be careful when finding the next child each time you backtrack).
As you backtrack from your depth first search you need to store the longest path from that node to a leaf so that you don't re-process any sub-trees. This will also serve as a visited flag (i.e. in addition to doing the node not in path test also have a node not in subpath_cache test before recursing). Instead of storing the subpath you could store the length and then rebuild the path once you're finished based on sequential values as discussed above (critical path).
ETA2: Here's a solution.
def find_longest_path_rec(graph, parent, cache):
maxlen = 0
for node in graph[parent]:
if node in cache:
pass
elif node not in graph:
cache[node] = 1
else:
cache[node] = find_longest_path_rec(graph, node, cache)
maxlen = max(maxlen, cache[node])
return maxlen + 1
def find_longest_path(graph, start):
cache = {}
maxlen = find_longest_path_rec(graph, start, cache)
path = [start]
for i in range(maxlen-1, 0, -1):
for node in graph[path[-1]]:
if cache[node] == i:
path.append(node)
break
else:
assert(0)
return path
Note that I've removed the node not in path test because I'm assuming that you're actually supplying a DAG as claimed. If you want that check you should really be raising an error rather than ignoring it. Also note that I've added the assertion to the else clause of the for to document that we must always find a valid next (sequential) node in the path.
ETA3: The final for loop is a little confusing. What we're doing is considering that in the critical path all of the node distances must be sequential. Consider node 0 is distance 4, node 1 is distance 3 and node 2 is distance 1. If our path started [0, 2, ...] we have a contradiction because node 0 is not 1 further from a leaf than 2.
There are a couple of non-algorithmic improvements I'd suggest (these are related to Python code quality):
def find_longest_path_from(graph, start, path=None):
"""
Returns the longest path in the graph from a given start node
"""
if path is None:
path = []
path = path + [start]
max_path = path
nodes = graph.get(start, [])
for node in nodes:
if node not in path:
candidate_path = find_longest_path_from(graph, node, path)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
Changes explained:
def find_longest_path_from(graph, start, path=None):
if path is None:
path = []
I've renamed find_longest_path as find_longest_path_from to better explain what it does.
Changed the path argument to have a default argument value of None instead of []. Unless you know you will specifically benefit from them, you want to avoid using mutable objects as default arguments in Python. This means you should typically set path to None by default and then when the function is invoked, check whether path is None and create an empty list accordingly.
max_path = path
...
candidate_path = find_longest_path_from(graph, node, path)
...
I've updated the names of your variables from paths to max_path and newpaths to candidate_path. These were confusing variable names because they referred to the plural of path -- implying that the value they stored consisted of multiple paths -- when in fact they each just held a single path. I tried to give them more descriptive names.
nodes = graph.get(start, [])
for node in nodes:
Your code errors out on your example input because the leaf nodes of the graph are not keys in the dict so graph[start] would raise a KeyError when start is 2, for instance. This handles the case where start is not a key in graph by returning an empty list.
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
A method to find the longest path in a graph that iterates over the keys. This is entirely separate from your algorithmic analysis of find_longest_path_from but I wanted to include it.

Discover All Paths in Single Source, Multi-Terminal (possibly cyclic) Directed Graph

I have a graph G = (V,E), where
V is a subset of {0, 1, 2, 3, …}
E is a subset of VxV
There are no unconnected components in G
The graph may contain cycles
There is a known node v in V, which is the source; i.e. there is no u in V such that (u,v) is an edge
There is at least one sink/terminal node v in V; i.e. there is no u in V such that (v,u) is an edge. The identities of the terminal nodes are not known - they must be discovered through traversal
What I need to do is to compute a set of paths P such that every possible path from the source node to any terminal node is in P. Now, if the graph contains cycles, it is possible that by this definition, P becomes an infinite set. This is not what I need. Rather, what I need is forPto contain a path that doesn't explore the loop and at least one path that does explore the loop.
I say "at least one path that does explore the loop", as the loop may contain branches internally, in which case, all of those branches will need to be explored as well. Thus, if the loop contains two internal branches, each with a branching factor of 2, then I need a total of four paths inP` that explore the loop.
For example, an algorithm run on the following graph:
+-------+
| |
v |
1->2->3->4->5->6 |
| | | |
v | v |
9 +->7-+
|
v
8
which can be represented as:
1:{2}
2:{3}
3:{4}
4:{5,9}
5:{6,7}
6:{7}
7:{4,8}
8:{}
9:{}
Should produce the set of paths:
1,2,3,4,9
1,2,3,4,5,6,7,8
1,2,3,4,5,6,7,4,9
1,2,3,4,5,7,8
1,2,3,4,5,7,4,9
1,2,3,4,5,7,4,5,6,7,8
1,2,3,4,5,7,4,5,7,8
Thus far, I have the following algorithm (in python) that works in some simple cases:
def extractPaths(G, s=None, explored=None, path=None):
_V,E = G
if s is None: s = 0
if explored is None: explored = set()
if path is None: path = [s]
explored.add(s)
if not len(set(E[s]) - explored):
print path
for v in set(E[s]) - explored:
if len(E[v]) > 1:
path.append(v)
for vv in set(E[v]) - explored:
extractPaths(G, vv, explored-set(n for n in path if len(E[n])>1), path+[vv])
else:
extractPaths(G, v, explored, path+[v])
but it fails horribly in the more complex cases.
I'd appreciate any help as this is a tool to validate an algorithm that I have developed for my Master's thesis.
Thank you in advance
I've though about this for a couple of hours, and have come up with this algorithm. It doesn't quite give the result you're asking for, but it's similar (and might be equivalent).
Observation: If we try to go to a node that has been seen before, the most recent visit, up until the current node, can be considered a loop. If we have seen that loop, we cannot go to that node.
def extractPaths(current_node,path,loops_seen):
path.append(current_node)
# if node has outgoing edges
if nodes[current_node]!=None:
for thatnode in nodes[current_node]:
valid=True
# if the node we are going to has been
# visited before, we are completeing
# a loop.
if thatnode-1 in path:
i=len(path)-1
# find the last time we visited
# that node
while path[i]!=thatnode-1:
i-=1
# the last time, to this time is
# a single loop.
new_loop=path[i:len(path)]
# if we haven't seen this loop go to
# the node and node we have seen this
# loop. else don't go to the node.
if new_loop in loops_seen:
valid=False
else:
loops_seen.append(new_loop)
if valid:
extractPaths(thatnode-1,path,loops_seen)
# this is the end of the path
else:
newpath=list()
# increment all the values for printing
for i in path:
newpath.append(i+1)
found_paths.append(newpath)
# backtrack
path.pop()
# graph defined by lists of outgoing edges
nodes=[[2],[3],[4],[5,9],[6,7],[7],[4,8],None,None]
found_paths=list()
extractPaths(0,list(),list())
for i in found_paths:
print(i)

Categories

Resources