algorithm - Path finding through a specific vertex - python

I am looking for a way to find a loopless path (preferably shortest path, but not necessarily though) from a source vertex (S) to a destination vertex (D) that passes through another specific vertex (X) somewhere in the graph.
Now, before you point me to
Finding shortest path between pass through a specific vertex I want to say that this solution ignores the case when the shortest path from S to X already includes D, which is a possible scenario in where I am applying this algorithm. How would you solve this problem in that case?
What I tried was a naive attempt to look for such paths in the results of Yen's K shortest paths algorithm. But I am hoping there is a more efficient and certain way to do that.
Again, just to point it out, I am not necessarily looking for the shortest path from S to D through X, but just any loopless path, although shortest path would do better.

The basic concept is quite simple; then you get to adapt for cases that loop into and out of X on their shortest remaining paths.
Remove D from the graph.
Find P1, the shortest path from S to X.
Restore D to the graph.
Remove all nodes in P1.
Find P2, the shortest path from X to D.
return P1 + P2.
That's the gist of the solution.
NOTE: you may find that removing P1 yields a sub-graph with no remaining path to D. In this case, you will want a dynamic programming solution that searches through the idea above, but with backtracking and another method to search for P1 candidates.
When you first find P1, check that the node you're about to use will not isolate X from D on the second leg of the trip. This will give you a much faster search algorithm.
Is that enough of a start?
The need to adapt comes from a case such as this --
consider the graph
src dst
S 1, 2
1 X, D
2 D
X 1
Your partial paths are
S -> 1 -> X
S -> 2 -> 3 -> X
X -> 1 -> D
and, incidentally,
S -> 1 -> D
When you run your shortest-path searches, you get the path S 1 X 1 D, rejected because of the loop. When you implement my first modification -- remove node 1 when trying to find a path X to D, there is no remaining path.
The algorithm needs the ability to back up, rejecting the path X 1 D to find X 2 3 D. That's the coding that isn't immediately obvious from the description.
Here's a mental exercise for you: is it possible to construct a graph in which each shortest path (S to X and X to D) isolates the other terminal node from X? In my example above, you can simply switch the process: when the S to X path isolates D, then start over: first find X to D, remove node 1, and then find S to X in the remaining graph.
Can you find a graph where this switch doesn't work, either?
If not, you have an immediate solution. If so, you have a more complex case to handle.

Related

"Bidirectional Dijkstra" by NetworkX

I just read the NetworkX implementation of Dijkstra's algorithm for shortest paths using bidirectional search (at this). What is the termination point of this method?
I'm going to base this on networkx's implementation.
Bidirectional Dijkstra stops when it encounters the same node in both directions - but the path it returns at that point might not be through that node. It's doing additional calculations to track the best candidate for the shortest path.
I'm going to base my explanation on your comment (on this answer )
Consider this simple graph (with nodes A,B,C,D,E). The edges of this graph and their weights are: "A->B:1","A->C:6","A->D:4","A->E:10","D->C:3","C->E:1". when I use Dijkstra algorithm for this graph in both sides: in forward it finds B after A and then D, in backward it finds C after E and then D. in this point, both sets have same vertex and an intersection. Does this is the termination point or It must be continued? because this answer (A->D->C->E) is incorrect.
For reference, here's the graph:
When I run networkx's bidirectional dijkstra on the (undirected) network in the counterexample you claimed that comment: "A->B:1","A->C:6","A->D:4","A->E:10","D->C:3","C->E:1": it gives me: (7, ['A', 'C', 'E']), not A-D-C-E.
The problem is in a misunderstanding of what it's doing before it stops. It does exactly what you're expecting in terms of finding nodes, but while it's doing that there is additional processing happening to find the shortest path. By the time it reaches D from both directions, it has already collected some other "candidate" paths that may be shorter. There is no guarantee that just because the node D is reached from both directions that ends up being part of the shortest path. Rather, at the point that a node has been reached from both directions, the current candidate shortest path is shorter than any candidate paths it would find if it continued running.
The algorithm starts with two empty clusters, each associated with A or E
{} {}
and it will build up "clusters" around each. It first puts A into the cluster associated with A
{A:0} {}
Now it checks if A is already in the cluster around E (which is currently empty). It is not. Next, it looks at each neighbor of A and checks if they are in the cluster around E. They are not. It then places all of those neighbours into a heap (like an ordered list) of upcoming neighbors of A ordered by pathlength from A. Call this the 'fringe' of A
clusters ..... fringes
{A:0} {} ..... A:[(B,1), (D,4), (C,6), (E,10)]
E:[]
Now it checks E. For E it does the symmetric thing. Place E into its cluster. Check that E is not in the cluster around A. Then check all of its neighbors to see if any are in the cluster around A(they are not). Then creates the fringe of E.
clusters fringes
{A:0} {E:0} ..... A:[(B,1), (D,4), (C,6), (E,10)]
E:[(C,1), (A,10)]
Now it goes back to A. It takes B from the list and adds it to the cluster around A. It checks if any neighbor of B is in the cluster around E (there are no neighbors to consider). So we have:
clusters fringes
{A:0, B:1} {E:0} ..... A:[(D,4), (C,6), (E,10)]
E:[(C,1), (A,10)]
Back to E: we add C tot he cluster of E and check whether any neighbor of C is in the cluster of A. What do you know, there's A. So we have a candidate shortest path A-C-E, with distance 7. We'll hold on to that. We add D to add to fringe of E (with distance 4, since it's 1+3). We have:
clusters fringes
{A:0, B:1} {E:0, C:1} ..... A:[(D,4), (C,6), (E,10)]
E:[(D,4), (A,10)]
candidate path: A-C-E, length 7
Back to A: We get the next thing from its fringe, D. We add it to the cluster about A, and note that its neighbor C is in the cluster about E. So we have a new candidate path, A-D-C-E, but it's length is greater than 7 so we discard it.
clusters fringes
{A:0, B:1, D:4} {E:0, C:1} ..... A:[(C,6), (E,10)]
E:[(D,4), (A,10)]
candidate path: A-C-E, length 7
Now we go back to E. We look at D. It's in the cluster around A. We can be sure that any future candidate path we would encounter will have length at least as large as the A-D-C-E path we have just traced out (this claim isn't necessarily obvious, but it is the key to this approach). So we can stop. We return the candidate path found earlier.

Prune nodes not in networkx simple path?

I have a DiGrraph and I want to prune any node that's not contained in one of the simple paths between two of the nodes that I specify. (Another way to think of it is any node that can't reach both the start and end points should be trimmed).
The best way I've found to do this is to get all_simple_paths, then to rebuild a new graph using those, but I'm hoping for a more elegant and less error prone solution. For example, is there a way to determine what's NOT on a simple path, and to then delete those nodes?
You can use the method all_simple_paths which returns a generator but you only need the first path. Then you can use the G.subgraph(nbunch) to return the induced graph from your path.
EDIT: to return the subgraphs induced by all simple paths just concatenate the uniques nodes returned by all_simple_paths.
import networkx as nx
import itertools
G = nx.complete_graph(10) # or DiGraph, MultiGraph, MultiDiGraph, etc
# Concatenate all the paths and keep unique nodes (in one line)
all_path_nodes = set(itertools.chain(*list(nx.all_simple_paths(G, source=0, target=3))))
# Extract the induced subgraph from a given list of nodes
H = G.subgraph(all_path_nodes)
print(nx.info(H))
Output:
Name: complete_graph(10)
Type: Graph
Number of nodes: 10
Number of edges: 45
Average degree: 9.0000
I did make some progress on this while #kikohs was working to understand my question and provide his answer, so I'm posting this as an alternative solution to the problem. I do think his answer is superior though!
def _trim_branches(self, g, start, end):
"""Find all the paths from start to finish, and nuke any nodes that
aren't in those paths.
"""
good_nodes = set()
for path in networkx.all_simple_paths(
g,
source=start,
target=end):
[good_nodes.add(n) for n in path]
for node in g.nodes:
if node not in good_nodes:
g.remove_node(node)
return g
Using subgraph to do the second loop is clearly better, as is his one-liner using itertools.chain. Great stuff around these parts today!

Efficiently compute the number of shortest path for a graph with 23000000 nodes using igraph

I am trying to compute the number of shortest path between 2 nodes which in the distance 2 of each other in a sparse graph which contains 23000000 vertices and around 9 X 23000000 edges. Right now I am using
for v,d,parent in graph.bfsiter(source.index, advanced=True):
if (0 < d < 3):
to loop through the nodes which are within distance 2 of the source node (I need the nodes which are in distance 1 but I don't need to compute all shortest path for them). And then I use:
len (graph.get_all_shortest_paths(source,v));
to get the number of all shortest paths from source to v (where v is the node that bfsiter gives me which has the shortest distance 2 from the source).
However this is taking too long. For example for the graph described above it takes around 1 second to compute the shortest distance for each (source,v).
I was wondering if someone could suggest a more efficient way to compute the number of all shortest paths using igraph
Here is an implementation of the answer suggested in the comments. The time consuming part of this code is the graph generation. To run on an already generated/in-memory graph takes very little time.
from igraph import *
import random
# Generate a graph
numnodes = int(1e6)
thegraph = Graph.GRG(numnodes, 0.003)
print("Graph Generated")
# Choose one node randomly and another a distance of 2 away
node1 = random.randint(0, numnodes-1)
node2 = random.sample(set(Graph.neighborhood(thegraph, node1, 2)).difference(
set(Graph.neighborhood(thegraph, node1, 1))),1)[0]
# Find the number of nodes in the intersection of the neighborhood
# of order 1.
result = len(set(Graph.neighbors(thegraph, node1)).intersection(
Graph.neighbors(thegraph, node2)))
print(result)
The intersection of the two neighborhoods is the number of unique paths. A path of length 2 visits 3 nodes. Since we know the start and end point, the only one which may vary is the middle. Since the middle node must be a distance 1 from both of the endpoints, the number of unique middle points is the number of paths of length 2 between the nodes.

Why doesn't the linear shortest path algorithm work for non-directed cyclic graphs?

I have the basic linear shortest path algorithm implemented in Python. According to various sites I've come across, this only works for directed acyclic graphs, including this, this, and this. However, I don't see why this is the case.
I've even tested the algorithm against graphs with cycles and un-directed edges, and it worked fine.
So the question is, why doesn't the linear shortest path algorithm work for non-directed cyclic graphs? Side question, what is the name of this algorithm?
For reference, here is the code I wrote for the algorithm:
def shortestPath(start, end, graph):
# First, topologically sort the graph, to determine which order to traverse it in
sorted = toplogicalSort(start, graph)
# Get ready to store the current weight of each node's path, and their predecessor
weights = [0] + [float('inf')] * (len(graph) - 1)
predecessor = [0] * len(graph)
# Next, relaxes all edges in the order of sorted nodes
for node in sorted:
for neighbour in graph[node]:
# Checks if it would be cheaper to take this path, as opposed to the last path
if weights[neighbour[0]] > weights[node] + neighbour[1]:
# If it is, then adjust the weight and predecessor
weights[neighbour[0]] = weights[node] + neighbour[1]
predecessor[neighbour[0]] = node
# Returns the shortest path to the end
path = [end]
while path[len(path) - 1] != start:
path.append(predecessor[path[len(path) - 1]])
return path[::-1]
Edit: As asked by Beta, here is the topological sort:
# Toplogically sorts the graph given, starting from the start point given.
def toplogicalSort(start, graph):
# Runs a DFS on all nodes connected to the starting node in the graph
def DFS(start):
for node in graph[start]:
if not node[0] in checked:
checked[node[0]] = True
DFS(node[0])
finish.append(start)
# Stores the finish point of all nodes in the graph, and a boolean stating if they have been checked
finish, checked = [], {}
DFS(start)
# Reverses the order of the sort, to get a proper topology; then returns
return finish[::-1]
Because you cannot topologically sort a graph with cycles (therefore undirected graphs are also out of the question as you can't tell which node should come before another).
Edit: After reading the comments, I think that's actually what #Beta meant.
When there is cycle, topological sort cannot guarantee the correct ordering of the shortest path.
For example, we have a graph:
A->C, A->B, B->C, C->B, B->D
Say the correct shortest path is:
A->C->B->D
But topological sort can generate an order:
A->B->C->D
Although it will update B to the correct order when visiting C, but B won't be visited again, thus not able to propagate correct weight to D. (Path happens to be correct though.)

How to restrict certain paths in NetworkX graphs?

I am trying to calculate shortest path between 2 points using Dijkstra and A Star algorithms (in a directed NetworkX graph).
At the moment it works fine and I can see the calculated path but I would like to find a way of restricting certain paths.
For example if we have following nodes:
nodes = [1,2,3,4]
With these edges:
edges = ( (1,2),(2,3),(3,4) )
Is there a way of blocking/restricting 1 -> 2 -> 3 but still allow 2 -> 3 & 1 -> 2.
This would mean that:
can travel from 1 to 2
can travel from 2 to 3
cannot travel from 1 to 3 .. directly or indirectly (i.e. restrict 1->2->3 path).
Can this be achieved in NetworkX.. if not is there another graph library in Python that would allow this ?
Thanks.
Interesting question, I never heard of this problem, probably because I don't have much background in this topic, nor much experience with NetworkX. However, I do have a idea for a algorithm. This may just be the most naive way to do this and I'd be glad to hear of a cleverer algorithm.
The idea is that you can use your restriction rules to transform you graph to a new graph where all edges are valid, using the following algorithm.
The restriction of path (1,2,3) can be split in two rules:
If you came over (1,2) then remove (2,3)
If you leave over (2,3) then remove (1,2)
To put this in the graph you can insert copies of node 2 for each case. I'll call the new nodes 1_2 and 2_3 after the valid edge in the respective case. For both nodes, you copy all incoming and outgoing edges minus the restricted edge.
For example:
Nodes = [1,2,3,4]
Edges = [(1,2),(2,3),(4,2)]
The valid path shall only be 4->2->3 not 1->2->3. So we expand the graph:
Nodes = [1,1_2,2_3,3,4] # insert the two states of 2
Edges = [ # first case: no (1_2,3) because of the restriction
(1,1_2), (4, 1_2)
# 2nd case, no (1,2_3)
(2_3,3), (4,2_3)]
The only valid path in this graph is 4->2_3->3. This simply maps to 4->2->3 in the original graph.
I hope this answer can at least help you if you find no existing solution. Longer restriction rules would blow up the graph with a exponentially growing number of state nodes, so either this algorithm is too simple, or the problem is hard ;-)
You could set your node data {color=['blue']} for node 1, node 2 has {color=['red','blue']} and node3 has {color=['red']}. Then use an networkx.algorithms. astar_path() approach setting the
heuristic is set to a function which returns a might_as_well_be_infinity when it encountered an node without the same color you are searching for
weight=less_than_infinity.

Categories

Resources