Smallest total weight in weighted directed graph - python

There is question from one Quiz that I don't fully understand:
Suppose you have a weighted directed graph and want to find a path between nodes A and B with the smallest total weight. Select the most accurate statement:
If some edges have negative weights, depth-first search finds a correct solution.
If all edges have weight 2, depth-first search guarantees that the first path found to be is the shortest path.
If some edges have negative weights, breadth-first search finds a correct solution.
If all edges have weight 2, breadth-first search guarantees that the first path found to be is the shortest path.
Am I right that #1 is correct?

4 is the correct one!, because all edges have the same weight, so you need to find the node traversing the minimum quantity of edges.
1 Is wrong because depth-first search doesn't consider edge weights, so any node could be reached first

Related

How to get sorted list of nodes for a list of nodes in a NetworkX graph as approximation of TSP

I have an undirected NetworkX graph (not necessarily complete, but connected). Also, I have a list of few nodes from all nodes of the above graph. I want to get the travel distance as per TSP result(or it's approximation).
I tried using networkx.approximation.traveling_salesman_problem(DS.G, nodes=['A0_S0_R0', 'A0_S14_R4', 'A0_S4_R4', 'A0_S14_R4', 'A0_S14_R4', 'A0_S7_R4']), but output list is a list of nodes to be used to cover given nodes in the order they appear in the given input list.
I want either total minimised distance or list of nodes in the order such that travelling distance is minimised and not according to their appearance in the input list.
The tsp function will return the list of nodes corresponding to best found path.
You can get the minimum distance using
nodes = [...] # list of nodes returned by traveling_salesman_problem()
# assuming you used `weight` key for edge weights
distances = [G.get_edge_data(nodes[i],node[i+1])['weight'] for i in range(len(nodes)-1)]
min_distance = sum(distances)

Find clique with maximum node value in graph

I have a graph of nodes assigned with a certain value. I wish to find the clique in the graph with the maximum sum of nodes (note not necessarily the maximum clique)
One approach I thought of would be a greedy algorithm that:
Select the largest node from the graph
Select the largest next node such that it is connected to all the previously selected nodes if the sum of the nodes increases.
Repeat 2 until the sum does not increase any more
However, this approach does not lead to correctness as you could imagine a graph with a clique of 8 nodes all of value 1, and a single node of value 7. The correct answer here is 8, not 7. My actual problem has a complex graph but here are some examples of the desired result with the actual graph and the biggest sum, which I found manually:
Here is a simpler example with the solution:
What would be the best graph representation and an efficient and correct way to solve this problem in python on an arbitrary graph in a representation of your choice in python without libraries?

"Bidirectional Dijkstra" by NetworkX

I just read the NetworkX implementation of Dijkstra's algorithm for shortest paths using bidirectional search (at this). What is the termination point of this method?
I'm going to base this on networkx's implementation.
Bidirectional Dijkstra stops when it encounters the same node in both directions - but the path it returns at that point might not be through that node. It's doing additional calculations to track the best candidate for the shortest path.
I'm going to base my explanation on your comment (on this answer )
Consider this simple graph (with nodes A,B,C,D,E). The edges of this graph and their weights are: "A->B:1","A->C:6","A->D:4","A->E:10","D->C:3","C->E:1". when I use Dijkstra algorithm for this graph in both sides: in forward it finds B after A and then D, in backward it finds C after E and then D. in this point, both sets have same vertex and an intersection. Does this is the termination point or It must be continued? because this answer (A->D->C->E) is incorrect.
For reference, here's the graph:
When I run networkx's bidirectional dijkstra on the (undirected) network in the counterexample you claimed that comment: "A->B:1","A->C:6","A->D:4","A->E:10","D->C:3","C->E:1": it gives me: (7, ['A', 'C', 'E']), not A-D-C-E.
The problem is in a misunderstanding of what it's doing before it stops. It does exactly what you're expecting in terms of finding nodes, but while it's doing that there is additional processing happening to find the shortest path. By the time it reaches D from both directions, it has already collected some other "candidate" paths that may be shorter. There is no guarantee that just because the node D is reached from both directions that ends up being part of the shortest path. Rather, at the point that a node has been reached from both directions, the current candidate shortest path is shorter than any candidate paths it would find if it continued running.
The algorithm starts with two empty clusters, each associated with A or E
{} {}
and it will build up "clusters" around each. It first puts A into the cluster associated with A
{A:0} {}
Now it checks if A is already in the cluster around E (which is currently empty). It is not. Next, it looks at each neighbor of A and checks if they are in the cluster around E. They are not. It then places all of those neighbours into a heap (like an ordered list) of upcoming neighbors of A ordered by pathlength from A. Call this the 'fringe' of A
clusters ..... fringes
{A:0} {} ..... A:[(B,1), (D,4), (C,6), (E,10)]
E:[]
Now it checks E. For E it does the symmetric thing. Place E into its cluster. Check that E is not in the cluster around A. Then check all of its neighbors to see if any are in the cluster around A(they are not). Then creates the fringe of E.
clusters fringes
{A:0} {E:0} ..... A:[(B,1), (D,4), (C,6), (E,10)]
E:[(C,1), (A,10)]
Now it goes back to A. It takes B from the list and adds it to the cluster around A. It checks if any neighbor of B is in the cluster around E (there are no neighbors to consider). So we have:
clusters fringes
{A:0, B:1} {E:0} ..... A:[(D,4), (C,6), (E,10)]
E:[(C,1), (A,10)]
Back to E: we add C tot he cluster of E and check whether any neighbor of C is in the cluster of A. What do you know, there's A. So we have a candidate shortest path A-C-E, with distance 7. We'll hold on to that. We add D to add to fringe of E (with distance 4, since it's 1+3). We have:
clusters fringes
{A:0, B:1} {E:0, C:1} ..... A:[(D,4), (C,6), (E,10)]
E:[(D,4), (A,10)]
candidate path: A-C-E, length 7
Back to A: We get the next thing from its fringe, D. We add it to the cluster about A, and note that its neighbor C is in the cluster about E. So we have a new candidate path, A-D-C-E, but it's length is greater than 7 so we discard it.
clusters fringes
{A:0, B:1, D:4} {E:0, C:1} ..... A:[(C,6), (E,10)]
E:[(D,4), (A,10)]
candidate path: A-C-E, length 7
Now we go back to E. We look at D. It's in the cluster around A. We can be sure that any future candidate path we would encounter will have length at least as large as the A-D-C-E path we have just traced out (this claim isn't necessarily obvious, but it is the key to this approach). So we can stop. We return the candidate path found earlier.

Efficiently compute the number of shortest path for a graph with 23000000 nodes using igraph

I am trying to compute the number of shortest path between 2 nodes which in the distance 2 of each other in a sparse graph which contains 23000000 vertices and around 9 X 23000000 edges. Right now I am using
for v,d,parent in graph.bfsiter(source.index, advanced=True):
if (0 < d < 3):
to loop through the nodes which are within distance 2 of the source node (I need the nodes which are in distance 1 but I don't need to compute all shortest path for them). And then I use:
len (graph.get_all_shortest_paths(source,v));
to get the number of all shortest paths from source to v (where v is the node that bfsiter gives me which has the shortest distance 2 from the source).
However this is taking too long. For example for the graph described above it takes around 1 second to compute the shortest distance for each (source,v).
I was wondering if someone could suggest a more efficient way to compute the number of all shortest paths using igraph
Here is an implementation of the answer suggested in the comments. The time consuming part of this code is the graph generation. To run on an already generated/in-memory graph takes very little time.
from igraph import *
import random
# Generate a graph
numnodes = int(1e6)
thegraph = Graph.GRG(numnodes, 0.003)
print("Graph Generated")
# Choose one node randomly and another a distance of 2 away
node1 = random.randint(0, numnodes-1)
node2 = random.sample(set(Graph.neighborhood(thegraph, node1, 2)).difference(
set(Graph.neighborhood(thegraph, node1, 1))),1)[0]
# Find the number of nodes in the intersection of the neighborhood
# of order 1.
result = len(set(Graph.neighbors(thegraph, node1)).intersection(
Graph.neighbors(thegraph, node2)))
print(result)
The intersection of the two neighborhoods is the number of unique paths. A path of length 2 visits 3 nodes. Since we know the start and end point, the only one which may vary is the middle. Since the middle node must be a distance 1 from both of the endpoints, the number of unique middle points is the number of paths of length 2 between the nodes.

Why doesn't the linear shortest path algorithm work for non-directed cyclic graphs?

I have the basic linear shortest path algorithm implemented in Python. According to various sites I've come across, this only works for directed acyclic graphs, including this, this, and this. However, I don't see why this is the case.
I've even tested the algorithm against graphs with cycles and un-directed edges, and it worked fine.
So the question is, why doesn't the linear shortest path algorithm work for non-directed cyclic graphs? Side question, what is the name of this algorithm?
For reference, here is the code I wrote for the algorithm:
def shortestPath(start, end, graph):
# First, topologically sort the graph, to determine which order to traverse it in
sorted = toplogicalSort(start, graph)
# Get ready to store the current weight of each node's path, and their predecessor
weights = [0] + [float('inf')] * (len(graph) - 1)
predecessor = [0] * len(graph)
# Next, relaxes all edges in the order of sorted nodes
for node in sorted:
for neighbour in graph[node]:
# Checks if it would be cheaper to take this path, as opposed to the last path
if weights[neighbour[0]] > weights[node] + neighbour[1]:
# If it is, then adjust the weight and predecessor
weights[neighbour[0]] = weights[node] + neighbour[1]
predecessor[neighbour[0]] = node
# Returns the shortest path to the end
path = [end]
while path[len(path) - 1] != start:
path.append(predecessor[path[len(path) - 1]])
return path[::-1]
Edit: As asked by Beta, here is the topological sort:
# Toplogically sorts the graph given, starting from the start point given.
def toplogicalSort(start, graph):
# Runs a DFS on all nodes connected to the starting node in the graph
def DFS(start):
for node in graph[start]:
if not node[0] in checked:
checked[node[0]] = True
DFS(node[0])
finish.append(start)
# Stores the finish point of all nodes in the graph, and a boolean stating if they have been checked
finish, checked = [], {}
DFS(start)
# Reverses the order of the sort, to get a proper topology; then returns
return finish[::-1]
Because you cannot topologically sort a graph with cycles (therefore undirected graphs are also out of the question as you can't tell which node should come before another).
Edit: After reading the comments, I think that's actually what #Beta meant.
When there is cycle, topological sort cannot guarantee the correct ordering of the shortest path.
For example, we have a graph:
A->C, A->B, B->C, C->B, B->D
Say the correct shortest path is:
A->C->B->D
But topological sort can generate an order:
A->B->C->D
Although it will update B to the correct order when visiting C, but B won't be visited again, thus not able to propagate correct weight to D. (Path happens to be correct though.)

Categories

Resources