I have my graph object, im trying to find a method to find a minimum for a group of nodes.
Ex. Nodes:
input_nodes=[123,45]
graph_nodes=[10, 76,123,45,98,456]
I run an algoritm which calculate shortest path between every node in the graph and every node in the input.
I have a dictionary with all shortest paths beetwen nodes :
{10:{123:0.56, 45:0.2}, 76:{123:0, 45:0.23}......
and so on for every graphs node.
How to get only min weight which is different from zero:
Like this:
Minimum path node 10 has with node 45,
Minimum path node 76 has with node 45,
......
Thatnks
I assume there might be ties in the weights, i.e. one node can be equally "close" to more than one other node. The following code will include them all in a dict. The structure of the result is essentially the same as the input d, but only the minimum weight items are retained.
Update: if a node has no neighbors or only neighbors with weight 0, this node will map to a empty dict in the result.
d = {10:{123:0.56, 45:0.2}, 76:{123:0, 45:0.23}, 19:{17:0}, 20:{}}
def closest(ns):
m = min((v for v in ns.values() if v != 0), default=-1)
return {k: v for k, v in ns.items() if v == m}
print({k: closest(v) for k, v in d.items()})
Related
I have implemented Dijkstra's algorithm but I have a problem. It always prints the same minimum path while there may be other paths with the same weight.
How could I change my algorithm so that it randomly selects the neighbors with the same weight?
My algorithm is below:
def dijkstra_algorithm(graph, start_node):
unvisited_nodes = list(graph.get_nodes())
# We'll use this dict to save the cost of visiting each node and update it as we move along the graph
shortest_path = {}
# We'll use this dict to save the shortest known path to a node found so far
previous_nodes = {}
# We'll use max_value to initialize the "infinity" value of the unvisited nodes
max_value = sys.maxsize
for node in unvisited_nodes:
shortest_path[node] = max_value
# However, we initialize the starting node's value with 0
shortest_path[start_node] = 0
# The algorithm executes until we visit all nodes
while unvisited_nodes:
# The code block below finds the node with the lowest score
current_min_node = None
for node in unvisited_nodes: # Iterate over the nodes
if current_min_node == None:
current_min_node = node
elif shortest_path[node] < shortest_path[current_min_node]:
current_min_node = node
# The code block below retrieves the current node's neighbors and updates their distances
neighbors = graph.get_outgoing_edges(current_min_node)
for neighbor in neighbors:
tentative_value = shortest_path[current_min_node] + graph.value(current_min_node, neighbor)
if tentative_value < shortest_path[neighbor]:
shortest_path[neighbor] = tentative_value
# We also update the best path to the current node
previous_nodes[neighbor] = current_min_node
# After visiting its neighbors, we mark the node as "visited"
unvisited_nodes.remove(current_min_node)
return previous_nodes, shortest_path
# The code block below finds all the min nodes
# and randomly chooses one for traversal
min_nodes = []
for node in unvisited_nodes: # Iterate over the nodes
if len(min_nodes) == 0:
min_nodes.append(node)
elif shortest_path[node] < shortest_path[min_nodes[0]]:
min_nodes = [node]
else:
# this is the case where 2 nodes have the same cost
# we are going to take all of them
# and at the end choose one randomly
min_nodes.append(node)
current_min_node = random.choice(min_nodes)
What the code does is as follows:
Instead of taking the first smallest element, it creates a list of all the smallest elements.
At the end it choose one of the smallest elements randomly.
This will both guarantee the Dijkstra invariant and choose a random path among the cheapest.
probably just try something like this
random.shuffle(neighbors)
for neighbor in neighbors:
...
which should visit the neighbors randomly (this assumes neighbors is a list or tuple... if its a generator call list on it first...
I'm having trouble understanding what this piece of code does. Please could someone step by step go through the code and explain how it works and what it's doing?
def scale_free(n,m):
if m < 1 or m >=n:
raise nx.NetworkXError("Preferential attactment algorithm must have m >= 1"
" and m < n, m = %d, n = %d" % (m, n))
# Add m initial nodes (m0 in barabasi-speak)
G=nx.empty_graph(m)
# Target nodes for new edges
targets=list(range(m))
# List of existing nodes, with nodes repeated once for each adjacent edge
repeated_nodes=[]
# Start adding the other n-m nodes. The first node is m.
source=m
while source<n:
# Add edges to m nodes from the source.
G.add_edges_from(zip([source]*m,targets))
# Add one node to the list for each new edge just created.
repeated_nodes.extend(targets)
# And the new node "source" has m edges to add to the list.
repeated_nodes.extend([source]*m)
# Now choose m unique nodes from the existing nodes
# Pick uniformly from repeated_nodes (preferential attachement)
targets = _random_subset(repeated_nodes,m)
source += 1
return G
So the first part of this makes sure that m is at least 1 and n>m.
def scale_free(n,m):
if m < 1 or m >=n:
raise nx.NetworkXError("Preferential attactment algorithm must have m >= 1"
" and m < n, m = %d, n = %d" % (m, n))
Then it creates a graph with no edges and the first m nodes 0, 1, ..., m-1.
This looks a bit different from the standard barabasi-albert graph which starts from a connected version, rather than a version without any edges.
# Add m initial nodes (m0 in barabasi-speak)
G=nx.empty_graph(m)
Now it's going to start adding new nodes 1 at a time and connecting them to existing nodes based on various rules. It first creates a set of "targets" that has all of the nodes in the edge-less graph.
# Target nodes for new edges
targets=list(range(m))
# List of existing nodes, with nodes repeated once for each adjacent edge
repeated_nodes=[]
# Start adding the other n-m nodes. The first node is m.
source=m
Now it's going to add each node 1 at a time. When it does that, it will add the new node with edges to m of the previous existing nodes. Those m previous nodes have been stored in a list called targets.
while source<n:
Here it creates those edges
# Add edges to m nodes from the source.
G.add_edges_from(zip([source]*m,targets))
Now it's going to decide who will get those edges when the next node is added. It's supposed to choose them with probability proportional to their degree The way it does that is by having a list repeated_nodes which has each node appearing once per edge. It then chooses a random set of m nodes from that to be the new targets. Depending on how _random_subset is defined, it might or might not be able to choose the same node several times to be a target in the same step.
# Add one node to the list for each new edge just created.
repeated_nodes.extend(targets)
# And the new node "source" has m edges to add to the list.
repeated_nodes.extend([source]*m)
# Now choose m unique nodes from the existing nodes
# Pick uniformly from repeated_nodes (preferential attachement)
targets = _random_subset(repeated_nodes,m)
source += 1
return G
I am trying to implement Prim's algorithm for a graph consisting of cities as vertices, but I am stuck. Any advice would be greatly appreciated.
I am reading in the data from a txt file and trying to get output of the score (total distance) and a list of the edges in tuples. For example, by starting with Houston, the first edge would be ('HOUSTON', 'SAN ANTONIO').
I implemented the graph/tree using a dictionary with adjacent vertices and their distance, like so:
{'HOUSTON': [('LUBBOCK', '535'), ('MIDLAND/ODESSA', '494'), ('MISSION/MCALLEN/EDINBURG', '346'), ('SAN ANTONIO', '197')],
'HARLINGEN/SAN BENITO': [('HOUSTON', '329')],
'SAN ANTONIO': [],
'WACO': [],
'ABILENE': [('DALLAS/FORT WORTH', '181')],
'LUBBOCK': [('MIDLAND/ODESSA', '137')],
'COLLEGE STATION/BRYAN': [('DALLAS/FORT WORTH', '181'), ('HOUSTON', '97')],
'MISSION/MCALLEN/EDINBURG': [],
'AMARILLO': [('DALLAS/FORT WORTH', '361'), ('HOUSTON', '596')],
'EL PASO': [('HOUSTON', '730'), ('SAN ANTONIO', '548')],
'DALLAS/FORT WORTH': [('EL PASO', '617'), ('HOUSTON', '238'), ('KILLEEN', '154'), ('LAREDO', '429'), ('LONGVIEW', '128'), ('LUBBOCK', '322'), ('MIDLAND/ODESSA', '347'), ('MISSION/MCALLEN/EDINBURG', '506'), ('SAN ANGELO', '252'), ('SAN ANTONIO', '271'), ('WACO', '91'), ('WICHITA FALLS', '141')],
'KILLEEN': [],
'SAN ANGELO': [],
'MIDLAND/ODESSA': [],
'WICHITA FALLS': [],
'CORPUS CHRISTI': [('DALLAS/FORT WORTH', '377'), ('HOUSTON', '207')],
'AUSTIN': [('DALLAS/FORT WORTH', '192'), ('EL PASO', '573'), ('HOUSTON', '162')],
'LONGVIEW': [],
'BROWNSVILLE': [('DALLAS/FORT WORTH', '550'), ('HOUSTON', '355')],
'LAREDO': [('SAN ANTONIO', '157')]}
here is what i have so far:
import csv
import operator
def prim(file_path):
with open(file_path) as csv_file:
csv_reader = csv.reader(csv_file, delimiter = "\t")
dict = {}
for row in csv_reader:
if row[0] == 'City':
continue
if row[0] in dict:
dict[row[0]].append((row[1],row[3]))
if row[1] not in dict:
dict[row[1]] = []
else:
dict[row[0]] = [(row[1], row[3])]
V = dict.keys()
A = ['HOUSTON']
score = 0 # score result
E = [] # tuple result
while A != V:
for x in A:
dict[x].sort(key=lambda x: x[1])
for y in dict[x]:
if y[0] in V and y[0] not in A:
A.append(y[0])
E.append((x, y[0]))
score += int(y[1])
break
break
break
print("Edges:")
print(E)
print("Score:")
print(score)
prim("Texas.txt")
This gives the correct first edge because of that last break statement, but when I remove the break statement, it infinitely loops and I can't exactly figure out why or how to fix it. I realize I may be implementing this algorithm totally wrong and inefficiently, so I would really appreciate any tips or advice on where to go from here/what to do differently and why. Thank you in advance!!
There are three main problems with your implementation:
Your graph is stored as a directed graph. Prim's algorithm cannot be applied to directed graphs (reference). If you really want to process a directed graph, the comments in the linked question provide information for how to compute the MST-equivalent of a directed graph (reference).
This is also related to why you algorithm gets stuck: It explores all vertices that are reachable via a directed edge from the starting vertex. There are more vertices in the graph, but they cannot be reached from the starting vertex when traversing the edges in the fixed direction. Therefore, the algorithm cannot grow the intermediate tree to include further vertices and gets stuck.
However, I assume you actually want the graph to be undirected. In that case, you could for example compute the symmetric closure of your graph, by duplicating every edge in the reverse direction. Or alternatively, you could adapt your algorithm to check both outgoing and incoming edges.
In Prim's algorithm, every iteration adds an edge to the intermediate tree. This is done by selecting the minimum-weight edge that connects the intermediate tree with a vertex from the remainder of the graph. However, in your code, you instead sort the outgoing edges of each vertex in the intermediate tree by their weight and add all edges pointing to a vertex that is not yet included in the intermediate tree. This will give you incorrect results. Consider the following example:
Assume we start at a. Your approach sorts the edges of a by their weight and adds them in that order. It therefore adds the edges a-b and a-c to the intermediate tree. All vertices of the graph are now contained in the intermediate tree, so the algorithm terminates. However, the tree b-a-c is not a minimum spanning tree.
So instead, from a you should select the minimum edge, which is a-b. You should then search for the minimum-weight edge from any of the vertices in the intermediate tree. This would be the edge b-c. You would then add this edge to the intermediate tree, finally resulting in the MST a-b-c.
The loop condition A != V will never be satisfied, even if A contains every vertex of the graph. This is because A is a list and V is the result of dict.keys(), which is not a list, but a view object. The view object is iterable, and keys will be given in the same order as they were inserted, but it will never evaluate as equal to a list, even if the list contains the same items.
Even if you would turn V into a list (e.g. with V = list(dict.keys()), this would not be enough, since lists are only equal if they contain the same items in the same order. However, you have no guarantee that the algorithm will insert the vertices into A in the same order as the keys were inserted into the dictionary.
Therefore, you should use some other approach for the loop condition. One possible solution would be to initialize V and A as sets, i.e. with V = set(dict.keys()) and A = set(['HOUSTON']). This would allow you to keep A != V as a loop condition. Or you could keep A a list, but only check if the size of A is equal to the size of V in each iteration. Since the algorithm only inserts into A if the vertex is not already contained in A, when len(A) == len(dict), it follows that A contains all vertices.
Here is an example for the fixed code. It first takes the symmetric closure of the input graph. As the loop condition, it checks if the size of A is unequal to the total number of vertices. In the loop, it computes the minimum-weight edge that connects a vertex in A with a vertex not in A:
# Compute symmetric closure
for v, edges in dict.items():
for neighbor, weight in edges:
edges_neighbor = dict[neighbor]
if (v, weight) not in edges_neighbor:
dict[neighbor].append((v, weight))
A = ['HOUSTON']
score = 0
E = []
while len(A) != len(dict):
# Prepend vertex to each (target,weight) pair in dict[x]
prepend_source = lambda x: map(lambda y: (x, *y), dict[x])
edges = itertools.chain.from_iterable(map(prepend_source, A))
# Keep only edges where the target is not in A
edges_filtered = filter(lambda edge: edge[1] not in A, edges)
# Select the minimum edge
edge_min = min(edges_filtered, key=lambda x: int(x[2]))
A.append(edge_min[1])
E.append((edge_min[0], edge_min[1]))
score += int(edge_min[2])
Note this implementation still assumes that the graph is connected. If you would want to handle disconnected graphs, you would have to apply this procedure for each connected component of the graph (resulting in a MST forest).
In a 3x3 network I want to be able to determine all the shortest paths between any two nodes. Then, for each node in the network, I want to compute how many shortest paths pass through one specific node.
This requires using the nx.all_shortest_paths(G,source,target) function, which returns a generator. This is at variance from using the nx.all_pairs_shortest_path(G), as suggested here. The difference is that in the former case the function computes all the shortest paths between any two nodes, while in the latter case it computes only one shortest path between the same pair of nodes.
Given that I need to consider all shortest paths, I have come up with the following script. This is how I generate the network I am working with:
import networkx as nx
N=3
G=nx.grid_2d_graph(N,N)
pos = dict( (n, n) for n in G.nodes() )
labels = dict( ((i, j), i + (N-1-j) * N ) for i, j in G.nodes() )
nx.relabel_nodes(G,labels,False)
inds=labels.keys()
vals=labels.values()
inds.sort()
vals.sort()
pos2=dict(zip(vals,inds))
nx.draw_networkx(G, pos=pos2, with_labels=False, node_size = 15)
And this is how I print all the shortest paths between any two nodes:
for n in G.nodes():
for j in G.nodes():
if (n!=j): #Self-loops are excluded
gener=nx.all_shortest_paths(G,source=n,target=j)
print('From node '+str(n)+' to '+str(j))
for p in gener:
print(p)
print('------')
The result is a path from node x to node y which only includes the nodes along the way. An excerpt of what I get is:
From node 0 to 2 #Only one path exists
[0, 1, 2] #Node 1 is passed through while going from node 0 to node 2
------
From node 0 to 4 #Two paths exist
[0, 1, 4] #Node 1 is passed through while going from node 0 to node 4
[0, 3, 4] #Node 3 is passed through while going from node 0 to node 4
------
...continues until all pairs of nodes are covered...
My question: how could I amend the last code block to make sure that I know how many shortest paths, in total, pass through each node? According to the excerpt outcome I've provided, node 1 is passed through 2 times, while node 3 is passed through 1 time (starting and ending node are excluded). This calculation needs to be carried out to the end to figure out the final number of paths through each node.
I would suggest making a dict mapping each node to 0
counts = {}
for n in G.nodes(): counts[n] = 0
and then for each path you find -- you're already finding and printing them all -- iterate through the vertices on the path incrementing the appropriate values in your dict:
# ...
for p in gener:
print(p)
for v in p: counts[v] += 1
What you seek to compute is the unnormalized betweenness centrality.
From Wikipedia:
The betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node.
More generally, I suggest you have a look at all the standard measures of centrality already in Networkx.
I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?
Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.
Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.