Scale-free network using preferential attachment algorithm

Scale-free network using preferential attachment algorithm - python

I'm having trouble understanding what this piece of code does. Please could someone step by step go through the code and explain how it works and what it's doing?
def scale_free(n,m):
if m < 1 or m >=n:
raise nx.NetworkXError("Preferential attactment algorithm must have m >= 1"
" and m < n, m = %d, n = %d" % (m, n))
# Add m initial nodes (m0 in barabasi-speak)
G=nx.empty_graph(m)
# Target nodes for new edges
targets=list(range(m))
# List of existing nodes, with nodes repeated once for each adjacent edge
repeated_nodes=[]
# Start adding the other n-m nodes. The first node is m.
source=m
while source<n:
# Add edges to m nodes from the source.
G.add_edges_from(zip([source]*m,targets))
# Add one node to the list for each new edge just created.
repeated_nodes.extend(targets)
# And the new node "source" has m edges to add to the list.
repeated_nodes.extend([source]*m)
# Now choose m unique nodes from the existing nodes
# Pick uniformly from repeated_nodes (preferential attachement)
targets = _random_subset(repeated_nodes,m)
source += 1
return G

So the first part of this makes sure that m is at least 1 and n>m.
def scale_free(n,m):
if m < 1 or m >=n:
raise nx.NetworkXError("Preferential attactment algorithm must have m >= 1"
" and m < n, m = %d, n = %d" % (m, n))
Then it creates a graph with no edges and the first m nodes 0, 1, ..., m-1.
This looks a bit different from the standard barabasi-albert graph which starts from a connected version, rather than a version without any edges.
# Add m initial nodes (m0 in barabasi-speak)
G=nx.empty_graph(m)
Now it's going to start adding new nodes 1 at a time and connecting them to existing nodes based on various rules. It first creates a set of "targets" that has all of the nodes in the edge-less graph.
# Target nodes for new edges
targets=list(range(m))
# List of existing nodes, with nodes repeated once for each adjacent edge
repeated_nodes=[]
# Start adding the other n-m nodes. The first node is m.
source=m
Now it's going to add each node 1 at a time. When it does that, it will add the new node with edges to m of the previous existing nodes. Those m previous nodes have been stored in a list called targets.
while source<n:
Here it creates those edges
# Add edges to m nodes from the source.
G.add_edges_from(zip([source]*m,targets))
Now it's going to decide who will get those edges when the next node is added. It's supposed to choose them with probability proportional to their degree The way it does that is by having a list repeated_nodes which has each node appearing once per edge. It then chooses a random set of m nodes from that to be the new targets. Depending on how _random_subset is defined, it might or might not be able to choose the same node several times to be a target in the same step.
# Add one node to the list for each new edge just created.
repeated_nodes.extend(targets)
# And the new node "source" has m edges to add to the list.
repeated_nodes.extend([source]*m)
# Now choose m unique nodes from the existing nodes
# Pick uniformly from repeated_nodes (preferential attachement)
targets = _random_subset(repeated_nodes,m)
source += 1
return G

Related

Dijkstra algorithm to select randomly an adjacent node with same minimum weight

I have implemented Dijkstra's algorithm but I have a problem. It always prints the same minimum path while there may be other paths with the same weight.
How could I change my algorithm so that it randomly selects the neighbors with the same weight?
My algorithm is below:
def dijkstra_algorithm(graph, start_node):
unvisited_nodes = list(graph.get_nodes())
# We'll use this dict to save the cost of visiting each node and update it as we move along the graph
shortest_path = {}
# We'll use this dict to save the shortest known path to a node found so far
previous_nodes = {}
# We'll use max_value to initialize the "infinity" value of the unvisited nodes
max_value = sys.maxsize
for node in unvisited_nodes:
shortest_path[node] = max_value
# However, we initialize the starting node's value with 0
shortest_path[start_node] = 0
# The algorithm executes until we visit all nodes
while unvisited_nodes:
# The code block below finds the node with the lowest score
current_min_node = None
for node in unvisited_nodes: # Iterate over the nodes
if current_min_node == None:
current_min_node = node
elif shortest_path[node] < shortest_path[current_min_node]:
current_min_node = node
# The code block below retrieves the current node's neighbors and updates their distances
neighbors = graph.get_outgoing_edges(current_min_node)
for neighbor in neighbors:
tentative_value = shortest_path[current_min_node] + graph.value(current_min_node, neighbor)
if tentative_value < shortest_path[neighbor]:
shortest_path[neighbor] = tentative_value
# We also update the best path to the current node
previous_nodes[neighbor] = current_min_node
# After visiting its neighbors, we mark the node as "visited"
unvisited_nodes.remove(current_min_node)
return previous_nodes, shortest_path

# The code block below finds all the min nodes
# and randomly chooses one for traversal
min_nodes = []
for node in unvisited_nodes: # Iterate over the nodes
if len(min_nodes) == 0:
min_nodes.append(node)
elif shortest_path[node] < shortest_path[min_nodes[0]]:
min_nodes = [node]
else:
# this is the case where 2 nodes have the same cost
# we are going to take all of them
# and at the end choose one randomly
min_nodes.append(node)
current_min_node = random.choice(min_nodes)
What the code does is as follows:
Instead of taking the first smallest element, it creates a list of all the smallest elements.
At the end it choose one of the smallest elements randomly.
This will both guarantee the Dijkstra invariant and choose a random path among the cheapest.

probably just try something like this
random.shuffle(neighbors)
for neighbor in neighbors:
...
which should visit the neighbors randomly (this assumes neighbors is a list or tuple... if its a generator call list on it first...

Kosaraju’s algorithm for scc

Can anyone explain me the logic behind Kosaraju’s algorithm for finding connected component?
I have read the description, though I can't understand how the DFS on reversed graph can detect the number of strongly connected components.
def dfs(visited, stack, adj, x):
visited[x] = 1
for neighbor in adj[x]:
if (visited[neighbor]==0):
dfs(visited, stack, adj, neighbor)
stack.insert(0, x)
return stack, visited
def reverse_dfs(visited, adj, x, cc):
visited[x] = 1
for neighbor in adj[x]:
if (visited[neighbor]==0):
cc += 1
reverse_dfs(visited, adj, neighbor,cc)
print(x)
return cc
def reverse_graph(adj):
vertex_num = len(adj)
new_adj = [ [] for _ in range(vertex_num)]
for i in range(vertex_num):
for j in adj[i]:
new_adj[j].append(i)
return new_adj
def find_post_order(adj):
vertex_num = len(adj)
visited = [0] * vertex_num
stack = []
for vertex in range(vertex_num):
if visited[vertex] == 0:
stack, visited = dfs(visited, stack, adj, vertex)
return stack
def number_of_strongly_connected_components(adj):
vertex_num = len(adj)
new_adj = reverse_graph(adj)
stack = find_post_order(adj)
visited = [0] * vertex_num
cc_num = 0
while (stack):
vertex = stack.pop(0)
print(vertex)
print('------')
if visited[vertex] == 0:
cc_num = reverse_dfs(visited, new_adj, vertex, cc_num)
return cc_num
if __name__ == '__main__':
input = sys.stdin.read()
data = list(map(int, input.split()))
n, m = data[0:2]
data = data[2:]
edges = list(zip(data[0:(2 * m):2], data[1:(2 * m):2]))
adj = [[] for _ in range(n)]
for (a, b) in edges:
adj[a - 1].append(b - 1)
print(number_of_strongly_connected_components(adj))
Above you can find my implementation. I guess that initial DFS and reverse operation are done correctly, but I can't get how to perform the second DFS properly.
Thanks in advance.

The first thing that you should notice is that the set of strongly connected components is the same for a graph and its reverse. In fact, the algorithm actually finds the set of strongly connected components in the reversed graph, not the original (but it's alright, because both graphs have the same SCC).
The first DFS execution results in a stack that holds the vertices in a particular order such that when the second DFS is executed on the vertices in this order (on the reversed graph) then it computes SCC correctly. So the whole purpose of running the first DFS is to find an ordering of the graph vertices that serves the execution of the DFS algorithm on the reversed graph. Now I'll explain what is the property that the order of vertices in the stack have and how it serves the execution of DFS on the reversed graph.
So what is the property of the stack? Imagine S1 and S2 are two strongly connected components in the graph, and that in the reversed graph, S2 is reachable from S1. Obviously, S1 cannot be reachable from S2 as well, because if that was the case, S1 and S2 would collapsed into a single component. Let x be the top vertex among the vertices in S1 (that is, all other vertices in S1 are below x in the stack). Similarly, let y be the top vertex among the vertices in S2 (again, all other vertices in S2 are below y in the stack). The property is that y (which belongs to S2) is above x (which belongs to S1) in the stack.
Why is this property helpful? When we execute DFS on the reversed graph, we do it according to the stack order. In particular, we explore y before we explore x. When exploring y we explore every vertex of S2, and since no vertex of S1 is reachable from S2 we explore all the vertices of S2 before we explore any vertex of S1. But this holds for any pair of connected components such that one is reachable from the other. In particular, the vertex at the top of the stack belongs to a sink component, and after we're done exploring that sink component, the top vertex again belongs to a sink relative to the graph induced by the not-yet-explored vertices.
Therefore, the algorithm correctly computes all the strongly connected components of the reversed graph, which, as aforesaid, are the same as in the original graph.

Get all possible simple paths between two nodes (Graph theory)

In the context of graph theory, I'm trying to get all the possible simple paths between two nodes.
I record the network using an adjacency matrix stored in a pandas dataframe, in a way that network[x][y] store the value of the arrow which goes from x to y.
To get the paths between two nodes, what I do is:
I get all the possible permutations with all the nodes (using it.permutations -as the path is simple there is no repetitions).
Then I use an ad hoc function: adjacent (which gives me the neighbours of a node), to check which among all the possible paths are true.
This takes too long, and it's not efficient. Do you know how I can improve the code? May be with a recursive function??
For a non relevant reason I don't want to use Networkx
def get_simple_paths(self, node, node_objective):
# Gives you all simple path between two nodes
#We get all possibilities and then we will filter it
nodes = self.nodes #The list of all nodes
possible_paths = [] #Store all possible paths
simple_paths = [] #Store the truly paths
l = 2
while l <= len(nodes):
for x in it.permutations(nodes, l): #They are neighbourgs
if x[0] == node and x[-1] == node_objective:
possible_paths.append(x)
l += 1
# Now check which of those paths exists
for x_pos, x in enumerate(possible_paths):
for i_pos, i in enumerate(x):
#We use it to check among all the path,
#if two of the nodes are not neighbours, the loop brokes
if i in self.adjacencies(x[i_pos+1]):
if i_pos+2 == len(x):
simple_paths.append(x)
break
else:
continue
else:
break
#Return simple paths
return(simple_paths)

Python: how to optimize the count of all possible shortest paths?

In a 3x3 network I want to be able to determine all the shortest paths between any two nodes. Then, for each node in the network, I want to compute how many shortest paths pass through one specific node.
This requires using the nx.all_shortest_paths(G,source,target) function, which returns a generator. This is at variance from using the nx.all_pairs_shortest_path(G), as suggested here. The difference is that in the former case the function computes all the shortest paths between any two nodes, while in the latter case it computes only one shortest path between the same pair of nodes.
Given that I need to consider all shortest paths, I have come up with the following script. This is how I generate the network I am working with:
import networkx as nx
N=3
G=nx.grid_2d_graph(N,N)
pos = dict( (n, n) for n in G.nodes() )
labels = dict( ((i, j), i + (N-1-j) * N ) for i, j in G.nodes() )
nx.relabel_nodes(G,labels,False)
inds=labels.keys()
vals=labels.values()
inds.sort()
vals.sort()
pos2=dict(zip(vals,inds))
nx.draw_networkx(G, pos=pos2, with_labels=False, node_size = 15)
And this is how I print all the shortest paths between any two nodes:
for n in G.nodes():
for j in G.nodes():
if (n!=j): #Self-loops are excluded
gener=nx.all_shortest_paths(G,source=n,target=j)
print('From node '+str(n)+' to '+str(j))
for p in gener:
print(p)
print('------')
The result is a path from node x to node y which only includes the nodes along the way. An excerpt of what I get is:
From node 0 to 2 #Only one path exists
[0, 1, 2] #Node 1 is passed through while going from node 0 to node 2
------
From node 0 to 4 #Two paths exist
[0, 1, 4] #Node 1 is passed through while going from node 0 to node 4
[0, 3, 4] #Node 3 is passed through while going from node 0 to node 4
------
...continues until all pairs of nodes are covered...
My question: how could I amend the last code block to make sure that I know how many shortest paths, in total, pass through each node? According to the excerpt outcome I've provided, node 1 is passed through 2 times, while node 3 is passed through 1 time (starting and ending node are excluded). This calculation needs to be carried out to the end to figure out the final number of paths through each node.

I would suggest making a dict mapping each node to 0
counts = {}
for n in G.nodes(): counts[n] = 0
and then for each path you find -- you're already finding and printing them all -- iterate through the vertices on the path incrementing the appropriate values in your dict:
# ...
for p in gener:
print(p)
for v in p: counts[v] += 1

What you seek to compute is the unnormalized betweenness centrality.
From Wikipedia:
The betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node.
More generally, I suggest you have a look at all the standard measures of centrality already in Networkx.

python, igraph coping with vertex renumbering

I am implementing an algorithm for finding a dense subgraph in a directed graph using python+igraph. The main loop maintains two subgraphs S and T which are initially identical and removes nodes (and incident edges) accoriding to a count of the indegree (or outdegree) of those nodes with respect to the other graph. The problem I have is that igraph renumbers the vertices so when I delete some from T, the remaining nodes no longer correspond to the same ones in S.
Here is the main part of the loop that is key.
def directed(S):
T = S.copy()
c = 2
while(S.vcount() > 0 and T.vcount() > 0):
if (S.vcount()/T.vcount() > c):
AS = S.vs.select(lambda vertex: T.outdegree(vertex) < 1.01*E(S,T)/S.vcount())
S.delete_vertices(AS)
else:
BT = T.vs.select(lambda vertex: S.indegree(vertex) < 1.01*E(S,T)/T.vcount())
T.delete_vertices(BT)
This doesn't work because of the effect of deleting vertices on the vertex ids. Is there a standard workaround for this problem?

One possibility is to assign unique names to the vertices in the name vertex attribute. These are kept intact when vertices are removed (unlike vertex IDs), and you can use them to refer to vertices in functions like indegree or outdegree. E.g.:
>>> g = Graph.Ring(4)
>>> g.vs["name"] = ["A", "B", "C", "D"]
>>> g.degree("C")
2
>>> g.delete_vertices(["B"])
>>> g.degree("C")
1
Note that I have removed vertex B so vertex C also gained a new ID, but the name is still the same.
In your case, the row with the select condition could probably be re-written like this:
AS = S.vs.select(lambda vertex: T.outdegree(vertex["name"]) < 1.01 * E(S,T)/S.vcount())
Of course this assumes that initially the vertex names are the same in S and T.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.