Understanding average_degree_connectivity in networkx? - python

I have some hard time to understand this graph quantity: networkx.algorithms.assortativity.average_degree_connectivity
average_neighbor_degree returns a node id and its average_neighbor_degree:
d – A dictionary keyed by node with average neighbors degree value.
However, I can't understand what average_degree_connectivity is? It returns:
d – A dictionary keyed by degree k with the value of average connectivity.
For example, for three graphs the average_degree_connectivity vs. average neighbors degree value. look as follows. What does average neighbors degree value. mean?
What does average_degree_connectivity represent?
How is average_neighbor_degree related to average_degree_connectivity?

It makes sense to answer your questions the other way round:
Let v be a vertex with m neighbors. The average_neighbor_degree of v is simply the sum of its neighbors' degrees divided by m.
For the average_degree_connectivity, this is the important part of the definition:
... is the average nearest neighbor degree of nodes with degree k
So for all the different degrees that occur in the graph, it gives the average of the average_neighbor_degree of all nodes with the same degree. It is a measure of how connected nodes with certain degrees are.
That are many averages, I hope this snippet clarifies question 2:
import networkx as nx
from collections import defaultdict
G = nx.karate_club_graph()
avg_neigh_degrees = nx.algorithms.assortativity.average_neighbor_degree(G)
deg_to_avg_neighbor_degrees = defaultdict(list)
for node, degree in nx.degree(G):
deg_to_avg_neighbor_degrees[degree].append(avg_neigh_degrees[node])
# this is the same as nx.algorithms.assortativity.average_degree_connectivity(G)
avg_degree_connectivity = {degree: sum(vals)/len(vals) for degree, vals in
deg_to_avg_neighbor_degrees.items()}

Related

Discrepancy in calculating graph coloring code complexity

Consider the code below. Suppose the graph in question has N nodes with at most D neighbors for each node, and D+1 colors are available for coloring the nodes such that no two nodes connected with an edge have the same color assigned to them. I reckon the complexity of the code below is O(N*D) because for each of the N nodes we loop through the at most D neighbors of that node to populate the set illegal_colors, and then iterate through colors list that comprises D+1 colors. But the complexity given is O(N+M) where M is the number of edges. What am I doing wrong here?
def color_graph(graph, colors):
for node in graph:
if node in node.neighbors:
raise Exception('Legal coloring impossible for node with loop: %s' %
node.label)
# Get the node's neighbors' colors, as a set so we
# can check if a color is illegal in constant time
illegal_colors = set([
neighbor.color
for neighbor in node.neighbors
if neighbor.color
])
# Assign the first legal color
for color in colors:
if color not in illegal_colors:
node.color = color
break
The number of edges M, the maximum degree D and the number of nodes N satisfy the inequality:
M <= N * D / 2.
Therefore O(N+M) is included in O(N*(D+1)).
In your algorithm, you loop over every neighbour of every node. The exact complexity of that is not N*D, but d1 + d2 + d3 + ... + dN where di is the degree of node i. This sum is equal to 2*M, which is at most N*D but might be less.
Therefore the complexity of your algorithm is O(N+M). Hence it is also O(N*(D+1)). Note that O(N*(D+1)) = O(N*D) under the assumption D >= 1.
Saying your algorithm runs in O(N+M) is slightly more precise than saying it runs in O(N*D). If most nodes have a lot fewer than D neighbours, then M+N might be much smaller than N*D.
Also note that O(M+N) = O(M) under the assumption that every node has at least one neighbour.

Create a networkx weighted graph and find the path between 2 nodes with the smallest weight

I have a problem involving graph theory. To solve it, I would like to create a weighted graph using networkx. At the moment, I have a dictionnary where each key is a node, and each value is the associated weight (between 10 and 200 000 or so).
weights = {node: weight}
I believe I do not need to normalize the weights with networks.
At the moment, I create a non-weighted graph by adding the edges:
def create_graph(data):
edges = create_edges(data)
# Create the graph
G = nx.Graph()
# Add edges
G.add_edges_from(edges)
return G
From what I read, I can add a weight to the edge. However, I would prefer the weight to be applied to a specific node instead of an edge. How can I do that?
Idea: I create the graph by adding the nodes weighted, and then I add the edges between the nodes.
def create_graph(data, weights):
nodes = create_nodes(data)
edges = create_edges(data) # list of tuples
# Create the graph
G = nx.Graph()
# Add edges
for node in nodes:
G.add_node(node, weight=weights[node])
# Add edges
G.add_edges_from(edges)
return G
Is this approach correct?
Next step is to find the path between 2 nodes with the smallest weight. I found this function: networkx.algorithms.shortest_paths.generic.shortest_path which I think is doing the right thing. However, it uses weights on the edge instead of weights on the nodes. Could someone explain me what this function does, what the difference between wieghts on the nodes and weights on the edges is for networkx, and how I could achieve what I am looking for? Thanks :)
This generally looks right.
You might use bidirectional_dijkstra. It can be significantly faster if you know the source and target nodes of your path (see my comments at the bottom).
To handle the edge vs node weight issue, there are two options. First note that you are after the sum of the nodes along the path. If I give each edge a weight w(u,v) = w(u) + w(v) then the sum of weights along this is w(source) + w(target) + 2 sum(w(v)) where the nodes v are all nodes found along the way. Whatever has the minimum weight with these edge weights will have the minimum weight with the node weights.
So you could go and assign each edge the weight to be the sum of the two nodes.
for edge in G.edges():
G.edges[edge]['weight'] = G.nodes[edge[0]]['weight'] + G.nodes[edge[1]]['weight']
But an alternative is to note that the weight input into bidirectional_dijkstra can be a function that takes the edge as input. Define your own function to give the sum of the two node weights:
def f(edge):
u,v = edge
return G.nodes[u]['weight'] + G.nodes[v]['weight']
and then in your call do bidirectional_dijkstra(G, source, target, weight=f)
So the choices I'm suggesting are to either assign each edge a weight equal to the sum of the node weights or define a function that will give those weights just for the edges the algorithm encounters. Efficiency-wise I expect it will take more time to figure out which is better than it takes to code either algorithm. The only performance issue is that assigning all the weights will use more memory. Assuming memory isn't an issue, use whichever one you think is easiest to implement and maintain.
Some comments on bidirectional dijkstra: Imagine you have two points in space a distance R apart and you want to find the shortest distance between them. The dijkstra algorithm (which is the default of shortest_path) will explore every point within distance D of the source point. Basically it's like expanding a balloon centered at the first point until it reaches the other. This has a volume (4/3) pi R^3. With bidirectional_dijkstra we inflate balloons centered at each until they touch. They will each have radius R/2. So the volume is (4/3)pi (R/2)^3 + (4/3) pi (R/2)^3, which is a quarter the volume of the original balloon, so the algorithm has explored a quarter of the space. Since networks can have very high effective dimension, the savings is often much bigger.

NetworkX Directed Graph Generation

Is there a function available in Python's NetworkX for generating random directed graphs with a maximum Euclidean distance between any two connected nodes? For example, for nodes separated by a certain Euclidean distance, there is a probability p of those nodes being connected and for all other nodes separated by greater than this distance, they will not be connected in the graph that is generated.
If you have a threshold such that distances greater than the threshold do not exist, and all edges shorter than that threshold have probability p, then you're in luck. [if it's not the same probability for all shorter edges, it's still doable but a bit harder]
Start by building a random geometric graph G. This is a graph whose nodes are put in place uniformly at random and any two are connected if they are within a threshold distance from each other.
Then create a new directed graph which has each direction of the edges in G with probability p.
import networkx as nx
import random
N=100 # 100 nodes
D = 0.2 #threshold distance of 0.2
G = nx.random_geometric_graph(N, D)
H = nx.Digraph()
H.add_nodes_from(G.edges())
p = 0.1 #keep 10% of the edges
for u,v in G.edges():
if random.random()<p:
H.add_edge(u,v)
if random.random()<p:
H.add_edge(v,u)

How to use Dijkstra's Shortest Path in a Weighted graph to compute the Average value of the Weights? [Python]

I want to compute the Dijkstra's shortest path in a weighted graph to compute the average value of the weights. I didn't found anything useful in the web, so please help me because I think this could be useful not just for me.
I have a list of dictionaries:
list_of_dictionaries= [
{"RA":"1","RB":"2","SPEED":"100"}
...
{"RA":"1","RB":"8250","SPEED":"150"}
{"RA":"2","RB":"3","SPEED":"120"}
...
{"RA":"2","RB":"8250","SPEED":"150"}
...
...
{"RA":"350","RB":"351","SPEED":"130"}
...
...
]
I create a weighted undirected graph and then use Dijkstra shortest path to compute the average speed (not the total speed). Let's start:
import networkx as nx
graph = nx.Graph()
for row in list_of_dictionaries:
source = row['RA']
target = row['RB']
if not graph.has_node(source):
graph.add_node(source)
if not graph.has_node(target):
graph.add_node(target)
graph.add_edge(source, target, speed=float(row['SPEED']))
I have checked that the graph is CONNECTED and it is!
Now I first get the path which contains the list of nodes (not edges), then I take two nodes at a time, I check into the graph the weight (speed) of the edge created with the two nodes.
I repeat this procedure for all the nodes in the path and then I compute the average speed.
This is of course very time consuming and inefficient
tot_speed = 0.0
path = nx.dijkstra_path(graph, source, target, 'speed')
for k, node in enumerate(path[:-1]):
n1 = path[k]
n2 = path[k + 1]
speed = graph[n1][n2]['speed']
tot_speed += speed
avg_speed = tot_speed / len(path)-1
As you can see this is not a good way of doing and there are two main issues:
1) if I try nx.dijkstra_path(graph, 1, 1, 'speed') I have troubles because the denominator in the avg_speed formula is zero.
2) The for loop is really a messy
If you have a better idea, please let me know your point of view. Thanks.

calculate indegree centralization of graph with python networkx

I have a graph and want to calculate its indegree and outdegree centralization. I tried to do this by using python networkx, but there I can only find a method to calculate indegree and outdegree centrality for each node. Is there a way to calculate in- and outdegree centralization of a graph in networkx?
Here's the code. I'm assuming that in-degree centralization is defined as I describe below...
N=G.order()
indegrees = G.in_degree().values()
max_in = max(indegrees)
centralization = float((N*max_in - sum(indegrees)))/(N-1)**2
Note I've written this with the assumption that it's python 2, not 3. So I've used float in the division. You can adapt as needed.
begin definition
Given a network G, define let y be the node with the largest in-degree, and use d_i(u) to denote the in-degree of a node u. Define H_G to be (I don't know a better way to write mathematical formulae on stackoverflow - would appreciate anyone who knows to either edit this or give a comment)
H_G = \sum_{u} d_i(y) - d_i(u)
= N d_i(u) - \sum_u d_i(u)
where u iterates over all nodes in G and N is the number of nodes of G.
The maximum possible value for a graph on N nodes comes when there is a single node to which all other nodes point to and no other nodes have edges to them. Then this H_G is (N-1)^2.
So for a given network, we define the centralization to be it's value of H_G compared to the maximum. So C(G) = H_G/ (N-1)^2.
end definition
This answer has been taken from a Google Groups on the issue (in the context of using R) that helps clarify the maths taken along with the above answer:
Freeman's approach measures "the average difference in centrality
between the most central actor and all others".
This 'centralization' is exactly captured in the mathematical formula
sum(max(x)-x)/(length(x)-1)
x refers to any centrality measure! That is, if you want to calculate
the degree centralization of a network, x has simply to capture the
vector of all degree values in the network. To compare various
centralization measures, it is best to use standardized centrality
measures, i.e. the centrality values should always be smaller than 1
(best position in any possible network) and greater than 0 (worst
position)... if you do so, the centralization will also be in the
range of [0,1].
For degree, e.g., the 'best position' is to have an edge to all other
nodes (i.e. incident edges = number of nodes minus 1) and the 'worst
position' is to have no incident edge at all.
You can use the following code for finding the network in degree centralization. The following is the function definition.
def in_degree_centralization(G):
centralities=nx.in_degree_centrality(G)
max_val=max(bcc.values())
summ=0
for i in bcc.values():
cc= max_val-i
summ=summ+cc
normalization_factor=(len(G.nodes())-1)*(len(G.nodes())-2)
return summ/normalization_factor
revoke the same function by passing the graph G as parameter i.e in_degree_centralization(graph)

Categories

Resources