I have a graph and want to calculate its indegree and outdegree centralization. I tried to do this by using python networkx, but there I can only find a method to calculate indegree and outdegree centrality for each node. Is there a way to calculate in- and outdegree centralization of a graph in networkx?
Here's the code. I'm assuming that in-degree centralization is defined as I describe below...
N=G.order()
indegrees = G.in_degree().values()
max_in = max(indegrees)
centralization = float((N*max_in - sum(indegrees)))/(N-1)**2
Note I've written this with the assumption that it's python 2, not 3. So I've used float in the division. You can adapt as needed.
begin definition
Given a network G, define let y be the node with the largest in-degree, and use d_i(u) to denote the in-degree of a node u. Define H_G to be (I don't know a better way to write mathematical formulae on stackoverflow - would appreciate anyone who knows to either edit this or give a comment)
H_G = \sum_{u} d_i(y) - d_i(u)
= N d_i(u) - \sum_u d_i(u)
where u iterates over all nodes in G and N is the number of nodes of G.
The maximum possible value for a graph on N nodes comes when there is a single node to which all other nodes point to and no other nodes have edges to them. Then this H_G is (N-1)^2.
So for a given network, we define the centralization to be it's value of H_G compared to the maximum. So C(G) = H_G/ (N-1)^2.
end definition
This answer has been taken from a Google Groups on the issue (in the context of using R) that helps clarify the maths taken along with the above answer:
Freeman's approach measures "the average difference in centrality
between the most central actor and all others".
This 'centralization' is exactly captured in the mathematical formula
sum(max(x)-x)/(length(x)-1)
x refers to any centrality measure! That is, if you want to calculate
the degree centralization of a network, x has simply to capture the
vector of all degree values in the network. To compare various
centralization measures, it is best to use standardized centrality
measures, i.e. the centrality values should always be smaller than 1
(best position in any possible network) and greater than 0 (worst
position)... if you do so, the centralization will also be in the
range of [0,1].
For degree, e.g., the 'best position' is to have an edge to all other
nodes (i.e. incident edges = number of nodes minus 1) and the 'worst
position' is to have no incident edge at all.
You can use the following code for finding the network in degree centralization. The following is the function definition.
def in_degree_centralization(G):
centralities=nx.in_degree_centrality(G)
max_val=max(bcc.values())
summ=0
for i in bcc.values():
cc= max_val-i
summ=summ+cc
normalization_factor=(len(G.nodes())-1)*(len(G.nodes())-2)
return summ/normalization_factor
revoke the same function by passing the graph G as parameter i.e in_degree_centralization(graph)
Related
I made a c# application that draw random point on the panel. I need to cluster this points according to euclidian distance. I already implement kruskal algorithm. Normally, there must be number of minimum spanning tree up to written number. For instance, when the user want to clusters drawn point for 3 clusters , end of the kruskal algorithm there must be 3 huge MST.
But I did it in a different way. I made a one huge MST, now I have to divide this MST into written number of clusters.
For example, point number = 5 , cluster number 2 my kruskal output is = 0-3:57 1-2:99 1-4:102
from-to:euclidian distance
Problem is I don't know in where I should cut these MST for create clusters
In Kruskal's algorithm, MST edges are added in order of increasing weight.
If you're starting with an MST and you want to get the same effect as stopping Kruskal's algorithm when there are N connected components, then just delete the N-1 highest-weight edges in the MST.
I have a problem involving graph theory. To solve it, I would like to create a weighted graph using networkx. At the moment, I have a dictionnary where each key is a node, and each value is the associated weight (between 10 and 200 000 or so).
weights = {node: weight}
I believe I do not need to normalize the weights with networks.
At the moment, I create a non-weighted graph by adding the edges:
def create_graph(data):
edges = create_edges(data)
# Create the graph
G = nx.Graph()
# Add edges
G.add_edges_from(edges)
return G
From what I read, I can add a weight to the edge. However, I would prefer the weight to be applied to a specific node instead of an edge. How can I do that?
Idea: I create the graph by adding the nodes weighted, and then I add the edges between the nodes.
def create_graph(data, weights):
nodes = create_nodes(data)
edges = create_edges(data) # list of tuples
# Create the graph
G = nx.Graph()
# Add edges
for node in nodes:
G.add_node(node, weight=weights[node])
# Add edges
G.add_edges_from(edges)
return G
Is this approach correct?
Next step is to find the path between 2 nodes with the smallest weight. I found this function: networkx.algorithms.shortest_paths.generic.shortest_path which I think is doing the right thing. However, it uses weights on the edge instead of weights on the nodes. Could someone explain me what this function does, what the difference between wieghts on the nodes and weights on the edges is for networkx, and how I could achieve what I am looking for? Thanks :)
This generally looks right.
You might use bidirectional_dijkstra. It can be significantly faster if you know the source and target nodes of your path (see my comments at the bottom).
To handle the edge vs node weight issue, there are two options. First note that you are after the sum of the nodes along the path. If I give each edge a weight w(u,v) = w(u) + w(v) then the sum of weights along this is w(source) + w(target) + 2 sum(w(v)) where the nodes v are all nodes found along the way. Whatever has the minimum weight with these edge weights will have the minimum weight with the node weights.
So you could go and assign each edge the weight to be the sum of the two nodes.
for edge in G.edges():
G.edges[edge]['weight'] = G.nodes[edge[0]]['weight'] + G.nodes[edge[1]]['weight']
But an alternative is to note that the weight input into bidirectional_dijkstra can be a function that takes the edge as input. Define your own function to give the sum of the two node weights:
def f(edge):
u,v = edge
return G.nodes[u]['weight'] + G.nodes[v]['weight']
and then in your call do bidirectional_dijkstra(G, source, target, weight=f)
So the choices I'm suggesting are to either assign each edge a weight equal to the sum of the node weights or define a function that will give those weights just for the edges the algorithm encounters. Efficiency-wise I expect it will take more time to figure out which is better than it takes to code either algorithm. The only performance issue is that assigning all the weights will use more memory. Assuming memory isn't an issue, use whichever one you think is easiest to implement and maintain.
Some comments on bidirectional dijkstra: Imagine you have two points in space a distance R apart and you want to find the shortest distance between them. The dijkstra algorithm (which is the default of shortest_path) will explore every point within distance D of the source point. Basically it's like expanding a balloon centered at the first point until it reaches the other. This has a volume (4/3) pi R^3. With bidirectional_dijkstra we inflate balloons centered at each until they touch. They will each have radius R/2. So the volume is (4/3)pi (R/2)^3 + (4/3) pi (R/2)^3, which is a quarter the volume of the original balloon, so the algorithm has explored a quarter of the space. Since networks can have very high effective dimension, the savings is often much bigger.
I have a network, and how to generate a random network but ensure each node retains the same degre of the original network using networkx? My first thought is to get the adjacency matrix, and perform a random in each row of the matrix, but this way is somwhat complex, e.g. need to avoid self-conneted (which is not seen in the original network) and re-label the nodes. Thanks!
I believe what you're looking for is expected_degree_graph. It generates a random graph based on a sequence of expected degrees, where each degree in the list corresponds to a node. It also even includes an option to disallow self-loops!
You can get a list of degrees using networkx.degree. Here's an example of how you would use them together in networkx 2.0+ (degree is slightly different in 1.0):
import networkx as nx
from networkx.generators.degree_seq import expected_degree_graph
N,P = 3, 0.5
G = nx.generators.random_graphs.gnp_random_graph(N, P)
G2 = expected_degree_graph([deg for (_, deg) in G.degree()], selfloops=False)
Note that you're not guaranteed to have the exact degrees for each node using expected_degree_graph; as the name implies, it's probabilistic given the expected value for each of the degrees. If you want something a little more concrete you can use configuration_model, however it does not protect against parallel edges or self-loops, so you'd need to prune those out and replace the edges yourself.
I searched but found there are many examples about how to create a graph with edges weight, but none of them shows how to create a graph with vertices weight. I start to wonder if it is possible.
If a vertices-weighted graph can be created with igraph, then is it possible to calculate the weighted independence or other weighted numbers with igraph?
As far as I know, there are no functions in igraph that accept arguments for weighted vertices. However, the SANTA package that is a part of the Bioconductor suite for R does have routines for weighted vertices, if you are willing to move to R for this. (Seems like maybe you can run bioconductor in python.)
Another hacky option is the use (when possible) unweighted routines from igraph and then back in the weights. E.g. something like this for weighted maximal independent sets:
def maxset(graph,weight):
ms = g.maximal_independent_vertex_sets()
w = []
t = []
for i in range(0, 150):
m = weights.loc[weights['ids'].isin(ms[i]),"weights"]
w.append(m)
s = sum(w[i])
t.append(s)
return(ms[t.index(max(t))])
maxset(g,weights)
(Where weights is a two column data frame with column 1 = vertex ids and column 2 = weights). This gets the maximal independent set taking vertex weights into consideration.
You want to use vs class to define vertices and their attributes in igraph.
As example for setting weight on vertices, taken from documentation:
http://igraph.org/python/doc/igraph.VertexSeq-class.html
g=Graph.Full(3) # generate a full graph as example
for idx, v in enumerate(g.vs):
v["weight"] = idx*(idx+1) # set the 'weight' of vertex to integer, in function of a progressive index
>>> g.vs["weight"]
[0, 2, 6]
Note that a sequence of vertices are called through g.vs, here g the instance of your Graph object.
I suggested you this page, I found it practical to look for iGraph methods here:
http://igraph.org/python/doc/identifier-index.html
In Python, I need to create an Exponential Network, which is different from an Exponential Random Graph.
Exponential Networks were introduced in 2005 by Liu & Tang, and originate from a slight variation of the Barabasi-Albert model used to create Scale-Free Networks. This new algorithm still uses growth and preferential attachment, but in such a way that:
The network grows from an initial number of nodes n to a final number N;
A new node is not connected to the most highly connected one, but is instead connected to the one that has the neighborhood with the highest average degree.
So what drives the attachment now is not the degree of existing nodes, but the average degree of their neighborhood. This implies that the algorithm generating the Barabasi-Albert model needs to be modified, and that is my goal.
I want to write a code that does this in a simple step-by-step fashion, using nested for loops for simulating growth and preferential attachment. Also, I want the nodes to be assigned specific positions, like this:
n=100 #Final number of nodes
ncols = 10 #Number of columns of a 10x10 grid
pos = {i : (i // ncols, (n-i-1) % ncols) for i in G.nodes()} #G=graph
My problem: I could do this by accessing the source code for the nx.barabasi_albert_graph() function, but I don't understand which is the growth phase, which is the preferential attachment phase, and where the degree of each node is computed. I would be glad if someone could point me in the right direction here.
The source code for the nx.barabasi_albert_graph() function:
def barabasi_albert_graph(n, m, seed=None):
if m < 1 or m >=n:
raise nx.NetworkXError(\
"Barabási-Albert network must have m>=1 and m<n, m=%d,n=%d"%(m,n))
if seed is not None:
random.seed(seed)
# Add m initial nodes (m0 in barabasi-speak)
G=empty_graph(m)
G.name="barabasi_albert_graph(%s,%s)"%(n,m)
# Target nodes for new edges
targets=list(range(m))
# List of existing nodes, with nodes repeated once for each adjacent edge
repeated_nodes=[]
# Start adding the other n-m nodes. The first node is m.
source=m
while source<n:
# Add edges to m nodes from the source.
G.add_edges_from(zip([source]*m,targets))
# Add one node to the list for each new edge just created.
repeated_nodes.extend(targets)
# And the new node "source" has m edges to add to the list.
repeated_nodes.extend([source]*m)
# Now choose m unique nodes from the existing nodes
# Pick uniformly from repeated_nodes (preferential attachement)
targets = _random_subset(repeated_nodes,m)
source += 1
return G
I have implemented an animation for Barabasi-Albert graph growth and I think the implementation is easily adjustable for the preferential attachment criteria and nodes positions.
Nodes Positions
You will need to look into animate_BA function for nodes position (as I have it randomly chosen) at lines 39 (for starting nodes) and 69 (for newly added node)
Growth Phase
As for Growth phase, this is implemented in a separate function choose_neighbors that is called for the insertion of a new node into the graph. My implementation chooses a node to connect to with probability: (deg(i)+1)/Sum(deg(i)+1) where i is a node in the graph and deg(i) is the degree of that node and Sum(deg(i)+1) is the summation of degrees of all nodes in the graph + 1. This is achieved by creating a list of floats from 0 to 1 specifying the probability for each node to be chosen based on its degree.You can adjust that to the average degree of neighbors instead. So you will have to create this list but differently, since this function calls select_neighbors function to choose neighbors randomly based on the defined probabilities.
Other functions are mainly related to animation so you may not need to look into them. The code is documented and you can find further explanation there.