Generating a graph with certain degree distribution? - python

I am trying to generate a random graph that has small-world properties (exhibits a power law distribution). I just started using the networkx package and discovered that it offers a variety of random graph generation. Can someone tell me if it possible to generate a graph where a given node's degree follows a gamma distribution (either in R or using python's networkx package)?

If you want to use the configuration model something like this should work in NetworkX:
import random
import networkx as nx
z=[int(random.gammavariate(alpha=9.0,beta=2.0)) for i in range(100)]
G=nx.configuration_model(z)
You might need to adjust the mean of the sequence z depending on parameters in the gamma distribution. Also z doesn't need to be graphical (you'll get a multigraph), but it does need an even sum so you might have to try a few random sequences (or add 1)...
The NetworkX documentation notes for configuration_model give another example, a reference and how to remove parallel edges and self loops:
Notes
-----
As described by Newman [1]_.
A non-graphical degree sequence (not realizable by some simple
graph) is allowed since this function returns graphs with self
loops and parallel edges. An exception is raised if the degree
sequence does not have an even sum.
This configuration model construction process can lead to
duplicate edges and loops. You can remove the self-loops and
parallel edges (see below) which will likely result in a graph
that doesn't have the exact degree sequence specified. This
"finite-size effect" decreases as the size of the graph increases.
References
----------
.. [1] M.E.J. Newman, "The structure and function
of complex networks", SIAM REVIEW 45-2, pp 167-256, 2003.
Examples
--------
>>> from networkx.utils import powerlaw_sequence
>>> z=nx.create_degree_sequence(100,powerlaw_sequence)
>>> G=nx.configuration_model(z)
To remove parallel edges:
>>> G=nx.Graph(G)
To remove self loops:
>>> G.remove_edges_from(G.selfloop_edges())
Here is an example similar to the one at http://networkx.lanl.gov/examples/drawing/degree_histogram.html that makes a drawing including a graph layout of the largest connected component:
#!/usr/bin/env python
import random
import matplotlib.pyplot as plt
import networkx as nx
def seq(n):
return [random.gammavariate(alpha=2.0,beta=1.0) for i in range(100)]
z=nx.create_degree_sequence(100,seq)
nx.is_valid_degree_sequence(z)
G=nx.configuration_model(z) # configuration model
degree_sequence=sorted(nx.degree(G).values(),reverse=True) # degree sequence
print "Degree sequence", degree_sequence
dmax=max(degree_sequence)
plt.hist(degree_sequence,bins=dmax)
plt.title("Degree histogram")
plt.ylabel("count")
plt.xlabel("degree")
# draw graph in inset
plt.axes([0.45,0.45,0.45,0.45])
Gcc=nx.connected_component_subgraphs(G)[0]
pos=nx.spring_layout(Gcc)
plt.axis('off')
nx.draw_networkx_nodes(Gcc,pos,node_size=20)
nx.draw_networkx_edges(Gcc,pos,alpha=0.4)
plt.savefig("degree_histogram.png")
plt.show()

I did this a while ago in base Python... IIRC, I used the following method. From memory, so this may not be entirely accurate, but hopefully it's worth something:
Chose the number of nodes, N, in your graph, and the density (existing edges over possible edges), D. This implies the number of edges, E.
For each node, assign its degree by first choosing a random positive number x and finding P(x), where P is your pdf. The node's degree is (P(x)*E/2) -1.
Chose a node at random, and connect it to another random node. If either node has realized its assigned degree, eliminate it from further selection. Repeat E times.
N.B. that this doesn't create a connected graph in general.

I know this is very late, but you can do the same thing, albeit a little more straightforward, with mathematica.
RandomGraph[DegreeGraphDistribution[{3, 3, 3, 3, 3, 3, 3, 3}], 4]
This will generate 4 random graphs, with each node having a prescribed degree.

Including the above mentioned, networkx provides 4 algorithms that receives the degree_distribution as an input:
configuration_model: explain by #eric
expected_degree_graph: use a probabilistic approach based on the expected degree of each node. It won't give you the exact degrees but an approximation.
havel_hakimi_graph: this one tries to connect the nodes with highest degree first
random_degree_sequence_graph: as far as I can see, this is similar to what #JonC suggested; it has a trials parameter as there is no guarantee of finding a suitable configuration.
The full list (including some versions of the algorithms for directed graphs) is here.
I also found a couple of papers:
Efficient and exact sampling of simple graphs with given arbitrary degree sequence
A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees

Related

Is a negative log likelihood necessary small?

I am having issues implementing a strategy described in the article "Blind identification strategies for room occupancy estimation" in Python
In order to make my question as easy to answer as possible, I will do my best to keep the level of detail to the bare minimum
Here are the Mathematics of the problem
My aim is to minimize L as it represents the negative log likelihood function. The problem : the implementation leads to astronomically high values that don't seem coherent (infinite for certain values even). Is it? Because my results do not correspond to the ones given in the article either and I can't find why
Here is the L function I defined in Python:
import numpy as np
from numpy.linalg import inv,det
from numpy import dot,transpose,log
def L(ARX,O,u,y):
#Preliminaries: Break down theta into a,bu,bo,s2,o(1),...,o(N)
N=O.shape[0]
[a,bu,bo,s2]=ARX #the coefficient of the ARX system to be identified
I=np.eye(N) #the identity matrix for later calculations
#Delta matrix
Delta=np.zeros([N,N])
Delta[1:N,0:N-1]=np.eye(N-1)
#y_barre
y_barre=dot(I-a*Delta,y)-dot(bu*Delta,u)-dot(bo*Delta,O)
#Sigma_y (covariance matrix)
Sy=s2*dot(inv(I-a*Delta),inv(I-a*Delta).T)
#finally the negative log likelihood is given by
return log(det(Sy))+1/s2*dot(y_barre.T,y_barre)

Python - Interpolate 2D point cloud using splines

I'm attempting to fit a 2D point-cloud (x and y coordinates). So far, I've had limited success with scipy's interpolation packages, most notable UnivariateSpline, which produced the following (sub-optimal) fit (please ignore the colors):
However, this is obviously the wrong package, since I need a final curve that can bend back in on itself (so no longer a 1d function) near the edges of my parabolic point-cloud.
I then read up on interp2d, for example, but don't understand what my z array would be. Is there a better class of packages that I'm perhaps overlooking?
Update 1: as suggested in the comments, I've redone this using scipy.interpolate.splprep; my general setup is:
from scipy.interpolate import splprep, splev
pts = np.vstack((X.ravel(), Y.ravel)) #X and Y contain my points
(tck, u), fp, ier, msg = splprep(pts, u=None, per=0, k=3, full_output=True) #s = optional parameter (default used here)
print('Spline score:',fp) #goodness of fit flatlines after a given s value (and higher), which captures the default s-value as well
x_new, y_new = splec(u_new, tck, der=0)
plt.plot(x_new, y_new, 'k')
plt.show()
The plot is below. Can anyone suggest a method for automating the decision of s... possible a loop while assessing coefficient of determination of each s plot? Or is there something baked in?
Update 2:
I've since re-run this on two different point-clouds, and found that the ordering of the points significantly alters the outcome. When I re-order the points in the point-cloud to be along an initial fit to a parabola, I get much better results with my spline. As well, the results are still sub-optimal, as can be seen below.
Are there any further adjustments I can make with this method? Alternatively, does anyone have a suggestion for a competitive approach I can investigate?
Update 3: actually, setting knots = 5 helps out tremendously:
I faced a similar issue, and the solution to my problem was to use Principal Curves (https://hastie.su.domains/Papers/Principal_Curves.pdf).
This implementation available on GitHub could help you: https://github.com/zsteve/pcurvepy.

ValueError: Linkage 'Z' uses the same cluster more than once in Python scipy fcluster

I'm getting ValueError: Linkage 'Z' uses the same cluster more than once. when trying to get flat clusters in Python with scipy.cluster.hierarchy.fcluster. This error happens only sometimes, usually only with really big matrices ie 10000x10000.
import scipy.cluster.hierarchy as sch
Z = sch.linkage(d, method="ward")
# some computation here, returning n (usually between 5-30)
clusters = sch.fcluster(Z, t=n, criterion='maxclust')
Why does it happen? How can I avoid it? Unfortunately I couldn't find any useful info by googling...
EDIT Error occurs also when trying to get dendrogram.
No such error appear if method='average' is used.
It seems using fastcluster instead of scipy.cluster.hierarchy solves the problem. In addition, fastcluster implementation is slightly faster than scipy.
For more details have a look at the paper.
import fastcluster
Z = fastcluster.linkage(d, method="ward")
# some computation here, returning n (usually between 5-30)
clusters = fastcluster.fcluster(Z, t=n, criterion='maxclust')

Is there a Python equivalent to the smooth.spline function in R

The smooth.spline function in R allows a tradeoff between roughness (as defined by the integrated square of the second derivative) and fitting the points (as defined by summing the squares of the residuals). This tradeoff is accomplished by the spar or df parameter. At one extreme you get the least squares line, and the other you get a very wiggly curve which intersects all of the data points (or the mean if you have duplicated x values with different y values)
I have looked at scipy.interpolate.UnivariateSpline and other spline variants in Python, however, they seem to only tradeoff by increasing the number of knots, and setting a threshold (called s) for the allowed SS residuals. By contrast, the smooth.spline in R allows having knots at all the x values, without necessarily having a wiggly curve that hits all the points -- the penalty comes from the second derivative.
Does Python have a spline fitting mechanism that behaves in this way? Allowing all knots but penalizing the second derivative?
You can use R functions in Python with rpy2:
import rpy2.robjects as robjects
r_y = robjects.FloatVector(y_train)
r_x = robjects.FloatVector(x_train)
r_smooth_spline = robjects.r['smooth.spline'] #extract R function# run smoothing function
spline1 = r_smooth_spline(x=r_x, y=r_y, spar=0.7)
ySpline=np.array(robjects.r['predict'](spline1,robjects.FloatVector(x_smooth)).rx2('y'))
plt.plot(x_smooth,ySpline)
If you want to directly set lambda: spline1 = r_smooth_spline(x=r_x, y=r_y, lambda=42) doesn't work, because lambda has already another meaning in Python, but there is a solution: How to use the lambda argument of smooth.spline in RPy WITHOUT Python interprating it as lambda.
To get the code running you first need to define the data x_train and y_train and you can define x_smooth=np.array(np.linspace(-3,5,1920)). if you want to plot it between -3 and 5 in Full-HD-resolution.
Note that this code is not fully compatible with Jupyter-notebooks for the latest versions of rpy2. You can fix this by using !pip install -Iv rpy2==3.4.2 as described in NotImplementedError: Conversion 'rpy2py' not defined for objects of type '<class 'rpy2.rinterface.SexpClosure'>' only after I run the code twice
I've been looking for exactly the same thing, but would rather not have to translate the code to Python. The Splinter package seems like an option, however: https://github.com/bgrimstad/splinter
From research on google, I concluded that
By contrast, the smooth.spline in R allows having knots at all the x values, without necessarily having a wiggly curve that hits all the points -- the penalty comes from the second derivative.

Determining Betweenness Centrality With The Igraph Library

I am a very, very mediocre programmer, but I still aim to use the igraph python library to determine the effect of a user's centrality in a given forum to predict his later contributions to that forum.
I got in touch with someone else who used the NetworkX library to do something similar, but given the current size of the forum, calculating exact centrality indices is virtually impossible--it just takes too much time.
This was his code though:
import networkx as netx
import sys, csv
if len(sys.argv) is not 2:
print 'Please specify an input graph.'
sys.exit(1)
ingraph = sys.argv[1]
graph = netx.readwrite.gpickle.read_gpickle(ingraph)
num_nodes = len(graph.nodes())
print '%s nodes found in input graph.' % num_nodes
print 'Recording data in centrality.csv'
# Calculate all of the betweenness measures
betweenness = netx.algorithms.centrality.betweenness_centrality(graph)
print 'Betweenness computations complete.'
closeness = netx.algorithms.centrality.closeness_centrality(graph)
print 'Closeness computations complete.'
outcsv = csv.writer(open('centrality.csv', 'wb'))
for node in graph.nodes():
outcsv.writerow([node, betweenness[node], closeness[node]])
print 'Complete!'
I tried to write something similar with the igraph library (which allows to make quick estimates rather than exact calculations), but I cannot seem to write data to a CSV file.
My code:
import igraph
import sys, csv
from igraph import *
graph = Graph.Read_Pajek("C:\karate.net")
print igraph.summary(graph)
estimate = graph.betweenness(vertices=None, directed=True, cutoff=2)
print 'Betweenness computation complete.'
outcsv = csv.writer(open('estimate.csv', 'wb'))
for v in graph.vs():
outcsv.writerow([v, estimate[vs]])
print 'Complete!'
I can't find how to call individual vertices (or nodes, in NetworkX jargon) in the igraph documentation, so that's where I am getting error messages). Perhaps I am forgetting something else as well; I am probably too bad a programmer to notice :P
What am I doing wrong?
So, for the sake of clarity, the following eventually proved to do the trick:
import igraph
import sys, csv
from igraph import *
from itertools import izip
graph = Graph.Read_GML("C:\stack.gml")
print igraph.summary(graph)
my_id_to_igraph_id = dict((v, k) for k, v in enumerate(graph.vs["id"]))
estimate = graph.betweenness(directed=True, cutoff=16)
print 'Betweenness computation complete.'
print graph.vertex_attributes()
outcsv = csv.writer(open('estimate17.csv', 'wb'))
outcsv.writerows(izip(graph.vs["id"], estimate))
print 'Complete!'
As you already noticed, individual vertices in igraph are accessed using the vs attribute of your graph object. vs behaves like a list, so iterating over it will yield the vertices of the graph. Each vertex is represented by an instance of the Vertex class, and the index of the vertex is given by its index attribute. (Note that igraph uses continuous numeric indices for both the vertices and the edges, so that's why you need the index attribute and cannot use your original vertex names directly).
I presume that what you need is the name of the vertex that was stored originally in your input file. Names are stored in the name or id vertex attribute (depending on your input format), so what you need is probably this:
for v in graph.vs:
outcsv.writerow([v["name"], estimate[v.index]])
Note that vertex attributes are accessed by indexing the vertex object as if it was a dictionary. An alternative is to use the vs object directly as a dictionary; this will give you a list containing the values of the given vertex attribute for all vertices. E.g.:
from itertools import izip
for name, est in izip(graph.vs["name"], estimate):
outcsv.writerow([name, est])
An even faster version using generator expressions:
outcsv.writerows(izip(graph.vs["name"], estimate))

Categories

Resources