Determining Betweenness Centrality With The Igraph Library

Determining Betweenness Centrality With The Igraph Library - python

I am a very, very mediocre programmer, but I still aim to use the igraph python library to determine the effect of a user's centrality in a given forum to predict his later contributions to that forum.
I got in touch with someone else who used the NetworkX library to do something similar, but given the current size of the forum, calculating exact centrality indices is virtually impossible--it just takes too much time.
This was his code though:
import networkx as netx
import sys, csv
if len(sys.argv) is not 2:
print 'Please specify an input graph.'
sys.exit(1)
ingraph = sys.argv[1]
graph = netx.readwrite.gpickle.read_gpickle(ingraph)
num_nodes = len(graph.nodes())
print '%s nodes found in input graph.' % num_nodes
print 'Recording data in centrality.csv'
# Calculate all of the betweenness measures
betweenness = netx.algorithms.centrality.betweenness_centrality(graph)
print 'Betweenness computations complete.'
closeness = netx.algorithms.centrality.closeness_centrality(graph)
print 'Closeness computations complete.'
outcsv = csv.writer(open('centrality.csv', 'wb'))
for node in graph.nodes():
outcsv.writerow([node, betweenness[node], closeness[node]])
print 'Complete!'
I tried to write something similar with the igraph library (which allows to make quick estimates rather than exact calculations), but I cannot seem to write data to a CSV file.
My code:
import igraph
import sys, csv
from igraph import *
graph = Graph.Read_Pajek("C:\karate.net")
print igraph.summary(graph)
estimate = graph.betweenness(vertices=None, directed=True, cutoff=2)
print 'Betweenness computation complete.'
outcsv = csv.writer(open('estimate.csv', 'wb'))
for v in graph.vs():
outcsv.writerow([v, estimate[vs]])
print 'Complete!'
I can't find how to call individual vertices (or nodes, in NetworkX jargon) in the igraph documentation, so that's where I am getting error messages). Perhaps I am forgetting something else as well; I am probably too bad a programmer to notice :P
What am I doing wrong?

So, for the sake of clarity, the following eventually proved to do the trick:
import igraph
import sys, csv
from igraph import *
from itertools import izip
graph = Graph.Read_GML("C:\stack.gml")
print igraph.summary(graph)
my_id_to_igraph_id = dict((v, k) for k, v in enumerate(graph.vs["id"]))
estimate = graph.betweenness(directed=True, cutoff=16)
print 'Betweenness computation complete.'
print graph.vertex_attributes()
outcsv = csv.writer(open('estimate17.csv', 'wb'))
outcsv.writerows(izip(graph.vs["id"], estimate))
print 'Complete!'

As you already noticed, individual vertices in igraph are accessed using the vs attribute of your graph object. vs behaves like a list, so iterating over it will yield the vertices of the graph. Each vertex is represented by an instance of the Vertex class, and the index of the vertex is given by its index attribute. (Note that igraph uses continuous numeric indices for both the vertices and the edges, so that's why you need the index attribute and cannot use your original vertex names directly).
I presume that what you need is the name of the vertex that was stored originally in your input file. Names are stored in the name or id vertex attribute (depending on your input format), so what you need is probably this:
for v in graph.vs:
outcsv.writerow([v["name"], estimate[v.index]])
Note that vertex attributes are accessed by indexing the vertex object as if it was a dictionary. An alternative is to use the vs object directly as a dictionary; this will give you a list containing the values of the given vertex attribute for all vertices. E.g.:
from itertools import izip
for name, est in izip(graph.vs["name"], estimate):
outcsv.writerow([name, est])
An even faster version using generator expressions:
outcsv.writerows(izip(graph.vs["name"], estimate))

Related

Find all shortest paths between all pairs of nodes in NetworkX

I am trying to get all shortest paths between all pairs of nodes in an undirected unweighted graph. I am currently using nx.all_pairs_shortest_path(), but I don't understand why it only returns one shortest path for every pair of nodes. There are cycles in my graph so there should exist multiple shortest paths between certain nodes. Any suggestions?

Iterate over all nodes in the graph:
results = []
for n1 in G.nodes():
for n2 in G.nodes():
shortest_path = nx.single_source_dijkstra(G, source=n1, target=n2, weight=f)
results.append(shortest_path)

I may be late, but I just came across the same problem and this was my solution:
def all_shortest_paths(G):
a = list(nx.all_pairs_shortest_path(G))
all_sp_list = []
for n in range(len(G.nodes)):
a1 = a[n][1]
for k,v in a1.items():
all_sp_list.append(len(v))
return all_sp_list
Every other way I tried was getting very very slow because my graph had a bunch of nodes, so this was my fastest solution.

I stumbled upon this problem myself and arrived her in my quest for a solution. unfortunately networkx doesn't have a function to calculate all the shortest pathes for every pair of node. Moreover the answer from Igor Michetti wasn't giving what i wanted at all but it might have been tweekable.
The answer from math_noob was good because it was close enough for me to make up a solution but the problem was that it was way way too slow.
def single_source_shortest_paths(graph,source):
shortest_paths_dict = {}
for node in graph:
shortest_paths_dict[node] = list(nx.all_shortest_paths(graph,source,node))
return shortest_paths_dict
def all_shortest_paths(graph):
for source in graph:
yield source, single_source_shortest_paths(source)
So i went back on networkx documentation and try to find one last time if there was any function i could use. but there was none so i decided i would implment it myself.So i first tried to implement everything by had which was a bit chaotic to only realise that it was a bit better but not that much so i decided i would try to look into the source code and found there a holy grail which is the nx.predecessor function.
This function is called on the graph and the source node only so it doesn't depends on the target node and it is the one doing most of the hard work. So i just recreated the function single_source_shortest_paths but by calling only once nx.predecessor per source node and then doing the same as all_shortest_path which only consist of calling another function with the right arguments.
def single_source_shortest_paths_pred(graph,source):
shortest_paths_dict = {}
pred = nx.predecessor(graph,source)
for node in graph:
shortest_paths_dict[node] = list(nx.algorithms.shortest_paths.generic._build_paths_from_predecessors([source], node, pred))
return shortest_paths_dict
In terms of time nx.predecessor takes the majority of the time to execute so the second function is around n times faster where n is the number of nodes in the graph

Is there a way to detect changes in a graph created with NetworkX?

I am a beginner Python programmer, and I would really appreciate your help with the following issue.
I have a first script that creates a graph and saves it as a Pickle file:
import networkx as nx
G = nx.Graph()
#Some code lines to add nodes and edges...
nx.write_gpickle(G, "graph_file.gpickle")
Then, I have a second script where I want to read the Pickle file and detect any changes in the graph nodes attributes, in case the first script was rerun with different values for the nodes attributes:
import networkx as nx
G = nx.read_gpickle("graph_file.gpickle")
#I need to implement something like this:
for node in G.nodes():
if node["attribute"] changed from previous value:
print(f"The attribute of the node {node} changed from {previous value of node["attribute"]} to {new value of node["attribute"]}")

Creating publication-quality geometric figures in Python

I am a mathematician. Recently, I became the editor of the puzzles and problems column for a well-known magazine. Occasionally, I need to create a figure to accompany a problem or solution. These figures mostly relate to 2D (occasionally, 3D) euclidean geometry (lines, polygons, circles, plus the occasional ellipse or other conic section). The goal is obtaining figures of very high quality (press-ready), with Computer Modern ("TeX") textual labels. My hope is finding (or perhaps helping write!) a relatively high-level Python library that "knows" euclidean geometry in the sense that natural operations (e.g., drawing a perpendicular line to a given one passing through a given point, bisecting a given angle, or reflecting a figure A on a line L to obtain a new figure A') are already defined in the library. Of course, the ability to create figures after their elements are defined is a crucial goal (e.g., as Encapsulated Postscript).
I know multiple sub-optimal solutions to this problem (some partial), but I don't know of any that is both simple and flexible. Let me explain:
Asymptote (similar to/based on Metapost) allows creating extremely high-quality figures of great complexity, but knows almost nothing about geometric constructions (it is a rather low-level language) and thus any nontrivial construction requires quite a long script.
TikZ with package tkz-euclide is high-level, flexible and also generates quality figures, but its syntax is so heavy that I just cry for Python's simplicity in comparison. (Some programs actually export to TikZ---see below.)
Dynamic Geometry programs, of which I'm most familiar with Geogebra, often have figure-exporting features (EPS, TikZ, etc.), but are meant to be used interactively. Sometimes, what one needs is a figure based on hard specs (e.g., exact side lengths)---defining objects in a script is ultimately more flexible (if correspondingly less convenient).
Two programs, Eukleides and GCLC, are closest to what I'm looking for: They generate figures (EPS format; GCLC also exports to TikZ). Eukleides has the prettiest, simplest syntax of all the options (see the examples), but it happens to be written in C (with source available, though I'm not sure about the license), rather limited/non-customizable, and no longer maintained. GCLC is still maintained but it is closed-source, its syntax is significantly worse than Eukleides's, and has certain other unnatural quirks. Besides, it is not available for Mac OS (my laptop is a Mac).
Python has:
Matplotlib, which produces extremely high-quality figures (particularly of functions or numerical data), but does not seem to know about geometric constructions, and
Sympy has a geometry module which does know about geometric objects and constructions, all accessible in delightful Python syntax, but seems to have no figure-exporting (or even displaying?) capabilities.
Finally, a question: Is there a library, something like "Figures for Sympy/geometry", that uses Python syntax to describe geometric objects and constructions, allowing to generate high-quality figures (primarily for printing, say EPS)?
If a library with such functionality does not exist, I would consider helping to write one (perhaps an extension to Sympy?). I will appreciate pointers.

There is a way to generate vector images with matplotlob, outputting with the library io to a vector image (SVG) with this approach.
I personally tried to run the code of the approach (generate a vectorial histogram) in that webpage as a python file, and it worked.
The code:
import numpy as np
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
from io import BytesIO
import json
plt.rcParams['svg.fonttype'] = 'none'
# Apparently, this `register_namespace` method is necessary to avoid garbling
# the XML namespace with ns0.
ET.register_namespace("", "http://www.w3.org/2000/svg")
# Fixing random state for reproducibility
np.random.seed(19680801)
# --- Create histogram, legend and title ---
plt.figure()
r = np.random.randn(100)
r1 = r + 1
labels = ['Rabbits', 'Frogs']
H = plt.hist([r, r1], label=labels)
containers = H[-1]
leg = plt.legend(frameon=False)
plt.title("From a web browser, click on the legend\n"
"marker to toggle the corresponding histogram.")
# --- Add ids to the svg objects we'll modify
hist_patches = {}
for ic, c in enumerate(containers):
hist_patches['hist_%d' % ic] = []
for il, element in enumerate(c):
element.set_gid('hist_%d_patch_%d' % (ic, il))
hist_patches['hist_%d' % ic].append('hist_%d_patch_%d' % (ic, il))
# Set ids for the legend patches
for i, t in enumerate(leg.get_patches()):
t.set_gid('leg_patch_%d' % i)
# Set ids for the text patches
for i, t in enumerate(leg.get_texts()):
t.set_gid('leg_text_%d' % i)
# Save SVG in a fake file object.
f = BytesIO()
plt.savefig(f, format="svg")
# Create XML tree from the SVG file.
tree, xmlid = ET.XMLID(f.getvalue())
# --- Add interactivity ---
# Add attributes to the patch objects.
for i, t in enumerate(leg.get_patches()):
el = xmlid['leg_patch_%d' % i]
el.set('cursor', 'pointer')
el.set('onclick', "toggle_hist(this)")
# Add attributes to the text objects.
for i, t in enumerate(leg.get_texts()):
el = xmlid['leg_text_%d' % i]
el.set('cursor', 'pointer')
el.set('onclick', "toggle_hist(this)")
# Create script defining the function `toggle_hist`.
# We create a global variable `container` that stores the patches id
# belonging to each histogram. Then a function "toggle_element" sets the
# visibility attribute of all patches of each histogram and the opacity
# of the marker itself.
script = """
<script type="text/ecmascript">
<![CDATA[
var container = %s
function toggle(oid, attribute, values) {
/* Toggle the style attribute of an object between two values.
Parameters
----------
oid : str
Object identifier.
attribute : str
Name of style attribute.
values : [on state, off state]
The two values that are switched between.
*/
var obj = document.getElementById(oid);
var a = obj.style[attribute];
a = (a == values[0] || a == "") ? values[1] : values[0];
obj.style[attribute] = a;
}
function toggle_hist(obj) {
var num = obj.id.slice(-1);
toggle('leg_patch_' + num, 'opacity', [1, 0.3]);
toggle('leg_text_' + num, 'opacity', [1, 0.5]);
var names = container['hist_'+num]
for (var i=0; i < names.length; i++) {
toggle(names[i], 'opacity', [1, 0])
};
}
]]>
</script>
""" % json.dumps(hist_patches)
# Add a transition effect
css = tree.getchildren()[0][0]
css.text = css.text + "g {-webkit-transition:opacity 0.4s ease-out;" + \
"-moz-transition:opacity 0.4s ease-out;}"
# Insert the script and save to file.
tree.insert(0, ET.XML(script))
ET.ElementTree(tree).write("svg_histogram.svg")
Previously, you need to pip install the required libraries on the top lines, and it successfully saved a SVG file with a plot (you can read the file and zoomwant in the histogram and you will get no pixels, as the image is generated with mathematicals functions).
It (obviously for our time) uses python 3.
You then could import the SVG image within your TeX document for the publication rendering.
I hope it may help.
Greetings,
Javier.

NetworkX - How to create MultiDiGraph from Shapefile?

I'm just picking up NetworkX and trying to learn how to use it with Shapefiles.
Right now I have a .shp with a road network that I want to represent in a Graph with NetworkX, so that I can find the shortest path between 2 GPS points. I tried using this, but the problem is that when I run the write_shp() function I lose edges because DiGraph does not allow more than one edge between the same two nodes. The arrow in the figure below shows an example of an edge I lose by using a DiGraph.
So I was wondering if there's any way to create a MultiDiGraph so I don't lose any edges, or if there's any way around it that I could use. I was thinking maybe I could write some code to extract the attributes from the Shapefile and create the MultiDiGraph without using NetworkX's read_shp(), but I don't have any experience at all working with graphs, so I'm not exactly sure if it'd be possible.
I'd really appreciate any help or guidance you could give me, or if I've missed any documentation please let me know. Thanks in advance.

As best I can follow from your question, the following will do it, basically copied from the original read_shp command.
def read_multi_shp(path):
"""
copied from read_shp, but allowing MultiDiGraph instead.
"""
try:
from osgeo import ogr
except ImportError:
raise ImportError("read_shp requires OGR: http://www.gdal.org/")
net = nx.MultiDiGraph() # <--- here is the main change I made
def getfieldinfo(lyr, feature, flds):
f = feature
return [f.GetField(f.GetFieldIndex(x)) for x in flds]
def addlyr(lyr, fields):
for findex in xrange(lyr.GetFeatureCount()):
f = lyr.GetFeature(findex)
flddata = getfieldinfo(lyr, f, fields)
g = f.geometry()
attributes = dict(zip(fields, flddata))
attributes["ShpName"] = lyr.GetName()
if g.GetGeometryType() == 1: # point
net.add_node((g.GetPoint_2D(0)), attributes)
if g.GetGeometryType() == 2: # linestring
attributes["Wkb"] = g.ExportToWkb()
attributes["Wkt"] = g.ExportToWkt()
attributes["Json"] = g.ExportToJson()
last = g.GetPointCount() - 1
net.add_edge(g.GetPoint_2D(0), g.GetPoint_2D(last), attr_dict=attributes) #<--- also changed this line
if isinstance(path, str):
shp = ogr.Open(path)
lyrcount = shp.GetLayerCount() # multiple layers indicate a directory
for lyrindex in xrange(lyrcount):
lyr = shp.GetLayerByIndex(lyrindex)
flds = [x.GetName() for x in lyr.schema]
addlyr(lyr, flds)
return net
I changed the returned graph from a DiGraph to a MultiDigraph and I had to change the add_edge command since the MultiDiGraph version has different syntax from DiGraph

If the multi-lines are disconnected at the joint, I think this library python-s2g(https://github.com/caesar0301/python-s2g) could help you. Even networkx's Graph object is used under the hood, those multi-paths are actually recorded by the graph data.

I have implemented a solution here: https://gitlab.com/njacadieux/upstream_downstream_shortests_path_dijkstra
I read the shapefile using GeoPandas and not the networkx.readwrite.nx_shp.read_shp. When I build the graph, I check for parallel edges. If found, rather than skipping them, as does the networkx.readwrite.nx_shp.read_shp function, I split the parallel edges in two edges of equal length and then divide the user length by 2. The user length variable field name must be given.

Generating a graph with certain degree distribution?

I am trying to generate a random graph that has small-world properties (exhibits a power law distribution). I just started using the networkx package and discovered that it offers a variety of random graph generation. Can someone tell me if it possible to generate a graph where a given node's degree follows a gamma distribution (either in R or using python's networkx package)?

If you want to use the configuration model something like this should work in NetworkX:
import random
import networkx as nx
z=[int(random.gammavariate(alpha=9.0,beta=2.0)) for i in range(100)]
G=nx.configuration_model(z)
You might need to adjust the mean of the sequence z depending on parameters in the gamma distribution. Also z doesn't need to be graphical (you'll get a multigraph), but it does need an even sum so you might have to try a few random sequences (or add 1)...
The NetworkX documentation notes for configuration_model give another example, a reference and how to remove parallel edges and self loops:
Notes
-----
As described by Newman [1]_.
A non-graphical degree sequence (not realizable by some simple
graph) is allowed since this function returns graphs with self
loops and parallel edges. An exception is raised if the degree
sequence does not have an even sum.
This configuration model construction process can lead to
duplicate edges and loops. You can remove the self-loops and
parallel edges (see below) which will likely result in a graph
that doesn't have the exact degree sequence specified. This
"finite-size effect" decreases as the size of the graph increases.
References
----------
.. [1] M.E.J. Newman, "The structure and function
of complex networks", SIAM REVIEW 45-2, pp 167-256, 2003.
Examples
--------
>>> from networkx.utils import powerlaw_sequence
>>> z=nx.create_degree_sequence(100,powerlaw_sequence)
>>> G=nx.configuration_model(z)
To remove parallel edges:
>>> G=nx.Graph(G)
To remove self loops:
>>> G.remove_edges_from(G.selfloop_edges())
Here is an example similar to the one at http://networkx.lanl.gov/examples/drawing/degree_histogram.html that makes a drawing including a graph layout of the largest connected component:
#!/usr/bin/env python
import random
import matplotlib.pyplot as plt
import networkx as nx
def seq(n):
return [random.gammavariate(alpha=2.0,beta=1.0) for i in range(100)]
z=nx.create_degree_sequence(100,seq)
nx.is_valid_degree_sequence(z)
G=nx.configuration_model(z) # configuration model
degree_sequence=sorted(nx.degree(G).values(),reverse=True) # degree sequence
print "Degree sequence", degree_sequence
dmax=max(degree_sequence)
plt.hist(degree_sequence,bins=dmax)
plt.title("Degree histogram")
plt.ylabel("count")
plt.xlabel("degree")
# draw graph in inset
plt.axes([0.45,0.45,0.45,0.45])
Gcc=nx.connected_component_subgraphs(G)[0]
pos=nx.spring_layout(Gcc)
plt.axis('off')
nx.draw_networkx_nodes(Gcc,pos,node_size=20)
nx.draw_networkx_edges(Gcc,pos,alpha=0.4)
plt.savefig("degree_histogram.png")
plt.show()

I did this a while ago in base Python... IIRC, I used the following method. From memory, so this may not be entirely accurate, but hopefully it's worth something:
Chose the number of nodes, N, in your graph, and the density (existing edges over possible edges), D. This implies the number of edges, E.
For each node, assign its degree by first choosing a random positive number x and finding P(x), where P is your pdf. The node's degree is (P(x)*E/2) -1.
Chose a node at random, and connect it to another random node. If either node has realized its assigned degree, eliminate it from further selection. Repeat E times.
N.B. that this doesn't create a connected graph in general.

I know this is very late, but you can do the same thing, albeit a little more straightforward, with mathematica.
RandomGraph[DegreeGraphDistribution[{3, 3, 3, 3, 3, 3, 3, 3}], 4]
This will generate 4 random graphs, with each node having a prescribed degree.

Including the above mentioned, networkx provides 4 algorithms that receives the degree_distribution as an input:
configuration_model: explain by #eric
expected_degree_graph: use a probabilistic approach based on the expected degree of each node. It won't give you the exact degrees but an approximation.
havel_hakimi_graph: this one tries to connect the nodes with highest degree first
random_degree_sequence_graph: as far as I can see, this is similar to what #JonC suggested; it has a trials parameter as there is no guarantee of finding a suitable configuration.
The full list (including some versions of the algorithms for directed graphs) is here.
I also found a couple of papers:
Efficient and exact sampling of simple graphs with given arbitrary degree sequence
A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.