how to compute 'nearby' nodes with networkx - python

What I'm looking for here may well be a built-in function in networkx, and have a mathematical name - if so, I'd like to know what it is! it is very difficult to Google for, it seems.
Given a graph G and a starting node i, I'd like to find the subgraph of all the nodes "within P edges" from i - that is, those that are connected to i by a path of less than P edges.
My draft implementation for this is:
import networkx as nx
N = 30
G = nx.Graph()
# populate the graph...
G.add_cycle(range(N))
# the starting node:
i = 15
# the 'distance' limit:
P = 4
neighborhood = [i]
new_neighbors = [i]
depth = 0
while depth < P:
new_neighbors = list(set(sum([
[k for k in G[j].keys() if k not in neighborhood]
for j in new_neighbors], [])))
neighborhood.extend(new_neighbors)
depth += 1
Gneighbors = G.subgraph(neighborhood)
This code works, by the way, so I don't need help with the implementation. I would simply like to know if this has a name, and whether it is provided by the networkx library.
It is very useful when your code is crashing and you want to see why - you can render just the "locality/region" of the graph near the problem node.

Two years late, but I was looking for this same thing and found a built-in that I think will get the subgraph you want: ego_graph. The function signature and documentation:
ego_graph(G, n, radius=1, center=True, undirected=False, distance=None)
Returns induced subgraph of neighbors centered at node n within a given radius.

Use single_source_shortest_path or single_source_shortest_path_length with a cutoff of p
Something like:
nx.single_source_shortest_path_length(G ,source=i, cutoff=p)

Related

extract degree, average degree from Graph class

Has anybody tried to implement a software to to extract degrees, average degrees from Graph Class of NetworkX? I am not asking for implemented methods in networkX which is stable. I am asking here for scratch level implementation.
Here is what I have tried so far (not sure if that is correct)?
for i in range(3, 9):
G = nx.gnp_random_graph(i, 0.2) #Returns a G_{n,p} random graph, also known as an Erdős-Rényi graph or a binomial graph.
#print(len(G))
#print(len(G.nodes()))
from collections import *
import collections
class OrderedCounter(Counter, OrderedDict):
pass
m=[list (i) for i in G.edges()]
flat_list = [item for sublist in m for item in sublist]
counterlist = OrderedCounter(flat_list)
degree_sequence=sorted(sorted(counterlist.values(), reverse=True))
degreeCount=collections.Counter(degree_sequence)
print("degreeCount:", degreeCount)
#deg, cnt = zip(*degreeCount.items()) #Returns the average degree of the neighborhood of each node.
#print(deg, cnt)
nodes = len(G)
count_Triangle = 0 #Initialize result
# Consider every possible triplet of edges in graph
for i in range(nodes):
for j in range(nodes):
for k in range(nodes):
# check the triplet if it satisfies the condition
if( i!=j and i !=k and j !=k and
G[i][j] and G[j][k] and G[k][i]):
count_Triangle += 1
print(count_Triangle)
when I count triangle this way I keep on getting Key error which is because I know the index I am passing is not correct. I thought G is a dict object. Can't figure out.
Also if I try to extract deg, cnt above from which I thought was solution to get average degree, I keep getting error when the dictionary is empty.
For triangle counting
the dict-like access G[u][v] operates on the edge data in the graph G, so the keys in the dict G[u] are not (in general) all other nodes in the graph; though the keys in the dict G do include all nodes in the graph.
If you want to pursue this form of indexing, you would probably be better off generating an adjacency matrix, which has n x n elements for an n-node graph. Then all queries A[i][j] for i in the range [0, n] will be valid; and the return value will be 0 if there is no edge.
also look at itertools, which will make your code cleaner..
for i,j,k in itertools.combinations(xrange(n), 3):
# a generator of all unique combinations of [0,1,2,3,4]
# this already excludes the cases where i==j, i==k j==k
print(i,j,k)
though be careful because there are various functions in this package that are quite similar.
Here is some code that gets you the triangle count here
import networkx as nx
import matplotlib.pyplot as plt
import itertools
T1 = []
T2 = []
n = 7
p = 0.2
reps = 1000
for r in xrange(reps):
G = nx.gnp_random_graph(n, p)
A = nx.adj_matrix(G);
t = 0;
for (i,j,k) in itertools.combinations(xrange(n), 3):
# a generator of all unique 3-combinations of [0,1,2,...,n]
if i==k or i==j or j==k:
print ("Found a duplicate node!", i,j,k)
continue # just skip it -- shouldn't happen
if A[i,j] and A[j,k] and A[i,k]:
t += 1
T1.append(t);
# let's check we agree with networkx built-in
tr = nx.triangles(G)
T2.append(sum(tr.values()))
T2 = [t /3.0 for t in T2]; # divide all through by 3, since this is a count of the nodes of each triangle and not the number of triangles.
plt.figure(1); plt.clf()
plt.hist([T1, T2], 20)
Here you see that the triangle counts are the same (I put a log scale on the y axis since the frequencies of the higher triangle counts are rather tlow).
For degree-counting
It seems that you need a clearer picture of what degree you want to compute:
- This is an undirected graph, which means that if there is an edge between u and v, then both of these nodes should be at least degree-1. Your calculation counts edges only once.
Secondly, the graphs you are producing do not have many edges, especially for the smaller ones. With p=0.2, the fraction of 3-node graphs without any edges at all is 51%, and even 5-node graphs will have no edges 11% of the time. So an empty list is not indicative of a failure.
The average degree is very easy to check, either using the graph attributes:
2*G.number_of_edges() / float(G.number_of_nodes())
or the built-in per-node degree-calculator.
sum([d for (n, d) in nx.degree(G)]) / float(G.number_of_nodes())
There are two mistakes in your code. First, node should be list of nodes in the Graph G not the length of the nodes in the Graph. This will make sure that your logic works for all graphs ( even with Graph whose node do not start with index 0). Also, your for loops should change accordingly, like this
nodes = G.nodes() #<--- Store the list of nodes
count_Triangle = 0 #Initialize result
# Consider every possible triplet of edges in graph
for i in nodes: #<---------Iterate over the lists of nodes
for j in nodes:
for k in nodes:
Next, you do not access the edges of the Graph like indices. You have to use has_edge() method because, incase the edge is not present, the code will not fail.
So your if statement becomes :
if( i!=j and i !=k and j !=k and
G.has_edge(i,j) and G.has_edge(j, k) and G.has_edge(k, i)):
count_Triangle += 1
print(count_Triangle)
Putting all this together, your program becomes:
import networkx as nx
from collections import *
import collections
for i in range(3, 9):
G = nx.gnp_random_graph(i, 0.2)
class OrderedCounter(Counter, OrderedDict):
pass
m=[list (i) for i in G.edges()]
flat_list = [item for sublist in m for item in sublist]
counterlist = OrderedCounter(flat_list)
degree_sequence=sorted(sorted(counterlist.values(), reverse=True))
degreeCount=collections.Counter(degree_sequence)
print("degreeCount:", degreeCount)
#Store the list of nodes
nodes = G.nodes()
count_Triangle = 0 #Initialize result
# Consider every possible triplet of edges in graph
for i in nodes: #<---------Iterate over the lists of nodes
for j in nodes:
for k in nodes:
# Use has_edge method
if( i!=j and i !=k and j !=k and
G.has_edge(i,j) and G.has_edge(j, k) and G.has_edge(k, i)):
count_Triangle += 1
print(count_Triangle)

Cycle detection in a 2-tuple python list

Given a list of edges in 2-tuple, (source, destination), is there any efficient way to determine if a cycle exists? Eg, in the example below, a cycle exists because 1 -> 3 -> 6 -> 4 -> 1. One idea is to calculate the number of occurrence of each integer in the list (again, is there any efficient way to do this?). Is there any better way? I am seeing a problem with 10,000 of 2-tuple edge information.
a = [(1,3), (4,6), (3,6), (1,4)]
I'm assuming you want to find a cycle in the undirected graph represented by your edge list and you don't want to count "trivial" cycles of size 1 or 2.
You can still use a standard depth-first search, but you need to be a bit careful about the node coloring (a simple flag to signal which nodes you have already visited is not sufficient):
from collections import defaultdict
edges = [(1,3), (4,6), (3,6), (1,4)]
adj = defaultdict(set)
for x, y in edges:
adj[x].add(y)
adj[y].add(x)
col = defaultdict(int)
def dfs(x, parent=None):
if col[x] == 1: return True
if col[x] == 2: return False
col[x] = 1
res = False
for y in adj[x]:
if y == parent: continue
if dfs(y, x): res = True
col[x] = 2
return res
for x in adj:
if dfs(x):
print "There's a cycle reachable from %d!" % x
This will detect if there is a back edge in the depth-first forest that spans at least 2 levels. This is exactly the case if there is a simple cycle of size >= 2. By storing parent pointers you can actually print the cycle as well if you found it.
For large graphs you might want to use an explicit stack instead of recursion, as illustrated on Wikipedia.

How to traverse tree with specific properties

I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?
Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.
Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.

Faster way to calculate the number of shortest paths a vertex belongs to using Networkx

I am considering that the Stress of a vertex i is the number of shortest paths between all pairs of vertices that i belongs to.
I am trying to calculate it using Networkx, I've made in three ways so far. The readable, dirty, and dirtiest but none of them is fast. Actually, I would like it to be faster than the betweenness (source) present on Networkx. Is there a better way to calculate that? Thanks in advance for any suggestion, answer or comment. Following see what I did so far:
Ps.: Here is a pastie with the code ready to go if you want give it a try, thanks again.
Here is the common part on all versions:
import networkx as nx
from collections import defaultdict
Dirtiest, brace yourselves:
def stress_centrality_dirtiest(g):
stress = defaultdict(int)
for a in nx.nodes_iter(g):
for b in nx.nodes_iter(g):
if a==b:
continue
# pred = nx.predecessor(G,b) # for unweighted graphs
pred, distance = nx.dijkstra_predecessor_and_distance(g,b) # for weighted graphs
if not pred.has_key(a):
return []
path = [[a,0]]
path_length = 1
index = 0
while index >= 0:
n,i = path[index]
if n == b:
for vertex in map(lambda x:x[0], path[:index+1])[1:-1]:
stress[vertex] += 1
if len(pred[n]) > i:
index += 1
if index == path_length:
path.append([pred[n][i],0])
path_length += 1
else:
path[index] = [pred[n][i],0]
else:
index -= 1
if index >= 0:
path[index][4] += 1
return stress
Dirty
def stress_centrality_dirty(g):
stress = defaultdict(int)
paths = nx.all_pairs_dijkstra_path(g)
for item in paths.values():
for element in item.values():
if len(element) > 2:
for vertex in element[1:-1]:
stress[vertex] += 1
return stress
Readable
def stress_centrality_readable(g):
stress = defaultdict(int)
paths = nx.all_pairs_dijkstra_path(g)
for source in nx.nodes_iter(g):
for end in nx.nodes_iter(g):
if source == end:
continue
path = paths[source][end]
if len(path) > 2: # path must contains at least 3 vertices source - another node - end
for vertex in path[1:-1]: # when counting the number of occurrencies, exclude source and end vertices
stress[vertex] += 1
return stress
The betweenness code you pointed to in NetworkX does almost what you want and can be adjusted easily.
In the betweenness function if you call the following (instead of _accumulate_basic) during the "accumulate" stage it should calculate the stress centrality (untested)
def _accumulate_stress(betweenness,S,P,sigma,s):
delta = dict.fromkeys(S,0)
while S:
w = S.pop()
for v in P[w]:
delta[v] += (1.0+delta[w])
if w != s:
betweenness[w] += sigma[w]*delta[w]
return betweenness
See the paper Ulrik Brandes: On Variants of Shortest-Path Betweenness Centrality and their Generic Computation. Social Networks 30(2):136-145, 2008. http://www.inf.uni-konstanz.de/algo/publications/b-vspbc-08.pdf
The stress centrality algorithm is Algorithm 12.
Based on the answer I have been given here, I tried to do exactly the same thing.
My attempt revolved around the use of the nx.all_shortest_paths(G,source,target) function, which produces a generator:
counts={}
for n in G.nodes(): counts[n]=0
for n in G.nodes():
for j in G.nodes():
if (n!=j):
gener=nx.all_shortest_paths(G,source=n,target=j) #A generator
print('From node '+str(n)+' to '+str(j))
for p in gener:
print(p)
for v in p: counts[v]+=1
print('------')
I have tested this code with a NxN grid network of 100 nodes and it took me approximately 168 seconds to get the results. Now I am aware this is not the best answer as this code is not optimized, but I thought you might have wanted to know about it. Hopefully I can get some directions on how to improve my code.

python, igraph coping with vertex renumbering

I am implementing an algorithm for finding a dense subgraph in a directed graph using python+igraph. The main loop maintains two subgraphs S and T which are initially identical and removes nodes (and incident edges) accoriding to a count of the indegree (or outdegree) of those nodes with respect to the other graph. The problem I have is that igraph renumbers the vertices so when I delete some from T, the remaining nodes no longer correspond to the same ones in S.
Here is the main part of the loop that is key.
def directed(S):
T = S.copy()
c = 2
while(S.vcount() > 0 and T.vcount() > 0):
if (S.vcount()/T.vcount() > c):
AS = S.vs.select(lambda vertex: T.outdegree(vertex) < 1.01*E(S,T)/S.vcount())
S.delete_vertices(AS)
else:
BT = T.vs.select(lambda vertex: S.indegree(vertex) < 1.01*E(S,T)/T.vcount())
T.delete_vertices(BT)
This doesn't work because of the effect of deleting vertices on the vertex ids. Is there a standard workaround for this problem?
One possibility is to assign unique names to the vertices in the name vertex attribute. These are kept intact when vertices are removed (unlike vertex IDs), and you can use them to refer to vertices in functions like indegree or outdegree. E.g.:
>>> g = Graph.Ring(4)
>>> g.vs["name"] = ["A", "B", "C", "D"]
>>> g.degree("C")
2
>>> g.delete_vertices(["B"])
>>> g.degree("C")
1
Note that I have removed vertex B so vertex C also gained a new ID, but the name is still the same.
In your case, the row with the select condition could probably be re-written like this:
AS = S.vs.select(lambda vertex: T.outdegree(vertex["name"]) < 1.01 * E(S,T)/S.vcount())
Of course this assumes that initially the vertex names are the same in S and T.

Categories

Resources