extract degree, average degree from Graph class - python

Has anybody tried to implement a software to to extract degrees, average degrees from Graph Class of NetworkX? I am not asking for implemented methods in networkX which is stable. I am asking here for scratch level implementation.
Here is what I have tried so far (not sure if that is correct)?
for i in range(3, 9):
G = nx.gnp_random_graph(i, 0.2) #Returns a G_{n,p} random graph, also known as an Erdős-Rényi graph or a binomial graph.
#print(len(G))
#print(len(G.nodes()))
from collections import *
import collections
class OrderedCounter(Counter, OrderedDict):
pass
m=[list (i) for i in G.edges()]
flat_list = [item for sublist in m for item in sublist]
counterlist = OrderedCounter(flat_list)
degree_sequence=sorted(sorted(counterlist.values(), reverse=True))
degreeCount=collections.Counter(degree_sequence)
print("degreeCount:", degreeCount)
#deg, cnt = zip(*degreeCount.items()) #Returns the average degree of the neighborhood of each node.
#print(deg, cnt)
nodes = len(G)
count_Triangle = 0 #Initialize result
# Consider every possible triplet of edges in graph
for i in range(nodes):
for j in range(nodes):
for k in range(nodes):
# check the triplet if it satisfies the condition
if( i!=j and i !=k and j !=k and
G[i][j] and G[j][k] and G[k][i]):
count_Triangle += 1
print(count_Triangle)
when I count triangle this way I keep on getting Key error which is because I know the index I am passing is not correct. I thought G is a dict object. Can't figure out.
Also if I try to extract deg, cnt above from which I thought was solution to get average degree, I keep getting error when the dictionary is empty.

For triangle counting
the dict-like access G[u][v] operates on the edge data in the graph G, so the keys in the dict G[u] are not (in general) all other nodes in the graph; though the keys in the dict G do include all nodes in the graph.
If you want to pursue this form of indexing, you would probably be better off generating an adjacency matrix, which has n x n elements for an n-node graph. Then all queries A[i][j] for i in the range [0, n] will be valid; and the return value will be 0 if there is no edge.
also look at itertools, which will make your code cleaner..
for i,j,k in itertools.combinations(xrange(n), 3):
# a generator of all unique combinations of [0,1,2,3,4]
# this already excludes the cases where i==j, i==k j==k
print(i,j,k)
though be careful because there are various functions in this package that are quite similar.
Here is some code that gets you the triangle count here
import networkx as nx
import matplotlib.pyplot as plt
import itertools
T1 = []
T2 = []
n = 7
p = 0.2
reps = 1000
for r in xrange(reps):
G = nx.gnp_random_graph(n, p)
A = nx.adj_matrix(G);
t = 0;
for (i,j,k) in itertools.combinations(xrange(n), 3):
# a generator of all unique 3-combinations of [0,1,2,...,n]
if i==k or i==j or j==k:
print ("Found a duplicate node!", i,j,k)
continue # just skip it -- shouldn't happen
if A[i,j] and A[j,k] and A[i,k]:
t += 1
T1.append(t);
# let's check we agree with networkx built-in
tr = nx.triangles(G)
T2.append(sum(tr.values()))
T2 = [t /3.0 for t in T2]; # divide all through by 3, since this is a count of the nodes of each triangle and not the number of triangles.
plt.figure(1); plt.clf()
plt.hist([T1, T2], 20)
Here you see that the triangle counts are the same (I put a log scale on the y axis since the frequencies of the higher triangle counts are rather tlow).
For degree-counting
It seems that you need a clearer picture of what degree you want to compute:
- This is an undirected graph, which means that if there is an edge between u and v, then both of these nodes should be at least degree-1. Your calculation counts edges only once.
Secondly, the graphs you are producing do not have many edges, especially for the smaller ones. With p=0.2, the fraction of 3-node graphs without any edges at all is 51%, and even 5-node graphs will have no edges 11% of the time. So an empty list is not indicative of a failure.
The average degree is very easy to check, either using the graph attributes:
2*G.number_of_edges() / float(G.number_of_nodes())
or the built-in per-node degree-calculator.
sum([d for (n, d) in nx.degree(G)]) / float(G.number_of_nodes())

There are two mistakes in your code. First, node should be list of nodes in the Graph G not the length of the nodes in the Graph. This will make sure that your logic works for all graphs ( even with Graph whose node do not start with index 0). Also, your for loops should change accordingly, like this
nodes = G.nodes() #<--- Store the list of nodes
count_Triangle = 0 #Initialize result
# Consider every possible triplet of edges in graph
for i in nodes: #<---------Iterate over the lists of nodes
for j in nodes:
for k in nodes:
Next, you do not access the edges of the Graph like indices. You have to use has_edge() method because, incase the edge is not present, the code will not fail.
So your if statement becomes :
if( i!=j and i !=k and j !=k and
G.has_edge(i,j) and G.has_edge(j, k) and G.has_edge(k, i)):
count_Triangle += 1
print(count_Triangle)
Putting all this together, your program becomes:
import networkx as nx
from collections import *
import collections
for i in range(3, 9):
G = nx.gnp_random_graph(i, 0.2)
class OrderedCounter(Counter, OrderedDict):
pass
m=[list (i) for i in G.edges()]
flat_list = [item for sublist in m for item in sublist]
counterlist = OrderedCounter(flat_list)
degree_sequence=sorted(sorted(counterlist.values(), reverse=True))
degreeCount=collections.Counter(degree_sequence)
print("degreeCount:", degreeCount)
#Store the list of nodes
nodes = G.nodes()
count_Triangle = 0 #Initialize result
# Consider every possible triplet of edges in graph
for i in nodes: #<---------Iterate over the lists of nodes
for j in nodes:
for k in nodes:
# Use has_edge method
if( i!=j and i !=k and j !=k and
G.has_edge(i,j) and G.has_edge(j, k) and G.has_edge(k, i)):
count_Triangle += 1
print(count_Triangle)

Related

Fastest way to sample most numbers with minimum difference larger than a value from a Python list

Given a list of 20 float numbers, I want to find a largest subset where any two of the candidates are different from each other larger than a mindiff = 1.. Right now I am using a brute-force method to search from largest to smallest subsets using itertools.combinations. As shown below, the code finds a subset after 4 s for a list of 20 numbers.
from itertools import combinations
import random
from time import time
mindiff = 1.
length = 20
random.seed(99)
lst = [random.uniform(1., 10.) for _ in range(length)]
t0 = time()
n = len(lst)
sample = []
found = False
while not found:
# get all subsets with size n
subsets = list(combinations(lst, n))
# shuffle to ensure randomness
random.shuffle(subsets)
for subset in subsets:
# sort the subset numbers
ss = sorted(subset)
# calculate the differences between every two adjacent numbers
diffs = [j-i for i, j in zip(ss[:-1], ss[1:])]
if min(diffs) > mindiff:
sample = set(subset)
found = True
break
# check subsets with size -1
n -= 1
print(sample)
print(time()-t0)
Output:
{2.3704888087015568, 4.365818049020534, 5.403474619948962, 6.518944556233767, 7.8388969285727015, 9.117993839791751}
4.182451486587524
However, in reality I have a list of 200 numbers, which is infeasible for a brute-froce enumeration. I want a fast algorithm to sample just one random largest subset with a minimum difference larger than 1. Note that I want each sample has randomness and maximum size. Any suggestions?
My previous answer assumed you simply wanted a single optimal solution, not a uniform random sample of all solutions. This answer assumes you want one that samples uniformly from all such optimal solutions.
Construct a directed acyclic graph G where there is one node for each point, and nodes a and b are connected when b - a > mindist. Also add two virtual nodes, s and t, where s -> x for all x and x -> t for all x.
Calculate for each node in G how many paths of length k exist to t. You can do this efficiently in O(n^2 k) time using dynamic programming with a table P[x][k], filling initially P[x][0] = 0 except P[t][0] = 1, and then P[x][k] = sum(P[y][k-1] for y in neighbors(x)).
Keep doing this until you reach the maximum k - you now know the size of the optimal subset.
Uniformly sample a path of length k from s to t using P to weight your choices.
This is done by starting at s. We then look at each neighbor of s and choose one randomly with a weighting dictated by P[s][k]. This gives us our first element of the optimal set.
We then repeatedly perform this step. We are at x, look at the neighbors of x and pick one randomly using weights P[x][k-i] where i is the step we're at.
Use the nodes you sampled in 3 as your random subset.
An implementation of the above in pure Python:
import random
def sample_mindist_subset(xs, mindist):
# Construct directed graph G.
n = len(xs)
s = n; t = n + 1 # Two virtual nodes, source and sink.
neighbors = {
i: [t] + [j for j in range(n) if xs[j] - xs[i] > mindist]
for i in range(n)}
neighbors[s] = [t] + list(range(n))
neighbors[t] = []
# Compute number of paths P[x][k] from x to t of length k.
P = [[0 for _ in range(n+2)] for _ in range(n+2)]
P[t][0] = 1
for k in range(1, n+2):
for x in range(n+2):
P[x][k] = sum(P[y][k-1] for y in neighbors[x])
# Sample maximum length path uniformly at random.
maxk = max(k for k in range(n+2) if P[s][k] > 0)
path = [s]
while path[-1] != t:
candidates = neighbors[path[-1]]
weights = [P[cn][maxk-len(path)] for cn in candidates]
path.append(random.choices(candidates, weights)[0])
return [xs[i] for i in path[1:-1]]
Note that if you want to sample from the same set of numbers many times, you don't have to recompute P every single time and can re-use it.
I probably don't fully understand the question, because right now the solution is quite trivial. EDIT: yes, I misunderstood after all, the OP does not just want an optimal solution, but wishes to randomly sample from the set of optimal solutions. This answer is not incorrect but it also is an answer to a different question than what OP is interested in.
Simply sort the numbers and greedily construct the subset:
def mindist_subset(xs, mindist):
result = []
for x in sorted(xs):
if not result or x - result[-1] > mindist:
result.append(x)
return result
Sketch of proof of correctness.
Suppose we have a solution S given input array A that is of optimal size. If it does not contain min(A) note that we could remove min(S) from S and add min(A) since this would only increase the distance between min(S) and the second smallest number in S. Conclusion: we can without loss of generality assume that min(A) is part of an optimal solution.
Now we can apply this argument recursively. We add min(A) to a solution and remove all elements too close to min(A), giving remaining elements A'. Then we're left with a subproblem where exactly the same argument applies, we can choose min(A') as our next element of the solution, etc.

how can i optimize this pseudo code

i am new user in python... i know that question very primitive but my project have lots of sets and i need effective and fast code
i want to generate a matrix with if condition.
for example:
M=Matrix(m[i,j] if Condition1 and Condition2 and ...)
how can i optimize following pseudo code?
import networkx as nx
import numpy as np
#G=nx.graph()
#G.neighbors(node)
def seidel_matrix(G):
n=nx.number_of_nodes(G)
x=np.zeros((n,n))
for i in range(n):
for j in range(n):
if i==j:
x[i][j]=0
elif i in G.neighbors(j):
x[i][j]=-1
else:
x[i][j]=1
return x
There are probably multiple ways to do this. Right now, you're looping over every possible edge. If there are lots of non-edges this is a poor choice. It would be faster to just loop over every edge that actually exists.
x = np.ones((n,n)) #default entry is 1.
for u, v in G.edges(): #get edges right
x[u][v] = -1
x[v][u] = -1 #assuming undirected network
for u in G.nodes(): #get diagonal right.
x[u][u] = 0
Note that this assumes that the nodes are labeled 0, 1, ..., n-1

Cycle detection in a 2-tuple python list

Given a list of edges in 2-tuple, (source, destination), is there any efficient way to determine if a cycle exists? Eg, in the example below, a cycle exists because 1 -> 3 -> 6 -> 4 -> 1. One idea is to calculate the number of occurrence of each integer in the list (again, is there any efficient way to do this?). Is there any better way? I am seeing a problem with 10,000 of 2-tuple edge information.
a = [(1,3), (4,6), (3,6), (1,4)]
I'm assuming you want to find a cycle in the undirected graph represented by your edge list and you don't want to count "trivial" cycles of size 1 or 2.
You can still use a standard depth-first search, but you need to be a bit careful about the node coloring (a simple flag to signal which nodes you have already visited is not sufficient):
from collections import defaultdict
edges = [(1,3), (4,6), (3,6), (1,4)]
adj = defaultdict(set)
for x, y in edges:
adj[x].add(y)
adj[y].add(x)
col = defaultdict(int)
def dfs(x, parent=None):
if col[x] == 1: return True
if col[x] == 2: return False
col[x] = 1
res = False
for y in adj[x]:
if y == parent: continue
if dfs(y, x): res = True
col[x] = 2
return res
for x in adj:
if dfs(x):
print "There's a cycle reachable from %d!" % x
This will detect if there is a back edge in the depth-first forest that spans at least 2 levels. This is exactly the case if there is a simple cycle of size >= 2. By storing parent pointers you can actually print the cycle as well if you found it.
For large graphs you might want to use an explicit stack instead of recursion, as illustrated on Wikipedia.

How to traverse tree with specific properties

I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?
Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.
Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.

how to compute 'nearby' nodes with networkx

What I'm looking for here may well be a built-in function in networkx, and have a mathematical name - if so, I'd like to know what it is! it is very difficult to Google for, it seems.
Given a graph G and a starting node i, I'd like to find the subgraph of all the nodes "within P edges" from i - that is, those that are connected to i by a path of less than P edges.
My draft implementation for this is:
import networkx as nx
N = 30
G = nx.Graph()
# populate the graph...
G.add_cycle(range(N))
# the starting node:
i = 15
# the 'distance' limit:
P = 4
neighborhood = [i]
new_neighbors = [i]
depth = 0
while depth < P:
new_neighbors = list(set(sum([
[k for k in G[j].keys() if k not in neighborhood]
for j in new_neighbors], [])))
neighborhood.extend(new_neighbors)
depth += 1
Gneighbors = G.subgraph(neighborhood)
This code works, by the way, so I don't need help with the implementation. I would simply like to know if this has a name, and whether it is provided by the networkx library.
It is very useful when your code is crashing and you want to see why - you can render just the "locality/region" of the graph near the problem node.
Two years late, but I was looking for this same thing and found a built-in that I think will get the subgraph you want: ego_graph. The function signature and documentation:
ego_graph(G, n, radius=1, center=True, undirected=False, distance=None)
Returns induced subgraph of neighbors centered at node n within a given radius.
Use single_source_shortest_path or single_source_shortest_path_length with a cutoff of p
Something like:
nx.single_source_shortest_path_length(G ,source=i, cutoff=p)

Categories

Resources