Getting nodes without edges (When the N is larger than 60) - python

First I generated an NxN matrix of zeros and ones using NumPy. After that, I generated a copy matrix from the previous matrix, I replaced the ones in the first matrix with the weight of the edges. ( The matrix is symmetric and connected and undirected and its diagonal is zero like the original matrix) and I used BSF to check if it's connected and I found it connected every time. Then I used SciPy to find the MST (Minimum Spanning Tree). After that, I illustrated the MST using Network X
for generating NxN Matrix of zeros and ones
base = np.zeros((shape,shape))
for _ in range(100):
a = np.random.randint(shape)
b = np.random.randint(shape)
if a != b:
base[a, b] = 1
base[b, a] = 1
for generating NxN Matrix with the weight of edges
# Fetch the location of the 1s.
Weightofedges = base
ones = np.argwhere(Weightofedges == 1)
ones = ones[ones[:, 0] < ones[:, 1], :]
# Assign random values.
for a, b in ones:
Weightofedges[a, b] = Weightofedges[b, a] = np.random.randint(100)
Find the MST using SciPy
from scipy.sparse.csgraph import minimum_spanning_tree
X = minimum_spanning_tree(Weightofedges)
print("The Output Of The MST By Kruskal Algorithm:")
print(" Edges: Weights:")
print(X)
print("-----------------------")
my_matrix3 = X.toarray().astype(int)
The Problem: When I input a matrix with a large number of nodes I got some nodes not connected with an edge
e.g.
Number Of Nodes equals 75
Number Of Edges equals 65
In the MST the edges must be N-1 where N is the number of nodes
This is the graph using N = 75 ( as shown there are nodes without edges )
enter image description here

You have created a weighted version of the Erdős–Rényi model - to be exact the ER model G(n,M) with n nodes and M edges. Currently, you have fixed M=100 and you observe for n>60 that your becomes disconnected. This is typical and (at least for the second ER model variant G(n,p) with n nodes and probability of an edge p) you can even calculate the threshold where you (almost surely) get a single/large connected component. But even without the math, you can intuitively see that it becomes difficult to connect 75 nodes with only 100 random edges.
I recommend that you check out the networkx package, if you want to do more with graphs on python. For example, the implementation of the G(n,p) variant: erdos_renyi_graph.

Related

Integer Programming for NNC

I'm trying to implement Integer Programming for Nearest Neighbor Classifier in python using cvxpy.
Short intro
Given a dataset of n points with a color (red or blue) we would like to choose the minimal number of candidate points, s.t for each point that isn`t a candidate, its closest candidate has the same color.
My flow
Given a set of n points (with colors) define an indicator vector I (|I| = n),
I_i = 1 if and only if vertex i is chosen as a candidate
In addition, I defined two more vectors, named as A and B (|A| = |B| = n) as follow:
A_i = the distance between v_i to it's closest candidate with the **same** color
B_i = the distance between v_i to it's closest candidate with a **different** color
Therefore, I have n constrains which are:
B_i > A_i
for any i
My target is to minimize the sum of vector I (which represents the number of candidates)
My Issue
Its seems that the vectors A, B are changing because they affected by I, since when a candidate is chosen, it is affecting its entry in I which affects A and B and the constrains are dependent on those vectors..
Any suggestions?
Thanks !
To recap: you want to find the smallest set of examples belonging to a given training set such that the resulting nearest neighbor classifier achieves perfect accuracy on that training set.
I would suggest that you formulate this as follows. Create a 0–1 variable x(e) for each example e indicating whether e is chosen. For each ordered pair of examples e and e′ with different labels, write a constraint
x(e′) ≤ ∑e′′∈C(e,e′) x(e′′)
where C(e, e′) is the set of examples e′′ with the same label as e such that e′′ is closer to e than e′ is to e (including e′′ = e). This means that, if e′ is chosen, then it is not the nearest chosen example to e.
We also need
∑e x(e) ≥ 1
to disallow the empty set. Finally, the objective is
minimize ∑e x(e).

How to construct a sparse recurrent neural network with limited connection in Python?

I’m trying to build a sparse recurrent neural network where there are 100 neurons in total, and each neuron is only randomly connected with 10 other neurons, and the weight is randomly drawn from a gaussian distribution with 0 mean 5e-05 standard deviation.
I know that in Python, to drawn weights from Gaussian distribution, I could use:
np.random.normal(0, 5e-05, (100, 100))
But what would be an efficient way to set up each neuron randomly connected to 10 other neurons in the network? I guess this could probably be achieved with basic python functions, without going to tensorflow or pytorch, but I’m welcoming all possible solutions.
Thanks,
Lily
Use the random.choice functionality from numpy.
import pandas as pd
import numpy as np
# Create a RNG
rng = np.random.default_rng(10)
n = 100 # Number of nodes
k = 10 # Number of edges per node
# Create an empty connectivity matrix
c = np.zeros((n, n), dtype=bool)
for i in range(c.shape[0]):
# End when all nodes have the right number of edges
if np.all(c.sum(axis=1) == k):
break
# Select more edges from nodes with fewer edges by weighting probability
p = 1 - c.sum(axis=1) / k
# Set the probability of self-association to zero
p[i] = 0
# Choose as many edges as needed for this node to bring it up to k
new_edges = rng.choice(np.arange(n),
size=k - c[i, :].sum(),
p=p / np.sum(p),
replace=False)
# Add the randomly selected edges for this node to a symmetric connectivity matrix
c[i, new_edges] = True
c[new_edges, i] = True
This gives a connectivity matrix where all the rows and columns sum to 10 edges, and the diagonal is zero (no nodes self-associate).

How to generate a Rank 5 matrix with entries Uniform?

I want to generate a rank 5 100x600 matrix in numpy with all the entries sampled from np.random.uniform(0, 20), so that all the entries will be uniformly distributed between [0, 20). What will be the best way to do so in python?
I see there is an SVD-inspired way to do so here (https://math.stackexchange.com/questions/3567510/how-to-generate-a-rank-r-matrix-with-entries-uniform), but I am not sure how to code it up. I am looking for a working example of this SVD-inspired way to get uniformly distributed entries.
I have actually managed to code up a rank 5 100x100 matrix by vertically stacking five 20x100 rank 1 matrices, then shuffling the vertical indices. However, the resulting 100x100 matrix does not have uniformly distributed entries [0, 20).
Here is my code (my best attempt):
import numpy as np
def randomMatrix(m, n, p, q):
# creates an m x n matrix with lower bound p and upper bound q, randomly.
count = np.random.uniform(p, q, size=(m, n))
return count
Qs = []
my_rank = 5
for i in range(my_rank):
L = randomMatrix(20, 1, 0, np.sqrt(20))
# L is tall
R = randomMatrix(1, 100, 0, np.sqrt(20))
# R is long
Q = np.outer(L, R)
Qs.append(Q)
Q = np.vstack(Qs)
#shuffle (preserves rank 5 [confirmed])
np.random.shuffle(Q)
Not a perfect solution, I must admit. But it's simple and comes pretty close.
I create 5 vectors that are gonna span the space of the matrix and create random linear combinations to fill the rest of the matrix.
My initial thought was that a trivial solution will be to copy those vectors 20 times.
To improve that, I created linear combinations of them with weights drawn from a uniform distribution, but then the distribution of the entries in the matrix becomes normal because the weighted mean basically causes the central limit theorm to take effect.
A middle point between the trivial approach and the second approach that doesn't work is to use sets of weights that favor one of the vectors over the others. And you can generate these sorts of weight vectors by passing any vector through the softmax function with an appropriately high temperature parameter.
The distribution is almost uniform, but the vectors are still very close to the base vectors. You can play with the temperature parameter to find a sweet spot that suits your purpose.
from scipy.stats import ortho_group
from scipy.special import softmax
import numpy as np
from matplotlib import pyplot as plt
N = 100
R = 5
low = 0
high = 20
sm_temperature = 100
p = np.random.uniform(low, high, (1, R, N))
weights = np.random.uniform(0, 1, (N-R, R, 1))
weights = softmax(weights*sm_temperature, axis = 1)
p_lc = (weights*p).sum(1)
rand_mat = np.concatenate([p[0], p_lc])
plt.hist(rand_mat.flatten())
I just couldn't take the fact the my previous solution (the "selection" method) did not really produce strictly uniformly distributed entries, but only close enough to fool a statistical test sometimes. The asymptotical case however, will almost surely not be distributed uniformly. But I did dream up another crazy idea that's just as bad, but in another manner - it's not really random.
In this solution, I do smth similar to OP's method of forming R matrices with rank 1 and then concatenating them but a little differently. I create each matrix by stacking a base vector on top of itself multiplied by 0.5 and then I stack those on the same base vector shifted by half the dynamic range of the uniform distribution. This process continues with multiplication by a third, two thirds and 1 and then shifting and so on until i have the number of required vectors in that part of the matrix.
I know it sounds incomprehensible. But, unfortunately, I couldn't find a way to explain it better. Hopefully, reading the code would shed some more light.
I hope this "staircase" method will be more reliable and useful.
import numpy as np
from matplotlib import pyplot as plt
'''
params:
N - base dimention
M - matrix length
R - matrix rank
high - max value of matrix
low - min value of the matrix
'''
N = 100
M = 600
R = 5
high = 20
low = 0
# base vectors of the matrix
base = low+np.random.rand(R-1, N)*(high-low)
def build_staircase(base, num_stairs, low, high):
'''
create a uniformly distributed matrix with rank 2 'num_stairs' different
vectors whose elements are all uniformly distributed like the values of
'base'.
'''
l = levels(num_stairs)
vectors = []
for l_i in l:
for i in range(l_i):
vector_dynamic = (base-low)/l_i
vector_bias = low+np.ones_like(base)*i*((high-low)/l_i)
vectors.append(vector_dynamic+vector_bias)
return np.array(vectors)
def levels(total):
'''
create a sequence of stritcly increasing numbers summing up to the total.
'''
l = []
sum_l = 0
i = 1
while sum_l < total:
l.append(i)
i +=1
sum_l = sum(l)
i = 0
while sum_l > total:
l[i] -= 1
if l[i] == 0:
l.pop(i)
else:
i += 1
if i == len(l):
i = 0
sum_l = sum(l)
return l
n_rm = R-1 # number of matrix subsections
m_rm = M//n_rm
len_rms = [ M//n_rm for i in range(n_rm)]
len_rms[-1] += M%n_rm
rm_list = []
for len_rm in len_rms:
# create a matrix with uniform entries with rank 2
# out of the vector 'base[i]' and a ones vector.
rm_list.append(build_staircase(
base = base[i],
num_stairs = len_rms[i],
low = low,
high = high,
))
rm = np.concatenate(rm_list)
plt.hist(rm.flatten(), bins = 100)
A few examples:
and now with N = 1000, M = 6000 to empirically demonstrate the nearly asymptotic behavior:

theano: summation by class label

I have a matrix which represents a distances to the k-nearest neighbour of a set of points,
and there is a matrix of class labels of the nearest neighbours. (both N-by-k matrix)
What is the best way in theano to build a (N-by-#classes) matrix whose (i,j) element will be the sum of distances from i-th point to its k-NN points with the class label 'j'?
Example:
# N = 2
# k = 5
# number of classes = 3
K_val = [[1,2,3,4,6],
[2,4,5,5,7]]
l_val = [[0,1,2,0,1],
[2,0,1,2,0]]
result = [[5,8,3],
[11,5,7]]
this task in theano?
K = theano.tensor.matrix()
l = theano.tensor.matrix()
result = <..some code..>
f = theano.function(inputs=[K,l], outputs=result)
You might be interesting in having a look to this repo:
https://github.com/erogol/KLP_KMEANS/blob/master/klp_kmeans.py
Is a K-Means implementation using theano (func kpl_kmeans). I believe what you want is the matrix W used in the function find_bmu.
Hope you find it useful.

Calculate Hitting Time between 2 nodes using NetworkX

I would like to know if i can use NetworkX to implement hitting time? Basically I want to calculate the hitting time between any 2 nodes in a graph. My graph is unweighted and undirected. If I understand hitting time correctly, it is very similar to the idea of PageRank.
Any idea how can I implement hitting time using the PageRank method provided by NetworkX?
May I know if there's any good starting point to work with?
I've checked: MapReduce, Python and NetworkX
but not quite sure how it works.
You don't need networkX to solve the problem, numpy can do it if you understand the math behind it. A undirected, unweighted graph can always be represented by a [0,1] adjacency matrix. nth powers of this matrix represent the number of steps from (i,j) after n steps. We can work with a Markov matrix, which is a row normalized form of the adj. matrix. Powers of this matrix represent a random walk over the graph. If the graph is small, you can take powers of the matrix and look at the index (start, end) that you are interested in. Make the final state an absorbing one, once the walk hits the spot it can't escape. At each power n you get probability that you'll have diffused from (i,j). The hitting time can be computed from this function (as you know the exact hit time for discrete steps).
Below is an example with a simple graph defined by the edge list. At the end, I plot this hitting time function. As a reference point, this is the graph used:
from numpy import *
hit_idx = (0,4)
# Define a graph by edge list
edges = [[0,1],[1,2],[2,3],[2,4]]
# Create adj. matrix
A = zeros((5,5))
A[zip(*edges)] = 1
# Undirected condition
A += A.T
# Make the final state an absorbing condition
A[hit_idx[1],:] = 0
A[hit_idx[1],hit_idx[1]] = 1
# Make a proper Markov matrix by row normalizing
A = (A.T/A.sum(axis=1)).T
B = A.copy()
Z = []
for n in xrange(100):
Z.append( B[hit_idx] )
B = dot(B,A)
from pylab import *
plot(Z)
xlabel("steps")
ylabel("hit probability")
show()

Categories

Resources