Related
I found some problems in calculating the symmetric normalised laplacian matrix in python.
Suppose to have the matrix S and its diagonal degree matrix D:
[ [ 1 , 0.5, 0.2] [ [1.7, 0, 0 ]
S = [0.5, 1 , 0.5] D = [ 0 , 2, 0 ]
[0.2, 0.5, 1 ] ] [ 0 , 0,1.7] ]
When calculating L as
I obtain this result:
[[ 0.41176471 -0.27116307 -0.11764706]
L = [-0.27116307 0.5 -0.27116307]
[-0.11764706 -0.27116307 0.41176471]]
Using this code:
S = np.array([[1,0.5,0.2],[0.5,1,0.5],[0.2,0.5,1]])
print("Similiarity Matrix: \n",S)
print("\n\n")
D = np.zeros((len(S), len(S)))
#H = np.sum(G[0])
for id, x in enumerate(S):
D[id][id] = np.sum(x)
I = np.identity(len(S))
L = I - ((sqrtm(inv(D))).dot(S)).dot(sqrtm(inv(D)))
print("\n\n")
print("Laplacian normalized: \n",L)
This differ from using the function csgraph.laplacian(S, normed=True) that return:
[[[ 1. -0.5976143 -0.28571429]
L = [-0.5976143 1. -0.5976143 ]
[-0.28571429 -0.5976143 1. ]]
why this happen? Am i doing something wrong?
I noticed that the ratio between the unnormalized and normalized matrixes returned by csgraph.laplacian is closely related to the ratio of the unnormalized matrix and your L:
In [20]: csgraph.laplacian(S, normed=False) / L - 1
Out[20]:
array([[0.7 , 0.84390889, 0.7 ],
[0.84390889, 1. , 0.84390889],
[0.7 , 0.84390889, 0.7 ]])
In [21]: csgraph.laplacian(S, normed=False) / csgraph.laplacian(S, normed=True)
Out[21]:
array([[0.7 , 0.83666003, 0.7 ],
[0.83666003, 1. , 0.83666003],
[0.7 , 0.83666003, 0.7 ]])
0.84390889 ≠ 0.83666003 but other numbers match. Could the difference be simply due to normalization?
That's because you have 1s in the diagonal of S:
# weighted adjacency
S = np.array([[1,0.5,0.2],[0.5,1,0.5],[0.2,0.5,1]])
np.fill_diagonal(S, 0.0)
# strength diagonal matrix
D = np.diag(np.sum(S,axis=1))
# identity
I = np.identity(S.shape[0])
# D^{-1/2} matrix
D_inv_sqrt = np.linalg.inv(np.sqrt(D))
L = I - np.dot(D_inv_sqrt, S).dot(D_inv_sqrt)
L
array([[ 1. , -0.5976143 , -0.28571429],
[-0.5976143 , 1. , -0.5976143 ],
[-0.28571429, -0.5976143 , 1. ]])
I have N matrices with dimensions R x R and one 'Weight matrix' with dimension R x N.
Now I want to combine those N matrices row-wise by weighting them with the 'Weight matrix'. In the end I want a R x R matrix.
Let me show you an example:
In the following example my initial matrices are a and b and my weight matrix is c. The desired output is matrix r.
The first row of r is the first row of a, because c[0,0] is 1 and c[0,1] is 0, so we just consider the first row of matrix a.
The second row of r is a weighted average of row 2 from both matrix a and b (because c[1,0]= 0.5 and c[1,1] = 0.5).
The third row of r is the third row of b, because c[2,0] is 0 and c[2,1] is 1, so we just consider the third row of matrix b.
How can I do this in Python (preferable with a numpy function)?
We can use np.einsum -
In [57]: A # 3D input array
Out[57]:
array([[[0.2, 0. , 0.8],
[0. , 0. , 1. ],
[0. , 0.2, 0.8]],
[[1. , 0. , 0. ],
[0. , 0.2, 0.8],
[0.2, 0. , 0.8]]])
In [58]: c # 2D weight array
Out[58]:
array([[1. , 0. ],
[0.5, 0.5],
[0. , 1. ]])
In [59]: np.einsum('ijk,ji->jk',A,c)
Out[59]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Alternatively with np.matmul -
In [142]: (np.matmul(A.transpose(1,2,0),c[...,None]))[...,0]
Out[142]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Note : On Python 3.x np.matmul could be replaced by # operator.
I have the following code to plot scalar x vs scalar f(x) where there is some matrix multiplication inside the function:
import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import matrix_power
P=np.array([\
[0,0,0.5,0,0.5],\
[0,0,1,0,0], \
[.25,.25,0,.25,.25], \
[0,0,.5,0,.5], \
[0,0,0,0,1], \
])
t=np.array([0,1,0,0,0])
ones=np.array([1,1,1,1,0])
def f(x):
return t.dot(matrix_power(P,x)).dot(ones)
x=np.arange(1,20)
plt.plot(x, f(x))
Now, the function by itself works fine.
>>> f(1)
1.0
>>> f(2)
0.75
But the plotting raises the error exponent must be an integer.
To put it another way, how do I evaluate this function upon an array? e.g.
f(np.array([1,2]))
I tried replacing the plot line with
plt.plot(x, map(f,x))
But this didn't help.
How can I fix this?
In [1]: P=np.array([\
...: [0,0,0.5,0,0.5],\
...: [0,0,1,0,0], \
...: [.25,.25,0,.25,.25], \
...: [0,0,.5,0,.5], \
...: [0,0,0,0,1], \
...: ])
In [2]:
In [2]: P
Out[2]:
array([[0. , 0. , 0.5 , 0. , 0.5 ],
[0. , 0. , 1. , 0. , 0. ],
[0.25, 0.25, 0. , 0.25, 0.25],
[0. , 0. , 0.5 , 0. , 0.5 ],
[0. , 0. , 0. , 0. , 1. ]])
In [4]: np.linalg.matrix_power(P,3)
Out[4]:
array([[0. , 0. , 0.25 , 0. , 0.75 ],
[0. , 0. , 0.5 , 0. , 0.5 ],
[0.125, 0.125, 0. , 0.125, 0.625],
[0. , 0. , 0.25 , 0. , 0.75 ],
[0. , 0. , 0. , 0. , 1. ]])
In [5]: np.linalg.matrix_power(P,np.arange(0,4))
---------------------------------------------------------------------------
TypeError: exponent must be an integer
So just give it the integer that it wants:
In [10]: [f(i) for i in range(4)]
Out[10]: [1.0, 1.0, 0.75, 0.5]
pylab.plot(np.arange(25), [f(i) for i in np.arange(25)])
From the matrix_power code:
a = asanyarray(a)
_assertRankAtLeast2(a)
_assertNdSquareness(a)
try:
n = operator.index(n)
except TypeError:
raise TypeError("exponent must be an integer")
....
Here's what it does for n=3:
In [5]: x = np.arange(9).reshape(3,3)
In [6]: np.linalg.matrix_power(x,3)
Out[6]:
array([[ 180, 234, 288],
[ 558, 720, 882],
[ 936, 1206, 1476]])
In [7]: x#x#x
Out[7]:
array([[ 180, 234, 288],
[ 558, 720, 882],
[ 936, 1206, 1476]])
You could define a matrix_power function that accepts an array of powers:
def matrix_power(P,x):
return np.array([np.linalg.matrix_power(P,i) for i in x])
With this matrix_power(P,np.arange(25)) would produce a (25,5,5) array. And your f(x) actually does work with that, returning a (25,) shape array. But I wonder, was that just fortuitous, or was it intentional? Did you write f with a 3d power array in mind?
t.dot(matrix_power(P,x)).dot(ones)
Given an adjacency list:
adj_list = [array([0,1]),array([0,1,2]),array([0,2])]
And an array of indices,
ind_arr = array([0,1,2])
Goal:
A = np.zeros((3,3))
for i in ind_arr:
A[i,list(adj_list[x])] = 1.0/float(adj_list[x].shape[0])
Currently, I have written:
A[ind_list[:],adj_list[:]] = 1. / len(adj_list[:])
And tried various configurations of indexing within this scaffold.
Here's one approach -
lens = np.array([len(i) for i in adj_list])
col_idx = np.concatenate(adj_list)
out = np.zeros((len(lens), col_idx.max()+1))
row_idx = np.repeat(np.arange(len(lens)), lens)
vals = np.repeat(1.0/lens, lens)
out[row_idx, col_idx] = vals
Sample input, output -
In [494]: adj_list = [np.array([0,2]),np.array([0,1,4])]
In [496]: out
Out[496]:
array([[ 0.5 , 0. , 0.5 , 0. , 0. ],
[ 0.33333333, 0.33333333, 0. , 0. , 0.33333333]])
Sparse matrix as output
Additionally, if you want to save memory and create a sparse matrix instead, that's an easy extension -
In [506]: from scipy.sparse import csr_matrix
In [507]: csr_matrix((vals, (row_idx, col_idx)), shape=(len(lens), col_idx.max()+1))
Out[507]:
<2x5 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [508]: _.toarray()
Out[508]:
array([[ 0.5 , 0. , 0.5 , 0. , 0. ],
[ 0.33333333, 0.33333333, 0. , 0. , 0.33333333]])
I don't think you can completely eliminate loops due to the mixed data types, but you can reduce the nested double for loops to a single one:
A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
arr_size = len(arr)
A[i, :arr_size] = 1./arr_size
A
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Or if the numbers in the arrays are actually columns positions:
A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
A[i, arr] = 1./len(arr)
A
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Another option using MultiLabelBinarizer from sklearn(but may not be as efficient):
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
adj_list = [np.array([0,1]),np.array([0,1,2])]
sizes = np.fromiter(map(len, adj_list), dtype=int)
mlb.fit_transform(adj_list)/sizes[:,None]
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
I am porting some matlab code to python using numpy and I have the following matlab command:
[xgrid,ygrid]=meshgrid(linspace(-0.5,0.5, GridSize-1), ...
linspace(-0.5,0.5, GridSize-1));
Now, this is fine in 2D but I would like to extend this to n-dimensional. So depending on the input data, GridSize can be a 2, 3 or 4 dimensional vector. So, in 2D this would be:
[xgrid, grid] = np.meshgrid(np.linspace(-0.5,0.5, GridSize[0]),
np.linspace(-0.5,0.5, GridSize[1]));
However, I do not know the dimensions of the input before, so is it possible to rewrite this expression, so that it can generate grids with arbitrary number of dimensions?
You could use loop comprehension to generate all 1D arrays and then use np.meshgrid on all those with * operator that internally does unpacking of argument lists, which is equivalent of MATLAB's comma separated lists, like so -
allG = [np.linspace(-0.5,0.5, G) for G in GridSize]
out = np.meshgrid(*allG)
Sample runs
1) 2D Case :
In [27]: GridSize = [3,4]
In [28]: allG = [np.linspace(-0.5,0.5, G) for G in GridSize]
...: out = np.meshgrid(*allG)
...:
In [29]: out[0]
Out[29]:
array([[-0.5, 0. , 0.5],
[-0.5, 0. , 0.5],
[-0.5, 0. , 0.5],
[-0.5, 0. , 0.5]])
In [30]: out[1]
Out[30]:
array([[-0.5 , -0.5 , -0.5 ],
[-0.16666667, -0.16666667, -0.16666667],
[ 0.16666667, 0.16666667, 0.16666667],
[ 0.5 , 0.5 , 0.5 ]])
2) 3D Case :
In [51]: GridSize = [3,4,2]
In [52]: allG = [np.linspace(-0.5,0.5, G) for G in GridSize]
...: out = np.meshgrid(*allG)
...:
In [53]: out[0]
Out[53]:
array([[[-0.5, -0.5],
[ 0. , 0. ],
[ 0.5, 0.5]], ...
[[-0.5, -0.5],
[ 0. , 0. ],
[ 0.5, 0.5]]])
In [54]: out[1]
Out[54]:
array([[[-0.5 , -0.5 ], ...
[[ 0.16666667, 0.16666667],
[ 0.16666667, 0.16666667],
[ 0.16666667, 0.16666667]],
[[ 0.5 , 0.5 ],
[ 0.5 , 0.5 ],
[ 0.5 , 0.5 ]]])
In [55]: out[2]
Out[55]:
array([[[-0.5, 0.5], ....
[[-0.5, 0.5],
[-0.5, 0.5],
[-0.5, 0.5]]])