importing a python sparse matrix into MATLAB - python

I've a Sparse matrix in CSR Sparse format in python and I want to import it to MATLAB. MATLAB does not have a CSR Sparse format. It has only 1 Sparse format for all kind of matrices. Since the matrix is very large in the dense format I was wondering how could I import it as a MATLAB sparse matrix?

The scipy.io.savemat saves sparse matrices in a MATLAB compatible format:
In [1]: from scipy.io import savemat, loadmat
In [2]: from scipy import sparse
In [3]: M = sparse.csr_matrix(np.arange(12).reshape(3,4))
In [4]: savemat('temp', {'M':M})
In [8]: x=loadmat('temp.mat')
In [9]: x
Out[9]:
{'M': <3x4 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Column format>,
'__globals__': [],
'__header__': 'MATLAB 5.0 MAT-file Platform: posix, Created on: Mon Sep 8 09:34:54 2014',
'__version__': '1.0'}
In [10]: x['M'].A
Out[10]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Note that savemat converted it to csc. It also transparently takes care of the index starting point difference.
And in Octave:
octave:4> load temp.mat
octave:5> M
M =
Compressed Column Sparse (rows = 3, cols = 4, nnz = 11 [92%])
(2, 1) -> 4
(3, 1) -> 8
(1, 2) -> 1
(2, 2) -> 5
...
octave:8> full(M)
ans =
0 1 2 3
4 5 6 7
8 9 10 11

The Matlab and Scipy sparse matrix formats are compatible. You need to get the data, indices and matrix size of the matrix in Scipy and use them to create a sparse matrix in Matlab. Here's an example:
from scipy.sparse import csr_matrix
from scipy import array
# create a sparse matrix
row = array([0,0,1,2,2,2])
col = array([0,2,2,0,1,2])
data = array([1,2,3,4,5,6])
mat = csr_matrix( (data,(row,col)), shape=(3,4) )
# get the data, shape and indices
(m,n) = mat.shape
s = mat.data
i = mat.tocoo().row
j = mat.indices
# display the matrix
print mat
Which prints out:
(0, 0) 1
(0, 2) 2
(1, 2) 3
(2, 0) 4
(2, 1) 5
(2, 2) 6
Use the values m, n, s, i, and j from Python to create a matrix in Matlab:
m = 3;
n = 4;
s = [1, 2, 3, 4, 5, 6];
% Index from 1 in Matlab.
i = [0, 0, 1, 2, 2, 2] + 1;
j = [0, 2, 2, 0, 1, 2] + 1;
S = sparse(i, j, s, m, n, m*n)
Which gives the same Matrix, only indexed from 1.
(1,1) 1
(3,1) 4
(3,2) 5
(1,3) 2
(2,3) 3
(3,3) 6

Related

matlab sum function to python converstion

I am trying to convert this matlab code to python:
#T2 = (sum((log(X(1:m,:)))'));
Here is my code in python:
T2 = sum(np.log(X[0:int(m),:]).T)
where m = 103 and X is a matrix:
f1 = np.float64(135)
f2 = np.float64(351)
X = np.float64(p[:, int(f1):int(f2)])
and p is dictionary (loaded data)
The problem is python gives me the exact same value with same dimension (216x103) like matlab before applying the sum function on (np.log(X[0:int(m), :]).T). However. after applying the sum function it gives me the correct value but wrong dimension (103x1). The correct dimension is (1x103). I have tried using transpose after getting the sum but it doesnt work. Any suggestions how to get my desired dimension?
A matrix in MATLAB consists of m rows and n columns, but a matrix in NumPy is an array of arrays. Each subarray is a flat vector having 1 dimension equal to the number of its elements n. MATLAB doesn't have flat vectors at all, a row is 1xn matrix, a column is mx1 matrix, and a scalar is 1x1 matrix.
So, back to the question, when you write T2 = sum(np.log(X[0:int(m),:]).T) in Python, it's neither 103x1 nor 1x103, it's a flat 103 vector. If you specifically want a 1x103 matrix like MATLAB, just reshape(1,-1) and you don't have to transpose since you can sum over the second axis.
import numpy as np
X = np.random.rand(216,103)
m = 103
T2 = np.sum(np.log(X[:m]), axis=1).reshape(1,-1)
T2.shape
# (1, 103)
Lets make a demo 2d array:
In [19]: x = np.arange(12).reshape(3,4)
In [20]: x
Out[20]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
And apply the base Python sum function (which isn't the same as numpy's own):
In [21]: sum(x)
Out[21]: array([12, 15, 18, 21])
The result is a (4,) shape array (not 4x1). Print sum(x).shape if you don't believe me.
The numpy.sum function adds all terms if no axis is given:
In [22]: np.sum(x)
Out[22]: 66
or with axis:
In [23]: np.sum(x, axis=0)
Out[23]: array([12, 15, 18, 21])
In [24]: np.sum(x, axis=1)
Out[24]: array([ 6, 22, 38])
The Python sum treats x as a list of arrays, and adds them together
In [25]: list(x)
Out[25]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])]
In [28]: x[0]+x[1]+x[2]
Out[28]: array([12, 15, 18, 21])
Transpose, without parameter, switch axes. It does not add any dimensions:
In [29]: x.T # (4,3) shape
Out[29]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
In [30]: sum(x).T
Out[30]: array([12, 15, 18, 21]) # still (4,) shape
Octave
>> x=reshape(0:11,4,3)'
x =
0 1 2 3
4 5 6 7
8 9 10 11
>> sum(x)
ans =
12 15 18 21
>> sum(x,1)
ans =
12 15 18 21
>> sum(x,2)
ans =
6
22
38
edit
The np.sum function has a keepdims parmeter:
In [32]: np.sum(x, axis=0, keepdims=True)
Out[32]: array([[12, 15, 18, 21]]) # (1,4) shape
In [33]: np.sum(x, axis=1, keepdims=True)
Out[33]:
array([[ 6], # (3,1) shape
[22],
[38]])
If I reshape the array to 3d, and sum, the result is 2d - unless I keepdims:
In [34]: np.sum(x.reshape(3,2,2), axis=0).shape
Out[34]: (2, 2)
In [36]: np.sum(x.reshape(3,2,2), axis=0,keepdims=True).shape
Out[36]: (1, 2, 2)
MATLAB/Octave on the other hand keeps the dims by default:
sum(reshape(x,3,2,2)) # (1,2,2)
unless I sum on that last, 3rd:
sum(reshape(x,3,2,2),3) # (3,2)
The key is that MATLAB everything is 2d, with the option of additional trailing dimensions, which aren't handled the same way. In numpy every number of dimensions, from 0 on up is handled the same way.

Construct equivalent transform for vectorized Matrix

Equivalent transform for vectorized solution
For a given symmetric 4x4 matrix Q and a 3x4 matrix P the 3x3 matrix C is obtained through
C=P # Q # P.T
It can be shown that the output C will be symmetric again. The same problem can be formulated using only the unique elements in Q and C exploiting their symmetry. To do so, the matrices are vectorized as seen below.
I want to construct a matrix B that maps the vectorized matrices onto each other like so:
c = B # q
B must be a 6x10 and should be constructable from P only. How can I get B from P?
I tried this, but it doesnt seem to work. Maybe someone has experienced a similar problem?
import numpy as np
def vectorize(A, ord='c'):
"""
Symmetric matrix to vector e.g:
[[1, 2, 3],
[2, 4, 5],
[3, 5, 6]] -> [1, 2, 3, 4, 5, 6] (c-order, row-col)
-> [1, 2, 4, 3, 5, 6] (f-order, col-row)
"""
# upper triangle mask
m = np.triu(np.ones_like(A, dtype=bool)).flatten(order=ord)
return A.flatten(order=ord)[m]
def B(P):
B = np.zeros((6, 10))
counter = 0
# the i,j entry in C depends on the i, j columns in P
for i in range(3):
for j in range(i, 3):
coeffs = np.outer(P[i], P[j])
B[counter] = vectorize(coeffs)
counter += 1
return B
if __name__ == '__main__':
# original transform
P = np.arange(12).reshape((3, 4))
# calculated transform for vectorized matrix
_B = B(P)
# some random symmetric matrix
Q = np.array([[1, 2, 3, 4],
[2, 5, 6, 7],
[3, 6, 8, 9],
[4, 7, 9, 10]])
# if B is an equivilant transform to P, these should be similar
C = P # Q # P.T
c = _B # vectorize(Q)
print(f"q: {vectorize(Q)}\n"
f"C: {vectorize(C)}\n"
f"c: {c}")
Output:
q: [ 1 2 3 4 5 6 7 8 9 10]
C: [ 301 949 2973 1597 4997 8397]
c: [ 214 542 870 1946 3154 5438] <-- not the same
import numpy as np
def vec_from_mat(A, order='c'):
"""
packs the unique elements of symmetric matrix A into a vector
:param A: symmetric matrix
:return:
"""
return A[np.triu_indices(A.shape[0])].flatten(order=order)
def B_from_P(P):
"""
returns a 6x10 matrix that maps the 10 unique elements of a symmetric 4x4 matrix Q on the 6 unique elements of a
3x3 matrix C to linearize the equation C=PTQP to c=Bv
:param P: 3x4 matrix
:return: B with shape (6, 10)
"""
n, m = P.shape
b1, b2 = (n * (n + 1) // 2), (m * (m + 1) // 2)
B = np.zeros((b1, b2))
for a, (i, j) in enumerate(zip(*np.triu_indices(n))):
coeffs = np.outer(P[i], P[j])
# collect coefficients from lower and upper triangle of symmetric matrix
B[a] = vec_from_mat(coeffs) + vec_from_mat(np.triu(coeffs.T, k=1))
return B

Numpy Matrix Subtraction Different Dimensions

I currently have a 5D numpy array of dimensions 40 x 3 x 3 x 5 x 1000 where the dimensions are labelled by a x b x c x d x e respectively.
I have another 2D numpy array of dimensions 3 x 1000 where the dimensions are labelled by b x e respectively.
I wish to subtract the 5D array from the 2D array.
One way I was thinking of was to expand the 2D into a 5D array (since the 2D array does not change for all combinations of the other 3 dimensions). I am not sure what array method/numpy function I can use to do this.
I tend to start getting lost with nD array manipulations. Thank you for assisting.
In [217]: a,b,c,d,e = 2,3,4,5,6
In [218]: A = np.ones((a,b,c,d,e),int); B = np.ones((b,e),int)
In [219]: A.shape
Out[219]: (2, 3, 4, 5, 6)
In [220]: B.shape
Out[220]: (3, 6)
In [221]: B[None,:,None,None,:].shape # could also use reshape()
Out[221]: (1, 3, 1, 1, 6)
In [222]: C = B[None,:,None,None,:]-A
In [223]: C.shape
Out[223]: (2, 3, 4, 5, 6)
The first None isn't essential; numpy will add it as needed, but as a human it might help to see it.
IIUC, suppose your arrays are a and b:
np.swapaxes(np.swapaxes(a, 1, 3) - b, 1, 3)

Convert scipy condensed distance matrix to lower matrix read by rows

I have a condensed distance matrix from scipy that I need to pass to a C function that requires the matrix be converted to the lower triangle read by rows. For example:
0 1 2 3
0 4 5
0 6
0
The condensed form of this is: [1,2,3,4,5,6] but I need to convert it to
0
1 0
2 4 0
3 5 6 0
The lower triangle read by rows is: [1,2,4,3,5,6].
I was hoping to convert the compact distance matrix to this form without creating a redundant matrix.
Here's a quick implementation--but it creates the square redundant distance matrix as an intermediate step:
In [128]: import numpy as np
In [129]: from scipy.spatial.distance import squareform
c is the condensed form of the distance matrix:
In [130]: c = np.array([1, 2, 3, 4, 5, 6])
d is the redundant square distance matrix:
In [131]: d = squareform(c)
Here's your condensed lower triangle distances:
In [132]: d[np.tril_indices(d.shape[0], -1)]
Out[132]: array([1, 2, 4, 3, 5, 6])
Here's a method that avoids forming the redundant distance matrix. The function condensed_index(i, j, n) takes the row i and column j of the redundant distance matrix, with j > i, and returns the corresponding index in the condensed distance array.
In [169]: def condensed_index(i, j, n):
...: return n*i - i*(i+1)//2 + j - i - 1
...:
As above, c is the condensed distance array.
In [170]: c
Out[170]: array([1, 2, 3, 4, 5, 6])
In [171]: n = 4
In [172]: i, j = np.tril_indices(n, -1)
Note that the arguments are reversed in the following call:
In [173]: indices = condensed_index(j, i, n)
indices gives the desired permutation of the condensed distance array.
In [174]: c[indices]
Out[174]: array([1, 2, 4, 3, 5, 6])
(Basically the same function as condensed_index(i, j, n) was given in several answers to this question.)

What is a pythonic way of finding maximum values and their indices for moving subarrays for numpy ndarray?

I have numpy ndarrays which could be 3 or 4 dimensional. I'd like to find maximum values and their indices in a moving subarray window with specified strides.
For example, suppose I have a 4x4 2d array and my moving subarray window is 2x2 with stride 2 for simplicity:
[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9,10,11,12],
[13,14,15,16]].
I'd like to find
[[ 6 8],
[14 16]]
for max values and
[(1,1), (3,1),
(3,1), (3,3)]
for indices as output.
Is there a concise, efficient implementation for this for ndarray without using loops?
Here's a solution using stride_tricks:
def make_panes(arr, window):
arr = np.asarray(arr)
r,c = arr.shape
s_r, s_c = arr.strides
w_r, w_c = window
if c % w_c != 0 or r % w_r != 0:
raise ValueError("Window doesn't fit array.")
shape = (r / w_r, c / w_c, w_r, w_c)
strides = (w_r*s_r, w_c*s_c, s_r, s_c)
return np.lib.stride_tricks.as_strided(arr, shape, strides)
def max_in_panes(arr, window):
w_r, w_c = window
r, c = arr.shape
panes = make_panes(arr, window)
v = panes.reshape((-1, w_r * w_c))
ix = np.argmax(v, axis=1)
max_vals = v[np.arange(r/w_r * c/w_c), ix]
i = np.repeat(np.arange(0,r,w_r), c/w_c)
j = np.tile(np.arange(0, c, w_c), r/w_r)
rel_i, rel_j = np.unravel_index(ix, window)
max_ix = i + rel_i, j + rel_j
return max_vals, max_ix
A demo:
>>> vals, ix = max_in_panes(x, (2,2))
>>> print vals
[[ 6 8]
[14 16]]
>>> print ix
(array([1, 1, 3, 3]), array([1, 3, 1, 3]))
Note that this is pretty untested, and is designed to work with 2d arrays. I'll leave the generalization to n-d arrays to the reader...

Categories

Resources