numpy/scipy equivalent of MATLAB's sparse function

numpy/scipy equivalent of MATLAB's sparse function - python

I'm porting a MATLAB code in Python with numpy and scipy and I need to use numpy/scipy equivalent of the sparse function in MATLAB.
Here's the usage of the sparse function in MATLAB,
sparse([3; 2], [2; 4], [3; 0])
gives:
Trial>> m = sparse([3; 2], [2; 4], [3; 0])
m =
(3,2) 3
Trial>> full(m)
ans =
0 0 0 0
0 0 0 0
0 3 0 0
I have these, but they don't give what MATLAB version does,
sps.csr_matrix([3, 2], [2, 4], [3, 0])
sps.csr_matrix(np.array([[3], [2]]), np.array([[2], [4]]), np.array([[3], [0]]))
sps.csr_matrix([[3], [2]], [[2], [4]], [[3], [0]])
Any ideas?
Thanks.

You're using the sparse(I, J, SV) form [note: link goes to documentation for GNU Octave, not Matlab]. The scipy.sparse equivalent is csr_matrix((SV, (I, J))) -- yes, a single argument which is a 2-tuple containing a vector and a 2-tuple of vectors. You also have to correct the index vectors because Python consistently uses 0-based indexing.
>>> m = sps.csr_matrix(([3,0], ([2,1], [1,3]))); m
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> m.todense()
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 3, 0, 0]], dtype=int64)
Note that scipy, unlike Matlab, does not automatically discard explicit zeroes, and will use integer storage for matrices containing only integers. To perfectly match the matrix you got in Matlab, you must explicitly ask for floating-point storage and you must call eliminate_zeros() on the result:
>>> m2 = sps.csr_matrix(([3,0], ([2,1], [1,3])), dtype=np.float)
>>> m2.eliminate_zeros()
>>> m2
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
>>> m2.todense()
matrix([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 3., 0., 0.]])
You could also change [3,0] to [3., 0.] but I recommend an explicit dtype= argument because that will prevent surprises when you are feeding in real data.
(I don't know what Matlab's internal sparse matrix representation is, but Octave appears to default to compressed sparse column representation. The difference between CSC and CSR should only affect performance. If your NumPy code winds up being slower than your Matlab code, try using sps.csc_matrix instead of csr_matrix, as well as all the usual NumPy performance tips.)
(You probably need to read NumPy for Matlab users if you haven't already.)

here a conversion I made. It is working for the 5 arguments version of sparse.
def sparse(i, j, v, m, n):
"""
Create and compressing a matrix that have many zeros
Parameters:
i: 1-D array representing the index 1 values
Size n1
j: 1-D array representing the index 2 values
Size n1
v: 1-D array representing the values
Size n1
m: integer representing x size of the matrix >= n1
n: integer representing y size of the matrix >= n1
Returns:
s: 2-D array
Matrix full of zeros excepting values v at indexes i, j
"""
return scipy.sparse.csr_matrix((v, (i, j)), shape=(m, n))

Related

Scipy find minimum nonzero element of a sparse matrix for each row

I am trying to find the both location and the value of the minimum element of a sparse matrix for each row. A toy example for the question is given below:
Here, we have a 3x6 sparse matrix "M".
H = np.array([[1, 2, 3, 0, 4, 0 ,0],
[0, 5, 0, 6, 0, 0 ,0],
[0, 0, 0, 7, 0, 0 ,8], dtype = np.float32)
M = scipy.sparse.csr_matrix(H)
Then, what I would like to obtain is the nonzero minimum elements of each row.
For the example above:
min_elements = some_function(M,axis = 0)
and receiving the return as min_elements = [1,5,7]. The method M.min(axis=0) does not work for my case since the minimum element of each row is zero, therefore, returning an all-zeros array.
Thus, is there any efficient way of implementing such a function in an computationally efficient way using sparse matrix. In my general case, the sparse matrices will be quite huge and requires lots of additional computation. For this reason, the performance/speed is the main benchmark for me.
Thank you!

In [333]: from scipy import sparse
In [334]: M = sparse.csr_matrix(H)
In [335]: M
Out[335]:
<3x7 sparse matrix of type '<class 'numpy.float32'>'
with 8 stored elements in Compressed Sparse Row format>
M is stored as:
In [336]: M.indptr
Out[336]: array([0, 4, 6, 8], dtype=int32)
In [337]: M.data
Out[337]: array([1., 2., 3., 4., 5., 6., 7., 8.], dtype=float32)
In [338]: M.indices
Out[338]: array([0, 1, 2, 4, 1, 3, 3, 6], dtype=int32)
We can iterate on the slices defined by indptr, and take the min:
In [340]: for i in range(M.shape[0]):
...: sl = slice(M.indptr[i],M.indptr[i+1])
...: x, y = M.data[sl], M.indices[sl]
...: m = np.argmin(x)
...: print(y[m], x[m])
...:
0 1.0
1 5.0
3 7.0
This can be streamlined a bit, but it gives the basic idea.
It may be easier to picture what's going on in the lil format:
In [341]: Ml = M.tolil()
In [342]: Ml.data
Out[342]:
array([list([1.0, 2.0, 3.0, 4.0]), list([5.0, 6.0]), list([7.0, 8.0])],
dtype=object)
In [343]: Ml.rows
Out[343]: array([list([0, 1, 2, 4]), list([1, 3]), list([3, 6])], dtype=object)
In [344]: for d,r in zip(Ml.data, Ml.rows):
...: m = np.argmin(d)
...: print(r[m], d[m])
...:
0 1.0
1 5.0
3 7.0
Previous SO have asked for things like the smallest (or largest) N values by row.
Sparse is best for things that can be expressed as some sort of matrix multiplication. That includes row (or column) sums. Even csr indexing is done with matrix multiplication. Other row-by-row operations aren't as easy.

You could flip all your data and find the maximum. This is assuming all your data is positive, as in the example.
M_inv = M.copy()
M_inv.data = 1/M.data
one_over_min_M = M_inv.max(axis=1)
min_M = 1/one_over_min_M.to_array()
On your example I get the output
[[1. ]
[5. ]
[6.9999995]]
There is some horrible numerical error there, but if you're happy to round your answer...
Edit: This approach might be redeemed if you're after the indices and want to do M_inv.argmax(axis=1), otherwise it's probably not the best.

How to multiply specific rows/columns of matrices with each other in python?

I have to input matrices of shape
m1: (n,3)
m2: (n,3)
I want to multiply each row (each n of size 3) with its correspondence of the other matrix, such that i get a (3,3) matrix for each row.
When im trying to just use e.g. m1[0]#m2.T[0] the operation doesnt work, as m[0] delivers a (3,) list instead of a (3,1) matrix, on which i could use matrix operations.
Is there a relatively easy or elegant way to get the desired (3,1) matrix for the matrix multiplication?

By default, numpy gets rid of the singleton dimension, as you have noticed.
You can use np.newaxis (or equivalently None. That is an implementation detail, but also works in pytorch) for the second axis to tell numpy to "invent" a new one.
import numpy as np
a = np.ones((3,3))
a[1].shape # this is (3,)
a[1,:].shape # this is (3,)
a[1][...,np.newaxis].shape # this is (3,1)
However, you can also use dot or outer directly:
>>> a = np.eye(3)
>>> np.outer(a[1], a[1])
array([[0., 0., 0.],
[0., 1., 0.],
[0., 0., 0.]])
>>> np.dot(a[1], a[1])
1.0

Generally, I would recommend using np.einsum for most matrix operations as it very elegant.
To obtain a the row-wise outer product of the vectors contained in m1 and m2 of shape (n, 3) you could do the following:
import numpy as np
m1 = np.array([1, 2, 3]).reshape(1, 3)
m2 = np.array([1, 2, 3]).reshape(1, 3)
result = np.einsum("ni, nj -> nij", m1, m2)
print(result)
>>>array([[[1, 2, 3],
[2, 4, 6],
[3, 6, 9]]])

how to multiply 2 numpy array with different dimensions

I try to multiply 2 matrix x,y with shape (41) and (41,6)
as it is supposed to broadcast the single matrix to every arrow in the multi-dimensions
I want to do it as :
x*y
but i get this error
ValueError: operands could not be broadcast together with shapes (41,6) (41,)
Is there anything I miss here to make that possible ?

Broadcasting involves 2 steps
give all arrays the same number of dimensions
expand the 1 dimensions to match the other arrays
With your inputs
(41,6) (41,)
one is 2d, the other 1d; broadcasting can change the 1d to (1, 41), but it does not automatically expand in the other direction (41,1).
(41,6) (1,41)
Neither (41,41) or (6,41) matches the other.
So you need to change your y to (41,1) or the x to (6,41)
x.T*y
x*y[:,None]
I'm assuming, of course, that you want element by element multiplication, not the np.dot matrix product.

You can try out this, it will works!
>>> import numpy as np
>>> x = np.array([[1, 2], [1, 2], [1, 2]])
>>> y = np.array([1, 2, 3])
>>> np.dot(y,x)
array([ 6, 12])

Not exactly sure, what you are trying to achieve. Maybe you could give an example of your input and your expected output. One possibility is:
import numpy as np
x = np.array([[1, 2], [1, 2], [1, 2]])
y = np.array([1, 2, 3])
res = x * np.transpose(np.array([y,]*2))
This will multiply each column of x with y, so the result of the above example is:
array([[1, 2],
[2, 4],
[3, 6]])

The multiplication of a ND array (say A) with a 1D one (B) is performed on the last axis by default, which means that the multiplication A * B is only valid if
A.shape[-1] == len(B)
A manipulation on A and B is needed to multiply A with B on another axis than -1:
Method 1: swapaxes
Swap the axes of A so that the axis to multiply with B appear on last postion
C = (A.swapaxes(axis, -1) * B).swapaxes(axis, -1)
example
A = np.arange(2 * 3 * 4).reshape((2, 3, 4))
B = np.array([0., 1., 2.])
print(A)
print(B)
pormpts :
(A)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
(B)
[0. 1. 2.]
A * B returns :
ValueError: operands could not be broadcast together with shapes (2,3,4) (3,)
now multiply A with B on axis 1
axis = 1
C = (A.swapaxes(axis, -1) * B).swapaxes(axis, -1)
returns C :
array([[[ 0., 0., 0., 0.],
[ 4., 5., 6., 7.],
[16., 18., 20., 22.]],
[[ 0., 0., 0., 0.],
[16., 17., 18., 19.],
[40., 42., 44., 46.]]])
Note that first raws of A have been multiplied by 0
last raws have been multiplied by 2
Method 2: reshape B
make B have the same number of dimensions than A, place the items of B on the dimension to be multiplied with A
A * B.reshape((1, len(B), 1))
or equivalently using the convenient 'numpy.newaxis' syntax :
A * B[np.newaxis, :, np.newaxis]

Depends on what you're expecting. One simple solution would be:
y*x
That should give you a matrix of dimensions (1,6).

If you wish to multiply X of dimension (n) to Y of dimension(n,m), you may consider the answers from this post
Tips can be found in the Wikipedia as well:
In Python with the numpy numerical library or the sympy symbolic library, multiplication of array objects as a1*a2 produces the Hadamard product, but with otherwise matrix objects m1*m2 will produce a matrix product.
Simply speaking, slice it up to arrays and perform x*y, or use other routes to fit the requirement.

So, if x has shape (41,6) and y (41,), I'd use np.expand_dims() to add an empty, second dimension (index 1) to y, i.e.,
x * np.expand_dims(y, 1)
This will automatically yield a result with shape (41,6).

Good ways to "expand" a numpy ndarray?

Are there good ways to "expand" a numpy ndarray? Say I have an ndarray like this:
[[1 2]
[3 4]]
And I want each row to contains more elements by filling zeros:
[[1 2 0 0 0]
[3 4 0 0 0]]
I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn't work:
import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))
Numpy complains that: ValueError: total size of new array must be unchanged

You can use numpy.pad, as follows:
>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
Here np.pad says, "Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values".

There are the index tricks r_ and c_.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.
>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.
By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!

Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.
A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.
Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.

A simple way:
# what you want to expand
x = np.ones((3, 3))
# expand to what shape
target = np.zeros((6, 6))
# do expand
target[:x.shape[0], :x.shape[1]] = x
# print target
array([[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
Functional way:
borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.
def pad(array, reference_shape, offsets=None):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
if not offsets:
offsets = np.zeros(array.ndim, dtype=np.int32)
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape, dtype=np.float32)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result

You should use np.column_stack or append
import numpy as np
p = np.array([ [1,2] , [3,4] ])
p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
p
Out[277]:
array([[1, 2, 0, 0],
[3, 4, 0, 0]])
Append seems to be faster though:
timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop
timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop
And a comparison with np.c_ and np.hstack [append still seems to be the fastest]:
In [295]: z=np.zeros((2, 2), dtype=a.dtype)
In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop
In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop
In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop
and np.concatenate [that is a even a bit faster than append]:
In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop

there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being "expanded".
temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))
it's easy to remember becase numpy's first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.

Diagonal stacking in numpy?

So numpy has some convenience functions for combining several arrays into one, e.g. hstack and vstack. I'm wondering if there's something similar but for stacking the component arrays diagonally?
Say I have N arrays of shape (n_i, m_i), and I want to combine them into a single array of size (sum_{1,N}n_i, sum_{1,N}m_i) such that the component arrays form blocks on the diagonal of the result array.
And yes, I know how to solve it manually, e.g. with the approach described in How to "embed" a small numpy array into a predefined block of a large numpy array? . Just wondering if there's an easier way.
Ah, How can I transform blocks into a blockdiagonal matrix (NumPy) mentions that scipy.linalg.block_diag() is the solution, except that the version of scipy installed on my workstation is so old it doesn't have it. Any other ideas?

It does seem block_diag does exactly what you want. So if for some reason you can't update scipy, then here is the source from v0.8.0 if you wish to simply define it!
import numpy as np
def block_diag(*arrs):
"""Create a block diagonal matrix from the provided arrays.
Given the inputs `A`, `B` and `C`, the output will have these
arrays arranged on the diagonal::
[[A, 0, 0],
[0, B, 0],
[0, 0, C]]
If all the input arrays are square, the output is known as a
block diagonal matrix.
Parameters
----------
A, B, C, ... : array-like, up to 2D
Input arrays. A 1D array or array-like sequence with length n is
treated as a 2D array with shape (1,n).
Returns
-------
D : ndarray
Array with `A`, `B`, `C`, ... on the diagonal. `D` has the
same dtype as `A`.
References
----------
.. [1] Wikipedia, "Block matrix",
http://en.wikipedia.org/wiki/Block_diagonal_matrix
Examples
--------
>>> A = [[1, 0],
... [0, 1]]
>>> B = [[3, 4, 5],
... [6, 7, 8]]
>>> C = [[7]]
>>> print(block_diag(A, B, C))
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 3 4 5 0]
[0 0 6 7 8 0]
[0 0 0 0 0 7]]
>>> block_diag(1.0, [2, 3], [[4, 5], [6, 7]])
array([[ 1., 0., 0., 0., 0.],
[ 0., 2., 3., 0., 0.],
[ 0., 0., 0., 4., 5.],
[ 0., 0., 0., 6., 7.]])
"""
if arrs == ():
arrs = ([],)
arrs = [np.atleast_2d(a) for a in arrs]
bad_args = [k for k in range(len(arrs)) if arrs[k].ndim > 2]
if bad_args:
raise ValueError("arguments in the following positions have dimension "
"greater than 2: %s" % bad_args)
shapes = np.array([a.shape for a in arrs])
out = np.zeros(np.sum(shapes, axis=0), dtype=arrs[0].dtype)
r, c = 0, 0
for i, (rr, cc) in enumerate(shapes):
out[r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
return out

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy/scipy equivalent of MATLAB's sparse function - python

Related

Scipy find minimum nonzero element of a sparse matrix for each row

How to multiply specific rows/columns of matrices with each other in python?

how to multiply 2 numpy array with different dimensions

Good ways to "expand" a numpy ndarray?

Diagonal stacking in numpy?

Categories

Resources