Row Division in Scipy Sparse Matrix - python

I want to divide a sparse matrix's rows by scalars given in an array.
For example, I have a csr_matrix C :
C = [[2,4,6], [5,10,15]]
D = [2,5]
I want the result of C after division to be :
result = [[1, 2, 3], [1, 2, 3]]
I have tried this using the method that we use for numpy arrays:
result = C / D[:,None]
But this seems really slow. How to do this efficiently in sparse matrices?

Approach #1
Here's a sparse matrix solution using manual replication with indexing -
from scipy.sparse import csr_matrix
r,c = C.nonzero()
rD_sp = csr_matrix(((1.0/D)[r], (r,c)), shape=(C.shape))
out = C.multiply(rD_sp)
The output is a sparse matrix as well as opposed to the output from C / D[:,None] that creates a full matrix. As such, the proposed approach saves on memory.
Possible performance boost with replication using np.repeat instead of indexing -
val = np.repeat(1.0/D, C.getnnz(axis=1))
rD_sp = csr_matrix((val, (r,c)), shape=(C.shape))
Approach #2
Another approach could involve data method of the sparse matrix that gives us a flattened view into the sparse matrix for in-place results and also avoid the use of nonzero, like so -
val = np.repeat(D, C.getnnz(axis=1))
C.data /= val

Question: I want to divide a sparse matrix's rows by scalars given in an array.
For example:
C = [[2,4,6], [5,10,15]]
D = [2,5]
Answer : use "multiply" provided by sparse matrix interface - it allows to "pointwise" multiply matrices by matrices as well as by vectors and scalars
C = [[2,4,6], [5,10,15]]
D = [2,5]
from scipy.sparse import csr_matrix
c = csr_matrix(C)
c2 = c.multiply( 1/np.array(D).reshape(2,1) )
c2.toarray()
'output:' array([[ 2, 4, 6],
[ 5, 10, 15]], dtype=int64)
PS
Thanks to Alexander Kirillin

one line code: result = [[C[i][j]/D[i] for j in range(len(C[0]))] for i in range(len(D))]
C = [[2,4,6], [5,10,15]] #len(C[0]) = 3
D = [2,5] # len(D) = 2
result = [[C[i][j]/D[i] for j in range(len(C[0]))] for i in range(len(D))]
print result

If you first cast D to type numpy.matrix (which I'm assuming you can do unless D is too big to fit into memory), then you can just run
C.multiply(1.0 / D.T)
to get what you want.

Related

Check how many numpy array within a numpy array are equal to other numpy arrays within another numpy array of different size

My problem
Suppose I have
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)
I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!
How can I do that?
My Try
count = 0
for bitem in b:
for aitem in a:
if aitem==bitem:
count+=1
Is there a better way? Especially in one line, maybe with some comprehension..
The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:
import numpy_indexed as npi
count = len(npi.intersection(a, b))
Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:
count = npi.in_(b, a).sum()
Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.
Here is a simple way to do it:
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = np.count_nonzero(
np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))
print(count)
>>> 2
You can do what you want in one liner as follows:
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
Explanation
Here's an explanation of what's happening:
Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
True is equal to 1 when using sum on a list
Full example:
The final code looks like this:
import numpy as np
from itertools import product
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2
You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays
def void_arr(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b[np.in1d(void_arr(b), void_arr(a))]
array([[5, 6],
[1, 2]])
If you just want the number of intersections, it's
np.in1d(void_arr(b), void_arr(a)).sum()
2
Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)
For more information, see the third answer here

numpy multidimensional (3d) matrix multiplication

I get two 3d matrix A (32x3x3) and B(32x3x3), and I want to get matrix C with dimension 32x3x3. The calculation can be done using loop like:
a = numpy.random.rand(32, 3, 3)
b = numpy.random.rand(32, 3, 3)
c = numpy.random.rand(32, 3, 3)
for i in range(32):
c[i] = numpy.dot(a[i], b[i])
I believe there must be a more efficient one-line solution to this problem. Can anybody help, thanks.
You could do this using np.einsum:
In [142]: old = orig(a,b)
In [143]: new = np.einsum('ijk,ikl->ijl', a, b)
In [144]: np.allclose(old, new)
Out[144]: True
One advantage of using einsum is that you can almost read off what it's doing from the indices: leave the first axis alone (i), and perform a matrix multiplication on the last two (jk,kl->jl)).

Numpy - compute all possible differences in an array at fixed distance

Suppose I have an array, and I want to compute differences between elements at a distance Delta. I can use numpy.diff(Array[::Delta-1]), but this will not give all possible differences (from each possible starting point). To get them, I can think of something like this:
for j in xrange(Delta-1):
NewDiff = numpy.diff(Array[j::Delta-1])
if j==0:
Diff = NewDiff
else:
Diff = numpy.hstack((Diff,NewDiff))
But I would be surprised if this is the most efficient way to do it. Any idea from those familiar with the most exoteric functionalities of numpy?
The following function returns a two-dimensional numpy array diff which contains the differences between all possible combinations of a list or numpy array a. For example, diff[3,2] would contain the result of a[3] - a[2] and so on.
def difference_matrix(a):
x = np.reshape(a, (len(a), 1))
return x - x.transpose()
Update
It seems I misunderstood the question and you are only asking for an the differences of array elements which are a certain distance d apart.1)
This can be accomplished as follows:
>>> a = np.array([1,3,7,11,13,17,19])
>>> d = 2
>>> a[d:] - a[:-d]
array([6, 8, 6, 6, 6])
Have a look at the documentation to learn more about this notation.
But, the function for the difference matrix I've posted above shall not be in vain. In fact, the array you're looking for is a diagonal of the matrix that difference_matrix returns.
>>> a = [1,3,7,11,13,17,19]
>>> d = 2
>>> m = difference_matrix(a)
>>> np.diag(m, -d)
array([6, 8, 6, 6, 6])
1) Judging by your comment, this distance d is different than the Delta you seem to be using, with d = Delta - 1, so that the distance between an element and itself is 0, and its distance to the adjacent elements is 1.

Python column addition of numpy arrays with shift

How can i accomplish column addition with shift using python numpy arrays ?
I have two dimensional array and need it's extended copy.
a = array([[0, 2, 4, 6, 8],
[1, 3, 5, 7, 9]])
i want something like (following is in pseudo code, it doesn't work; there is no a.columns in numpy as far as i know):
shift = 3
mult_factor = 0.7
for column in a.columns - shift :
out[column] = a[column] + 0.7 * a[column + shift]
I also know, that i can do the something similar to what i need using indexes. But i seems that is really overkill enumerating three values and using only one (j) :
for (i,j),value in np.ndenumerate(a):
print i,j
I founded, that i could iterate over columns, but not their indexes:
for column in a.T:
print column
Than i though that i can simply do this with something that is similar to xrange, but applying to multidimensional array:
In [225]: for column in np.ndindex(a.shape[1]):
print column
.....:
(0,)
(1,)
(2,)
(3,)
(4,)
So now i only know how to do this with simple xrange and i am not sure, that is the best solution.
out = np.zeros(a.shape)
shift = 2
mult_factor = 0.7
for i in xrange(a.shape[1]-shift):
print a[:, i]
out[:, i] = a[:, i] + mult_factor * a[:, i+shift]
However it will be not so fast in Python as it maybe can be.
Can you give me an advice how it will be in performance and maybe there is more faster way to accomplish column addition of numpy arrays with shift ?
out = a[:, :-shift] + mult_factor * a[:, shift:]
I think this is what you're looking for. It's a vectorized form of your loop, operating on large slices of a instead of column by column.
I'm not positive I completely understand what the computed quantity should be, but here are two things that seem germane to what you are asking:
If you have a 2D array, called a that you wish to convert to a list of 1D arrays which are the columns of a you can do this
cols = [c for c in a.T]
It looks like what you want can be accomplished with matrix multiplication if I am not mistaken. You could make a banded matrix in numpy using numpy.diag or, since you would have the same values along each band 1, mult_factor, or 0, you could use scipy.linalg.toeplitz
m,n = a.shape
band = np.eye(1,n)
band[0,shift] = mult_factor
T = scipy.linalg.toeplitz(np.eye(1,m),band)
out = np.inner(a,T)
For large matrices, it might make sense to use a sparse matrix for T if you only want to add two or a few columns of a.

how to create a sparse matrix from lists of numbers

I have three lists namely A , B , C
All these lists contain 97510 items . I need to create a sparse matrix like this
matrix[A[0]][B[0]] = C[0]
For example ,
A=[1,2,3,4,5]
B=[7,8,9,10,11]
C=[14,15,16,17,18]
I need to create a sparse matrix with
matrix[1][7] = 14 # (which is C[0])
matrix[2][8] = 15 # and so on .....
I tried and python gives me an error saying that "Index values must be continuous"
How do I do it?
I suggest you have a look at the SciPy sparse matrices. E.g. a COO sparse matrix:
matrix = sparse.coo_matrix((C,(A,B)),shape=(5,5))
Note: I just took the COO matrix because it was in the example, you can take any other. You probably have to try which one is most suitable for your situation. They all differ in the way how the data is compressed and this has an influence on the performance of certain operations.
If you simply need a way how to get matrix[A[0]][B[0]] = C[0] you can use the following:
A=[1,2,3,4,5]
B=[7,8,9,10,11]
C=[14,15,16,17,18]
matrix = dict((v,{B[i]:C[i]}) for i, v in enumerate(A))
EDITED(thanx for gnibbler):
A = [1,2,3,4,5]
B = [7,8,9,10,11]
C = [14,15,16,17,18]
matrix = dict(((v, B[i]), C[i]) for i, v in enumerate(A))
It is very simple to use a dict, especially if you are willing to change the way you write the indices slightly
>>> A=[1,2,3,4,5]
>>> B=[7,8,9,10,11]
>>> C=[14,15,16,17,18]
>>> matrix=dict(((a,b),c) for a,b,c in zip(A,B,C))
>>> matrix[1,7]
14
>>> matrix[2,8]
15
>>>
Have a look at numpy/scipy which has support for sparse matrixes. See e.g. here

Categories

Resources