Python (NumPy): Memory efficient array multiplication with fancy indexing

Python (NumPy): Memory efficient array multiplication with fancy indexing - python

I'm looking to do fast matrix multiplication in python, preferably NumPy, of an array A with another array B of repeated matrices by using a third array I of indices. This can be accomplished using fancy indexing and matrix multiplication:
from numpy.random import rand, randint
A = rand(1000,5,5)
B = rand(40000000,5,1)
I = randint(low=0, high=1000, size=40000000)
A[I] # B
However, this creates the intermediate array A[I] of shape (40000000, 5, 5) which overflows the memory. It seems highly inefficient to have to repeat a small set of matrices for multiplication, and this is essentially a more general version of broadcasting such as A[0:1] # B which has no issues.
Are there any alternatives?
I have looked at NumPy's einsum function but have not seen any support for utilizing an index vector in the call.

If you're open to another package, you could wrap it up with dask.
from numpy.random import rand, randint
from dask import array as da
A = da.from_array(rand(1000,5,5))
B = da.from_array(rand(40000000,5,1))
I = da.from_array(randint(low=0, high=1000, size=40000000))
fancy = A[I] # B
After finished manipulating, then bring it into memory using fancy.compute()

Related

How to get chunks of submatrices faster?

I have a really big matrix (nxn)for which I would to build the intersecting tiles (submatrices) with the dimensions mxm. There will be an offset of step bvetween each contiguous submatrices. Here is an example for n=8, m=4, step=2:
import numpy as np
matrix=np.random.randn(8,8)
n=matrix.shape[0]
m=4
step=2
This will store all the corner indices (x,y) from which we will take a 4x4 natrix: (x:x+4,x:x+4)
a={(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step)}
The submatrices will be extracted like that
sub_matrices = np.zeros([m,m,len(a)])
for i,ind in enumerate(a):
x,y=ind
sub_matrices[:,:,i]=matrix[x:x+m, y:y+m]
Is there a faster way to do this submatrices initialization?

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
# Get indices as array
ar = np.array(list(a))
# Get all sliding windows
w = view_as_windows(matrix,(m,m))
# Get selective ones by indexing with ar
selected_windows = np.moveaxis(w[ar[:,0],ar[:,1]],0,2)
Alternatively, we can extract the row and col indices with a list comprehension and then index with those, like so -
R = [i[0] for i in a]
C = [i[1] for i in a]
selected_windows = np.moveaxis(w[R,C],0,2)
Optimizing from the start, we can skip the creation of stepping array, a and simply use the step arg with view_as_windows, like so -
view_as_windows(matrix,(m,m),step=2)
This would give us a 4D array and indexing into the first two axes of it would have all the mxm shaped windows. These windows are simply views into input and hence no extra memory overhead plus virtually free runtime!

import numpy as np
a = np.random.randn(n, n)
b = a[0:m*step:step, 0:m*step:step]
If you have a one-dimension array, you can get it's submatrix by the following code:
c = a[start:end:step]
If the dimension is two or more, add comma between every dimension.
d = a[start1:end1:step1, start2:end3:step2]

"Killed: 9" error when trying to construct a Scipy csr_matrix from a large NumPy array

I'm trying to solve a Markov chain problem in which the transition matrix contains about ~150,000 rows and columns, which is however sparse (only about ~450,000 elements are nonzero).
I notice that trying to construct a csr_matrix matrix from a np.zeros array of that size leads to a Killed: 9 error:
In [139]: N = 150000
In [140]: T = np.zeros((N, N))
In [142]: import scipy.sparse
In [143]: _T = scipy.sparse.csr_matrix(T)
Killed: 9
Is it possible to construct a csr_matrix of this size? Do I need to initiate the matrix T as a csr_matrix and dispense with NumPy arrays altogether?

Your process is "killed: 9" mostly because the process is taking too long or too much memory of the system and it's been terminated by the os. Just like in the comment, you can construct a sparse matrix directly using csr_matrix:
_T = scipy.sparse.csr_matrix((N,N))

How to use a sparse matrix in numpy.linalg.solve

I want to solve the following linear system for x
Ax = b
Where A is sparse and b is just regular column matrix. However when I plug into the usual np.linalg.solve(A,b) routine it gives me an error. However when I do np.linalg.solve(A.todense(),b) it works fine.
Question.
How can I use this linear solve still preserving the sparseness of A?. The reason is A is quite large about 150 x 150 and there are about 50 such matrices and so keeping it sparse for as long as possible is the way I'd prefer it.
I hope my question makes sense. How should I go about achieving this?

Use scipy instead to work on sparse matrices.You can do that using scipy.sparse.linalg.spsolve. For further details read its documentation spsolve

np.linalg.solve only works for array-like objects. For example it would work on a np.ndarray or np.matrix (Example from the numpy documentation):
import numpy as np
a = np.array([[3,1], [1,2]])
b = np.array([9,8])
x = np.linalg.solve(a, b)
or
import numpy as np
a = np.matrix([[3,1], [1,2]])
b = np.array([9,8])
x = np.linalg.solve(a, b)
or on A.todense() where A=scipy.sparse.csr_matrix(np.matrix([[3,1], [1,2]])) as this returns a np.matrix object.
To work with a sparse matrix, you have to use scipy.sparse.linalg.spsolve (as already pointed out by rakesh)
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
a = scipy.sparse.csr_matrix(np.matrix([[3,1], [1,2]]))
b = np.array([9,8])
x = scipy.sparse.linalg.spsolve(a, b)
Note that x is still a np.ndarray and not a sparse matrix. A sparse matrix will only be returned if you solve Ax=b, with b being a matrix and not a vector.

python numpy vector math

What is the numpy equivalent to euclid's 2d vector classes / operations ? ( like: euclid.Vector2 )
So far I have this. Create two vectors
import numpy as np
loc = np.array([100., 100.])
vel = np.array([30., 10])
loc += vel
# reseting speed to a default value, maintaining direction
vel.normalize()
vel *= 200
loc += vel

You can just use numpy arrays. Look at the numpy for matlab users page for a detailed overview of the pros and cons of arrays w.r.t. matrices.
As I mentioned in the comment, having to use the dot() function or method for mutiplication of vectors is the biggest pitfall. But then again, numpy arrays are consistent. All operations are element-wise. So adding or subtracting arrays and multiplication with a scalar all work as expected of vectors.
Edit2: Starting with Python 3.5 and numpy 1.10 you can use the # infix-operator for matrix multiplication, thanks to pep 465.
Edit: Regarding your comment:
Yes. The whole of numpy is based on arrays.
Yes. linalg.norm(v) is a good way to get the length of a vector. But what you get depends on the possible second argument to norm! Read the docs.
To normalize a vector, just divide it by the length you calculated in (2). Division of arrays by a scalar is also element-wise.
An example in ipython:
In [1]: import math
In [2]: import numpy as np
In [3]: a = np.array([4,2,7])
In [4]: np.linalg.norm(a)
Out[4]: 8.3066238629180749
In [5]: math.sqrt(sum([n**2 for n in a]))
Out[5]: 8.306623862918075
In [6]: b = a/np.linalg.norm(a)
In [7]: np.linalg.norm(b)
Out[7]: 1.0
Note that In [5] is an alternative way to calculate the length. In [6] shows normalizing the vector.

Python - sparse vectors/distance calculation

I'm looking for dynamically growing vectors in Python, since I don't know their length in advance. In addition, I would like to calculate distances between these sparse vectors, preferably using the distance functions in scipy.spatial.distance (although any other suggestions are welcome). Any ideas how to do this? (Initially, it doesn't need to be efficient.)
Thanks a lot in advance!

You can use regular python lists (which are dynamic) as vectors. Trivial example follows.
from scipy.spatial.distance import sqeuclidean
a = [1,2,3]
b = [0,0,0]
print sqeuclidean(a,b) # 14
As per aganders3's suggestion, do note that you can also use numpy arrays if needed:
import numpy
a = numpy.array([1,2,3])
If the sparse part of your question is crucial I'd use scipy for that - it has support for sparse matrixes. You can define a 1xn matrix and use it as a vector. This works (the parameter is the size of the matrix, filled with zeroes by default):
sqeuclidean(scipy.sparse.coo_matrix((1,3)),scipy.sparse.coo_matrix((1,3))) # 0
There are many kinds of sparse matrixes, some dictionary based (see comment). You can define a row sparse matrix from a list like this:
scipy.sparse.csr_matrix([1,2,3])

Here is how you can do it in numpy:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([0, 0, 0])
c = np.sum(((a - b) ** 2)) # 14

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python (NumPy): Memory efficient array multiplication with fancy indexing - python

Related

How to get chunks of submatrices faster?

"Killed: 9" error when trying to construct a Scipy csr_matrix from a large NumPy array

How to use a sparse matrix in numpy.linalg.solve

python numpy vector math

Python - sparse vectors/distance calculation

Categories

Resources