Large point-matrix array multiplication in numpy - python

Given two large numpy arrays, one for a list of 3D points, and another for a list of transformation matrices. Assuming there is a 1 to 1 correspondence between the two lists, i'm looking for the best way to calculate the result array of each point transformed by it's corresponding matrix.
My solution to do this was to use slicing (see "test4" in the example code below) which worked fine with small arrays, but fails with large arrays because of how memory-wasteful my method is :)
import numpy as np
COUNT = 100
matrix = np.random.random_sample((3,3,)) # A single matrix
matrices = np.random.random_sample((COUNT,3,3,)) # Many matrices
point = np.random.random_sample((3,)) # A single point
points = np.random.random_sample((COUNT,3,)) # Many points
# Test 1, result of a single point multiplied by a single matrix
# This is as easy as it gets
test1 = np.dot(point,matrix)
print 'done'
# Test 2, result of a single point multiplied by many matrices
# This works well and returns a transformed point for each matrix
test2 = np.dot(point,matrices)
print 'done'
# Test 3, result of many points multiplied by a single matrix
# This works also just fine
test3 = np.dot(points,matrix)
print 'done'
# Test 4, this is the case i'm trying to solve. Assuming there's a 1-1
# correspondence between the point and matrix arrays, the result i want
# is an array of points, where each point has been transformed by it's
# corresponding matrix
test4 = np.zeros((COUNT,3))
for i in xrange(COUNT):
test4[i] = np.dot(points[i],matrices[i])
print 'done'
With a small array, this works fine. With large arrays, (COUNT=1000000) Test #4 works but gets rather slow.
Is there a way to make Test #4 faster? Presuming without using a loop?

You can use numpy.einsum. Here's an example with 5 matrices and 5 points:
In [49]: matrices.shape
Out[49]: (5, 3, 3)
In [50]: points.shape
Out[50]: (5, 3)
In [51]: p = np.einsum('ijk,ik->ij', matrices, points)
In [52]: p[0]
Out[52]: array([ 1.16532051, 0.95155227, 1.5130032 ])
In [53]: matrices[0].dot(points[0])
Out[53]: array([ 1.16532051, 0.95155227, 1.5130032 ])
In [54]: p[1]
Out[54]: array([ 0.79929572, 0.32048587, 0.81462493])
In [55]: matrices[1].dot(points[1])
Out[55]: array([ 0.79929572, 0.32048587, 0.81462493])
The above is doing matrix[i] * points[i] (i.e. multiplying on the right), but I just reread the question and noticed that your code uses points[i] * matrix[i]. You can do that by switching the indices and arguments of einsum:
In [76]: lp = np.einsum('ij,ijk->ik', points, matrices)
In [77]: lp[0]
Out[77]: array([ 1.39510822, 1.12011057, 1.05704609])
In [78]: points[0].dot(matrices[0])
Out[78]: array([ 1.39510822, 1.12011057, 1.05704609])
In [79]: lp[1]
Out[79]: array([ 0.49750324, 0.70664634, 0.7142573 ])
In [80]: points[1].dot(matrices[1])
Out[80]: array([ 0.49750324, 0.70664634, 0.7142573 ])

It doesn't make much sense to have multiple transform matrices. You can combine transform matrices as in this question:
If I want to apply matrix A, then B, then C, I will multiply the matrices in reverse order np.dot(C,np.dot(B,A))
So you can save some memory space by precomputing that matrix. Then applying a bunch of vectors to one transform matrix should be easily handled (within reason).
I don't know why you would need one million transformation on one million vectors, but I would suggest buying a larger RAM.
Edit:
There isn't a way to reduce the operations, no. Unless your transformation matrices hold a specific property such as sparsity, diagonality, etc. you're going to have to run all multiplications and summations. However, the way you process these operations can be optimized across cores and/or using vector operations on GPUs.
Also, python is notably slow. You can try splitting numpy across your cores using NumExpr. Or maybe use a BLAS framework on C++ (notably quick ;))

Related

1D Convolution of 2D arrays

I have 2 arrays of sets of signals, both 16x90000 arrays. In other words, 2 arrays with 16 signals in each. I want to perform matched filtering on the signals, row by row, correlating row 1 of array 1 with row 1 of array 2, and so forth. I've tried using scipy's signal.convolve2D but it is extremely slow, taking tens of seconds to convolve even a 2x90000 array. I'm not sure if I am simply implementing wrong, or if there is a more efficient way of achieving what I want. I know the arrays are long, but I feel it should still be achievable. I have a feeling convolve2d is actually convolving to a squared factor higher than I want and convolving rows by columns too but I may be misunderstanding.
My implementation:
A.shape = (16,90000) # an array of 16 signals each 90000 samples long
B.shape = (16,90000) # another array of 16 signals each 90000 samples long
corr = sig.convolve2d(A,B,mode='same')
I haven't had much coffee yet so there's every chance I'm being stupid right now.
Please no for loops.
Since you need to correlate the signals row by row, the most basic solution would be:
import numpy as np
from scipy.signal import correlate
# sample inputs: A and B both have n signals of length m
n, m = 2, 5
A = np.random.randn(n, m)
B = np.random.randn(n, m)
C = np.vstack([correlate(a, b, mode="same") for a, b in zip(A, B)])
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]]
One way to avoid a looped solution could be by bootlegging off a deep learning library, like PyTorch. Torch's Conv1d (though named conv, it effectively performs cross-correlation) can handle this scenario.
import torch
import torch.nn.functional as F
# Convert A and B to torch tensors
P = torch.from_numpy(A).unsqueeze(0) # (1, n, m)
Q = torch.from_numpy(B).unsqueeze(1) # (n, 1, m)
# Use conv1d --- with groups = n
def torch_correlate(A, B, n):
with torch.no_grad():
return F.conv1d(A, B, bias=None, stride=1, groups=n, padding="same").squeeze(0).numpy()
R = torch_correlate(P, Q, n)
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]
However, I believe there shouldn't be any significant difference in the results, since grouping might be using some form of iteration internally as well. (Plus there is an overhead of converting from torch to numpy and back to consider).
I would suggest using the first method generally. Unless if you are working on really large signals, then you could theoretically use the PyTorch version to run it really fast on GPU, which you won't be able to do with the regular scipy one.

Find correspondence between 2 arrays of coordinates

I have two large 2D arrays (3x100,000) corresponding to 3D coordinates and I would like to find index of each correspondence.
An example:
mat1 = np.array([[0,0,0],[0,0,0],[0,0,0],[10,11,12],[1,2,3]]).T
mat2 = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]]).T
So here I need to obtain indexes of 3 and 0. And I need to find each correspondence on around 100,000 coordinates. Is there a specific function in Python to do this work? Apply a for loop could be probl
res = [3,0]
To sum up, my need:
We can use Cython-powered kd-tree for quick nearest-neighbor lookup -
In [77]: from scipy.spatial import cKDTree
In [78]: d,idx = cKDTree(mat2.T).query(mat1.T, k=1)
In [79]: idx[np.isclose(d,0)]
Out[79]: array([3, 0])

Differences between array class and matrix class in numpy for matrix operation

I was trying to do matrix dot product and transpose with Numpy, and I found array can do many things matrix can do, such as dot product, point wise product, and transpose.
When I have to create a matrix, I have to create an array first.
example:
import numpy as np
array = np.ones([3,1])
matrix = np.matrix(array)
Since I can do matrix transpose and dot product in array type, I don't have to convert array into matrix to do matrix operations.
For example, the following line is valid, where A is an ndarray :
dot_product = np.dot(A.T, A )
The previous matrix operation can be expressed with matrix class variable A
dot_product = A.T * A
The operator * is exactly the same as point-wise product for ndarray. Therefore, it makes ndarray and matrix almost indistinguishable and causes confusions.
The confusion is a serious problem, as said in REP465
Writing code using numpy.matrix also works fine. But trouble begins as
soon as we try to integrate these two pieces of code together. Code
that expects an ndarray and gets a matrix, or vice-versa, may crash or
return incorrect results. Keeping track of which functions expect
which types as inputs, and return which types as outputs, and then
converting back and forth all the time, is incredibly cumbersome and
impossible to get right at any scale.
It will be very tempting if we stick to ndarray and deprecate matrix and support ndarray with matrix operation methods such as .inverse(), .hermitian(), outerproduct(), etc, in the future.
The major reason I still have to use matrix class is that it handles 1d array as 2d array, so I can transpose it.
It is very inconvenient so far how I transpose 1d array, since 1d array of size n has shape (n,) instead of (1,n). For example, if I have to do the inner product of two arrays :
A = [[1,1,1],[2,2,2].[3,3,3]]
B = [[1,2,3],[1,2,3],[1,2,3]]
np.dot(A,B) works fine, but if
B = [1,1,1]
,its transpose is still a row vector.
I have to handle this exception when the dimensions of input variable is unknown.
I hope this help some people with the same trouble, and hope to know if there is any better way to handle matrix operation like in Matlab, especially with 1d array. Thanks.
Your first example is a column vector:
In [258]: x = np.arange(3).reshape(3,1)
In [259]: x
Out[259]:
array([[0],
[1],
[2]])
In [260]: xm = np.matrix(x)
dot produces the inner product, and dimensions operate as: (1,2),(2,1)=>(1,1)
In [261]: np.dot(x.T, x)
Out[261]: array([[5]])
the matrix product does the same thing:
In [262]: xm.T * xm
Out[262]: matrix([[5]])
(The same thing with 1d arrays produces a scalar value, np.dot([0,1,2],[0,1,2]) # 5)
element multiplication of the arrays produces the outer product (so does np.outer(x, x) and np.dot(x,x.T))
In [263]: x.T * x
Out[263]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
For ndarray, * IS element wise multiplication (the .* of MATLAB, but with broadcasting added). For element multiplication of matrix use np.multiply(xm,xm). (scipy sparse matrices have a multiply method, X.multiply(other))
You quote from the PEP that added the # operator (matmul). This, as well as np.tensordot and np.einsum can handle larger dimensional arrays, and other mixes of products. Those don't make sense with np.matrix since that's restricted to 2d.
With your 3x3 A and B
In [273]: np.dot(A,B)
Out[273]:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
In [274]: C=np.array([1,1,1])
In [281]: np.dot(A,np.array([1,1,1]))
Out[281]: array([3, 6, 9])
Effectively this sums each row. np.dot(A,np.array([1,1,1])[:,None]) does the same thing, but returns a (3,1) array.
np.matrix was created years ago to make numpy (actually one of its predecessors) feel more like MATLAB. A key feature is that it is restricted to 2d. That's what MATLAB was like back in the 1990s. np.matrix and MATLAB don't have 1d arrays; instead they have single column or single row matrices.
If the fact that ndarrays can be 1d (or even 0d) is a problem there are many ways of adding that 2nd dimension. I prefer the [None,:] kind of syntax, but reshape is also useful. ndmin=2, np.atleast_2d, np.expand_dims also work.
np.sum and other operations that reduced dimensions have a keepdims=True parameter to counter that. The new # gives an operator syntax for matrix multiplication. As far as I know, np.matrix class does not have any compiled code of its own.
============
The method that implements * for np.matrix uses np.dot:
def __mul__(self, other):
if isinstance(other, (N.ndarray, list, tuple)) :
# This promotes 1-D vectors to row vectors
return N.dot(self, asmatrix(other))
if isscalar(other) or not hasattr(other, '__rmul__') :
return N.dot(self, other)
return NotImplemented

np.bincount for 1 line, vectorized multidimensional averaging

I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.

Iterate over matrices in numpy

How can you iterate over all 2^(n^2) binary n by n matrices (or 2d arrays) in numpy? I would something like:
for M in ....:
Do you have to use itertools.product([0,1], repeat = n**2) and then convert to a 2d numpy array?
This code will give me a random 2d binary matrix but that isn't what I need.
np.random.randint(2, size=(n,n))
Note that 2**(n**2) is a big number for even relatively small n, so your loop might run indefinetely long.
Being said that, one possible way to iterate matrices you need is for example
nxn = np.arange(n**2).reshape(n, -1)
for i in xrange(0, 2**(n**2)):
arr = (i >> nxn) % 2
# do smthng with arr
np.array(list(itertools.product([0,1], repeat = n**2))).reshape(-1,n,n)
produces a (2^(n^2),n,n) array.
There may be some numpy 'grid' function that does the same, but my recollection from other discussions is that itertools.product is pretty fast.
g=(np.array(x).reshape(n,n) for x in itertools.product([0,1], repeat = n**2))
is a generator that produces the nxn arrays one at time:
g.next()
# array([[0, 0],[0, 0]])
Or to produce the same 3d array:
np.array(list(g))

Categories

Resources