Idea to speed up array processing

Idea to speed up array processing - python

I want to create a dataset B by processing a dataset A. Therefore, every column in A (~ 2 Mio.) has to be processed in a batch-fashion (putting through a neural network), resulting in 3 outputs which are stacked together and then e.g. stored in a numpy array.
My code looks like the following, which seems to be not the best solution.
# Load data
data = get_data()
# Storage for B
B = np.empty(shape=data.shape)
# Process data
for idx, data_B in enumerate(data):
# Process data
a, b, c = model(data_B)
# Reshape and feed in B
B[idx * batch_size:batch_size * (idx + 1)] = np.squeeze(np.concatenate((a, b, c), axis=1))
I am looking for ideas to speed up the stacking or assigning process. I do not know if it is possible for parallel processing since everything should be stored in the same array finally (the ordering is not important). Is there any python framework I can use?
Loading the data takes 29s (only done once), stacking and assigning takes 20s for a batch size of only 2. The model command takes < 1s, allocating the array takes 5s and all other part <1s.

Your arrays shapes, and especially number of dimensions, is unclear. I can make a few guesses from what works in the code. Your times suggest that things are very large, so memory management may a big issue. Creating large temporary arrays takes time.
What is data.shape? Probably 2d at least; B has the same shape
B = np.empty(shape=data.shape)
Now you iterate on the 1st dimension of data; lets call them rows, though they might be 2d or larger:
# Process data
for idx, data_B in enumerate(data):
# Process data
a, b, c = model(data_B)
What the nature of a, etc. I'm assuming arrays, with a shape similar to data_B. But that just a guess.
# Reshape and feed in B
B[idx * batch_size:batch_size * (idx + 1)] =
np.squeeze(np.concatenate((a, b, c), axis=1)
For concatenate to work a,b,c must be 2d (at least). Lets guess they are all (n,m). The result is (n,3m). Why the squeeze? Is the shape (1,3m)?
I don't know batch_size. But with anything other than 1 I don't think this works. B[idx:idx+1, :] = ... works since idx ranges the B.shape[0], but with other values it would produce an error.
With this batchsize slice indexing it almost looks like you are trying to string out the iteration values in a long 1d array, batchsize values per iteration. But that doesn't fit with B matching data in shape.
That puzzle aside, I wonder if you really need the concatenate. Can you initial B so you can assign values directly, e.g.
B[idx, 0, ...] = a
B[idx, 1, ...] = b
etc
Reshaping a array after filling is trivial. Even transposing axes isn't too time consuming.

Related

1D Convolution of 2D arrays

I have 2 arrays of sets of signals, both 16x90000 arrays. In other words, 2 arrays with 16 signals in each. I want to perform matched filtering on the signals, row by row, correlating row 1 of array 1 with row 1 of array 2, and so forth. I've tried using scipy's signal.convolve2D but it is extremely slow, taking tens of seconds to convolve even a 2x90000 array. I'm not sure if I am simply implementing wrong, or if there is a more efficient way of achieving what I want. I know the arrays are long, but I feel it should still be achievable. I have a feeling convolve2d is actually convolving to a squared factor higher than I want and convolving rows by columns too but I may be misunderstanding.
My implementation:
A.shape = (16,90000) # an array of 16 signals each 90000 samples long
B.shape = (16,90000) # another array of 16 signals each 90000 samples long
corr = sig.convolve2d(A,B,mode='same')
I haven't had much coffee yet so there's every chance I'm being stupid right now.
Please no for loops.

Since you need to correlate the signals row by row, the most basic solution would be:
import numpy as np
from scipy.signal import correlate
# sample inputs: A and B both have n signals of length m
n, m = 2, 5
A = np.random.randn(n, m)
B = np.random.randn(n, m)
C = np.vstack([correlate(a, b, mode="same") for a, b in zip(A, B)])
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]]
One way to avoid a looped solution could be by bootlegging off a deep learning library, like PyTorch. Torch's Conv1d (though named conv, it effectively performs cross-correlation) can handle this scenario.
import torch
import torch.nn.functional as F
# Convert A and B to torch tensors
P = torch.from_numpy(A).unsqueeze(0) # (1, n, m)
Q = torch.from_numpy(B).unsqueeze(1) # (n, 1, m)
# Use conv1d --- with groups = n
def torch_correlate(A, B, n):
with torch.no_grad():
return F.conv1d(A, B, bias=None, stride=1, groups=n, padding="same").squeeze(0).numpy()
R = torch_correlate(P, Q, n)
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]
However, I believe there shouldn't be any significant difference in the results, since grouping might be using some form of iteration internally as well. (Plus there is an overhead of converting from torch to numpy and back to consider).
I would suggest using the first method generally. Unless if you are working on really large signals, then you could theoretically use the PyTorch version to run it really fast on GPU, which you won't be able to do with the regular scipy one.

Matrix-vector-multiplication with tensors in numpy

I have a numpy.array A with shape (l,l) and another numpy.array B with shape (l,m,n). Usually, the second and third dimension in B correspond to spatial cells and the first to something else.
I want to compute
l,m,n = 2,3,4 # dummy dimensions
A = np.random.rand(l,l) # dummy data
B = np.random.rand(l,m,n) # dummy data
C = np.zeros((l,m,n))
for i in range(m):
for j in range(n):
C[:,i,j] = A#B[:,i,j]
i.e., in every spatial cell, I want to perform a matrix-vector-multiplication.
Since I have to do this frequently, I would like to know, if there's a more compact way to write this with numpy. (Especially, because there are several situations in which the tensor has shape (l,m,n,o,p).)
Thank you in advance!

I found the answer using np.einsum:
np.einsum('ij,jkl->ikl', A,B)
Explanation:
Einstein notation implies that we sum over matching subscripts.
np.einsum('ij,jkl->ikl', A,B)
= rewritten in math terms
A_{i,j} B_{j,k,l}
= Einstein notation implies summation
sum_j A_{i,j} B_{j,k,l}

Is there an efficient way to combine 4 small arrays into one big array using an interleaved 'mosaic' pattern?

I have 4 square arrays of the same shape
array1 = 1*np.ones((10,10))
array2 = 2*np.ones((10,10))
array3 = 3*np.ones((10,10))
array4 = 4*np.ones((10,10))
I want to recombine them into one big array in an interleaved mosaic pattern as such:
result = np.asarray([[1,2,1,2,...,1,2],\
[3,4,3,4,...,3,4],\
[1,2,1,2,...,1,2],\
...
[3,4,3,4,...,3,4]])
Where result is twice as big in both dimensions as the original individual images.
Is there an efficient way to do this?
To illustrate my question, I used arrays containing constant values but in reality, these 4 arrays would be different images.

Two common approaches for interlacing data in numpy are:
A) Assign each source to a slice of a blank result array, corresponding to where the data should go:
result = np.zeros((20, 20)) # allocate space
result[::2, ::2] = array1 # put those values in the appropriate spots
result[::2, 1::2] = array2
result[1::2, ::2] = array3
result[1::2, 1::2] = array4
B) use stacking to stick the data together in a single array, and then reshape to flatten the data in a way that leaves it interlaced. This typically requires a bit of trial and error, but after playing around with the REPL a bit I came up with:
result = np.hstack((np.dstack((array1, array2)), np.dstack((array3, array4)))).reshape(20, 20)

Replicating a matrix in pandas or numpy to a certain size

I have a matrix A which is (41, 41) which is a dataframe.
B is a matrix of size (7154, 8240), ndarray.
I want replicate A (keeping the whole 41x41 matrix intact) to the size of B. It will not fit exactly, but then it should just clip the rows that does not fit.
This is to be able to multiply A*B.
I tried this code, but I cannot multiply with a float.
repeat = pd.concat([A]*(B.shape[0]/A.shape[0]), axis=0, ignore_index=True)
filter_large = pd.concat([repeat]*(B.shape[1]/A.shape[1]), axis=1, ignore_index=True)
filter_l = filter_large.values # change from a dataframe to a numpy array
AB = A*filter_l
I should mention that I've tried numpy.resize but it does not keep the matrix intact, mixing up all rows which is not what I want.

This code will do what you ask for:
shapeMultiples = (np.ceil(B.shape[0]/A.shape[0]).astype(int), np.ceil(B.shape[1]/A.shape[1]).astype(int))
res = np.tile(A, shapeMultiples)[:B.shape[0], :B.shape[1]]
Explanation:
np.tile(A, reps) repeats the matrix A multiple times along each axis. How often it is repeated is specified for each axis in reps.
For your example it should be repeated b.shape[0]/a.shape[0] times along axis 0 and b.shape[1]/a.shape[1] times along axis 1. However you have to round these values up, to make sure it extends the size of matrix B, which is what np.ceil does. Since reps is expected to be a shape of integers but ceil returns floats, we have to cast the type to int.
In the final step we cut of the result to make it fit the size of B with [:B.shape[0], :B.shape[1]].

np.bincount for 1 line, vectorized multidimensional averaging

I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)

One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Idea to speed up array processing - python

Related

1D Convolution of 2D arrays

Matrix-vector-multiplication with tensors in numpy

Is there an efficient way to combine 4 small arrays into one big array using an interleaved 'mosaic' pattern?

Replicating a matrix in pandas or numpy to a certain size

np.bincount for 1 line, vectorized multidimensional averaging

Categories

Resources