numpy standardize 2D subsets of a 4D array - python

Lets say I have a 4D array with shape (1,2,3,3):
test = np.array([[[[11,27,33],[45,58,96],[77,85,93]],[[55,27,39],[46,51,62],[73,86,98]]]])
Whats the most efficient way of standardizing/calculating z scores for a 2D subset? For example, test[0][0] looks like this:
array([[11, 27, 33],
[45, 58, 96],
[77, 85, 93]])
There are 2 dimensions here, but I want to calculate the mean and standard deviation across both dimensions, and use those values to standardize each value in these 2 dimensions.
I can do it manually like this:
(test[0][0] - np.mean(test[0][0])) / np.std(test[0][0])
Which correctly gives:
array([[-1.61593336, -1.06970236, -0.86486574],
[-0.45519249, -0.01137981, 1.2859188 ],
[ 0.63726949, 0.91038499, 1.18350049]])
However, this would require me to iterate over the first 2 dimensions of the 4D array which would take too long given the size of my actual data
I see that scipy has a zscore function but that only works in 1 dimension at a time: scipy.stats.zscore(test, axis=3) and haven't been able to find a simple implementation that standardizes across a 2D array

Approach #1 : You could make use of using np.mean and np.std over multiple axes (in this case over the last two axes) with axis=(2,3) and keep their number of dims same with keepdims=1 so that the later subtraction and division operations are broadcastable.
Thus, a vectorized implementation would be -
(test - test.mean(axis=(2,3),keepdims=1)) / test.std(axis=(2,3),keepdims=1)
Approach #2 : Alternative approach using the definition of std that would re-use the average calculations -
m = (test - test.mean(axis=(2,3),keepdims=1))
s = np.sqrt((np.abs(m)**2).mean(axis=(2,3),keepdims=1))
out = m/s
Approach #3 : For larger datasets, you might want to use numexpr module that does those summing/averaging operations quite efficiently -
import numexpr as ne
d0,d1 = test.shape[-2:]
m = (test - test.mean(axis=(2,3),keepdims=1))
m1 = m.reshape(-1,d0*d1)
s = np.sqrt(ne.evaluate('sum(abs(m1)**2,1)')/(d0*d1))
out = m/s[:,None,None]
Based on this post, we could replace those division by s with 1.0/s and then multiply it with m for further performance boost. This would be applicable across all above mentioned three approaches.

To do normalization in numpy, just make broadcasting match.
def normalize_nchw(inp):
EPS = 1e-6
means = np.mean(inp, axis=(2,3)).expand_dims(-1).expand_dims(-1)
inp -= means
vars = EPS + np.mean(inp*inp, axis=(2,3)).expand_dims(-1).expand_dims(-1)
inp *= (1./np.sqrt(vars))
Side note: if you are doing this for CNN, a better idea is to use batch normalization, which is built into many frameworks.

Related

1D Convolution of 2D arrays

I have 2 arrays of sets of signals, both 16x90000 arrays. In other words, 2 arrays with 16 signals in each. I want to perform matched filtering on the signals, row by row, correlating row 1 of array 1 with row 1 of array 2, and so forth. I've tried using scipy's signal.convolve2D but it is extremely slow, taking tens of seconds to convolve even a 2x90000 array. I'm not sure if I am simply implementing wrong, or if there is a more efficient way of achieving what I want. I know the arrays are long, but I feel it should still be achievable. I have a feeling convolve2d is actually convolving to a squared factor higher than I want and convolving rows by columns too but I may be misunderstanding.
My implementation:
A.shape = (16,90000) # an array of 16 signals each 90000 samples long
B.shape = (16,90000) # another array of 16 signals each 90000 samples long
corr = sig.convolve2d(A,B,mode='same')
I haven't had much coffee yet so there's every chance I'm being stupid right now.
Please no for loops.
Since you need to correlate the signals row by row, the most basic solution would be:
import numpy as np
from scipy.signal import correlate
# sample inputs: A and B both have n signals of length m
n, m = 2, 5
A = np.random.randn(n, m)
B = np.random.randn(n, m)
C = np.vstack([correlate(a, b, mode="same") for a, b in zip(A, B)])
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]]
One way to avoid a looped solution could be by bootlegging off a deep learning library, like PyTorch. Torch's Conv1d (though named conv, it effectively performs cross-correlation) can handle this scenario.
import torch
import torch.nn.functional as F
# Convert A and B to torch tensors
P = torch.from_numpy(A).unsqueeze(0) # (1, n, m)
Q = torch.from_numpy(B).unsqueeze(1) # (n, 1, m)
# Use conv1d --- with groups = n
def torch_correlate(A, B, n):
with torch.no_grad():
return F.conv1d(A, B, bias=None, stride=1, groups=n, padding="same").squeeze(0).numpy()
R = torch_correlate(P, Q, n)
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]
However, I believe there shouldn't be any significant difference in the results, since grouping might be using some form of iteration internally as well. (Plus there is an overhead of converting from torch to numpy and back to consider).
I would suggest using the first method generally. Unless if you are working on really large signals, then you could theoretically use the PyTorch version to run it really fast on GPU, which you won't be able to do with the regular scipy one.

Suggestion to vectorize a Python function

I wrote the following function, which takes as inputs three 1D array (namely int_array, x, and y) and a number lim. The output is a number as well.
def integrate_to_lim(int_array, x, y, lim):
if lim >= np.max(x):
res = 0.0
if lim <= np.min(x):
res = int_array[0]
else:
index = np.argmax(x > lim) # To find the first element of x larger than lim
partial = int_array[index]
slope = (y[index-1] - y[index]) / (x[index-1] - x[index])
rest = (x[index] - lim) * (y[index] + (lim - x[index]) * slope / 2.0)
res = partial + rest
return res
Basically, outside form the limit cases lim>=np.max(x) and lim<=np.min(x), the idea is that the function finds the index of the first value of the array x larger than lim and then uses it to make some simple calculations.
In my case, however lim can also be a fairly big 2D array (shape ~2000 times ~1000 elements)
I would like to rewrite it such that it makes the same calculations for the case that lim is a 2D array.
Obviously, the output should also be a 2D array of the same shape of lim.
I am having a real struggle figuring out how to vectorize it.
I would like to stick only to the numpy package.
PS I want to vectorize my function because efficiency is important and as I understand using for loops is not a good choice in this regard.
Edit: my attempt
I was not aware of the function np.take, which made the task way easier.
Here is my brutal attempt that seems to work (suggestions on how to clean up or to make the code faster are more than welcome).
def integrate_to_lim_vect(int_array, x, y, lim_mat):
lim_mat = np.asarray(lim_mat) # Make sure that it is an array
shape_3d = list(lim_mat.shape) + [1]
x_3d = np.ones(shape_3d) * x # 3 dimensional version of x
lim_3d = np.expand_dims(lim_mat, axis=2) * np.ones(x_3d.shape) # also 3d
# I use np.argmax on the 3d matrices (is there a simpler way?)
index_mat = np.argmax(x_3d > lim_3d, axis=2)
# Silly calculations
partial = np.take(int_array, index_mat)
y1_mat = np.take(y, index_mat)
y2_mat = np.take(y, index_mat - 1)
x1_mat = np.take(x, index_mat)
x2_mat = np.take(x, index_mat - 1)
slope = (y1_mat - y2_mat) / (x1_mat - x2_mat)
rest = (x1_mat - lim_mat) * (y1_mat + (lim_mat - x1_mat) * slope / 2.0)
res = partial + rest
# Make the cases with np.select
condlist = [lim_mat >= np.max(x), lim_mat <= np.min(x)]
choicelist = [0.0, int_array[0]] # Shoud these options be a 2d matrix?
output = np.select(condlist, choicelist, default=res)
return output
I am aware that if the limit is larger than the maximum value in the array np.argmax returns the index zero (leading to wrong results). This is why I used np.select to check and correct for these cases.
Is it necessary to define the three dimensional matrices x_3d and lim_3d, or there is a simpler way to find the 2D matrix of the indices index_mat?
Suggestions, especially to improve the way I expanded the dimension of the arrays, are welcome.
I think you can solve this using two tricks. First, a 2d array can be easily flattened to a 1d array, and then your answers can be converted back into a 2d array with reshape.
Next, your use of argmax suggests that your array is sorted. Then you can find your full set of indices using digitize. Thus instead of a single index, you will get a complete array of indices. All the calculations you are doing are intrinsically supported as array operations in numpy, so that should not cause any problems.
You will have to specifically look at the limiting cases. If those are rare enough, then it might be okay to let the answers be derived by the default formula (they will be garbage values), and then replace them with the actual values you desire.

Multiply and sum numpy arrays with shapes (x,y,z) and (x,)

So I have a 3D data-set (x,y,z), and i want to sum over one of the axes (x) with a set of weights, w = w(x). The start and end index i am summing over is different for every (y,z), I have solved this by masking the 3D-array. The weights are constant with regard to the two variables i am not summing over. Both answers regarding implementation and mathematics are appreciated (is there a smart linalg. way of doing this?).
I have a 3D masked array (A) of shape (x,y,z) and a 1D array (t) of shape (x,). Is there a good way to multiply every (y,z) element in A with the corresponding number in t without expanding t to a 3D array? My current solution is using np.tensordot to make a 3D array of the same shape as A, that holds all the t-values, but it feels very unsatisfactory to spend runtime building the "new_t" array, which is essensially just y*z copies of t.
Example of current solution:
a1 = np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
a2 = np.array([[0,1,2,3],
[4,5,6,7],
[8,9,10,11]])
#note: A is a masked array, mask is a 3D array of bools
A = np.ma.masked_array([a1,a2],mask)
t = np.array([10,11])
new_t = np.tensordot(t, np.ones(A[0].shape), axes = 0)
return np.sum(A*new_t, axis=0)
In essence i want to perform t*A[:,i,j] for all i,j with the shortest possible runtime, preferably without using too many other libraries than numpy and scipy.
Another way of producing desired output (again, with far too high run time):
B = [[t*A[:,i,j] for j in range(A.shape[2])] for i in range(A.shape[1])]
return np.sum(B,axis=2)
inspired by #phipsgabler comment
arr1 = np.tensordot(A.T,t,axes=1).T
arr1
array([[ 10, 31, 52, 73],
[ 94, 115, 136, 157],
[178, 199, 220, 241]])
Thanks for good answers! Using tensordot like #alyhosny proposed worked, but replacing masked values with zeros using
A = np.ma.MaskedArray.filled(A,0)
before summing with einsum (thanks #phipsgabler) gave half the run time. Final code:
A = np.ma.MaskedArray(A,mask)
A = np.ma.MaskedArray.filled(A,0)
return np.einsum('ijk,i->jk',A,t)

np.bincount for 1 line, vectorized multidimensional averaging

I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.

Large point-matrix array multiplication in numpy

Given two large numpy arrays, one for a list of 3D points, and another for a list of transformation matrices. Assuming there is a 1 to 1 correspondence between the two lists, i'm looking for the best way to calculate the result array of each point transformed by it's corresponding matrix.
My solution to do this was to use slicing (see "test4" in the example code below) which worked fine with small arrays, but fails with large arrays because of how memory-wasteful my method is :)
import numpy as np
COUNT = 100
matrix = np.random.random_sample((3,3,)) # A single matrix
matrices = np.random.random_sample((COUNT,3,3,)) # Many matrices
point = np.random.random_sample((3,)) # A single point
points = np.random.random_sample((COUNT,3,)) # Many points
# Test 1, result of a single point multiplied by a single matrix
# This is as easy as it gets
test1 = np.dot(point,matrix)
print 'done'
# Test 2, result of a single point multiplied by many matrices
# This works well and returns a transformed point for each matrix
test2 = np.dot(point,matrices)
print 'done'
# Test 3, result of many points multiplied by a single matrix
# This works also just fine
test3 = np.dot(points,matrix)
print 'done'
# Test 4, this is the case i'm trying to solve. Assuming there's a 1-1
# correspondence between the point and matrix arrays, the result i want
# is an array of points, where each point has been transformed by it's
# corresponding matrix
test4 = np.zeros((COUNT,3))
for i in xrange(COUNT):
test4[i] = np.dot(points[i],matrices[i])
print 'done'
With a small array, this works fine. With large arrays, (COUNT=1000000) Test #4 works but gets rather slow.
Is there a way to make Test #4 faster? Presuming without using a loop?
You can use numpy.einsum. Here's an example with 5 matrices and 5 points:
In [49]: matrices.shape
Out[49]: (5, 3, 3)
In [50]: points.shape
Out[50]: (5, 3)
In [51]: p = np.einsum('ijk,ik->ij', matrices, points)
In [52]: p[0]
Out[52]: array([ 1.16532051, 0.95155227, 1.5130032 ])
In [53]: matrices[0].dot(points[0])
Out[53]: array([ 1.16532051, 0.95155227, 1.5130032 ])
In [54]: p[1]
Out[54]: array([ 0.79929572, 0.32048587, 0.81462493])
In [55]: matrices[1].dot(points[1])
Out[55]: array([ 0.79929572, 0.32048587, 0.81462493])
The above is doing matrix[i] * points[i] (i.e. multiplying on the right), but I just reread the question and noticed that your code uses points[i] * matrix[i]. You can do that by switching the indices and arguments of einsum:
In [76]: lp = np.einsum('ij,ijk->ik', points, matrices)
In [77]: lp[0]
Out[77]: array([ 1.39510822, 1.12011057, 1.05704609])
In [78]: points[0].dot(matrices[0])
Out[78]: array([ 1.39510822, 1.12011057, 1.05704609])
In [79]: lp[1]
Out[79]: array([ 0.49750324, 0.70664634, 0.7142573 ])
In [80]: points[1].dot(matrices[1])
Out[80]: array([ 0.49750324, 0.70664634, 0.7142573 ])
It doesn't make much sense to have multiple transform matrices. You can combine transform matrices as in this question:
If I want to apply matrix A, then B, then C, I will multiply the matrices in reverse order np.dot(C,np.dot(B,A))
So you can save some memory space by precomputing that matrix. Then applying a bunch of vectors to one transform matrix should be easily handled (within reason).
I don't know why you would need one million transformation on one million vectors, but I would suggest buying a larger RAM.
Edit:
There isn't a way to reduce the operations, no. Unless your transformation matrices hold a specific property such as sparsity, diagonality, etc. you're going to have to run all multiplications and summations. However, the way you process these operations can be optimized across cores and/or using vector operations on GPUs.
Also, python is notably slow. You can try splitting numpy across your cores using NumExpr. Or maybe use a BLAS framework on C++ (notably quick ;))

Categories

Resources