So I have a 3D data-set (x,y,z), and i want to sum over one of the axes (x) with a set of weights, w = w(x). The start and end index i am summing over is different for every (y,z), I have solved this by masking the 3D-array. The weights are constant with regard to the two variables i am not summing over. Both answers regarding implementation and mathematics are appreciated (is there a smart linalg. way of doing this?).
I have a 3D masked array (A) of shape (x,y,z) and a 1D array (t) of shape (x,). Is there a good way to multiply every (y,z) element in A with the corresponding number in t without expanding t to a 3D array? My current solution is using np.tensordot to make a 3D array of the same shape as A, that holds all the t-values, but it feels very unsatisfactory to spend runtime building the "new_t" array, which is essensially just y*z copies of t.
Example of current solution:
a1 = np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
a2 = np.array([[0,1,2,3],
[4,5,6,7],
[8,9,10,11]])
#note: A is a masked array, mask is a 3D array of bools
A = np.ma.masked_array([a1,a2],mask)
t = np.array([10,11])
new_t = np.tensordot(t, np.ones(A[0].shape), axes = 0)
return np.sum(A*new_t, axis=0)
In essence i want to perform t*A[:,i,j] for all i,j with the shortest possible runtime, preferably without using too many other libraries than numpy and scipy.
Another way of producing desired output (again, with far too high run time):
B = [[t*A[:,i,j] for j in range(A.shape[2])] for i in range(A.shape[1])]
return np.sum(B,axis=2)
inspired by #phipsgabler comment
arr1 = np.tensordot(A.T,t,axes=1).T
arr1
array([[ 10, 31, 52, 73],
[ 94, 115, 136, 157],
[178, 199, 220, 241]])
Thanks for good answers! Using tensordot like #alyhosny proposed worked, but replacing masked values with zeros using
A = np.ma.MaskedArray.filled(A,0)
before summing with einsum (thanks #phipsgabler) gave half the run time. Final code:
A = np.ma.MaskedArray(A,mask)
A = np.ma.MaskedArray.filled(A,0)
return np.einsum('ijk,i->jk',A,t)
Related
I have a matrix called vec with two columns, vec[:,0] and vec[:,1]. P contains two matrices, P[0,:,:] and P[1,:,:]. I want to mulitiply P[0,:,:] with the first column of vec and multiply P[1,:,:] with the second column of vec. However, the operation P#vec also gives me the matrix product of P[0,:,:] with the second column of vec and the matrix product of P[1,:,:] with the first column of vec, which slows my code.
Is it possible to directly compute the pairs column 1 to matrix 1 and column 2 to matrix 2 without the "off" products?
import numpy as np
P=np.arange(50).reshape(2, 5, 5)
vec=np.arange(10).reshape(5,2)
have=P#vec
want=np.column_stack((have[0,:,0],have[1,:,1]))
have,want
There is a very powerful function in numpy called np.einsum. It can perform all kind of tensor contractions, axis reordering and matrix multiplication. For your example you could use
res = np.einsum('nij,jn->in', P, vec)
after which res is exactly like want.
How does this work:
You give the np.einsum function both your arrays as well as a signature (that 'nij,jn->in' string) that tells the function how to multiply the arrays. In short, you want the third axis of the P tensor to be contracted with the first axis of vec. Therefore you choose the same index j in the signature string and leave it out in the part after the ->. A mere broadcast is done if indices appear on the left and right hand side of the ->, which is done here for the n and i indices.
A more complete explanation of this very powerful function with many examples of how to use it can be found at the corresponding numpy documentation.
#/matmul handles batches nicely, but the rules are that for 3d arrays, the first dimension is the batch, and dot is done on the last 2 dimensions, with the usual "last of A with the second to the last of B" pairing.
It took a bit of reading to decipher you description but it appears that you want the first of p to the batch, and last of vec to be the batch. That means vec needs to transformed to a (2,5,1) to work with the (2,5,5) p.
In [176]: P#vec.T[:,:,None]
Out[176]:
array([[[ 60],
[ 160],
[ 260],
[ 360],
[ 460]],
[[ 695],
[ 820],
[ 945],
[1070],
[1195]]])
The result is (2,5,1). We can squeeze out the the last to get (2,5), but apparently you want a (5,2)
In [179]: (P#vec.T[:,:,None])[...,0].T
Out[179]:
array([[ 60, 695],
[ 160, 820],
[ 260, 945],
[ 360, 1070],
[ 460, 1195]])
np.einsum('nij,jn->in', P, vec) does effectively the same, with the n as the batch dimension that is 'carried through' to the result, and sum-of-products on the shared j dimension.
I have a 3d numpy array that looks like this
A = np.random.randin(0, 10, (23, 23, 39)) # H, W, D
And wish to random sample from its depth to reach a 2d array with H and W only
Note … this doesn't work
B = A[np.random.randint(0, 39, (23,23))]
I think this is what you're looking for:
B = np.array([x[np.random.randint(A.shape[2])] for y in A for x in y]).reshape(A.shape[:-1])
A little explanation: we use list comprehension to iterate, two dimensionally, over every sub-array in the list (y iterates over dimension 0, x iterates over dimension 1, we get arrays of dimension 2)
In each of these arrays, we then take a random number.
The result is a large one dimensional array containing one element from each sub-array. We finally resize the array so it is the shape of A, minus the last dimension (in our case, 23 x 23)|
Hope it's what you're looking for!
Lets say I have a 4D array with shape (1,2,3,3):
test = np.array([[[[11,27,33],[45,58,96],[77,85,93]],[[55,27,39],[46,51,62],[73,86,98]]]])
Whats the most efficient way of standardizing/calculating z scores for a 2D subset? For example, test[0][0] looks like this:
array([[11, 27, 33],
[45, 58, 96],
[77, 85, 93]])
There are 2 dimensions here, but I want to calculate the mean and standard deviation across both dimensions, and use those values to standardize each value in these 2 dimensions.
I can do it manually like this:
(test[0][0] - np.mean(test[0][0])) / np.std(test[0][0])
Which correctly gives:
array([[-1.61593336, -1.06970236, -0.86486574],
[-0.45519249, -0.01137981, 1.2859188 ],
[ 0.63726949, 0.91038499, 1.18350049]])
However, this would require me to iterate over the first 2 dimensions of the 4D array which would take too long given the size of my actual data
I see that scipy has a zscore function but that only works in 1 dimension at a time: scipy.stats.zscore(test, axis=3) and haven't been able to find a simple implementation that standardizes across a 2D array
Approach #1 : You could make use of using np.mean and np.std over multiple axes (in this case over the last two axes) with axis=(2,3) and keep their number of dims same with keepdims=1 so that the later subtraction and division operations are broadcastable.
Thus, a vectorized implementation would be -
(test - test.mean(axis=(2,3),keepdims=1)) / test.std(axis=(2,3),keepdims=1)
Approach #2 : Alternative approach using the definition of std that would re-use the average calculations -
m = (test - test.mean(axis=(2,3),keepdims=1))
s = np.sqrt((np.abs(m)**2).mean(axis=(2,3),keepdims=1))
out = m/s
Approach #3 : For larger datasets, you might want to use numexpr module that does those summing/averaging operations quite efficiently -
import numexpr as ne
d0,d1 = test.shape[-2:]
m = (test - test.mean(axis=(2,3),keepdims=1))
m1 = m.reshape(-1,d0*d1)
s = np.sqrt(ne.evaluate('sum(abs(m1)**2,1)')/(d0*d1))
out = m/s[:,None,None]
Based on this post, we could replace those division by s with 1.0/s and then multiply it with m for further performance boost. This would be applicable across all above mentioned three approaches.
To do normalization in numpy, just make broadcasting match.
def normalize_nchw(inp):
EPS = 1e-6
means = np.mean(inp, axis=(2,3)).expand_dims(-1).expand_dims(-1)
inp -= means
vars = EPS + np.mean(inp*inp, axis=(2,3)).expand_dims(-1).expand_dims(-1)
inp *= (1./np.sqrt(vars))
Side note: if you are doing this for CNN, a better idea is to use batch normalization, which is built into many frameworks.
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.
I got some working code using einsum function. But as einsum is currently still like black voodoo for me. I was wondering, what this code actually is doing and if it can be somehow optimized using np.dot
My data looks likes this
n, p, q = 40000, 8, 4
a = np.random.rand(n, p, q)
b = np.random.rand(n, p)
And my existing functions einsum functions looks like this
f1 = np.einsum("ijx,ijy->ixy", a, a)
f2 = np.einsum("ijx,ij->ix", a, b)
But what does it really do? I get till here: each dimension (axis) is represented by a label, i is equal to the first axis n, j for the 2nd axis p and x and y are different labels for the same axis q.
So the order of the output array of f1 is ixy and thus the output shape is 40000,4,4 (n,q,q)
But that's as far as I get. And
Lets play around with a couple of small arrays
In [110]: a=np.arange(2*3*4).reshape(2,3,4)
In [111]: b=np.arange(2*3).reshape(2,3)
In [112]: np.einsum('ijx,ij->ix',a,b)
Out[112]:
array([[ 20, 23, 26, 29],
[200, 212, 224, 236]])
In [113]: np.diagonal(np.dot(b,a)).T
Out[113]:
array([[ 20, 23, 26, 29],
[200, 212, 224, 236]])
np.dot operates on the last dim of the 1st array, and 2nd to the last of the 2nd. So I have to switch the arguments so the 3 dimension lines up. dot(b,a) produces a (2,2,4) array. diagonal selects 2 of those 'rows', and transpose to clean up. Another einsum expresses that cleanup nicely:
In [122]: np.einsum('iik->ik',np.dot(b,a))
Since np.dot is producing a larger array than the original einsum, it is unlikely to be faster, even if the underlying C code is tighter.
(Curiously I'm having trouble replicating np.dot(b,a) with einsum; it won't generate that (2,2,...) array).
For the a,a case we have to do something similar - roll the axes of one array so the last dimension lines up with the 2nd to last of the other, do the dot, and then cleanup with diagonal and transpose:
In [157]: np.einsum('ijx,ijy->ixy',a,a).shape
Out[157]: (2, 4, 4)
In [158]: np.einsum('ijjx->jix',np.dot(np.rollaxis(a,2),a))
In [176]: np.diagonal(np.dot(np.rollaxis(a,2),a),0,2).T
tensordot is another way of taking a dot over selected axes.
np.tensordot(a,a,(1,1))
np.diagonal(np.rollaxis(np.tensordot(a,a,(1,1)),1),0,2).T # with cleanup