import numpy as np
ts = np.random.rand(40,45,40,1000)
mask = np.random.randint(2, size=(40,45,40),dtype=bool)
#creating a masked array
ts_m = np.ma.array(ts, mask=ts*~mask[:,:,:,np.newaxis])
#demeaning
ts_md = ts_m - ts_m.mean(axis=3)[:,:,:,np.newaxis]
#standardisation
ts_mds = ts_md / ts_md.std(ddof=1,axis=3)[:,:,:,np.newaxis]
I would like to demean ts (along axis 3), and divide by its standard deviation (along axis 3), all within the mask.
Am I doing this correctly ?
Is there a faster method ?
You have a couple of options available to you.
The first is to use masked arrays as you are doing, but provide a proper mask and use the masked functions. Right now, your code is computing all the means and standard deviations, and slapping a mask on the result. To skip masked elements, use np.ma.mean and np.ma.std, and thereby avoid doing a whole lot of extra work.
As you correctly understood, the size of the mask must match that of the data. While multiplying by the data gives you the correct size, it is expensive and gives the wrong result in the general case since the mask will be zero whenever either data or mask is zero. A better approach would be to create a view of the mask repeated along the last (new) dimension. You can use np.broadcast_to if you get the trailing dimensions to match up first:
ts = np.random.rand(40, 45, 40, 1000)
mask = np.random.randint(2, size=(40, 45, 40), dtype=np.bool)
#creating a masked array
ts_m = np.ma.array(ts, mask=np.broadcast_to(mask[..., None], ts.shape)
#demeaning
ts_md = ts_m - np.ma.mean(ts_m, axis=3)[..., None]
#standardisation
ts_mds = ts_md / np.ma.std(ts_m, ddof=1,axis=3)[..., None]
The mask is read only, and because it likely has a dimension with zero stride, can sometimes do unexpected things. The broadcasted version here is roughly equivalent to
np.lib.stride_tricks.as_strided(mask, ts.shape, (*mask.strides, 0), writeable=False)
Both versions create views to the original data, so are very fast. They just allocate a new array object that points to the existing data, which is not copied. Keep in mind that np.lib.stride_tricks.as_strided is a sledgehammer that should be used with the utmost care. It will crash your interpreted any day if you let it.
Note: The mask in a masked array is interpreted as True being masked, while Boolean indexing arrays are interpreted with False masked. Depending on how it's obtained and it's meaning in your real code, you may want to invert the mask
mask=np.broadcast_to(~mask[..., None], ...)
Another option is to implement the masking yourself. There are two ways you can do that. If you do it up-front, the mask will be applied to the leading dimensions of your data:
ts = np.random.rand(40, 45, 40, 1000)
mask = np.random.randint(2, size=(40, 45, 40), dtype=np.bool)
#creating a masked array
mask = ~mask # optional, see note above
ts_m = ts[mask]
#demeaning
ts_md = ts_m - ts_m.mean(axis=-1)
#standardisation
ts_mds = ts_md / ts_md.std(ddof=1,axis=-1)
# reshaping
result = np.empty_like(ts) # alternatively, np.zeros_like
result[mask] = ts_mds
This option may be cheaper than a masked array because the initial masking step creates a 40*45*40-mask_size x 1000 array, and only replaces it into the masked area of the result when finished, instead of operating on the full sized data and preserving shape.
The third option is only really useful if you have only a small number of elements masked out. It's essentially what your original code is doing: perform all the commutations, and apply the mask to the result.
More Tips
Ellipsis is a special object that means "all the remaining dimensions". It's usually abbreviated ... in slice notation. np.newaxis is an alias for None. Combine those pieces of information, and you get that [: :, :, np.newaxis] can be written more cleanly and elegantly as [..., None]. The latter is more general since it works for an arbitrary number of dimensions.
Numpy allows for negative axis indices. A nicer way to say "last axis" is generally axis=-1.
import numpy as np
ts = np.random.rand(40,45,40,1000)
mask = np.random.randint(2, size=(40,45,40)).astype(bool)
#creating a masked array
ts_m = np.ma.array(ts, mask=np.broadcast_to(~mask.reshape(40,45,40,1),ts.shape))
#demeaning
ts_md = ts_m - ts_m.mean(axis=3)[:,:,:,np.newaxis]
#standardisation
ts_mds = ts_md / ts_md.std(ddof=1,axis=3)[:,:,:,np.newaxis]
Related
I would like to iterate through a subset of dimensions of a numpy array and compare the resulting array elements (which are arrays or the remaining dimension(s)).
The code below does this:
import numpy
def min(h,m):
return h*60+m
exclude_times_default=[min(3,00),min(6,55)]
d=exclude_times_default
exclude_times_wkend=[min(3,00),min(9,00)]
w=exclude_times_wkend;
exclude_times=numpy.array([[[min(3,00),min(6,20)],d,d,d,d,d,[min(3,00),min(6,20)],d,d,[min(3,00),min(6,20)]],
[d,d,d,d,[min(3,00),min(9,30)],[min(3,00),min(9,30)],d,d,d,d],
[[min(20,00),min(7,15)],[min(3,00),min(23,15)],[min(3,00),min(7,15)],[min(3,00),min(7,15)],[min(3,00),min(23,15)],[min(3,00),min(23,15)],d,d,d,d]])
num_level=exclude_times.shape[0]
num_wind=exclude_times.shape[1]
for level in range(num_level):
for window in range(num_wind):
if (exclude_times[level,window,:]==d).all():
print("Default")
exclude_times[level][window]=w
print(level,window,exclude_times[level][window])
The solution does not look very elegant to me, just wondering if there are more elegant solutions.
You can get a 2D mask pinpointing all the window/level combinations set to default like this:
mask = (exclude_times == d[None, None, :]).all(axis=-1)
The expression d[None, None, :] introduces two new axes into a view of d to make it broadcast to the shape of exclude_times properly. Another way to do that would be with an explicit reshape: np.reshape(d, (1, 1, -1)) or d.reshape(1, 1, -1). There are many other ways as well.
The .all(axis=-1) operation reduces the 3D boolean mask along the last axis, giving you a 2D mask indexed be level and window.
To count the number of default entries, use np.countnonzero:
nnz = np.countnonzero(mask)
To count the defaults for each window:
np.countnonzero(mask, axis=0)
To count the defaults for each level:
np.countnonzero(mask, axis=1)
Remember, the axis parameter is the one you reduce, not the one(s) you keep.
Assigning w to the default elements is a bit more complex. The problem is that exclude_times[mask[:, :, None]] is a copy of the original data, and doesn't preserve the shape of the original at all.
You have to do a couple of extra steps to reshape correctly:
exclude_times[mask[:, :, None]] = np.broadcast_to(w[None, :], (nnz, 2)).ravel()
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.
This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.
How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.
More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.
One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:
import numpy as np
M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)
# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False
To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:
M_strided = np.lib.stride_tricks.as_strided(
M, # input array
(700, M.shape[0], M.shape[1]), # output dimensions
(0, M.strides[0], M.strides[1]) # stride length in bytes
)
In NumPy, is there an easy way to broadcast two arrays of dimensions e.g. (x,y) and (x,y,z)? NumPy broadcasting typically matches dimensions from the last dimension, so usual broadcasting will not work (it would require the first array to have dimension (y,z)).
Background: I'm working with images, some of which are RGB (shape (h,w,3)) and some of which are grayscale (shape (h,w)). I generate alpha masks of shape (h,w), and I want to apply the mask to the image via mask * im. This doesn't work because of the above-mentioned problem, so I end up having to do e.g.
mask = mask.reshape(mask.shape + (1,) * (len(im.shape) - len(mask.shape)))
which is ugly. Other parts of the code do operations with vectors and matrices, which also run into the same issue: it fails trying to execute m + v where m has shape (x,y) and v has shape (x,). It's possible to use e.g. atleast_3d, but then I have to remember how many dimensions I actually wanted.
how about use transpose:
(a.T + c.T).T
numpy functions often have blocks of code that check dimensions, reshape arrays into compatible shapes, all before getting down to the core business of adding or multiplying. They may reshape the output to match the inputs. So there is nothing wrong with rolling your own that do similar manipulations.
Don't offhand dismiss the idea of rotating the variable 3 dimension to the start of the dimensions. Doing so takes advantage of the fact that numpy automatically adds dimensions at the start.
For element by element multiplication, einsum is quite powerful.
np.einsum('ij...,ij...->ij...',im,mask)
will handle cases where im and mask are any mix of 2 or 3 dimensions (assuming the 1st 2 are always compatible. Unfortunately this does not generalize to addition or other operations.
A while back I simulated einsum with a pure Python version. For that I used np.lib.stride_tricks.as_strided and np.nditer. Look into those functions if you want more power in mixing and matching dimensions.
as another angle: if you encounter this pattern frequently, it may be useful to create a utility function to enforce right-broadcasting:
def right_broadcasting(arr, target):
return arr.reshape(arr.shape + (1,) * (target.ndim - arr.ndim))
Although if there are only two types of input (already having 3 dims or having only 2), id say the single if statement is preferable.
Indexing with np.newaxis creates a new axis in that place. Ie
xyz = #some 3d array
xy = #some 2d array
xyz_sum = xyz + xy[:,:,np.newaxis]
or
xyz_sum = xyz + xy[:,:,None]
Indexing in this way creates an axis with shape 1 and stride 0 in this location.
Why not just decorate-process-undecorate:
def flipflop(func):
def wrapper(a, mask):
if len(a.shape) == 3:
mask = mask[..., None]
b = func(a, mask)
return np.squeeze(b)
return wrapper
#flipflop
def f(x, mask):
return x * mask
Then
>>> N = 12
>>> gs = np.random.random((N, N))
>>> rgb = np.random.random((N, N, 3))
>>>
>>> mask = np.ones((N, N))
>>>
>>> f(gs, mask).shape
(12, 12)
>>> f(rgb, mask).shape
(12, 12, 3)
Easy, you just add a singleton dimension at the end of the smaller array. For example, if xyz_array has shape (x,y,z) and xy_array has shape (x,y), you can do
xyz_array + np.expand_dims(xy_array, xy_array.ndim)
I am implementing color interpolation using a look-up-table (LUT) with NumPy. At one point I am using the 4 most significant bits of RGB values to choose corresponding CMYK values from a 17x17x17x4 LUT. Right now it looks something like this:
import numpy as np
rgb = np.random.randint(16, size=(3, 1000, 1000))
lut = np.random.randint(256, size=(17, 17, 17, 4))
cmyk = lut[rgb[0], rgb[1], rgb[2]]
Here comes the first question... Is there no better way? It sort of seems natural that you could tell NumPy that the indices for lut are stored along axis 0 of rgb, without having to actually write it out. So is there anything like cmyk = lut.fancier_take(rgb, axis=0) in NumPy?
Furthermore, I am left with an array of shape (1000, 1000, 4), so to be consistent with the input, I need to rotate it all around using a couple of swapaxes:
cmyk = cmyk.swapaxes(2, 1).swapaxes(1, 0).copy()
And I also need to add the copy statement, because if not the resulting array is not contiguous in memory, and that brings trouble later on.
Right now I am leaning towards rotating the LUT before the fancy indexing and then do something along the lines of:
swapped_lut = lut.swapaxes(2, 1).swapaxes(1, 0)
cmyk = swapped_lut[np.arange(4), rgb[0], rgb[1], rgb[2]]
But again, it just does not seem right... There has to be a more elegant way to do this, right? Something like cmyk = lut.even_fancier_take(rgb, in_axis=0, out_axis=0)...
I'd suggest using tuple to force indexing rowwise, and np.rollaxis or transpose instead of swapaxes:
lut[tuple(rgb)].transpose(2, 0, 1).copy()
or
np.rollaxis(lut[tuple(rgb)], 2).copy()
To roll the axis first, use:
np.rollaxis(lut, -1)[(Ellipsis,) + tuple(rgb)]
You'll need to do the following if you swap lut, np.arange(4) will not work:
swapped_lut = np.rollaxis(lut, -1)
cmyk = swapped_lut[:, rgb[0], rgb[1], rgb[2]].copy()
Or you can replace
cmyk = lut[rgb[0], rgb[1], rgb[2]]
cmyk = cmyk.swapaxes(2, 1).swapaxes(1, 0).copy()
with:
cmyk = lut[tuple(rgb)]
cmyk = np.rollaxis(cmyk, -1).copy()
But to try and do it all in one step, ... Maybe:
rng = np.arange(4).reshape(4, 1, 1)
cmyk = lut[rgb[0], rgb[1], rgb[2], rng]
That's not very readable at all is it?
Take a look at the answer to this question, Numpy multi-dimensional array indexing swaps axis order. It does a good job of explaining how numpy broadcasts multiple arrays to get the output size. Here you want to create indices into lut that broadcast to (4, 1000, 1000). Hope that makes some sense.