I'm working with masked arrays and I want to calculate the max of different arrays/columns. I have problems, if the whole array is masked.
Example:
import numpy as np
x = np.ma.array(np.array([1,2,3,4,100]),mask=[True,True,True, True, True])
y = 5
print(np.max(np.hstack((x, y))))
print np.max((np.max(y), np.max(x)))
print(np.max((np.hstack((np.max(x), 5)))))
Results:
100
nan
--
I find the result odd, because the result should be 5. Why is hstack() ignoring the
mask of the masked array?
With masked arrays, you need to use masked routines, that is numpy.ma. should precede method name:
>>> np.ma.hstack((x, y))
masked_array(data = [-- -- -- -- -- 5],
mask = [ True True True True True False],
fill_value = 999999)
>>> np.ma.max(np.ma.hstack((x, y)))
5
Related
This is a problem I've run into when developing something, and it's a hard question to phrase. So it's best with an simple example:
Imagine you have 4 random number generators which generate an array of size 4:
[rng-0, rng-1, rng-2, rng-3]
| | | |
[val0, val1, val2, val3]
Our goal is to loop through "generations" of arrays populated by these RNGs, and iteratively mask out the RNG which outputted the maximum value.
So an example might be starting out with:
mask = [False, False, False, False], arr = [0, 10, 1, 3], and so we would mask out rng-1.
Then the next iteration could be: mask = [False, True, False, False], arr = [2, 1, 9] (before it gets asked, yes arr HAS to decrease in size with each rng that is masked out). In this case, it is clear that rng-3 should be masked out (e.g. mask[3] = True), but since arr is now of different size than mask, it is tricky to get the right indexing for setting the mask (since the max of arr is at index 2 of the arr, but the corresponding generator is index 3). This problem grows more an more difficult as more generators get masked out (in my case I'm dealing with a mask of size ~30).
If it helps, here is python version of the example:
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
for _ in range(mask.size):
arr = rng.randint(100, size=~mask.sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx = unadjusted_max_value_idx + ????
mask[adjusted_max_value_idx] = True
Any idea a good way to map the index of the max value in the arr to the corresponding index in the mask? (i.e. moving from unadjusted_max_value_idx to adjusted_max_value_idx)
#use a helper list
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
ndxLst=list(range(mask.size))
maskHistory=[]
for _ in range(mask.size):
arr = rng.randint(100, size=(~mask).sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx=ndxLst.pop(unadjusted_max_value_idx)
mask[adjusted_max_value_idx] = True
maskHistory.append(adjusted_max_value_idx)
print(maskHistory)
print(mask)
Say I have some matrix, W = MxN and a long array of indices z with shape of Mx1.
Now, assume I'd like to sum up the element of each row in W, excluding the index appears for that row in z.
1-d example:
import numpy as np
W = np.array([1.0, 2.0, 8.0])
z = 2
np.sum(np.delete(W,z))
MxN example and desired output:
import numpy as np
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2]).reshape(2,1)
# desired output
# [10. 20.]
I tried to use np.delete and axis=1 with no success
I managed to get around it using tricks like:
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2])
W[np.arange(z.shape[0]), z]=0
print(np.sum(W, axis=1))
# [10. 20.]
but I'm wondering if there's a more elegant way.
Using broadcasting to get the mask to simulate deletion and then sum-reduce -
(W*(z != np.arange(W.shape[-1]))).sum(-1)
Sample runs -
For 2D case :
In [61]: W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
...: z = np.array([0,2]).reshape(2,1)
In [62]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[62]: array([10., 20.])
Works just as well for the 1D case :
In [59]: W = np.array([1.0, 2.0, 8.0])
...: z = 2
In [60]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[60]: 3.0
For 2D case :
With np.einsum for the sum-reduction -
In [53]: np.einsum('ij,ij->i',W,z != np.arange(W.shape[1]))
Out[53]: array([10., 20.])
Summing and then subtracting the z-indexed values for 2D case -
In [134]: W.sum(1) - np.take_along_axis(W,z,axis=1).squeeze(1)
Out[134]: array([10., 20.])
Extend to handle both 2D and 1D cases -
W.sum(-1)-np.take_along_axis(W,np.atleast_1d(z),axis=-1).squeeze(-1)
#Divaka answers are pretty good. I just give another perspective on your question. If you need masking to ignore certain indices and doing multiple operations on array, you should use numpy masked array np.ma.array instead of regular np.array. Masked array is truly for the purpose of ignore certain indices.
document of masked array for more info
z = np.array([0,2]).reshape(2,1)
W_ma = np.ma.array(W, mask=z == np.arange(W.shape[-1]))
In [36]: W_ma
Out[36]:
masked_array(
data=[[--, 2.0, 8.0],
[5.0, 15.0, --]],
mask=[[ True, False, False],
[False, False, True]],
fill_value=1e+20)
From this W_ma masked array, you may do almost all operations the same as np.array. For sum
W_ma.sum(1)
Out[44]:
masked_array(data=[10.0, 20.0],
mask=[False, False],
fill_value=1e+20)
To turn masked array to regular array, you may use compressed, filled, or compress_rowcols
In [46]: W_ma.sum(1).compressed()
Out[46]: array([10., 20.])
Note: I emphasize masked array is useful when you do multiple operations on ignore indices. If you only need to do one or two operations on ignore indices, there is no point to use masked array.
I have a 3D image scan (shape: 335x306x306, total elements: 31368060) and I want to mask it with a 3D boolean mask of the same size to return a masked image of the same size.
When I simply index the array with the mask as so:
masked_image = image_pix[mask]
I get a 1D array of the image pixel values where the mask is = 1 ordered by standard row-major (C-style) order (as explained here). It only has 6953600 elements because of the masking.
So how do I reshape this 1D array back into the 3D array if I don't have the indices? I realize that I can use the indices of the mask itself to iteratively populate a 3D array with the masked values, but I am hoping there is a more elegant (and computationally efficient) solution that doesn't rely on for loops.
Use np.ma.MaskedArray:
marr = np.ma.array(image_pix, mask=mask)
The "normal" indexing with [mask] removes all masked values so there is no garantuee that it can be reshaped into 3D again (because it lost items) so that's not possible.
However MaskedArrays keep their shape:
>>> import numpy as np
>>> arr = np.random.randint(0, 10, 16).reshape(4, 4)
>>> marr = np.ma.array(arr, mask=arr>6)
>>> marr.shape
(4, 4)
>>> marr
masked_array(data =
[[3 -- 0 1]
[4 -- 6 --]
[2 -- 6 0]
[4 5 0 0]],
mask =
[[False True False False]
[False True False True]
[False True False False]
[False False False False]],
fill_value = 999999)
I just thought about this for a little while longer and realized that I can accomplish this by logical indexing.
masked_image = image_pix # define the masked image as the full image
masked_image[mask==0] = 0 # define the pixels where mask == 0 as 0
That was easy...
If you are building a masked array, its :
class myclass(object):
def __init__(self, data, mask):
self.masked_array = numpy.ma(data, mask=mask)
What I would like is for mask and data to change when I change the masked array. Like:
data = [1,2,3]
mask = [True, False, False]
c = myclass(data, mask)
c.masked_array.mask[0] = False # this will not change mask
The obvious answer is to link the after building the object:
c = myclass(data, mask)
data = c.masked_array.data
mask = c.masker_array.mask
And, although it works, in my non-simplified problem it is quite a hack to do just for this. Any other options?
I am using numpy 1.10.1 and python 2.7.9.
The mask is itself a numpy array, so when you give a list as the mask, the values in the mask must be copied into a new array. Instead of using a list, pass in a numpy array as the mask.
For example, here are two arrays that we'll use to construct the masked array:
In [38]: data = np.array([1, 2, 3])
In [39]: mask = np.array([True, False, False])
Create our masked array:
In [40]: c = ma.masked_array(data, mask=mask)
In [41]: c
Out[41]:
masked_array(data = [-- 2 3],
mask = [ True False False],
fill_value = 999999)
Change c.mask in-place, and see that mask is also changed:
In [42]: c.mask[0] = False
In [43]: mask
Out[43]: array([False, False, False], dtype=bool)
It is worth noting that the masked_array constructor has the argument copy. If copy is False (the default), the constructor doesn't copy the input arrays, and instead uses the given references (but it can't do that if the inputs are not already numpy arrays). If you use copy=True, then even input arrays will be copied--but that's not what you want.
I used masked arrays all the time in my work, but one problem I have is that the initialization of masked arrays is a bit clunky. Specifically, the ma.zeros() and ma.empty() return masked arrays with a mask that doesn't match the array dimension. The reason I want this is so that if I don't assign to a particular element of my array, it is masked by default.
In [4]: A=ma.zeros((3,))
...
masked_array(data = [ 0. 0. 0.],
mask = False,
fill_value = 1e+20)
I can subsequently assign the mask:
In [6]: A.mask=ones((3,))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But why should I have to use two lines to initialize and array? Alternatively, I can ignore the ma.zeros() functionality and specify the mask and data in one line:
In [8]: A=ma.masked_array(zeros((3,)),mask=ones((3,)))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But I think this is also clunky. I have trawled through the numpy.ma documentation but I can't find a neat way of dealing with this. Have I missed something obvious?
Well, the mask in ma.zeros is actually a special constant, ma.nomask, that corresponds to np.bool_(False). It's just a placeholder telling NumPy that the mask hasn't been set.
Using nomask actually speeds up np.ma significantly: no need to keep track of where the masked values are if we know beforehand that there are none.
The best approach is not to set your mask explicitly if you don't need it and leave np.ma set it when needed (ie, when you end up trying to take the log of a negative number).
Side note #1: to set the mask to an array of False with the same shape as your input, use
np.ma.array(..., mask=False)
That's easier to type. Note that it's really the Python False, not np.ma.nomask... Similarly, use mask=True to force all your inputs to be masked (ie, mask will be a bool ndarray full of True, with the same shape as the data).
Side note #2:
If you need to set the mask after initialization, you shouldn't use an assignment to .mask but assign to the special value np.ma.masked, it's safer:
a[:] = np.ma.masked
Unfortunately your Side note#2 recommendation breaks for an array with more than one dimension:
a = ma.zeros( (2,2) )
a[0][0] = ma.masked
a
masked_array(data =
[[ 0. 0.]
[ 0. 0.]],
mask =
False,
fill_value = 1e+20)
Like the OP, I haven't found a neat way around this. Masking a whole row will initialise the mask properly:
a[0] = ma.masked
a
masked_array(data =
[[-- --]
[0.0 0.0]],
mask =
[[ True True]
[False False]],
fill_value = 1e+20)
but if this isn't what you want to do you then have to do a[0] = ma.nomask to undo it. Doing a[0] = ma.nomask immediately after a = ma.zeros( (2,2) ) has no effect.