If you are building a masked array, its :
class myclass(object):
def __init__(self, data, mask):
self.masked_array = numpy.ma(data, mask=mask)
What I would like is for mask and data to change when I change the masked array. Like:
data = [1,2,3]
mask = [True, False, False]
c = myclass(data, mask)
c.masked_array.mask[0] = False # this will not change mask
The obvious answer is to link the after building the object:
c = myclass(data, mask)
data = c.masked_array.data
mask = c.masker_array.mask
And, although it works, in my non-simplified problem it is quite a hack to do just for this. Any other options?
I am using numpy 1.10.1 and python 2.7.9.
The mask is itself a numpy array, so when you give a list as the mask, the values in the mask must be copied into a new array. Instead of using a list, pass in a numpy array as the mask.
For example, here are two arrays that we'll use to construct the masked array:
In [38]: data = np.array([1, 2, 3])
In [39]: mask = np.array([True, False, False])
Create our masked array:
In [40]: c = ma.masked_array(data, mask=mask)
In [41]: c
Out[41]:
masked_array(data = [-- 2 3],
mask = [ True False False],
fill_value = 999999)
Change c.mask in-place, and see that mask is also changed:
In [42]: c.mask[0] = False
In [43]: mask
Out[43]: array([False, False, False], dtype=bool)
It is worth noting that the masked_array constructor has the argument copy. If copy is False (the default), the constructor doesn't copy the input arrays, and instead uses the given references (but it can't do that if the inputs are not already numpy arrays). If you use copy=True, then even input arrays will be copied--but that's not what you want.
Related
I am trying to modify a masked pytorch tensor inside a function.
I observe the same behaviour for numpy arrays.
from torch import tensor
def foo(x):
"""
Minimal example.
The actual function is complex.
"""
x *= -1
y = tensor([1,2,3])
mask = [False, True, False]
foo(y[mask])
print(y)
# Result: tensor([1, 2, 3]). Expected: tensor([1, -2, 3])
There are two obvious solutions that I can think of. Both have shortcomings I would like to avoid.
def foo1(x):
return -x
y = tensor([1,2,3])
mask = [False, True, False]
y[mask] = foo1(y[mask])
This creates an copy of y[mask], which is not ideal for my RAM-bound application.
def foo2(x, m):
x[m] *= -1
y = tensor([1,2,3])
mask = [False, True, False]
foo2(y, mask)
This works without a copy, but makes the function messy. It has to be aware of the mask and types. E.g. it won't work directly on scalars.
What is the idiomatic way to handle this problem?
This is a problem I've run into when developing something, and it's a hard question to phrase. So it's best with an simple example:
Imagine you have 4 random number generators which generate an array of size 4:
[rng-0, rng-1, rng-2, rng-3]
| | | |
[val0, val1, val2, val3]
Our goal is to loop through "generations" of arrays populated by these RNGs, and iteratively mask out the RNG which outputted the maximum value.
So an example might be starting out with:
mask = [False, False, False, False], arr = [0, 10, 1, 3], and so we would mask out rng-1.
Then the next iteration could be: mask = [False, True, False, False], arr = [2, 1, 9] (before it gets asked, yes arr HAS to decrease in size with each rng that is masked out). In this case, it is clear that rng-3 should be masked out (e.g. mask[3] = True), but since arr is now of different size than mask, it is tricky to get the right indexing for setting the mask (since the max of arr is at index 2 of the arr, but the corresponding generator is index 3). This problem grows more an more difficult as more generators get masked out (in my case I'm dealing with a mask of size ~30).
If it helps, here is python version of the example:
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
for _ in range(mask.size):
arr = rng.randint(100, size=~mask.sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx = unadjusted_max_value_idx + ????
mask[adjusted_max_value_idx] = True
Any idea a good way to map the index of the max value in the arr to the corresponding index in the mask? (i.e. moving from unadjusted_max_value_idx to adjusted_max_value_idx)
#use a helper list
rng = np.random.RandomState(42)
mask = np.zeros(10, dtype=bool) # True if generator is being masked
ndxLst=list(range(mask.size))
maskHistory=[]
for _ in range(mask.size):
arr = rng.randint(100, size=(~mask).sum())
unadjusted_max_value_idx = arr.argmax()
adjusted_max_value_idx=ndxLst.pop(unadjusted_max_value_idx)
mask[adjusted_max_value_idx] = True
maskHistory.append(adjusted_max_value_idx)
print(maskHistory)
print(mask)
I have a 3D image scan (shape: 335x306x306, total elements: 31368060) and I want to mask it with a 3D boolean mask of the same size to return a masked image of the same size.
When I simply index the array with the mask as so:
masked_image = image_pix[mask]
I get a 1D array of the image pixel values where the mask is = 1 ordered by standard row-major (C-style) order (as explained here). It only has 6953600 elements because of the masking.
So how do I reshape this 1D array back into the 3D array if I don't have the indices? I realize that I can use the indices of the mask itself to iteratively populate a 3D array with the masked values, but I am hoping there is a more elegant (and computationally efficient) solution that doesn't rely on for loops.
Use np.ma.MaskedArray:
marr = np.ma.array(image_pix, mask=mask)
The "normal" indexing with [mask] removes all masked values so there is no garantuee that it can be reshaped into 3D again (because it lost items) so that's not possible.
However MaskedArrays keep their shape:
>>> import numpy as np
>>> arr = np.random.randint(0, 10, 16).reshape(4, 4)
>>> marr = np.ma.array(arr, mask=arr>6)
>>> marr.shape
(4, 4)
>>> marr
masked_array(data =
[[3 -- 0 1]
[4 -- 6 --]
[2 -- 6 0]
[4 5 0 0]],
mask =
[[False True False False]
[False True False True]
[False True False False]
[False False False False]],
fill_value = 999999)
I just thought about this for a little while longer and realized that I can accomplish this by logical indexing.
masked_image = image_pix # define the masked image as the full image
masked_image[mask==0] = 0 # define the pixels where mask == 0 as 0
That was easy...
I'm working with masked arrays and I want to calculate the max of different arrays/columns. I have problems, if the whole array is masked.
Example:
import numpy as np
x = np.ma.array(np.array([1,2,3,4,100]),mask=[True,True,True, True, True])
y = 5
print(np.max(np.hstack((x, y))))
print np.max((np.max(y), np.max(x)))
print(np.max((np.hstack((np.max(x), 5)))))
Results:
100
nan
--
I find the result odd, because the result should be 5. Why is hstack() ignoring the
mask of the masked array?
With masked arrays, you need to use masked routines, that is numpy.ma. should precede method name:
>>> np.ma.hstack((x, y))
masked_array(data = [-- -- -- -- -- 5],
mask = [ True True True True True False],
fill_value = 999999)
>>> np.ma.max(np.ma.hstack((x, y)))
5
I used masked arrays all the time in my work, but one problem I have is that the initialization of masked arrays is a bit clunky. Specifically, the ma.zeros() and ma.empty() return masked arrays with a mask that doesn't match the array dimension. The reason I want this is so that if I don't assign to a particular element of my array, it is masked by default.
In [4]: A=ma.zeros((3,))
...
masked_array(data = [ 0. 0. 0.],
mask = False,
fill_value = 1e+20)
I can subsequently assign the mask:
In [6]: A.mask=ones((3,))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But why should I have to use two lines to initialize and array? Alternatively, I can ignore the ma.zeros() functionality and specify the mask and data in one line:
In [8]: A=ma.masked_array(zeros((3,)),mask=ones((3,)))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But I think this is also clunky. I have trawled through the numpy.ma documentation but I can't find a neat way of dealing with this. Have I missed something obvious?
Well, the mask in ma.zeros is actually a special constant, ma.nomask, that corresponds to np.bool_(False). It's just a placeholder telling NumPy that the mask hasn't been set.
Using nomask actually speeds up np.ma significantly: no need to keep track of where the masked values are if we know beforehand that there are none.
The best approach is not to set your mask explicitly if you don't need it and leave np.ma set it when needed (ie, when you end up trying to take the log of a negative number).
Side note #1: to set the mask to an array of False with the same shape as your input, use
np.ma.array(..., mask=False)
That's easier to type. Note that it's really the Python False, not np.ma.nomask... Similarly, use mask=True to force all your inputs to be masked (ie, mask will be a bool ndarray full of True, with the same shape as the data).
Side note #2:
If you need to set the mask after initialization, you shouldn't use an assignment to .mask but assign to the special value np.ma.masked, it's safer:
a[:] = np.ma.masked
Unfortunately your Side note#2 recommendation breaks for an array with more than one dimension:
a = ma.zeros( (2,2) )
a[0][0] = ma.masked
a
masked_array(data =
[[ 0. 0.]
[ 0. 0.]],
mask =
False,
fill_value = 1e+20)
Like the OP, I haven't found a neat way around this. Masking a whole row will initialise the mask properly:
a[0] = ma.masked
a
masked_array(data =
[[-- --]
[0.0 0.0]],
mask =
[[ True True]
[False False]],
fill_value = 1e+20)
but if this isn't what you want to do you then have to do a[0] = ma.nomask to undo it. Doing a[0] = ma.nomask immediately after a = ma.zeros( (2,2) ) has no effect.