I used masked arrays all the time in my work, but one problem I have is that the initialization of masked arrays is a bit clunky. Specifically, the ma.zeros() and ma.empty() return masked arrays with a mask that doesn't match the array dimension. The reason I want this is so that if I don't assign to a particular element of my array, it is masked by default.
In [4]: A=ma.zeros((3,))
...
masked_array(data = [ 0. 0. 0.],
mask = False,
fill_value = 1e+20)
I can subsequently assign the mask:
In [6]: A.mask=ones((3,))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But why should I have to use two lines to initialize and array? Alternatively, I can ignore the ma.zeros() functionality and specify the mask and data in one line:
In [8]: A=ma.masked_array(zeros((3,)),mask=ones((3,)))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But I think this is also clunky. I have trawled through the numpy.ma documentation but I can't find a neat way of dealing with this. Have I missed something obvious?
Well, the mask in ma.zeros is actually a special constant, ma.nomask, that corresponds to np.bool_(False). It's just a placeholder telling NumPy that the mask hasn't been set.
Using nomask actually speeds up np.ma significantly: no need to keep track of where the masked values are if we know beforehand that there are none.
The best approach is not to set your mask explicitly if you don't need it and leave np.ma set it when needed (ie, when you end up trying to take the log of a negative number).
Side note #1: to set the mask to an array of False with the same shape as your input, use
np.ma.array(..., mask=False)
That's easier to type. Note that it's really the Python False, not np.ma.nomask... Similarly, use mask=True to force all your inputs to be masked (ie, mask will be a bool ndarray full of True, with the same shape as the data).
Side note #2:
If you need to set the mask after initialization, you shouldn't use an assignment to .mask but assign to the special value np.ma.masked, it's safer:
a[:] = np.ma.masked
Unfortunately your Side note#2 recommendation breaks for an array with more than one dimension:
a = ma.zeros( (2,2) )
a[0][0] = ma.masked
a
masked_array(data =
[[ 0. 0.]
[ 0. 0.]],
mask =
False,
fill_value = 1e+20)
Like the OP, I haven't found a neat way around this. Masking a whole row will initialise the mask properly:
a[0] = ma.masked
a
masked_array(data =
[[-- --]
[0.0 0.0]],
mask =
[[ True True]
[False False]],
fill_value = 1e+20)
but if this isn't what you want to do you then have to do a[0] = ma.nomask to undo it. Doing a[0] = ma.nomask immediately after a = ma.zeros( (2,2) ) has no effect.
Related
I cannot figure out how to set the fill_value of a real masked array to be np.nan. The array is the result of the calculation of two complex maskedarrays. Somehow, the calculated array's fill_value always gets converted to a complex fill_value, when I want a real fill_value. Even if I explicitly set the fill_value, it won't get set to a float. This is triggering ComplexWarnings in my code because it drops the imaginary part later. I am OK with setting the ang.fill_value manually, but it doesn't work.
import numpy as np
ma1 = np.ma.MaskedArray([1.1+1j, 2.2-1j])
ma2 = np.ma.MaskedArray([2.2+1j, 3.3+1j])
ma1.fill_value = np.nan + np.nan*1j
ma2.fill_value = np.nan + np.nan*1j
ang = np.ma.angle(ma1/ma2, deg=True)
ang.fill_value = np.nan
print(ang.fill_value)
<prints out (nan+0j)>
First, I haven't worked with angle (ma or not), and only played with np.ma on and off, mainly for SO questions.
np.angle is python code; np.ma.angle is produced by a generic wrapper on np.angle.
Without studying those, let's experiement.
Your array ratio:
In [34]: ma1/ma2
Out[34]:
masked_array(data=[(0.5856164383561644+0.18835616438356162j),
(0.526492851135408-0.46257359125315395j)],
mask=[False, False],
fill_value=(nan+nanj))
The non-ma version:
In [35]: (ma1/ma2).data
Out[35]: array([0.58561644+0.18835616j, 0.52649285-0.46257359j])
or
In [36]: np.asarray(ma1/ma2)
Out[36]: array([0.58561644+0.18835616j, 0.52649285-0.46257359j])
The angle:
In [37]: np.ma.angle(ma1/ma2, deg=True)
Out[37]:
masked_array(data=[17.829734225677196, -41.30235354815481],
mask=[False, False],
fill_value=(nan+nanj))
The data dtype looks fine, but the fill dtype is complex. Without ma, it's still masked, but with a different fill, and a simple mask:
In [38]: np.angle(ma1/ma2, deg=True)
Out[38]:
masked_array(data=[ 17.82973423, -41.30235355],
mask=False,
fill_value=1e+20)
If we give it the "raw" data:
In [40]: np.angle((ma1/ma2).data, deg=True)
Out[40]: array([ 17.82973423, -41.30235355])
np.ma is not heavily used, so I'm not surprised that there are bugs in details like this, passing the fill and mask through. Especially in a function like this that can take a complex argument, but returns a real result.
If I don't fiddle with the fill values,
In [41]: ma1 = np.ma.MaskedArray([1.1+1j, 2.2-1j])
...: ma2 = np.ma.MaskedArray([2.2+1j, 3.3+1j])
In [42]: ma1/ma2
Out[42]:
masked_array(data=[(0.5856164383561644+0.18835616438356162j),
(0.526492851135408-0.46257359125315395j)],
mask=[False, False],
fill_value=(1e+20+0j))
In [43]: np.ma.angle(ma1/ma2, deg=True)
Out[43]:
masked_array(data=[17.829734225677196, -41.30235354815481],
mask=[False, False],
fill_value=1e+20)
The angle fill is float.
Casting (nan+nanj) to float might be producing some errors or warnings that it doesn't get with (1e+20+0j). Again we'd have to examine the code.
I'm in the process of porting a bunch of Numpy calculations over to TensorFlow. At one stage in my calculations, I use a boolean mask to extract and flatten a subset of values from a large array. The array can have many dimensions, but the boolean mask acts only on the last two dimensions. In Numpy, it looks something like this:
mask = np.array([
[False, True , True , True ],
[True , False, True , True ],
[True , True , False, False],
[True , True , False, False]]
large_array_masked = large_array[..., mask]
I can't figure out how to do the equivalent of this in TensorFlow. I tried:
tf.boolean_mask(large_array, mask, axis = -2)
That doesn't work because tf.boolean_mask() doesn't seem to take negative axis arguments.
As an ugly hack, I tried forcing mask to broadcast to the same shape as large_array using:
mask_broadcast = tf.logical_and(tf.fill(tf.shape(large_array), True), mask)
large_array_masked = tf.boolean_mask(large_array, mask_broadcast)
It appears that mask_broadcast has the shape and value that I want, but I get the error:
ValueError: Number of mask dimensions must be specified, even if some dimensions are None
Presumably this happens because large_array is calculated from inputs and therefore its shape is not static.
Any suggestions?
In general, I've found that in tensorflow you want good known shapes. This is because most operations are matrix multiplications and the matrices are fixed shape.
If you really want to do this you need to convert to sparse tensor and then apply tf.sparse_retain.
The "equivalent" I would normally use in tensorflow is to multiply the mask with the large_array to 0 out the False values (large_array_masked = large_array * mask). This keeps the original shape so it makes it easier to pass to dense layers, etc...
I came up with a hack to solve my narrow problem so I'm posting here, but I'm accepting the answer from #Sorin because it's probably more generally applicable.
To get around the fact that tf.boolean_mask() can only act on the initial indices, I just rolled the indices forward, applied the mask, and then rolled them back. In simplified form, it looks like this:
indices = tf.range(tf.rank(large_array))
large_array_rolled_forward = tf.transpose(
large_array,
tf.concat([indices[-2:], indices[:-2]], axis=0))
large_array_rolled_forward_masked = tf.boolean_mask(
large_array_rolled_forward,
mask)
new_indices = tf.range(tf.rank(large_array_rolled_forward_masked))
large_array_masked = tf.transpose(
large_array_rolled_forward_masked,
tf.concat([new_indices[1:], [0]], axis=0))
I have a 3D image scan (shape: 335x306x306, total elements: 31368060) and I want to mask it with a 3D boolean mask of the same size to return a masked image of the same size.
When I simply index the array with the mask as so:
masked_image = image_pix[mask]
I get a 1D array of the image pixel values where the mask is = 1 ordered by standard row-major (C-style) order (as explained here). It only has 6953600 elements because of the masking.
So how do I reshape this 1D array back into the 3D array if I don't have the indices? I realize that I can use the indices of the mask itself to iteratively populate a 3D array with the masked values, but I am hoping there is a more elegant (and computationally efficient) solution that doesn't rely on for loops.
Use np.ma.MaskedArray:
marr = np.ma.array(image_pix, mask=mask)
The "normal" indexing with [mask] removes all masked values so there is no garantuee that it can be reshaped into 3D again (because it lost items) so that's not possible.
However MaskedArrays keep their shape:
>>> import numpy as np
>>> arr = np.random.randint(0, 10, 16).reshape(4, 4)
>>> marr = np.ma.array(arr, mask=arr>6)
>>> marr.shape
(4, 4)
>>> marr
masked_array(data =
[[3 -- 0 1]
[4 -- 6 --]
[2 -- 6 0]
[4 5 0 0]],
mask =
[[False True False False]
[False True False True]
[False True False False]
[False False False False]],
fill_value = 999999)
I just thought about this for a little while longer and realized that I can accomplish this by logical indexing.
masked_image = image_pix # define the masked image as the full image
masked_image[mask==0] = 0 # define the pixels where mask == 0 as 0
That was easy...
If you are building a masked array, its :
class myclass(object):
def __init__(self, data, mask):
self.masked_array = numpy.ma(data, mask=mask)
What I would like is for mask and data to change when I change the masked array. Like:
data = [1,2,3]
mask = [True, False, False]
c = myclass(data, mask)
c.masked_array.mask[0] = False # this will not change mask
The obvious answer is to link the after building the object:
c = myclass(data, mask)
data = c.masked_array.data
mask = c.masker_array.mask
And, although it works, in my non-simplified problem it is quite a hack to do just for this. Any other options?
I am using numpy 1.10.1 and python 2.7.9.
The mask is itself a numpy array, so when you give a list as the mask, the values in the mask must be copied into a new array. Instead of using a list, pass in a numpy array as the mask.
For example, here are two arrays that we'll use to construct the masked array:
In [38]: data = np.array([1, 2, 3])
In [39]: mask = np.array([True, False, False])
Create our masked array:
In [40]: c = ma.masked_array(data, mask=mask)
In [41]: c
Out[41]:
masked_array(data = [-- 2 3],
mask = [ True False False],
fill_value = 999999)
Change c.mask in-place, and see that mask is also changed:
In [42]: c.mask[0] = False
In [43]: mask
Out[43]: array([False, False, False], dtype=bool)
It is worth noting that the masked_array constructor has the argument copy. If copy is False (the default), the constructor doesn't copy the input arrays, and instead uses the given references (but it can't do that if the inputs are not already numpy arrays). If you use copy=True, then even input arrays will be copied--but that's not what you want.
I'm working with masked arrays and I want to calculate the max of different arrays/columns. I have problems, if the whole array is masked.
Example:
import numpy as np
x = np.ma.array(np.array([1,2,3,4,100]),mask=[True,True,True, True, True])
y = 5
print(np.max(np.hstack((x, y))))
print np.max((np.max(y), np.max(x)))
print(np.max((np.hstack((np.max(x), 5)))))
Results:
100
nan
--
I find the result odd, because the result should be 5. Why is hstack() ignoring the
mask of the masked array?
With masked arrays, you need to use masked routines, that is numpy.ma. should precede method name:
>>> np.ma.hstack((x, y))
masked_array(data = [-- -- -- -- -- 5],
mask = [ True True True True True False],
fill_value = 999999)
>>> np.ma.max(np.ma.hstack((x, y)))
5