Aggregating numpy masked arrays - python

I'm working with numpy masked arrays and there is a trivial operation I cannot figure out how to do it in a simple way. If I have two masked arrays, how can I get them aggregated into another array that contains only the unmasked values?
In [1]: import numpy as np
In [2]: np.ma.array([1, 2, 3], mask = [0,1,1])
Out[2]:
masked_array(data = [1 -- --],
mask = [False True True],
fill_value = 999999)
In [3]: np.ma.array([4, 5, 6], mask = [1,1,0])
Out[3]:
masked_array(data = [-- -- 6],
mask = [ True True False],
fill_value = 999999)
Which operation should I apply to the previous arrays if I want to get:
masked_array(data = [1 -- 6],
mask = [False True False],
fill_value = 999999)

Stack the masks and the arrays using numpy.dstack and create a new masked array and then you can get the required output using numpy.prod:
>>> a1 = np.ma.array([1, 2, 3], mask = [0,1,1])
>>> a2 = np.ma.array([7, 8, 9], mask = [1,1,0])
>>> arr = np.ma.array(np.dstack((a1, a2)), mask=np.dstack((a1.mask, a2.mask)))
>>> np.prod(arr[0], axis=1)
masked_array(data = [1 -- 9],
mask = [False True False],
fill_value = 999999)

Related

Numpy masking 3D array using np.where() index

I created an index based on several conditions
transition = np.where((rain>0) & (snow>0) & (graup>0) & (xlat<53.) & (xlat>49.) & (xlon<-114.) & (xlon>-127.)) #indexes the grids where there are transitions
with the shape of (3,259711) that looks like the following:
array([[ 0, 0, 0, ..., 47, 47, 47], #hour
[847, 847, 848, ..., 950, 950, 951], #lat gridpoint
[231, 237, 231, ..., 200, 201, 198]]) #lon gridpoint
I have several other variables (e.g. temp) with the shape of (48, 1015, 1359) corresponding to hour, lat, lon.
Seeing as the index are my valid gridpoints, how do I mask all the variables, like temp so that it retains the (48,1015,1359) shape, but masks the values outside the index.
In [90]: arr = np.arange(24).reshape(6,4)
In [91]: keep = (arr % 3)==1
In [92]: keep
Out[92]:
array([[False, True, False, False],
[ True, False, False, True],
[False, False, True, False],
[False, True, False, False],
[ True, False, False, True],
[False, False, True, False]], dtype=bool)
In [93]: np.where(keep)
Out[93]:
(array([0, 1, 1, 2, 3, 4, 4, 5], dtype=int32),
array([1, 0, 3, 2, 1, 0, 3, 2], dtype=int32))
Simple application of the keep mask gives a 1d array of the desired values. I could also index with the where tuple.
In [94]: arr[keep]
Out[94]: array([ 1, 4, 7, 10, 13, 16, 19, 22])
With keep, or rather it's boolean inverse, I can make a masked array:
In [95]: np.ma.masked_array(arr,mask=~keep)
Out[95]:
masked_array(data =
[[-- 1 -- --]
[4 -- -- 7]
[-- -- 10 --]
[-- 13 -- --]
[16 -- -- 19]
[-- -- 22 --]],
mask =
[[ True False True True]
[False True True False]
[ True True False True]
[ True False True True]
[False True True False]
[ True True False True]],
fill_value = 999999)
np.ma.masked_where(~keep, arr) does the same thing - just a different argument order. It still expects the boolean mask array.
I can do the same starting with the where tuple:
In [105]: idx = np.where(keep)
In [106]: mask = np.ones_like(arr, dtype=bool)
In [107]: mask[idx] = False
In [108]: np.ma.masked_array(arr, mask=mask)
There may be something in the np.ma class that does this with one call, but it will have to do the same sort of construction.
This also works:
x = np.ma.masked_all_like(arr)
x[idx] = arr[idx]

Numpy masked_array sum

I would expect the result of a summation for a fully masked array to be zero, but instead "masked" is returned. How can I get the function to return zero?
>>> a = np.asarray([1, 2, 3, 4])
>>> b = np.ma.masked_array(a, mask=~(a > 2))
>>> b
masked_array(data = [-- -- 3 4],
mask = [ True True False False],
fill_value = 999999)
>>> b.sum()
7
>>> b = np.ma.masked_array(a, mask=~(a > 5))
>>> b
masked_array(data = [-- -- -- --],
mask = [ True True True True],
fill_value = 999999)
>>> b.sum()
masked
>>> np.ma.sum(b)
masked
>>>
Here's another unexpected thing:
>>> b.sum() + 3
masked
In your last case:
In [197]: bs=b1.sum()
In [198]: bs.data
Out[198]: array(0.0)
In [199]: bs.mask
Out[199]: array(True, dtype=bool)
In [200]: repr(bs)
Out[200]: 'masked'
In [201]: str(bs)
Out[201]: '--'
If I specify keepdims, I get a different array:
In [208]: bs=b1.sum(keepdims=True)
In [209]: bs
Out[209]:
masked_array(data = [--],
mask = [ True],
fill_value = 999999)
In [210]: bs.data
Out[210]: array([0])
In [211]: bs.mask
Out[211]: array([ True], dtype=bool)
here's the relevant part of the sum code:
def sum(self, axis=None, dtype=None, out=None, keepdims=np._NoValue):
kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}
_mask = self._mask
newmask = _check_mask_axis(_mask, axis, **kwargs)
# No explicit output
if out is None:
result = self.filled(0).sum(axis, dtype=dtype, **kwargs)
rndim = getattr(result, 'ndim', 0)
if rndim:
result = result.view(type(self))
result.__setmask__(newmask)
elif newmask:
result = masked
return result
....
It's the
newmask = np.ma.core._check_mask_axis(b1.mask, axis=None)
...
elif newmask: result = masked
lines that produce the masked value in your case. newmask is True in the case where all values are masked, and False is some are not. The choice to return np.ma.masked is deliberate.
The core of the calculation is:
In [218]: b1.filled(0).sum()
Out[218]: 0
the rest of the code decides whether to return a scalar or masked array.
============
And for your addition:
In [232]: np.ma.masked+3
Out[232]: masked
It looks like the np.ma.masked is a special array that propagates itself across calculations. Sort of like np.nan.

numpy mask for 2d array with all values in 1d array

I want to convert a 2d matrix of dates to boolean matrix based on dates in a 1d matrix. i.e.,
[[20030102, 20030102, 20070102],
[20040102, 20040102, 20040102].,
[20050102, 20050102, 20050102]]
should become
[[True, True, False],
[False, False, False].,
[True, True, True]]
if I provide a 1d array [20010203, 20030102, 20030501, 20050102, 20060101]
import numpy as np
dateValues = np.array(
[[20030102, 20030102, 20030102],
[20040102, 20040102, 20040102],
[20050102, 20050102, 20050102]])
requestedDates = [20010203, 20030102, 20030501, 20050102, 20060101]
ix = np.in1d(dateValues.ravel(), requestedDates).reshape(dateValues.shape)
print(ix)
Returns:
[[ True True True]
[False False False]
[ True True True]]
Refer to numpy.in1d for more information (documentation):
http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html
a = np.array([[20030102, 20030102, 20070102],
[20040102, 20040102, 20040102],
[20050102, 20050102, 20050102]])
b = np.array([20010203, 20030102, 20030501, 20050102, 20060101])
>>> a.shape
(3, 3)
>>> b.shape
(5,)
>>>
For the comparison, you need to broadcast b onto a by adding an axis to a. - this compares each element of a with each element of b
>>> mask = a[...,None] == b
>>> mask.shape
(3, 3, 5)
>>>
Then use np.any() to see if there are any matches
>>> np.any(mask, axis = 2, keepdims = False)
array([[ True, True, False],
[False, False, False],
[ True, True, True]], dtype=bool)
timeit.Timer comparison with in1d:
>>>
>>> t = Timer("np.any(a[...,None] == b, axis = 2)","from __main__ import np, a, b")
>>> t.timeit(10000)
0.13268041338812964
>>> t = Timer("np.in1d(a.ravel(), b).reshape(a.shape)","from __main__ import np, a, b")
>>> t.timeit(10000)
0.26060646913566643
>>>

Add together two numpy masked arrays

Is there a convenient way to add another array with actual values to masked positions in another array?
import numpy as np
arr1 = np.ma.array([0,1,0], mask=[True, False, True])
arr2 = np.ma.array([2,3,0], mask=[False, False, True])
arr1+arr2
Out[4]:
masked_array(data = [-- 4 --],
mask = [ True False True],
fill_value = 999999)
Note: in arr2 the value 2 is not masked -> should be in the resulting array
The result should be [2, 4, --]. I'd think there must be an easy solution for this?
Try this (choosing the logical operator that you want to use for your masks from http://docs.python.org/3/library/operator.html)
>>> from operator import and_
>>> np.ma.array(arr1.data+arr2.data,mask=map(and_,arr1.mask,arr2.mask))
masked_array(data = [2 4 --],
mask = [False False True],
fill_value = 999999)
In Python 3, map() returns an iterator and not a list, so it is necessary to add list():
>>> np.ma.array(arr1.data+arr2.data,mask=list(map(and_,arr1.mask,arr2.mask)))

Check if values in a set are in a numpy array in python

I want to check if a NumPyArray has values in it that are in a set, and if so set that area in an array = 1. If not set a keepRaster = 2.
numpyArray = #some imported array
repeatSet= ([3, 5, 6, 8])
confusedRaster = numpyArray[numpy.where(numpyArray in repeatSet)]= 1
Yields:
<type 'exceptions.TypeError'>: unhashable type: 'numpy.ndarray'
Is there a way to loop through it?
for numpyArray
if numpyArray in repeatSet
confusedRaster = 1
else
keepRaster = 2
To clarify and ask for a bit further help:
What I am trying to get at, and am currently doing, is putting a raster input into an array. I need to read values in the 2-d array and create another array based on those values. If the array value is in a set then the value will be 1. If it is not in a set then the value will be derived from another input, but I'll say 77 for now. This is what I'm currently using. My test input has about 1500 rows and 3500 columns. It always freezes at around row 350.
for rowd in range(0, width):
for cold in range (0, height):
if numpyarray.item(rowd,cold) in repeatSet:
confusedArray[rowd][cold] = 1
else:
if numpyarray.item(rowd,cold) == 0:
confusedArray[rowd][cold] = 0
else:
confusedArray[rowd][cold] = 2
In versions 1.4 and higher, numpy provides the in1d function.
>>> test = np.array([0, 1, 2, 5, 0])
>>> states = [0, 2]
>>> np.in1d(test, states)
array([ True, False, True, False, True], dtype=bool)
You can use that as a mask for assignment.
>>> test[np.in1d(test, states)] = 1
>>> test
array([1, 1, 1, 5, 1])
Here are some more sophisticated uses of numpy's indexing and assignment syntax that I think will apply to your problem. Note the use of bitwise operators to replace if-based logic:
>>> numpy_array = numpy.arange(9).reshape((3, 3))
>>> confused_array = numpy.arange(9).reshape((3, 3)) % 2
>>> mask = numpy.in1d(numpy_array, repeat_set).reshape(numpy_array.shape)
>>> mask
array([[False, False, False],
[ True, False, True],
[ True, False, True]], dtype=bool)
>>> ~mask
array([[ True, True, True],
[False, True, False],
[False, True, False]], dtype=bool)
>>> numpy_array == 0
array([[ True, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> numpy_array != 0
array([[False, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
>>> confused_array[mask] = 1
>>> confused_array[~mask & (numpy_array == 0)] = 0
>>> confused_array[~mask & (numpy_array != 0)] = 2
>>> confused_array
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Another approach would be to use numpy.where, which creates a brand new array, using values from the second argument where mask is true, and values from the third argument where mask is false. (As with assignment, the argument can be a scalar or an array of the same shape as mask.) This might be a bit more efficient than the above, and it's certainly more terse:
>>> numpy.where(mask, 1, numpy.where(numpy_array == 0, 0, 2))
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Here is one possible way of doing what you whant:
numpyArray = np.array([1, 8, 35, 343, 23, 3, 8]) # could be n-Dimensional array
repeatSet = np.array([3, 5, 6, 8])
mask = (numpyArray[...,None] == repeatSet[None,...]).any(axis=-1)
print mask
>>> [False True False False False True True]
In recent numpy you could use a combination of np.isin and np.where to achieve this result. The first method outputs a boolean numpy array that evaluates to True where its vlaues are equal to an array-like specified test element (see doc), while with the second you could create a new array that set some a value where the specified confition evaluates to True and another value where False.
Example
I'll make an example with a random array but using the specific values you provided.
import numpy as np
repeatSet = ([2, 5, 6, 8])
arr = np.array([[1,5,1],
[0,1,0],
[0,0,0],
[2,2,2]])
out = np.where(np.isin(arr, repeatSet), 1, 77)
> out
array([[77, 1, 77],
[77, 77, 77],
[77, 77, 77],
[ 1, 1, 1]])

Categories

Resources