I would expect the result of a summation for a fully masked array to be zero, but instead "masked" is returned. How can I get the function to return zero?
>>> a = np.asarray([1, 2, 3, 4])
>>> b = np.ma.masked_array(a, mask=~(a > 2))
>>> b
masked_array(data = [-- -- 3 4],
mask = [ True True False False],
fill_value = 999999)
>>> b.sum()
7
>>> b = np.ma.masked_array(a, mask=~(a > 5))
>>> b
masked_array(data = [-- -- -- --],
mask = [ True True True True],
fill_value = 999999)
>>> b.sum()
masked
>>> np.ma.sum(b)
masked
>>>
Here's another unexpected thing:
>>> b.sum() + 3
masked
In your last case:
In [197]: bs=b1.sum()
In [198]: bs.data
Out[198]: array(0.0)
In [199]: bs.mask
Out[199]: array(True, dtype=bool)
In [200]: repr(bs)
Out[200]: 'masked'
In [201]: str(bs)
Out[201]: '--'
If I specify keepdims, I get a different array:
In [208]: bs=b1.sum(keepdims=True)
In [209]: bs
Out[209]:
masked_array(data = [--],
mask = [ True],
fill_value = 999999)
In [210]: bs.data
Out[210]: array([0])
In [211]: bs.mask
Out[211]: array([ True], dtype=bool)
here's the relevant part of the sum code:
def sum(self, axis=None, dtype=None, out=None, keepdims=np._NoValue):
kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}
_mask = self._mask
newmask = _check_mask_axis(_mask, axis, **kwargs)
# No explicit output
if out is None:
result = self.filled(0).sum(axis, dtype=dtype, **kwargs)
rndim = getattr(result, 'ndim', 0)
if rndim:
result = result.view(type(self))
result.__setmask__(newmask)
elif newmask:
result = masked
return result
....
It's the
newmask = np.ma.core._check_mask_axis(b1.mask, axis=None)
...
elif newmask: result = masked
lines that produce the masked value in your case. newmask is True in the case where all values are masked, and False is some are not. The choice to return np.ma.masked is deliberate.
The core of the calculation is:
In [218]: b1.filled(0).sum()
Out[218]: 0
the rest of the code decides whether to return a scalar or masked array.
============
And for your addition:
In [232]: np.ma.masked+3
Out[232]: masked
It looks like the np.ma.masked is a special array that propagates itself across calculations. Sort of like np.nan.
Related
I have two 2d arrays, one containing float values, one containing bool. I want to create an array containing the mean values of the first matrix for each column considering only the values corresponding to False in the second matrix.
For example:
A = [[1 3 5]
[2 4 6]
[3 1 0]]
B = [[True False False]
[False False False]
[True True False]]
result = [2, 3.5, 3.67]
Where B is False, keep the value of A, make it NaN otherwise and then use the nanmean function which ignores NaN's for operations.
np.nanmean(np.where(~B, A, np.nan), axis=0)
>>> array([2. , 3.5 , 3.66666667])
Using numpy.mean using where argument to specify elements to include in the mean.
np.mean(A, where = ~B, axis = 0)
>>> [2. 3.5 3.66666667]
A = [[1, 3, 5],
[2, 4, 6],
[3, 1, 0]]
B = [[True, False, False],
[False, False, False],
[True, True, False]]
sums = [0]*len(A[0])
amounts = [0]*len(A[0])
for i in range(0, len(A)):
for j in range(0, len(A[0])):
sums[j] = sums[j] + (A[i][j] if not B[i][j] else 0)
amounts[j] = amounts[j] + (1 if not B[i][j] else 0)
result = [sums[i]/amounts[i] for i in range(0, len(sums))]
print(result)
There may be some fancy numpy trick for this, but I think using a list comprehension to construct a new array is the most straightforward.
result = np.array([a_col[~b_col].mean() for a_col, b_col in zip(A.T,B.T)])
To follow better, this is what the line does expanded out:
result=[]
for i in range(len(A)):
new_col = A[:,i][~B[:,i]]
result.append(new_col.mean())
You could also use a masked array:
import numpy as np
result = np.ma.array(A, mask=B).mean(axis=0).filled(fill_value=0)
# Output:
# array([2. , 3.5 , 3.66666667])
which has the advantage of being able to supply a fill_value for when every element in some column in B is True.
I have an image of 3 channels. I have pixel values of 3 channels that if a pixel has these 3 values in its 3 channels then it belongs to class 'A'.
For example
classes_channel = np.zeros((image.shape[0], image.shape[1], num_classes))
pixel_class_dict={'0': [128, 64, 128], '1': [230, 50, 140]} #num_classes=2
for channel in range(num_classes):
pixel_value= pixel_class_dict[str(channel)]
for i in range(image.shape[0]):
for j in range(image.shape[1]):
if list(image[i][j])==pixel_value:
classes_channel[i,j,channel]=1
Basically I want to generate an array of channels equal to number of classes with each class separate in a particular channel.
Is there any efficient way to do this?
You could use numpy’s broadcasting to check more efficiently whether any value in the channels matches another value:
>>> a=np.arange(2*3).reshape(2,3)
>>> a
array([[0, 1, 2],
[3, 4, 5]])
>>> a == 4
array([[False, False, False],
[False, True, False]])
This way you can create binary masks. And those you can combine with boolean operators, like np.logical_and and np.logical_or:
>>> b = a == 4
>>> c = a == 0
>>> np.logical_and(b, c)
array([[False, False, False],
[False, False, False]])
>>> np.logical_or(b, c)
array([[ True, False, False],
[False, True, False]])
In your case, you could loop over the classes of pixel values, and compare the different channels:
>>> pixel_class_dict = {1: [18, 19, 20], 2: [9,10,11]}
>>> a = np.arange(2*4*3).reshape(2,4,3)
>>> b = np.zeros((a.shape[:2]), dtype=np.int)
>>> for pixel_class, pixel_values in pixel_class_dict.items():
... mask = np.logical_and(*(a[..., channel] == pixel_values[channel]
... for channel in range(a.shape[-1])))
... b += pixel_class*mask
...
>>> b
array([[0, 0, 0, 2],
[0, 0, 1, 0]])
This last part works because you can multiply a number with a boolean value (4*True == 4 and 3*False == 0 and because I'm assuming each of the pixel values in your dictionary is unique. If the latter doesn’t hold, you’ll sum up the class identifiers.
A slightly shorter approach would be to reshape the starting array:
>>> b = np.zeros((a.shape[:2]), dtype=np.int)
>>> a2 = a.reshape(-1, 3)
>>> for pixel_class, pixel_values in pixel_class_dict.items():
... mask = (np.all(a2 == pixel_values, axis=1)
... .reshape(b.shape))
... b += mask * pixel_class
...
>>> b
array([[0, 0, 0, 4],
[0, 0, 2, 0]])
Found another solution:
Here image is the image (with 3 channels) to which we want to
import numpy as np
import cv2
for class_id in pixel_class_dict:
class_color = np.array(pixel_class_dict[class_id])
classes_channel[:, :, int(class_id)] = cv2.inRange(image,class_color,class_color).astype('bool').astype('float32')
I want to convert a 2d matrix of dates to boolean matrix based on dates in a 1d matrix. i.e.,
[[20030102, 20030102, 20070102],
[20040102, 20040102, 20040102].,
[20050102, 20050102, 20050102]]
should become
[[True, True, False],
[False, False, False].,
[True, True, True]]
if I provide a 1d array [20010203, 20030102, 20030501, 20050102, 20060101]
import numpy as np
dateValues = np.array(
[[20030102, 20030102, 20030102],
[20040102, 20040102, 20040102],
[20050102, 20050102, 20050102]])
requestedDates = [20010203, 20030102, 20030501, 20050102, 20060101]
ix = np.in1d(dateValues.ravel(), requestedDates).reshape(dateValues.shape)
print(ix)
Returns:
[[ True True True]
[False False False]
[ True True True]]
Refer to numpy.in1d for more information (documentation):
http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html
a = np.array([[20030102, 20030102, 20070102],
[20040102, 20040102, 20040102],
[20050102, 20050102, 20050102]])
b = np.array([20010203, 20030102, 20030501, 20050102, 20060101])
>>> a.shape
(3, 3)
>>> b.shape
(5,)
>>>
For the comparison, you need to broadcast b onto a by adding an axis to a. - this compares each element of a with each element of b
>>> mask = a[...,None] == b
>>> mask.shape
(3, 3, 5)
>>>
Then use np.any() to see if there are any matches
>>> np.any(mask, axis = 2, keepdims = False)
array([[ True, True, False],
[False, False, False],
[ True, True, True]], dtype=bool)
timeit.Timer comparison with in1d:
>>>
>>> t = Timer("np.any(a[...,None] == b, axis = 2)","from __main__ import np, a, b")
>>> t.timeit(10000)
0.13268041338812964
>>> t = Timer("np.in1d(a.ravel(), b).reshape(a.shape)","from __main__ import np, a, b")
>>> t.timeit(10000)
0.26060646913566643
>>>
I'm working with numpy masked arrays and there is a trivial operation I cannot figure out how to do it in a simple way. If I have two masked arrays, how can I get them aggregated into another array that contains only the unmasked values?
In [1]: import numpy as np
In [2]: np.ma.array([1, 2, 3], mask = [0,1,1])
Out[2]:
masked_array(data = [1 -- --],
mask = [False True True],
fill_value = 999999)
In [3]: np.ma.array([4, 5, 6], mask = [1,1,0])
Out[3]:
masked_array(data = [-- -- 6],
mask = [ True True False],
fill_value = 999999)
Which operation should I apply to the previous arrays if I want to get:
masked_array(data = [1 -- 6],
mask = [False True False],
fill_value = 999999)
Stack the masks and the arrays using numpy.dstack and create a new masked array and then you can get the required output using numpy.prod:
>>> a1 = np.ma.array([1, 2, 3], mask = [0,1,1])
>>> a2 = np.ma.array([7, 8, 9], mask = [1,1,0])
>>> arr = np.ma.array(np.dstack((a1, a2)), mask=np.dstack((a1.mask, a2.mask)))
>>> np.prod(arr[0], axis=1)
masked_array(data = [1 -- 9],
mask = [False True False],
fill_value = 999999)
Is there a convenient way to add another array with actual values to masked positions in another array?
import numpy as np
arr1 = np.ma.array([0,1,0], mask=[True, False, True])
arr2 = np.ma.array([2,3,0], mask=[False, False, True])
arr1+arr2
Out[4]:
masked_array(data = [-- 4 --],
mask = [ True False True],
fill_value = 999999)
Note: in arr2 the value 2 is not masked -> should be in the resulting array
The result should be [2, 4, --]. I'd think there must be an easy solution for this?
Try this (choosing the logical operator that you want to use for your masks from http://docs.python.org/3/library/operator.html)
>>> from operator import and_
>>> np.ma.array(arr1.data+arr2.data,mask=map(and_,arr1.mask,arr2.mask))
masked_array(data = [2 4 --],
mask = [False False True],
fill_value = 999999)
In Python 3, map() returns an iterator and not a list, so it is necessary to add list():
>>> np.ma.array(arr1.data+arr2.data,mask=list(map(and_,arr1.mask,arr2.mask)))