Handling masked numpy array - python

I have masked numpy array. While doing processing for each of the element, I need to first check whether the particular element is masked or not, if masked then I need to skip those element.
I have tried like this :
from netCDF4 import Dataset
data=Dataset('test.nc')
dim_size=len(data.dimensions[nc_dims[0]])
model_dry_tropo_corr=data.variables['model_dry_tropo_corr'][:]
solid_earth_tide=data.variables['solid_earth_tide'][:]
for i in range(0,dim_size)
try :
model_dry_tropo_corr[i].mask=True
continue
except :
Pass
try:
solid_earth_tide[i].mask=True
continue
except:
Pass
correction=model_dry_tropo_corr[i]/2+solid_earth_tide[i]
Is there other efficient way to do this, please do let me know. Your suggestion or comments are highly appreciated.

Instead of a loop you could use
correction = model_dry_tropo_corr/2 + solid_earth_tide
This will create a new masked array that will have your answers and masks. You could then access unmasked values from new array.

I'm puzzled about this code
try :
model_dry_tropo_corr[i].mask=True
continue
except :
Pass
I don't have netCDF4 installed, but it appears from the documentation that your variable will look like, maybe even be a numpy.ma masked array.
It would be helpful if you printed all or part of this variable, with attributes like shape and dtype.
I can make a masked array with an expression like:
In [746]: M=np.ma.masked_where(np.arange(10)%3==0,np.arange(10))
In [747]: M
Out[747]:
masked_array(data = [-- 1 2 -- 4 5 -- 7 8 --],
mask = [ True False False True False False True False False True],
fill_value = 999999)
I can test whether mask for a given element if True/False with:
In [748]: M.mask[2]
Out[748]: False
In [749]: M.mask[3]
Out[749]: True
But if I index first,
In [754]: M[2]
Out[754]: 2
In [755]: M[3]
Out[755]: masked
In [756]: M[2].mask=True
...
AttributeError: 'numpy.int32' object has no attribute 'mask'
In [757]: M[3].mask=True
So yes, your try/except will skip the elements that have the mask set True.
But I think it would be clear to do:
if model_dry_tropo_corr.mask[i]:
continue
But that is still iterative.
But as #user3404344 showed, you could perform the math with the variables. Masking will carry over. That could though be a problem if masked values are 'bad' and cause errors in the calculation.
If I define another masked array
In [764]: N=np.ma.masked_where(np.arange(10)%4==0,np.arange(10))
In [765]: N+M
Out[765]:
masked_array(data = [-- 2 4 -- -- 10 -- 14 -- --],
mask = [ True False False True True False True False True True],
fill_value = 999999)
you can see how elements that were masked in either M or N are masked in the result
I can used the compressed method to give only the valid elements
In [766]: (N+M).compressed()
Out[766]: array([ 2, 4, 10, 14])
filling can also be handy when doing math with masked arrays:
In [779]: N.filled(0)+M.filled(0)
Out[779]: array([ 0, 2, 4, 3, 4, 10, 6, 14, 8, 9])
I could use filled to neutralize problem calculations, and still mask those values
In [785]: z=np.ma.masked_array(N.filled(0)+M.filled(0),mask=N.mask|M.mask)
In [786]: z
Out[786]:
masked_array(data = [-- 2 4 -- -- 10 -- 14 -- --],
mask = [ True False False True True False True False True True],
fill_value = 999999)
Oops, I don't need to worry about the masked values messing the calculation. The masked addition is doing the filling for me
In [787]: (N+M).data
Out[787]: array([ 0, 2, 4, 3, 4, 10, 6, 14, 8, 9])
In [788]: N.data+M.data # raw unmasked addition
Out[788]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
In [789]: z.data # same as the (N+M).data
Out[789]: array([ 0, 2, 4, 3, 4, 10, 6, 14, 8, 9])

Related

Does anybody know how to record/capture data into a numpy array?

I have for after alot of days of research found no answer, so i hope that we can find a solution here.
Matrix = np.random.randint(5, size=(60, 6, 6))
arr = np.random.randint(5, size=(1, 240))
arr1 = np.random.randint(5, size=(1, 240))
arr2 = np.random.randint(5, size=(1, 240))
arr3 = np.random.randint(5, size=(1, 240))
in the original data all the arr, arr1, arr2, arr3 consists of different data and got sizes of (1,240), but is sliced to size (1,72), kkk, vvv represents placement shifts for each iteration but with the fixed size of (1,72)
but might not be relevant to solving this case.
I have this code:
kkk = 3
vvv = 79
while True:
for i in range(len(Matrix)):
for j in range(len(Matrix[i])):
for z in range(len(Matrix[j])):
pass
Matrix2 = np.concatenate((arr[kkk:vvv], arr1[kkk:vvv], arr2[kkk:vvv], arr3[kkk:vvv]), axis=0)
for v in range(len(Matrix2 )):
for vv in range(len(Matrix2 [v])):
for kv in np.isclose(Matrix[i, j, z], Matrix2 [v], rtol=0.005, atol=0.0):
def array_for(x, y):
return np.asanyarray([kv for kv in np.isclose(x[i, j, z], y[v], rtol=0.005, atol=0.0)])
boolarray = array_for(Matrix, Matrix2)
if j >= 5:
kkk += 4
vvv += 4
if i == 59:
break
The thing i am looking to do is record data from my boolarray, into an array of the size that:
def array_for(x, y):
return np.asanyarray([kv for kv in np.isclose(x[i, j, z], y[v], rtol=0.005, atol=0.0)])
This gives me.
The size of Matrix, is 60,6,6
The size of Matrix2, is 4,72
The size of boolarray, is 4,1, for each value in Matrix
[False False False False]
i want to record the boolarray data into an array of 1,6,6,4,72
and end up with 60 of these arrays so it becomes 60,6,6,4,72
The problem i found is that numpy is memorybased arrays, and if i try and append i get a bunch of 4,1 arrays which wouldnt be able to concate either in the loop or outside the loop, with anything other than itself.
Also trying to use np.insert, will yield me the same result.
And appending data to a python list, would create wrong shapes, like haveing a shape 1,1 and inside it 72 values.
Is it possible to use this logic and if so how to achieve the desired result:
For values in boolarray:
append values a list[],
concate with each condition like:
if boolarray == 71:
np.concate(values, mask)
out = 2,4,72
delete mask:
continue np.concate(value, prevvalues)
print(Full array)
Out = 60,6,6,4,72
Matrix array looks like this:
[[0, 1, 0, 3, 4, 5],
[0, 2, 0, 3, 4, 5],
[2, 1, 1, 3, 4, 5],
[0, 1, 5, 3, 9, 5],
[4, 1, 0, 3, 8, 5],
[4, 1, 0, 3, 8, 5]]
Then with 60 layers of different values shown like above.
Matrix2 array looks like this:
[[1, 1, 4, 5]] x72 times with different values
Let me know if i should explain anything further, and or if you have an idea of how to put data into an array dynamicly.
Your help is much appreciated.
Desired array should look like this:
[[False False False False False False],
[False True False False False False],
[False False True False True False],
[False True True False True False],
[False False False True False True ],
[False False False False False False]] 60x times = 60,6,6
[[False True True False],
[True True True False],
[False False True False],
[False True True False],
[False False False True],
[False False False False]] 72x times 4, 72
each 4 values iterates through 1 cell of 6,6 at a time, so maybe i suppose the array might look like this 72, 4, 6, 6, 1 where 1 is 1 layer out of 60
So the last is likely how it would look like, im sorry i cant represent the figure much better as it is a complex shape, compared to that of 3 dimensional object, but the Np.isclose gives this kind of relation, by 1 cell at the time. and which yield: 10,368 values in total, which is 72times,4times,6times,6,times1.
I think this is what you want:
import numpy as np
#input data
Matrix = np.random.randint(5, size=(60, 6, 6))
arr = np.random.randint(5, size=(1, 240))
arr1 = np.random.randint(5, size=(1, 240))
arr2 = np.random.randint(5, size=(1, 240))
arr3 = np.random.randint(5, size=(1, 240))
# construct matrix 2
Matrix2=np.vstack((arr,arr1,arr2,arr3))
# slice matrix 2
kkk = 3
vvv = 79
Matrix2_slice=Matrix2[:,kkk:vvv]
# methods
def compare_value(x,y):
return np.isclose(x, y, rtol=0.005, atol=0.0)
def compare_to_matrix2(x):
return np.apply_along_axis(compare_value, 0, Matrix2_slice,x)
# output
vectorized_ctm = np.vectorize(compare_to_matrix2, signature='()->(l,p)')
output = vectorized_ctm(Matrix)
# some checks
print(output.shape)
print(output[0,0,0,0,0:10])
some remarks though:
the difference between kkk and vvv is 76, not 72 so the shape is a bit different than you requested (60,6,6,4,76)
Matrix2_slice is not used as a function input, which is not ideal
place shift per iteration (different values for kkk and vvv) are not included

Python Fancy Indexing Assignments: cannot assign 3 input values to the 6 output values where the mask is true

I am trying to make a zeroed array with the same shape of a source array. Then modify every value in the second array that corresponds to a specific value in the first array.
This would be simple enough if I was just replacing one value. Here is a toy example:
import numpy as np
arr1 = np.array([[1,2,3],[3,4,5],[1,2,3]])
arr2 = np.array([[0,0,0],[0,0,0],[0,0,0]])
arr2[arr1==1] = -1
This will work as expected and arr2 would be:
[[-1,0,0],
[ 0,0,0],
[-1,0,0]]
But I would like to replace an entire row. Something like this replacing the last line of the sample code above:
arr2[arr1==[3,4,5]] = [-1,-1,-1]
When I do this, it also works as expected and arr2 would be:
[[ 0, 0, 0],
[-1,-1,-1],
[ 0, 0, 0]]
But when I tried to replace the last line of sample code with something like:
arr2[arr1==[1,2,3]] = [-1,-1,-1]
I expected to get something like the last output, but with the 0th and 2nd rows being changed. But instead I got the following error.
ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 6
output values where the mask is true
I assume this is because, unlike the other example, it was going to have to replace more than one row. Though this seems odd to me, since it worked fine replacing more than one value in the simple single value example.
I'm just wondering if anyone can explain this behavior to me, because it is a little confusing. I am not that experienced with the inner workings of numpy operations. Also, if anyone has an any recommendations to do what I am trying to accomplish in an efficient manner.
In my real world implementation, I am working with a very large three dimensional array (an image with 3 color channels) and I want to make an new array that stores a specific value into these three color channels if the source image has a specific three color values in that corresponding pixel (and remain [0,0,0] if it doesn't match our pixel_rgb_of_interest). I could go through in linear time and just check every single pixel, but this could get kind of slow if there are a lot of images, and was just wondering if there was a better way.
Thank you!
This would be a good application for numpy.where
>>> import numpy as np
>>> arr1 = np.array([[1,2,3],[3,4,5],[1,2,3]])
>>> arr2 = np.array([[0,0,0],[0,0,0],[0,0,0]])
>>> np.where(arr1 == [1,2,3], [-1,-1,-1], arr1)
array([[-1, -1, -1],
[ 3, 4, 5],
[-1, -1, -1]])
This basically works as "wherever the condition is true, use the x argument, then use the y argument the rest of the time"
Lets add an "index" array:
In [56]: arr1 = np.array([[1,2,3],[3,4,5],[1,2,3]])
...: arr2 = np.array([[0,0,0],[0,0,0],[0,0,0]])
...: arr3 = np.arange(9).reshape(3,3)
The test against 1 value:
In [57]: arr1==1
Out[57]:
array([[ True, False, False],
[False, False, False],
[ True, False, False]])
that has 2 true values:
In [58]: arr3[arr1==1]
Out[58]: array([0, 6])
We could assign one value as you do, or 2.
Test with a list, which is converted to array first:
In [59]: arr1==[3,4,5]
Out[59]:
array([[False, False, False],
[ True, True, True],
[False, False, False]])
That has 3 True:
In [60]: arr3[arr1==[3,4,5]]
Out[60]: array([3, 4, 5])
so it works to assign a list of 3 values as you do. Or a scalar.
In [61]: arr1==[1,2,3]
Out[61]:
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
Here the test has 6 True.
In [62]: arr3[arr1==[1,2,3]]
Out[62]: array([0, 1, 2, 6, 7, 8])
So we can assign 6 values or a scalar. But you tried to assign 3 values.
Or we could apply all to find the rows that match [1,2,3]:
In [63]: np.all(arr1==[1,2,3], axis=1)
Out[63]: array([ True, False, True])
In [64]: arr3[np.all(arr1==[1,2,3], axis=1)]
Out[64]:
array([[0, 1, 2],
[6, 7, 8]])
To this we could assign a (2,3) array, a scalar, a (3,) array, or a (2,1) (as per broadcasting rules):
In [65]: arr2[np.all(arr1==[1,2,3], axis=1)]=np.array([100,200])[:,None]
In [66]: arr2
Out[66]:
array([[100, 100, 100],
[ 0, 0, 0],
[200, 200, 200]])

Explanation of boolean indexing behaviors

For the 2D array y:
y = np.arange(20).reshape(5,4)
---
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
All indexing select 1st, 3rd, and 5th rows. This is clear.
print(y[
[0, 2, 4],
::
])
print(y[
[0, 2, 4],
::
])
print(y[
[True, False, True, False, True],
::
])
---
[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
Questions
Please help understand what rules or mechanism are working to produce the results.
Replacing [] with tuple produces an empty array with shape (0, 5, 4).
y[
(True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)
Use single True adds a new axis.
y[True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True].shape
---
(1, 5, 4)
Adding additional boolean True produces the same.
y[True, True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True, True].shape
---
(1, 5, 4)
However, adding False boolean causes the empty array again.
y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)
Not sure the documentation explains this behavior.
Boolean array indexing
In general if an index includes a Boolean array, the result will be
identical to inserting obj.nonzero() into the same position and using
the integer array indexing mechanism described above. x[ind_1,
boolean_array, ind_2] is equivalent to x[(ind_1,) +
boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array
present, this is straight forward. Care must only be taken to make
sure that the boolean index has exactly as many dimensions as it is
supposed to work with.
Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:
/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/
So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.
This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:
x = np.ones((2, 2))
assert x[x > 0].ndim == 1
x = np.ones(2)
assert x[x > 0].ndim == 1
x = np.ones(())
assert x[x > 0].ndim == 1 # scalar boolean here!
The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.
Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

Add together two numpy masked arrays

Is there a convenient way to add another array with actual values to masked positions in another array?
import numpy as np
arr1 = np.ma.array([0,1,0], mask=[True, False, True])
arr2 = np.ma.array([2,3,0], mask=[False, False, True])
arr1+arr2
Out[4]:
masked_array(data = [-- 4 --],
mask = [ True False True],
fill_value = 999999)
Note: in arr2 the value 2 is not masked -> should be in the resulting array
The result should be [2, 4, --]. I'd think there must be an easy solution for this?
Try this (choosing the logical operator that you want to use for your masks from http://docs.python.org/3/library/operator.html)
>>> from operator import and_
>>> np.ma.array(arr1.data+arr2.data,mask=map(and_,arr1.mask,arr2.mask))
masked_array(data = [2 4 --],
mask = [False False True],
fill_value = 999999)
In Python 3, map() returns an iterator and not a list, so it is necessary to add list():
>>> np.ma.array(arr1.data+arr2.data,mask=list(map(and_,arr1.mask,arr2.mask)))

Check if values in a set are in a numpy array in python

I want to check if a NumPyArray has values in it that are in a set, and if so set that area in an array = 1. If not set a keepRaster = 2.
numpyArray = #some imported array
repeatSet= ([3, 5, 6, 8])
confusedRaster = numpyArray[numpy.where(numpyArray in repeatSet)]= 1
Yields:
<type 'exceptions.TypeError'>: unhashable type: 'numpy.ndarray'
Is there a way to loop through it?
for numpyArray
if numpyArray in repeatSet
confusedRaster = 1
else
keepRaster = 2
To clarify and ask for a bit further help:
What I am trying to get at, and am currently doing, is putting a raster input into an array. I need to read values in the 2-d array and create another array based on those values. If the array value is in a set then the value will be 1. If it is not in a set then the value will be derived from another input, but I'll say 77 for now. This is what I'm currently using. My test input has about 1500 rows and 3500 columns. It always freezes at around row 350.
for rowd in range(0, width):
for cold in range (0, height):
if numpyarray.item(rowd,cold) in repeatSet:
confusedArray[rowd][cold] = 1
else:
if numpyarray.item(rowd,cold) == 0:
confusedArray[rowd][cold] = 0
else:
confusedArray[rowd][cold] = 2
In versions 1.4 and higher, numpy provides the in1d function.
>>> test = np.array([0, 1, 2, 5, 0])
>>> states = [0, 2]
>>> np.in1d(test, states)
array([ True, False, True, False, True], dtype=bool)
You can use that as a mask for assignment.
>>> test[np.in1d(test, states)] = 1
>>> test
array([1, 1, 1, 5, 1])
Here are some more sophisticated uses of numpy's indexing and assignment syntax that I think will apply to your problem. Note the use of bitwise operators to replace if-based logic:
>>> numpy_array = numpy.arange(9).reshape((3, 3))
>>> confused_array = numpy.arange(9).reshape((3, 3)) % 2
>>> mask = numpy.in1d(numpy_array, repeat_set).reshape(numpy_array.shape)
>>> mask
array([[False, False, False],
[ True, False, True],
[ True, False, True]], dtype=bool)
>>> ~mask
array([[ True, True, True],
[False, True, False],
[False, True, False]], dtype=bool)
>>> numpy_array == 0
array([[ True, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> numpy_array != 0
array([[False, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
>>> confused_array[mask] = 1
>>> confused_array[~mask & (numpy_array == 0)] = 0
>>> confused_array[~mask & (numpy_array != 0)] = 2
>>> confused_array
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Another approach would be to use numpy.where, which creates a brand new array, using values from the second argument where mask is true, and values from the third argument where mask is false. (As with assignment, the argument can be a scalar or an array of the same shape as mask.) This might be a bit more efficient than the above, and it's certainly more terse:
>>> numpy.where(mask, 1, numpy.where(numpy_array == 0, 0, 2))
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Here is one possible way of doing what you whant:
numpyArray = np.array([1, 8, 35, 343, 23, 3, 8]) # could be n-Dimensional array
repeatSet = np.array([3, 5, 6, 8])
mask = (numpyArray[...,None] == repeatSet[None,...]).any(axis=-1)
print mask
>>> [False True False False False True True]
In recent numpy you could use a combination of np.isin and np.where to achieve this result. The first method outputs a boolean numpy array that evaluates to True where its vlaues are equal to an array-like specified test element (see doc), while with the second you could create a new array that set some a value where the specified confition evaluates to True and another value where False.
Example
I'll make an example with a random array but using the specific values you provided.
import numpy as np
repeatSet = ([2, 5, 6, 8])
arr = np.array([[1,5,1],
[0,1,0],
[0,0,0],
[2,2,2]])
out = np.where(np.isin(arr, repeatSet), 1, 77)
> out
array([[77, 1, 77],
[77, 77, 77],
[77, 77, 77],
[ 1, 1, 1]])

Categories

Resources