I created an index based on several conditions
transition = np.where((rain>0) & (snow>0) & (graup>0) & (xlat<53.) & (xlat>49.) & (xlon<-114.) & (xlon>-127.)) #indexes the grids where there are transitions
with the shape of (3,259711) that looks like the following:
array([[ 0, 0, 0, ..., 47, 47, 47], #hour
[847, 847, 848, ..., 950, 950, 951], #lat gridpoint
[231, 237, 231, ..., 200, 201, 198]]) #lon gridpoint
I have several other variables (e.g. temp) with the shape of (48, 1015, 1359) corresponding to hour, lat, lon.
Seeing as the index are my valid gridpoints, how do I mask all the variables, like temp so that it retains the (48,1015,1359) shape, but masks the values outside the index.
In [90]: arr = np.arange(24).reshape(6,4)
In [91]: keep = (arr % 3)==1
In [92]: keep
Out[92]:
array([[False, True, False, False],
[ True, False, False, True],
[False, False, True, False],
[False, True, False, False],
[ True, False, False, True],
[False, False, True, False]], dtype=bool)
In [93]: np.where(keep)
Out[93]:
(array([0, 1, 1, 2, 3, 4, 4, 5], dtype=int32),
array([1, 0, 3, 2, 1, 0, 3, 2], dtype=int32))
Simple application of the keep mask gives a 1d array of the desired values. I could also index with the where tuple.
In [94]: arr[keep]
Out[94]: array([ 1, 4, 7, 10, 13, 16, 19, 22])
With keep, or rather it's boolean inverse, I can make a masked array:
In [95]: np.ma.masked_array(arr,mask=~keep)
Out[95]:
masked_array(data =
[[-- 1 -- --]
[4 -- -- 7]
[-- -- 10 --]
[-- 13 -- --]
[16 -- -- 19]
[-- -- 22 --]],
mask =
[[ True False True True]
[False True True False]
[ True True False True]
[ True False True True]
[False True True False]
[ True True False True]],
fill_value = 999999)
np.ma.masked_where(~keep, arr) does the same thing - just a different argument order. It still expects the boolean mask array.
I can do the same starting with the where tuple:
In [105]: idx = np.where(keep)
In [106]: mask = np.ones_like(arr, dtype=bool)
In [107]: mask[idx] = False
In [108]: np.ma.masked_array(arr, mask=mask)
There may be something in the np.ma class that does this with one call, but it will have to do the same sort of construction.
This also works:
x = np.ma.masked_all_like(arr)
x[idx] = arr[idx]
Related
I have a 2D array A:
28 39 52
77 80 66
7 18 24
9 97 68
And a vector array of column indexes B:
1
0
2
0
How, in a pythonian way, using base Python or Numpy, can I select the elements from A which DO NOT correspond to the column indexes in B?
I should get this 2D array which contains the elements of A, Not corresponding to the column indexes stored in B:
28 52
80 66
7 18
97 68
You can make use of broadcasting and a row-wise mask to select elements not contained in your array for each row:
Setup
B = np.array([1, 0, 2, 0])
cols = np.arange(A.shape[1])
Now use broadcasting to create a mask, and index your array.
mask = B[:, None] != cols
A[mask].reshape(-1, 2)
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
A spin off of my answer to your other question,
Replace 2D array elements with zeros, using a column index vector
We can make a boolean mask with the same indexing used before:
In [124]: mask = np.ones(A.shape, dtype=bool)
In [126]: mask[np.arange(4), B] = False
In [127]: mask
Out[127]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
Indexing an array with a boolean mask produces a 1d array, since in the most general case such a mask could select a different number of elements in each row.
In [128]: A[mask]
Out[128]: array([28, 52, 80, 66, 7, 18, 97, 68])
In this case the result can be reshaped back to 2d:
In [129]: A[mask].reshape(4,2)
Out[129]:
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
Since you allowed for 'base Python' here's list comprehension answer:
In [136]: [[y for i,y in enumerate(x) if i!=b] for b,x in zip(B,A)]
Out[136]: [[28, 52], [80, 66], [7, 18], [97, 68]]
If all the 0's in the other A come from the insertion, then we can also get the mask (Out[127]) with
In [142]: A!=0
Out[142]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
I have this 3x3 matrix:
a=array([[ 1, 11, 5],
[ 3, 9, 9],
[ 5, 7, -3]])
I need to mask the minimum values in each row in order to calculate the mean of each row discarding the minimum values. Is there a general solution?
I have tried with
a_masked=np.ma.masked_where(a==np.ma.min(a,axis=1),a)
Which masks the minimum value in first and third row, but not the second row?
I would appreciate any help. Thanks!
The issue is because the comparison a == a.min(axis=1) is comparing each column to the minimum value of each row rather than comparing each row to the minimum values. This is because a.min(axis=1) returns a vector rather than a matrix which behaves similarly to an Nx1 array. As such, when broadcasting, the == operator performs the operation in a column-wise fashion to match dimensions.
a == a.min(axis=1)
# array([[ True, False, False],
# [False, False, False],
# [False, False, True]], dtype=bool)
One potential way to fix this is to resize the result of a.min(axis=1) into column vector (e.g. a 3 x 1 2D array).
a == np.resize(a.min(axis=1), [a.shape[0],1])
# array([[ True, False, False],
# [ True, False, False],
# [False, False, True]], dtype=bool)
Or more simply as #ColonelBeuvel has shown:
a == a.min(axis=1)[:,None]
Now applying this to your entire line of code.
a_masked = np.ma.masked_where(a == np.resize(a.min(axis=1),[a.shape[0],1]), a)
# masked_array(data =
# [[-- 11 5]
# [-- 9 9]
# [5 7 --]],
# mask =
# [[ True False False]
# [ True False False]
# [False False True]],
# fill_value = 999999)
What is with the min() function?
For every Row just do min(row) and it gives you the minimum of this list in your Case a row. Simply append this minimum in a list for all Minimum.
minList=[]
for i in array:
minList.append(min(i))
I essentially want to crop an image with numpy—I have a 3-dimension numpy.ndarray object, ie:
[ [0,0,0,0], [255,255,255,255], ....]
[0,0,0,0], [255,255,255,255], ....] ]
where I want to remove whitespace, which, in context, is known to be either entire rows or entire columns of [0,0,0,0].
Letting each pixel just be a number for this example, I'm trying to essentially do this:
Given this: *EDIT: chose a slightly more complex example to clarify
[ [0,0,0,0,0,0]
[0,0,1,1,1,0]
[0,1,1,0,1,0]
[0,0,0,1,1,0]
[0,0,0,0,0,0]]
I'm trying to create this:
[ [0,1,1,1],
[1,1,0,1],
[0,0,1,1] ]
I can brute force this with loops, but intuitively I feel like numpy has a better means of doing this.
In general, you'd want to look into scipy.ndimage.label and scipy.ndimage.find_objects to extract the bounding box of contiguous regions fulfilling a condition.
However, in this case, you can do it fairly easily with "plain" numpy.
I'm going to assume you have a nrows x ncols x nbands array here. The other convention of nbands x nrows x ncols is also quite common, so have a look at the shape of your array.
With that in mind, you might do something similar to:
mask = im == 0
all_white = mask.sum(axis=2) == 0
rows = np.flatnonzero((~all_white).sum(axis=1))
cols = np.flatnonzero((~all_white).sum(axis=0))
crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1, :]
For your 2D example, it would look like:
import numpy as np
im = np.array([[0,0,0,0,0,0],
[0,0,1,1,1,0],
[0,1,1,0,1,0],
[0,0,0,1,1,0],
[0,0,0,0,0,0]])
mask = im == 0
rows = np.flatnonzero((~mask).sum(axis=1))
cols = np.flatnonzero((~mask).sum(axis=0))
crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1]
print crop
Let's break down the 2D example a bit.
In [1]: import numpy as np
In [2]: im = np.array([[0,0,0,0,0,0],
...: [0,0,1,1,1,0],
...: [0,1,1,0,1,0],
...: [0,0,0,1,1,0],
...: [0,0,0,0,0,0]])
Okay, now let's create a boolean array that meets our condition:
In [3]: mask = im == 0
In [4]: mask
Out[4]:
array([[ True, True, True, True, True, True],
[ True, True, False, False, False, True],
[ True, False, False, True, False, True],
[ True, True, True, False, False, True],
[ True, True, True, True, True, True]], dtype=bool)
Also, note that the ~ operator functions as logical_not on boolean arrays:
In [5]: ~mask
Out[5]:
array([[False, False, False, False, False, False],
[False, False, True, True, True, False],
[False, True, True, False, True, False],
[False, False, False, True, True, False],
[False, False, False, False, False, False]], dtype=bool)
With that in mind, to find rows where all elements are false, we can sum across columns:
In [6]: (~mask).sum(axis=1)
Out[6]: array([0, 3, 3, 2, 0])
If no elements are True, we'll get a 0.
And similarly to find columns where all elements are false, we can sum across rows:
In [7]: (~mask).sum(axis=0)
Out[7]: array([0, 1, 2, 2, 3, 0])
Now all we need to do is find the first and last of these that are not zero. np.flatnonzero is a bit easier than nonzero, in this case:
In [8]: np.flatnonzero((~mask).sum(axis=1))
Out[8]: array([1, 2, 3])
In [9]: np.flatnonzero((~mask).sum(axis=0))
Out[9]: array([1, 2, 3, 4])
Then, you can easily slice out the region based on min/max nonzero elements:
In [10]: rows = np.flatnonzero((~mask).sum(axis=1))
In [11]: cols = np.flatnonzero((~mask).sum(axis=0))
In [12]: im[rows.min():rows.max()+1, cols.min():cols.max()+1]
Out[12]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 0, 1, 1]])
One way of implementing this for arbitrary dimensions would be:
import numpy as np
def trim(arr, mask):
bounding_box = tuple(
slice(np.min(indexes), np.max(indexes) + 1)
for indexes in np.where(mask))
return arr[bounding_box]
A slightly more flexible solution (where you could indicate which axis to act on) is available in FlyingCircus (Disclaimer: I am the main author of the package).
You could use np.nonzero function to find your zero values, then slice nonzero elements from your original array and reshape to what you want:
import numpy as np
n = np.array([ [0,0,0,0,0,0],
[0,0,1,1,1,0],
[0,0,1,1,1,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]])
elems = n[n.nonzero()]
In [415]: elems
Out[415]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In [416]: elems.reshape(3,3)
Out[416]:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
I'm trying to rewrite a function using numpy which is originally in MATLAB. There's a logical indexing part which is as follows in MATLAB:
X = reshape(1:16, 4, 4).';
idx = [true, false, false, true];
X(idx, idx)
ans =
1 4
13 16
When I try to make it in numpy, I can't get the correct indexing:
X = np.arange(1, 17).reshape(4, 4)
idx = [True, False, False, True]
X[idx, idx]
# Output: array([6, 1, 1, 6])
What's the proper way of getting a grid from the matrix via logical indexing?
You could also write:
>>> X[np.ix_(idx,idx)]
array([[ 1, 4],
[13, 16]])
In [1]: X = np.arange(1, 17).reshape(4, 4)
In [2]: idx = np.array([True, False, False, True]) # note that here idx has to
# be an array (not a list)
# or boolean values will be
# interpreted as integers
In [3]: X[idx][:,idx]
Out[3]:
array([[ 1, 4],
[13, 16]])
In numpy this is called fancy indexing. To get the items you want you should use a 2D array of indices.
You can use an outer to make from your 1D idx a proper 2D array of indices. The outers, when applied to two 1D sequences, compare each element of one sequence to each element of the other. Recalling that True*True=True and False*True=False, the np.multiply.outer(), which is the same as np.outer(), can give you the 2D indices:
idx_2D = np.outer(idx,idx)
#array([[ True, False, False, True],
# [False, False, False, False],
# [False, False, False, False],
# [ True, False, False, True]], dtype=bool)
Which you can use:
x[ idx_2D ]
array([ 1, 4, 13, 16])
In your real code you can use x=[np.outer(idx,idx)] but it does not save memory, working the same as if you included a del idx_2D after doing the slice.
I want to check if a NumPyArray has values in it that are in a set, and if so set that area in an array = 1. If not set a keepRaster = 2.
numpyArray = #some imported array
repeatSet= ([3, 5, 6, 8])
confusedRaster = numpyArray[numpy.where(numpyArray in repeatSet)]= 1
Yields:
<type 'exceptions.TypeError'>: unhashable type: 'numpy.ndarray'
Is there a way to loop through it?
for numpyArray
if numpyArray in repeatSet
confusedRaster = 1
else
keepRaster = 2
To clarify and ask for a bit further help:
What I am trying to get at, and am currently doing, is putting a raster input into an array. I need to read values in the 2-d array and create another array based on those values. If the array value is in a set then the value will be 1. If it is not in a set then the value will be derived from another input, but I'll say 77 for now. This is what I'm currently using. My test input has about 1500 rows and 3500 columns. It always freezes at around row 350.
for rowd in range(0, width):
for cold in range (0, height):
if numpyarray.item(rowd,cold) in repeatSet:
confusedArray[rowd][cold] = 1
else:
if numpyarray.item(rowd,cold) == 0:
confusedArray[rowd][cold] = 0
else:
confusedArray[rowd][cold] = 2
In versions 1.4 and higher, numpy provides the in1d function.
>>> test = np.array([0, 1, 2, 5, 0])
>>> states = [0, 2]
>>> np.in1d(test, states)
array([ True, False, True, False, True], dtype=bool)
You can use that as a mask for assignment.
>>> test[np.in1d(test, states)] = 1
>>> test
array([1, 1, 1, 5, 1])
Here are some more sophisticated uses of numpy's indexing and assignment syntax that I think will apply to your problem. Note the use of bitwise operators to replace if-based logic:
>>> numpy_array = numpy.arange(9).reshape((3, 3))
>>> confused_array = numpy.arange(9).reshape((3, 3)) % 2
>>> mask = numpy.in1d(numpy_array, repeat_set).reshape(numpy_array.shape)
>>> mask
array([[False, False, False],
[ True, False, True],
[ True, False, True]], dtype=bool)
>>> ~mask
array([[ True, True, True],
[False, True, False],
[False, True, False]], dtype=bool)
>>> numpy_array == 0
array([[ True, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> numpy_array != 0
array([[False, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
>>> confused_array[mask] = 1
>>> confused_array[~mask & (numpy_array == 0)] = 0
>>> confused_array[~mask & (numpy_array != 0)] = 2
>>> confused_array
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Another approach would be to use numpy.where, which creates a brand new array, using values from the second argument where mask is true, and values from the third argument where mask is false. (As with assignment, the argument can be a scalar or an array of the same shape as mask.) This might be a bit more efficient than the above, and it's certainly more terse:
>>> numpy.where(mask, 1, numpy.where(numpy_array == 0, 0, 2))
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Here is one possible way of doing what you whant:
numpyArray = np.array([1, 8, 35, 343, 23, 3, 8]) # could be n-Dimensional array
repeatSet = np.array([3, 5, 6, 8])
mask = (numpyArray[...,None] == repeatSet[None,...]).any(axis=-1)
print mask
>>> [False True False False False True True]
In recent numpy you could use a combination of np.isin and np.where to achieve this result. The first method outputs a boolean numpy array that evaluates to True where its vlaues are equal to an array-like specified test element (see doc), while with the second you could create a new array that set some a value where the specified confition evaluates to True and another value where False.
Example
I'll make an example with a random array but using the specific values you provided.
import numpy as np
repeatSet = ([2, 5, 6, 8])
arr = np.array([[1,5,1],
[0,1,0],
[0,0,0],
[2,2,2]])
out = np.where(np.isin(arr, repeatSet), 1, 77)
> out
array([[77, 1, 77],
[77, 77, 77],
[77, 77, 77],
[ 1, 1, 1]])