I have a 2D array A:
28 39 52
77 80 66
7 18 24
9 97 68
And a vector array of column indexes B:
1
0
2
0
How, in a pythonian way, using base Python or Numpy, can I select the elements from A which DO NOT correspond to the column indexes in B?
I should get this 2D array which contains the elements of A, Not corresponding to the column indexes stored in B:
28 52
80 66
7 18
97 68
You can make use of broadcasting and a row-wise mask to select elements not contained in your array for each row:
Setup
B = np.array([1, 0, 2, 0])
cols = np.arange(A.shape[1])
Now use broadcasting to create a mask, and index your array.
mask = B[:, None] != cols
A[mask].reshape(-1, 2)
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
A spin off of my answer to your other question,
Replace 2D array elements with zeros, using a column index vector
We can make a boolean mask with the same indexing used before:
In [124]: mask = np.ones(A.shape, dtype=bool)
In [126]: mask[np.arange(4), B] = False
In [127]: mask
Out[127]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
Indexing an array with a boolean mask produces a 1d array, since in the most general case such a mask could select a different number of elements in each row.
In [128]: A[mask]
Out[128]: array([28, 52, 80, 66, 7, 18, 97, 68])
In this case the result can be reshaped back to 2d:
In [129]: A[mask].reshape(4,2)
Out[129]:
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
Since you allowed for 'base Python' here's list comprehension answer:
In [136]: [[y for i,y in enumerate(x) if i!=b] for b,x in zip(B,A)]
Out[136]: [[28, 52], [80, 66], [7, 18], [97, 68]]
If all the 0's in the other A come from the insertion, then we can also get the mask (Out[127]) with
In [142]: A!=0
Out[142]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
Related
I'm trying to filter a 2D numpy array with another 2D numpy arrays values. Something like this:
array1 = np.array([[ 0, 0],
[86, 4],
[75, 74],
[78, 55],
[53, 94],
[49, 83],
[99, 75],
[99, 10],
[32, 4],
[55, 99],
[62, 95],
[ 0, 0]])
array2 = np.array([[55, 99],
[32, 4],
[75, 74]])
array1[np.isin(array1, array2[2:5]).all(axis=1) == 0]
My ideal output would be a filtered version of array1 that does not have the rows which are equal to the ones in the array2 slice.
Problem is when i do it like this:
np.isin(array1, array[2:5])
output is:
array([[False, False],
[False, True],
[ True, True],
[False, True],
[False, False],
[False, False],
[ True, True],
[ True, False],
[ True, True],
[ True, True],
[False, False],
[False, False]])
It wrongly classifies [99,75] row as [True, True] because both of those values individually exist in our array2.
Is there a more correct way to filter based on all values of a row?
Here's an inefficient but very explicit way to do this with np.all():
# for each row in array2, check full match with each row in array1
bools = [np.all(array1==row,axis=1) for row in array2]
# combine 3 boolean arrays with 'or' logic
mask = [any(tup) for tup in zip(*bools)]
# flip the mask
mask = ~np.array(mask)
# final index
out = array1[mask]
Problem
I have np.array and mask which are of the same shape. Once I apply the mask, the array loses it shape and becomes 1D - flattened one dimensional.
Question
I am wanting to reduce my array across some axis, based on a mask of axis length 1D.
How can I apply a mask, but keep dimensionality of the array?
Example
A small example in code:
# data ...
>>> data = np.ones((4, 4))
>>> data.shape
(4, 4)
# mask ...
>>> mask = np.ones((4, 4), dtype=bool)
>>> mask.shape
(4, 4)
# apply mask ...
>>> data[mask].shape
(16,)
My ideal shape would be (4, 4).
An example with array dimension reduction across an axis:
# data, mask ...
>>> data = np.ones((4, 4))
>>> mask = np.ones((4, 4), dtype=bool)
# remove last column from data ...
>>> mask[:, 3] = False
>>> mask
array([[ True, True, True, False],
[ True, True, True, False],
[ True, True, True, False],
[ True, True, True, False]])
# equivalent mask in 1D ...
>>> mask[0]
array([ True, True, True, False])
# apply mask ...
>>> data[mask].shape
(12,)
The ideal dimensions of the array would be (4, 3) without reshape.
Help is appreciated, thanks!
The 'correct' way of achieving your goal is to not expand the mask to 2D. Instead index with [:, mask] with the 1D mask. This indicates to numpy that you want axis 0 unchanged and mask applied along axis 1.
a = np.arange(12).reshape(3, 4)
b = np.array((1,0,1,0),'?')
a
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
b
# array([ True, False, True, False])
a[:, b]
# array([[ 0, 2],
# [ 4, 6],
# [ 8, 10]])
If your mask is already 2D, numpy won't check whether all its rows are the same because that would be inefficient. But obviously you can use [:, mask[0]] in that case.
If your mask is 2D and just happens to have the same number of Trues in each row then either use #tel's answer. Or create an index array:
B = b^b[:3, None]
B
# array([[False, True, False, True],
# [ True, False, True, False],
# [False, True, False, True]])
J = np.where(B)[1].reshape(len(B), -1)
And now either
np.take_along_axis(a, J, 1)
# array([[ 1, 3],
# [ 4, 6],
# [ 9, 11]])
or
I = np.arange(len(J))[:, None]
IJ = I, J
a[IJ]
# #array([[ 1, 3],
# [ 4, 6],
# [ 9, 11]])
I believe what you want can be done by calling new_data.reshape(837, -1). Here's a brief example:
arr = np.arange(8*6).reshape(8,6)
maskpiece = np.array([True, False]*3)
mask = np.broadcast_to(maskpiece, (8,6))
print('the original array\n%s\n' % arr)
print('the flat masked array\n%s\n' % arr[mask])
print('the masked array reshaped into 2D\n%s\n' % arr[mask].reshape(8, -1))
Output:
the original array
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]
[36 37 38 39 40 41]
[42 43 44 45 46 47]]
the flat masked array
[ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46]
the masked array reshaped into 2D
[[ 0 2 4]
[ 6 8 10]
[12 14 16]
[18 20 22]
[24 26 28]
[30 32 34]
[36 38 40]
[42 44 46]]
I have a numpy array of shape (6,5) and i am trying to index it with Boolean arrays. I slice the boolean array along the columns and then use that slice to index the original array, everything is fine, however as soon as i do the same thing along the rows i get the below error. Below is my code,
array([[73, 20, 49, 56, 64],
[18, 66, 64, 45, 67],
[27, 83, 71, 85, 61],
[78, 74, 38, 42, 17],
[26, 18, 71, 27, 29],
[41, 16, 17, 24, 75]])
bool = a > 50
bool
array([[ True, False, False, True, True],
[False, True, True, False, True],
[False, True, True, True, True],
[ True, True, False, False, False],
[False, False, True, False, False],
[False, False, False, False, True]], dtype=bool)
cols = bool[:,3] # returns values from 3rd column for every row
cols
array([ True, False, True, False, False, False], dtype=bool)
a[cols]
array([[73, 20, 49, 56, 64],
[27, 83, 71, 85, 61]])
rows = bool[3,] # returns 3rd row for every column
rows
array([ True, True, False, False, False], dtype=bool)
a[rows]
IndexError Traceback (most recent call last)
<ipython-input-24-5a0658ebcfdb> in <module>()
----> 1 a[rows]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 5
Since there are only 5 entries in rows,
In [18]: rows
Out[18]: array([ True, True, False, False, False], dtype=bool)
it can't index 6 rows in your array since the lengths don't match.
In [20]: arr.shape
Out[20]: (6, 5)
In [21]: rows.shape
Out[21]: (5,)
When you index into an array like arr[rows] it will be interpreted as you're indexing into axis 0 since rows is an 1D array. So, you have to use : for axis 0, and rows for axis 1 like:
# select all rows but only columns where rows is `True`
In [19]: arr[:, rows]
Out[19]:
array([[73, 20],
[18, 66],
[27, 83],
[78, 74],
[26, 18],
[41, 16]])
Also, please refrain from using bool as a variable name since it's a built-in keyword. This might cause unexpected behaviour, at a later point in your code.
I created an index based on several conditions
transition = np.where((rain>0) & (snow>0) & (graup>0) & (xlat<53.) & (xlat>49.) & (xlon<-114.) & (xlon>-127.)) #indexes the grids where there are transitions
with the shape of (3,259711) that looks like the following:
array([[ 0, 0, 0, ..., 47, 47, 47], #hour
[847, 847, 848, ..., 950, 950, 951], #lat gridpoint
[231, 237, 231, ..., 200, 201, 198]]) #lon gridpoint
I have several other variables (e.g. temp) with the shape of (48, 1015, 1359) corresponding to hour, lat, lon.
Seeing as the index are my valid gridpoints, how do I mask all the variables, like temp so that it retains the (48,1015,1359) shape, but masks the values outside the index.
In [90]: arr = np.arange(24).reshape(6,4)
In [91]: keep = (arr % 3)==1
In [92]: keep
Out[92]:
array([[False, True, False, False],
[ True, False, False, True],
[False, False, True, False],
[False, True, False, False],
[ True, False, False, True],
[False, False, True, False]], dtype=bool)
In [93]: np.where(keep)
Out[93]:
(array([0, 1, 1, 2, 3, 4, 4, 5], dtype=int32),
array([1, 0, 3, 2, 1, 0, 3, 2], dtype=int32))
Simple application of the keep mask gives a 1d array of the desired values. I could also index with the where tuple.
In [94]: arr[keep]
Out[94]: array([ 1, 4, 7, 10, 13, 16, 19, 22])
With keep, or rather it's boolean inverse, I can make a masked array:
In [95]: np.ma.masked_array(arr,mask=~keep)
Out[95]:
masked_array(data =
[[-- 1 -- --]
[4 -- -- 7]
[-- -- 10 --]
[-- 13 -- --]
[16 -- -- 19]
[-- -- 22 --]],
mask =
[[ True False True True]
[False True True False]
[ True True False True]
[ True False True True]
[False True True False]
[ True True False True]],
fill_value = 999999)
np.ma.masked_where(~keep, arr) does the same thing - just a different argument order. It still expects the boolean mask array.
I can do the same starting with the where tuple:
In [105]: idx = np.where(keep)
In [106]: mask = np.ones_like(arr, dtype=bool)
In [107]: mask[idx] = False
In [108]: np.ma.masked_array(arr, mask=mask)
There may be something in the np.ma class that does this with one call, but it will have to do the same sort of construction.
This also works:
x = np.ma.masked_all_like(arr)
x[idx] = arr[idx]
I want to apply mask on 2D numpy array. But it does not work correctly. Suppose I have
val(lat, lon) ---> my 2D array (20, 30)
Mask_lat = np.ma.masked_array(lat, mask=latmask) ---> masked lat (5,)
Mask_lon = np.ma.masked_array(lon, mask =lonmask) ---> masked lon (8,)
Maks_val = np.ma.masked_array(val, mask=mask_lat_lon) ---> ?
I do not know how can I pass a correct mask_lat_lon to have masked val (5,8). I would appreciate if one guides me.
Thank you in advance.
If I understand your question correctly, you have two 1D arrays that represent y and x (lat and long) positions in a 2D array. You want to mask a region based on the x/y position in the 2D array.
The key part to understand is that mask for a 2D array is also 2D.
For example, let's mask a single element of a 2D array:
import numpy as np
z = np.arange(20).reshape(5, 4)
mask = np.zeros(z.shape, dtype=bool)
mask[3, 2] = True
print z
print np.ma.masked_array(z, mask)
This yields:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 -- 15]
[16 17 18 19]]
In your case, you have two 1D x and y arrays that you need to create a 2D mask from. For example:
import numpy as np
x = np.linspace(-85, -78, 4)
y = np.linspace(32, 37, 5)
z = np.arange(20).reshape(5, 4)
xmask = (x > -82.6) & (x < -80)
ymask = (y > 33) & (y < 35.6)
print xmask
print ymask
We'd then need to combine them into a single 2D mask using broadcasting:
mask = xmask[np.newaxis, :] & ymask[:, np.newaxis]
Slicing with newaxis (or None, they're the same object) adds a new axis at that position, turning the 1D array into a 2D array. It you have seen this before, it's useful to take a quick look at what xmask[np.newaxis, :] and ymask[:, np.newaxis] look like:
In [14]: xmask
Out[14]: array([False, False, True, False], dtype=bool)
In [15]: ymask
Out[15]: array([False, True, True, False, False], dtype=bool)
In [16]: xmask[np.newaxis, :]
Out[16]: array([[False, False, True, False]], dtype=bool)
In [17]: ymask[:, np.newaxis]
Out[17]:
array([[False],
[ True],
[ True],
[False],
[False]], dtype=bool)
mask will then be (keep in mind that True elements are masked):
In [18]: xmask[np.newaxis, :] & ymask[:, np.newaxis]
Out[18]:
array([[False, False, False, False],
[False, False, True, False],
[False, False, True, False],
[False, False, False, False],
[False, False, False, False]], dtype=bool)
Finally, we can create a 2D masked array from z based on this mask:
arr = np.masked_array(z, mask)
Which gives us our final result:
[[ 0 1 2 3]
[ 4 5 -- 7]
[ 8 9 -- 11]
[12 13 14 15]
[16 17 18 19]]