Numpy - Indexing with Boolean array - python

I have a numpy array of shape (6,5) and i am trying to index it with Boolean arrays. I slice the boolean array along the columns and then use that slice to index the original array, everything is fine, however as soon as i do the same thing along the rows i get the below error. Below is my code,
array([[73, 20, 49, 56, 64],
[18, 66, 64, 45, 67],
[27, 83, 71, 85, 61],
[78, 74, 38, 42, 17],
[26, 18, 71, 27, 29],
[41, 16, 17, 24, 75]])
bool = a > 50
bool
array([[ True, False, False, True, True],
[False, True, True, False, True],
[False, True, True, True, True],
[ True, True, False, False, False],
[False, False, True, False, False],
[False, False, False, False, True]], dtype=bool)
cols = bool[:,3] # returns values from 3rd column for every row
cols
array([ True, False, True, False, False, False], dtype=bool)
a[cols]
array([[73, 20, 49, 56, 64],
[27, 83, 71, 85, 61]])
rows = bool[3,] # returns 3rd row for every column
rows
array([ True, True, False, False, False], dtype=bool)
a[rows]
IndexError Traceback (most recent call last)
<ipython-input-24-5a0658ebcfdb> in <module>()
----> 1 a[rows]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 5

Since there are only 5 entries in rows,
In [18]: rows
Out[18]: array([ True, True, False, False, False], dtype=bool)
it can't index 6 rows in your array since the lengths don't match.
In [20]: arr.shape
Out[20]: (6, 5)
In [21]: rows.shape
Out[21]: (5,)
When you index into an array like arr[rows] it will be interpreted as you're indexing into axis 0 since rows is an 1D array. So, you have to use : for axis 0, and rows for axis 1 like:
# select all rows but only columns where rows is `True`
In [19]: arr[:, rows]
Out[19]:
array([[73, 20],
[18, 66],
[27, 83],
[78, 74],
[26, 18],
[41, 16]])
Also, please refrain from using bool as a variable name since it's a built-in keyword. This might cause unexpected behaviour, at a later point in your code.

Related

Return a boolean array of values < 40

How can I get a boolean 1 dimentional output for values <40 from the below given array. Since there are three values <40 so the output should be: array([ True, True, True])
x = np.array([[40, 37, 70],[62, 61, 98],[65, 89, 22],[95, 98, 81],[44, 32, 79]])
You can do it like this:
import numpy as np
x = np.array([[40, 37, 70],[62, 61, 98],[65, 89, 22],[95, 98, 81],[44, 32, 79]])
x<40
Output:
array([[False, True, False],
[False, False, False],
[False, False, True],
[False, False, False],
[False, True, False]])
Or if you want a 1d result, you can use .flatten():
y = x.flatten()
y<40
Output:
array([False, True, False, False, False, False, False, False, True,
False, False, False, False, True, False])
If you want a 1d list like [True]*n where n is the number of values <40, you can do:
np.array([i for i in x.flatten()<40 if i])
Output:
array([True, True, True])
This could be solved in many ways, one could be:
x[x<40]<40

Potential bug in np.isnan() for mixed types on pandas Dataframe

I have run into a bug with np.isnan(). It may be that it is intended to work this way and the problem is how pandas handles it. If I make a dataframe with mixed types like
raw_data = {'Binary 1': [True, True, False, False, True],
'Binary 2': [False, False, True, True, False],
'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['Binary 1', 'Binary 2', 'age', 'preTestScore', 'postTestScore'])
df.dtypes
Binary 1 bool
Binary 2 bool
age int64
preTestScore int64
postTestScore int64
I can't call
np.isnan(df)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Both this
np.isnan(df[['Binary 1', 'Binary 2']])
and this
np.isnan(df[['age', 'preTestScore', 'postTestScore']])
work. I think it is because they are of the same type since this does not
np.isnan(df[['Binary 1', 'age']])
np.isnan is a numpy function, so it works with numpy arrays, and values derived from the input:
In [418]: df[['Binary 1', 'Binary 2']].values
Out[418]:
array([[ True, False],
[ True, False],
[False, True],
[False, True],
[ True, False]])
This is a 2d boolean dtype array. But the whole dataframe has mixed dtypes, so it produces an object dtype:
In [419]: df.values
Out[419]:
array([[True, False, 42, 4, 25],
[True, False, 52, 24, 94],
[False, True, 36, 31, 57],
[False, True, 24, 2, 62],
[True, False, 73, 3, 70]], dtype=object)
Casting that array to int (or float), runs ok: np.isnan(df.values.astype(int))
But as pointed out in the comments, pandas has its own nan tester, which I believe is even more powerful (and forgiving). The np.isnan is really intended for float arrays, since np.nan is a float.

Numpy filtering based on all row values

I'm trying to filter a 2D numpy array with another 2D numpy arrays values. Something like this:
array1 = np.array([[ 0, 0],
[86, 4],
[75, 74],
[78, 55],
[53, 94],
[49, 83],
[99, 75],
[99, 10],
[32, 4],
[55, 99],
[62, 95],
[ 0, 0]])
array2 = np.array([[55, 99],
[32, 4],
[75, 74]])
array1[np.isin(array1, array2[2:5]).all(axis=1) == 0]
My ideal output would be a filtered version of array1 that does not have the rows which are equal to the ones in the array2 slice.
Problem is when i do it like this:
np.isin(array1, array[2:5])
output is:
array([[False, False],
[False, True],
[ True, True],
[False, True],
[False, False],
[False, False],
[ True, True],
[ True, False],
[ True, True],
[ True, True],
[False, False],
[False, False]])
It wrongly classifies [99,75] row as [True, True] because both of those values individually exist in our array2.
Is there a more correct way to filter based on all values of a row?
Here's an inefficient but very explicit way to do this with np.all():
# for each row in array2, check full match with each row in array1
bools = [np.all(array1==row,axis=1) for row in array2]
# combine 3 boolean arrays with 'or' logic
mask = [any(tup) for tup in zip(*bools)]
# flip the mask
mask = ~np.array(mask)
# final index
out = array1[mask]

Selecting vector of 2D array elements from column index vector

I have a 2D array A:
28 39 52
77 80 66
7 18 24
9 97 68
And a vector array of column indexes B:
1
0
2
0
How, in a pythonian way, using base Python or Numpy, can I select the elements from A which DO NOT correspond to the column indexes in B?
I should get this 2D array which contains the elements of A, Not corresponding to the column indexes stored in B:
28 52
80 66
7 18
97 68
You can make use of broadcasting and a row-wise mask to select elements not contained in your array for each row:
Setup
B = np.array([1, 0, 2, 0])
cols = np.arange(A.shape[1])
Now use broadcasting to create a mask, and index your array.
mask = B[:, None] != cols
A[mask].reshape(-1, 2)
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
A spin off of my answer to your other question,
Replace 2D array elements with zeros, using a column index vector
We can make a boolean mask with the same indexing used before:
In [124]: mask = np.ones(A.shape, dtype=bool)
In [126]: mask[np.arange(4), B] = False
In [127]: mask
Out[127]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
Indexing an array with a boolean mask produces a 1d array, since in the most general case such a mask could select a different number of elements in each row.
In [128]: A[mask]
Out[128]: array([28, 52, 80, 66, 7, 18, 97, 68])
In this case the result can be reshaped back to 2d:
In [129]: A[mask].reshape(4,2)
Out[129]:
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
Since you allowed for 'base Python' here's list comprehension answer:
In [136]: [[y for i,y in enumerate(x) if i!=b] for b,x in zip(B,A)]
Out[136]: [[28, 52], [80, 66], [7, 18], [97, 68]]
If all the 0's in the other A come from the insertion, then we can also get the mask (Out[127]) with
In [142]: A!=0
Out[142]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])

Numpy masking 3D array using np.where() index

I created an index based on several conditions
transition = np.where((rain>0) & (snow>0) & (graup>0) & (xlat<53.) & (xlat>49.) & (xlon<-114.) & (xlon>-127.)) #indexes the grids where there are transitions
with the shape of (3,259711) that looks like the following:
array([[ 0, 0, 0, ..., 47, 47, 47], #hour
[847, 847, 848, ..., 950, 950, 951], #lat gridpoint
[231, 237, 231, ..., 200, 201, 198]]) #lon gridpoint
I have several other variables (e.g. temp) with the shape of (48, 1015, 1359) corresponding to hour, lat, lon.
Seeing as the index are my valid gridpoints, how do I mask all the variables, like temp so that it retains the (48,1015,1359) shape, but masks the values outside the index.
In [90]: arr = np.arange(24).reshape(6,4)
In [91]: keep = (arr % 3)==1
In [92]: keep
Out[92]:
array([[False, True, False, False],
[ True, False, False, True],
[False, False, True, False],
[False, True, False, False],
[ True, False, False, True],
[False, False, True, False]], dtype=bool)
In [93]: np.where(keep)
Out[93]:
(array([0, 1, 1, 2, 3, 4, 4, 5], dtype=int32),
array([1, 0, 3, 2, 1, 0, 3, 2], dtype=int32))
Simple application of the keep mask gives a 1d array of the desired values. I could also index with the where tuple.
In [94]: arr[keep]
Out[94]: array([ 1, 4, 7, 10, 13, 16, 19, 22])
With keep, or rather it's boolean inverse, I can make a masked array:
In [95]: np.ma.masked_array(arr,mask=~keep)
Out[95]:
masked_array(data =
[[-- 1 -- --]
[4 -- -- 7]
[-- -- 10 --]
[-- 13 -- --]
[16 -- -- 19]
[-- -- 22 --]],
mask =
[[ True False True True]
[False True True False]
[ True True False True]
[ True False True True]
[False True True False]
[ True True False True]],
fill_value = 999999)
np.ma.masked_where(~keep, arr) does the same thing - just a different argument order. It still expects the boolean mask array.
I can do the same starting with the where tuple:
In [105]: idx = np.where(keep)
In [106]: mask = np.ones_like(arr, dtype=bool)
In [107]: mask[idx] = False
In [108]: np.ma.masked_array(arr, mask=mask)
There may be something in the np.ma class that does this with one call, but it will have to do the same sort of construction.
This also works:
x = np.ma.masked_all_like(arr)
x[idx] = arr[idx]

Categories

Resources