Numpy - Indexing with Boolean array

Numpy - Indexing with Boolean array - python

I have a numpy array of shape (6,5) and i am trying to index it with Boolean arrays. I slice the boolean array along the columns and then use that slice to index the original array, everything is fine, however as soon as i do the same thing along the rows i get the below error. Below is my code,
array([[73, 20, 49, 56, 64],
[18, 66, 64, 45, 67],
[27, 83, 71, 85, 61],
[78, 74, 38, 42, 17],
[26, 18, 71, 27, 29],
[41, 16, 17, 24, 75]])
bool = a > 50
bool
array([[ True, False, False, True, True],
[False, True, True, False, True],
[False, True, True, True, True],
[ True, True, False, False, False],
[False, False, True, False, False],
[False, False, False, False, True]], dtype=bool)
cols = bool[:,3] # returns values from 3rd column for every row
cols
array([ True, False, True, False, False, False], dtype=bool)
a[cols]
array([[73, 20, 49, 56, 64],
[27, 83, 71, 85, 61]])
rows = bool[3,] # returns 3rd row for every column
rows
array([ True, True, False, False, False], dtype=bool)
a[rows]
IndexError Traceback (most recent call last)
<ipython-input-24-5a0658ebcfdb> in <module>()
----> 1 a[rows]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 5

Since there are only 5 entries in rows,
In [18]: rows
Out[18]: array([ True, True, False, False, False], dtype=bool)
it can't index 6 rows in your array since the lengths don't match.
In [20]: arr.shape
Out[20]: (6, 5)
In [21]: rows.shape
Out[21]: (5,)
When you index into an array like arr[rows] it will be interpreted as you're indexing into axis 0 since rows is an 1D array. So, you have to use : for axis 0, and rows for axis 1 like:
# select all rows but only columns where rows is `True`
In [19]: arr[:, rows]
Out[19]:
array([[73, 20],
[18, 66],
[27, 83],
[78, 74],
[26, 18],
[41, 16]])
Also, please refrain from using bool as a variable name since it's a built-in keyword. This might cause unexpected behaviour, at a later point in your code.

Related

Return a boolean array of values < 40

How can I get a boolean 1 dimentional output for values <40 from the below given array. Since there are three values <40 so the output should be: array([ True, True, True])
x = np.array([[40, 37, 70],[62, 61, 98],[65, 89, 22],[95, 98, 81],[44, 32, 79]])

You can do it like this:
import numpy as np
x = np.array([[40, 37, 70],[62, 61, 98],[65, 89, 22],[95, 98, 81],[44, 32, 79]])
x<40
Output:
array([[False, True, False],
[False, False, False],
[False, False, True],
[False, False, False],
[False, True, False]])
Or if you want a 1d result, you can use .flatten():
y = x.flatten()
y<40
Output:
array([False, True, False, False, False, False, False, False, True,
False, False, False, False, True, False])
If you want a 1d list like [True]*n where n is the number of values <40, you can do:
np.array([i for i in x.flatten()<40 if i])
Output:
array([True, True, True])

This could be solved in many ways, one could be:
x[x<40]<40

Potential bug in np.isnan() for mixed types on pandas Dataframe

I have run into a bug with np.isnan(). It may be that it is intended to work this way and the problem is how pandas handles it. If I make a dataframe with mixed types like
raw_data = {'Binary 1': [True, True, False, False, True],
'Binary 2': [False, False, True, True, False],
'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['Binary 1', 'Binary 2', 'age', 'preTestScore', 'postTestScore'])
df.dtypes
Binary 1 bool
Binary 2 bool
age int64
preTestScore int64
postTestScore int64
I can't call
np.isnan(df)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Both this
np.isnan(df[['Binary 1', 'Binary 2']])
and this
np.isnan(df[['age', 'preTestScore', 'postTestScore']])
work. I think it is because they are of the same type since this does not
np.isnan(df[['Binary 1', 'age']])

np.isnan is a numpy function, so it works with numpy arrays, and values derived from the input:
In [418]: df[['Binary 1', 'Binary 2']].values
Out[418]:
array([[ True, False],
[ True, False],
[False, True],
[False, True],
[ True, False]])
This is a 2d boolean dtype array. But the whole dataframe has mixed dtypes, so it produces an object dtype:
In [419]: df.values
Out[419]:
array([[True, False, 42, 4, 25],
[True, False, 52, 24, 94],
[False, True, 36, 31, 57],
[False, True, 24, 2, 62],
[True, False, 73, 3, 70]], dtype=object)
Casting that array to int (or float), runs ok: np.isnan(df.values.astype(int))
But as pointed out in the comments, pandas has its own nan tester, which I believe is even more powerful (and forgiving). The np.isnan is really intended for float arrays, since np.nan is a float.

Numpy filtering based on all row values

I'm trying to filter a 2D numpy array with another 2D numpy arrays values. Something like this:
array1 = np.array([[ 0, 0],
[86, 4],
[75, 74],
[78, 55],
[53, 94],
[49, 83],
[99, 75],
[99, 10],
[32, 4],
[55, 99],
[62, 95],
[ 0, 0]])
array2 = np.array([[55, 99],
[32, 4],
[75, 74]])
array1[np.isin(array1, array2[2:5]).all(axis=1) == 0]
My ideal output would be a filtered version of array1 that does not have the rows which are equal to the ones in the array2 slice.
Problem is when i do it like this:
np.isin(array1, array[2:5])
output is:
array([[False, False],
[False, True],
[ True, True],
[False, True],
[False, False],
[False, False],
[ True, True],
[ True, False],
[ True, True],
[ True, True],
[False, False],
[False, False]])
It wrongly classifies [99,75] row as [True, True] because both of those values individually exist in our array2.
Is there a more correct way to filter based on all values of a row?

Here's an inefficient but very explicit way to do this with np.all():
# for each row in array2, check full match with each row in array1
bools = [np.all(array1==row,axis=1) for row in array2]
# combine 3 boolean arrays with 'or' logic
mask = [any(tup) for tup in zip(*bools)]
# flip the mask
mask = ~np.array(mask)
# final index
out = array1[mask]

Selecting vector of 2D array elements from column index vector

I have a 2D array A:
28 39 52
77 80 66
7 18 24
9 97 68
And a vector array of column indexes B:
1
0
2
0
How, in a pythonian way, using base Python or Numpy, can I select the elements from A which DO NOT correspond to the column indexes in B?
I should get this 2D array which contains the elements of A, Not corresponding to the column indexes stored in B:
28 52
80 66
7 18
97 68

You can make use of broadcasting and a row-wise mask to select elements not contained in your array for each row:
Setup
B = np.array([1, 0, 2, 0])
cols = np.arange(A.shape[1])
Now use broadcasting to create a mask, and index your array.
mask = B[:, None] != cols
A[mask].reshape(-1, 2)
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])

A spin off of my answer to your other question,
Replace 2D array elements with zeros, using a column index vector
We can make a boolean mask with the same indexing used before:
In [124]: mask = np.ones(A.shape, dtype=bool)
In [126]: mask[np.arange(4), B] = False
In [127]: mask
Out[127]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])
Indexing an array with a boolean mask produces a 1d array, since in the most general case such a mask could select a different number of elements in each row.
In [128]: A[mask]
Out[128]: array([28, 52, 80, 66, 7, 18, 97, 68])
In this case the result can be reshaped back to 2d:
In [129]: A[mask].reshape(4,2)
Out[129]:
array([[28, 52],
[80, 66],
[ 7, 18],
[97, 68]])
Since you allowed for 'base Python' here's list comprehension answer:
In [136]: [[y for i,y in enumerate(x) if i!=b] for b,x in zip(B,A)]
Out[136]: [[28, 52], [80, 66], [7, 18], [97, 68]]
If all the 0's in the other A come from the insertion, then we can also get the mask (Out[127]) with
In [142]: A!=0
Out[142]:
array([[ True, False, True],
[False, True, True],
[ True, True, False],
[False, True, True]])

Numpy masking 3D array using np.where() index

I created an index based on several conditions
transition = np.where((rain>0) & (snow>0) & (graup>0) & (xlat<53.) & (xlat>49.) & (xlon<-114.) & (xlon>-127.)) #indexes the grids where there are transitions
with the shape of (3,259711) that looks like the following:
array([[ 0, 0, 0, ..., 47, 47, 47], #hour
[847, 847, 848, ..., 950, 950, 951], #lat gridpoint
[231, 237, 231, ..., 200, 201, 198]]) #lon gridpoint
I have several other variables (e.g. temp) with the shape of (48, 1015, 1359) corresponding to hour, lat, lon.
Seeing as the index are my valid gridpoints, how do I mask all the variables, like temp so that it retains the (48,1015,1359) shape, but masks the values outside the index.

In [90]: arr = np.arange(24).reshape(6,4)
In [91]: keep = (arr % 3)==1
In [92]: keep
Out[92]:
array([[False, True, False, False],
[ True, False, False, True],
[False, False, True, False],
[False, True, False, False],
[ True, False, False, True],
[False, False, True, False]], dtype=bool)
In [93]: np.where(keep)
Out[93]:
(array([0, 1, 1, 2, 3, 4, 4, 5], dtype=int32),
array([1, 0, 3, 2, 1, 0, 3, 2], dtype=int32))
Simple application of the keep mask gives a 1d array of the desired values. I could also index with the where tuple.
In [94]: arr[keep]
Out[94]: array([ 1, 4, 7, 10, 13, 16, 19, 22])
With keep, or rather it's boolean inverse, I can make a masked array:
In [95]: np.ma.masked_array(arr,mask=~keep)
Out[95]:
masked_array(data =
[[-- 1 -- --]
[4 -- -- 7]
[-- -- 10 --]
[-- 13 -- --]
[16 -- -- 19]
[-- -- 22 --]],
mask =
[[ True False True True]
[False True True False]
[ True True False True]
[ True False True True]
[False True True False]
[ True True False True]],
fill_value = 999999)
np.ma.masked_where(~keep, arr) does the same thing - just a different argument order. It still expects the boolean mask array.
I can do the same starting with the where tuple:
In [105]: idx = np.where(keep)
In [106]: mask = np.ones_like(arr, dtype=bool)
In [107]: mask[idx] = False
In [108]: np.ma.masked_array(arr, mask=mask)
There may be something in the np.ma class that does this with one call, but it will have to do the same sort of construction.
This also works:
x = np.ma.masked_all_like(arr)
x[idx] = arr[idx]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy - Indexing with Boolean array - python

Related

Return a boolean array of values < 40

Potential bug in np.isnan() for mixed types on pandas Dataframe

Numpy filtering based on all row values

Selecting vector of 2D array elements from column index vector

Numpy masking 3D array using np.where() index

Categories

Resources