Filter arrays in Numpy

Filter arrays in Numpy - python

I have an array: [[True], [False], [True]]. If I would want this array to filter my existing array, e.g [[1,2],[3,4],[5,6]] should get filtered to [[1,2],[5,6]], what is the correct way to do this?
A simple a[b] indexing gives the error: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1

The solution is to get the array [[True], [False], [True]] into shape [True, False, True], so that it works for indexing the rows of the other array. As Divakar said, ravel does this; in general it flattens any array to a 1D array. Another option is squeeze which removes the dimensions with size 1 but leaves the other dimensions as they were,

Use .ravel...
From the documentation, ravel will:
Return a contiguous flattened array.
So if we have your b array:
b = np.array([[True], [False], [True]])
we can take the boolean values out of their sub-arrays with:
b.ravel()
which gives:
array([ True, False, True], dtype=bool)
So then, we can simply use b.ravel() as a mask for a and it will work as you want:
a = np.array([[1,2], [3,4], [5,6]])
b = np.array([[True], [False], [True]])
c = a[b.ravel()]
which gives c as:
array([[1, 2],
[5, 6]])

Related

How to filter with numpy on 2D array using np.where

I read the numpy doc and np.where takes 1 argument to return row indices when the condition is matching..
numpy.where(condition, [x, y, ]/)
In the context of multi dimensional array I want to find and replace when the condition is matching
this is doable with some other params from the doc [x, y, ] are replacement values
Here is my data structure :
my_2d_array = np.array([[1,2],[3,4]])
Here is how I filter a column with python my_2d_array[:,1]
Here is how I filter find/replace with numpy :
indices = np.where( my_2d_array[:,1] == 4, my_2d_array[:,1] , my_2d_array[:,1] )
(when the second column value match 4 invert the value in column two with column one)
So its hard for me to understand why the same syntax my_2d_array[:,1] is used to filter a whole column in python and to designate a single row of my 2D array for numpy where the condition is matched

Your array:
In [9]: arr = np.array([[1,2],[3,4]])
In [10]: arr
Out[10]:
array([[1, 2],
[3, 4]])
Testing for some value:
In [11]: arr==4
Out[11]:
array([[False, False],
[False, True]])
testing one column:
In [12]: arr[:,1]
Out[12]: array([2, 4])
In [13]: arr[:,1]==4
Out[13]: array([False, True])
As documented, np.where with just one argument is just a call to nonzero, which finds the index(s) for the True values:
So for the 2d array in [11] we get two arrays:
In [15]: np.nonzero(arr==4)
Out[15]: (array([1], dtype=int64), array([1], dtype=int64))
and for the 1d boolean in [13], one array:
In [16]: np.nonzero(arr[:,1]==4)
Out[16]: (array([1], dtype=int64),)
That array can be used to select a row from arr:
In [17]: arr[_,1]
Out[17]: array([[4]])
If used in the three argument where, it selects elements between the 2nd and 3rd arguments. For example, using arguments that have nothing to do with arr:
In [18]: np.where(arr[:,1]==4, ['a','b'],['c','d'])
Out[18]: array(['c', 'b'], dtype='<U1')
The selection gets more complicated if the arguments differ in shape; then the rules of broadcasting apply.
So the basic point with np.where is that all 3 arguments are first evaluated, and passed (in true python function fashion) to the where function. It then selects elements based on the cond, returning a new array.
That where is functionally the same as this list comprehension (or an equivalent for loop):
In [19]: [i if cond else j for cond,i,j in zip(arr[:,1]==4, ['a','b'],['c','d'])]
Out[19]: ['c', 'b']

How do I check if a numpy tuple is in a 2D numpy array of tuples?

I would like to check if a numpy tuple is present in a numpy array of tuples.
When I run the following code:
import numpy as np
myarray=np.array([[0,1],[0,2],[0,3],[4,4]])
test1=np.array([0,3])
test2=np.array([4,0])
myarraylst=myarray.tolist()
test1lst=test1.tolist()
test2lst=test2.tolist()
print(test1lst in myarraylst)
print(test2lst in myarraylst)
I get "True" for the first test and "False" for the second test as it should be.
Is there a way to do this without converting the numpy arrays to python lists ?
Many Thanks !

For lists, the in tests for the identity/equality of the sublists. For arrays in, or np.isin the evaluation goes all the down, to the numeric elements.
In [181]: myarray = np.array([[0, 1], [0, 2], [0, 3], [4, 4]])
...: test1 = np.array([0, 3])
...: test2 = np.array([4, 0])
But we can do an elementwise test:
In [183]: test1 == myarray
Out[183]:
array([[ True, False],
[ True, False],
[ True, True],
[False, False]])
Here one array is (4,2) and the other (2,) shape, which broadcast together just fine. For other cases we may need to tweak dimensions.
You just want the rows where both elements match:
In [184]: (test1 == myarray).all(axis=1)
Out[184]: array([False, False, True, False])
In [185]: (test2 == myarray).all(axis=1)
Out[185]: array([False, False, False, False])
and reduce those arrays to one value with any:
In [187]: (test1 == myarray).all(axis=1).any()
Out[187]: True
In [188]: (test2 == myarray).all(axis=1).any()
Out[188]: False

finding the datatypes inside a numpy array

For my requirement, I need to find the locations of all the instances between 2 numpy arrays that have different data types
array 1 can be so : numpy.array(['1',3, 9, None])
array 2 can be so : numpy.array([5,4,3,2])
if they all were of the same type then I can do array 1 - array 2 diff to get the numerical differences. This won't be possible in the above scenario. So, as part of my data quality check, I would like to explicitly flag the indexes of array 1 that are of a different type than array 2. What would be the most pythonic way to do so?

A testing function:
def foo(a,b):
try:
a-b
return True
except TypeError:
return False
the two sample arrays:
In [22]: array1 = np.array(['1',3, 9, None])
...: array2 = np.array([5,4,3,2])
test the 2 arrays - ones that work:
In [23]: [i for i,(a,b) in enumerate(zip(array1,array2)) if foo(a,b)]
Out[23]: [1, 2]
ones that don't:
In [24]: [i for i,(a,b) in enumerate(zip(array1,array2)) if not foo(a,b)]
Out[24]: [0, 3]
another way to use foo, getting a boolean array:
In [26]: f = np.frompyfunc(foo, 2, 1)
In [27]: f(array1,array2)
Out[27]: array([False, True, True, False], dtype=object)
actually it is still object dtype, but the values are boolean.
In [28]: f(array1,array2).astype(bool)
Out[28]: array([False, True, True, False])
and the problem items in array1:
In [29]: array1[~_]
Out[29]: array(['1', None], dtype=object)
The test function could be more elaborate.

Python Fancy Indexing Assignments: cannot assign 3 input values to the 6 output values where the mask is true

I am trying to make a zeroed array with the same shape of a source array. Then modify every value in the second array that corresponds to a specific value in the first array.
This would be simple enough if I was just replacing one value. Here is a toy example:
import numpy as np
arr1 = np.array([[1,2,3],[3,4,5],[1,2,3]])
arr2 = np.array([[0,0,0],[0,0,0],[0,0,0]])
arr2[arr1==1] = -1
This will work as expected and arr2 would be:
[[-1,0,0],
[ 0,0,0],
[-1,0,0]]
But I would like to replace an entire row. Something like this replacing the last line of the sample code above:
arr2[arr1==[3,4,5]] = [-1,-1,-1]
When I do this, it also works as expected and arr2 would be:
[[ 0, 0, 0],
[-1,-1,-1],
[ 0, 0, 0]]
But when I tried to replace the last line of sample code with something like:
arr2[arr1==[1,2,3]] = [-1,-1,-1]
I expected to get something like the last output, but with the 0th and 2nd rows being changed. But instead I got the following error.
ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 6
output values where the mask is true
I assume this is because, unlike the other example, it was going to have to replace more than one row. Though this seems odd to me, since it worked fine replacing more than one value in the simple single value example.
I'm just wondering if anyone can explain this behavior to me, because it is a little confusing. I am not that experienced with the inner workings of numpy operations. Also, if anyone has an any recommendations to do what I am trying to accomplish in an efficient manner.
In my real world implementation, I am working with a very large three dimensional array (an image with 3 color channels) and I want to make an new array that stores a specific value into these three color channels if the source image has a specific three color values in that corresponding pixel (and remain [0,0,0] if it doesn't match our pixel_rgb_of_interest). I could go through in linear time and just check every single pixel, but this could get kind of slow if there are a lot of images, and was just wondering if there was a better way.
Thank you!

This would be a good application for numpy.where
>>> import numpy as np
>>> arr1 = np.array([[1,2,3],[3,4,5],[1,2,3]])
>>> arr2 = np.array([[0,0,0],[0,0,0],[0,0,0]])
>>> np.where(arr1 == [1,2,3], [-1,-1,-1], arr1)
array([[-1, -1, -1],
[ 3, 4, 5],
[-1, -1, -1]])
This basically works as "wherever the condition is true, use the x argument, then use the y argument the rest of the time"

Lets add an "index" array:
In [56]: arr1 = np.array([[1,2,3],[3,4,5],[1,2,3]])
...: arr2 = np.array([[0,0,0],[0,0,0],[0,0,0]])
...: arr3 = np.arange(9).reshape(3,3)
The test against 1 value:
In [57]: arr1==1
Out[57]:
array([[ True, False, False],
[False, False, False],
[ True, False, False]])
that has 2 true values:
In [58]: arr3[arr1==1]
Out[58]: array([0, 6])
We could assign one value as you do, or 2.
Test with a list, which is converted to array first:
In [59]: arr1==[3,4,5]
Out[59]:
array([[False, False, False],
[ True, True, True],
[False, False, False]])
That has 3 True:
In [60]: arr3[arr1==[3,4,5]]
Out[60]: array([3, 4, 5])
so it works to assign a list of 3 values as you do. Or a scalar.
In [61]: arr1==[1,2,3]
Out[61]:
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
Here the test has 6 True.
In [62]: arr3[arr1==[1,2,3]]
Out[62]: array([0, 1, 2, 6, 7, 8])
So we can assign 6 values or a scalar. But you tried to assign 3 values.
Or we could apply all to find the rows that match [1,2,3]:
In [63]: np.all(arr1==[1,2,3], axis=1)
Out[63]: array([ True, False, True])
In [64]: arr3[np.all(arr1==[1,2,3], axis=1)]
Out[64]:
array([[0, 1, 2],
[6, 7, 8]])
To this we could assign a (2,3) array, a scalar, a (3,) array, or a (2,1) (as per broadcasting rules):
In [65]: arr2[np.all(arr1==[1,2,3], axis=1)]=np.array([100,200])[:,None]
In [66]: arr2
Out[66]:
array([[100, 100, 100],
[ 0, 0, 0],
[200, 200, 200]])

All boolean combinations from 2 numpy arrays

Is there an existing function in numpy that takes 2 numpy arrays (x,y) and returns a boolean matrix for each i,j (x[i]>y[j])
For example, let x = [3, 4 ,5] and y = [1, 2, 3] and I want
res = [ [True, True, False],
[True, True, True],
[True, True, True] ]

You don't need a function here, just array broadcasting can work if you shape your arrays properly. I think you want this approach, which makes x a column vector and y a row vector:
x = np.array([3,4,5])
y = np.array([1,2,3])
res = x[:,None] > y[None,:]

Using numpy, you can cast your x and y list to arrays like so:x = np.array([3,4,5]) y=np.array([1,2,3]) and then numpy does elementwise comparisons by simply doing: print(x > y)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Filter arrays in Numpy - python

Related

How to filter with numpy on 2D array using np.where

How do I check if a numpy tuple is in a 2D numpy array of tuples?

finding the datatypes inside a numpy array

Python Fancy Indexing Assignments: cannot assign 3 input values to the 6 output values where the mask is true

All boolean combinations from 2 numpy arrays

Categories

Resources