Python numpy index set from Boolean array

Python numpy index set from Boolean array - python

How do I transform a Boolean array into an iterable of indexes?
E.g.,
import numpy as np
import itertools as it
x = np.array([1,0,1,1,0,0])
y = x > 0
retval = [i for i, y_i in enumerate(y) if y_i]
Is there a nicer way?

Try np.where or np.nonzero.
x = np.array([1, 0, 1, 1, 0, 0])
np.where(x)[0] # returns a tuple hence the [0], see help(np.where)
# array([0, 2, 3])
x.nonzero()[0] # in this case, the same as above.
See help(np.where) and help(np.nonzero).
Possibly worth noting that in the np.where page it mentions that for 1D x it's basically equivalent to your longform in the question.

Related

Finding an index numpy python

Consider a NumPy array of shape (8, 8).
My Question: What is the index (x,y) of the 50th element?
Note: For counting the elements go row-wise.
Example, in array A, where A = [[1, 5, 9], [3, 0, 2]] the 5th element would be '0'.
Can someone explain how to find the general solution for this and, what would be the solution for this specific problem?

You can use unravel_index to find the coordinates corresponding to the index of the flattened array. Usually np.arrays start with index 0, you have to adjust for this.
import numpy as np
a = np.arange(64).reshape(8,8)
np.unravel_index(50-1, a.shape)
Out:
(6, 1)

In a NumPy array a of shape (r, c) (just like a list of lists), the n-th element is
a[(n-1) // c][(n-1) % c],
assuming that n starts from 1 as in your example.
It has nothing to do with r. Thus, when r = c = 8 and n = 50, the above formula is exactly
a[6][1].
Let me show more using your example:
from numpy import *
a = array([[1, 5, 9], [3, 0, 2]])
r = len(a)
c = len(a[0])
print(f'(r, c) = ({r}, {c})')
print(f'Shape: {a.shape}')
for n in range(1, r * c + 1):
print(f'Element {n}: {a[(n-1) // c][(n-1) % c]}')
Below is the result:
(r, c) = (2, 3)
Shape: (2, 3)
Element 1: 1
Element 2: 5
Element 3: 9
Element 4: 3
Element 5: 0
Element 6: 2

numpy.ndarray.faltten(a) returns a copy of the array a collapsed into one dimension. And please note that the counting starts from 0, therefore, in your example 0 is the 4th element and 1 is the 0th.
import numpy as np
arr = np.array([[1, 5, 9], [3, 0, 2]])
fourth_element = np.ndarray.flatten(arr)[4]
or
fourth_element = arr.flatten()[4]
the same for 8x8 matrix.

First need to create a 88 order 2d numpy array using np.array and range.Reshape created array as 88
In the output you check index of 50th element is [6,1]
import numpy as np
arr = np.array(range(1,(8*8)+1)).reshape(8,8)
print(arr[6,1])
output will be 50
or you can do it in generic way as well by the help of numpy where method.
import numpy as np
def getElementIndex(array: np.array, element):
elementIndex = np.where(array==element)
return f'[{elementIndex[0][0]},{elementIndex[1][0]}]'
def getXYOrderNumberArray(x:int, y:int):
return np.array(range(1,(x*y)+1)).reshape(x,y)
arr = getXYOrderNumberArray(8,8)
print(getElementIndex(arr,50))

What is a best way to intersect multiple arrays with numpy array?

Suppose I have an example of numpy array:
import numpy as np
X = np.array([2,5,0,4,3,1])
And I also have a list of arrays, like:
A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]
I want to leave only these items of each list that are also in X. I expect also to do it in a most efficient/common way.
Solution I have tried so far:
Sort X using X.sort().
Find locations of items of each array in X using:
locations = [np.searchsorted(X, n) for n in A]
Leave only proper ones:
masks = [X[locations[i]] == A[i] for i in range(len(A))]
result = [A[i][masks[i]] for i in range(len(A))]
But it doesn't work because locations of third array is out of bounds:
locations = [array([0, 0, 2], dtype=int64), array([0, 1, 2, 3, 4, 5], dtype=int64), array([2, 5, 4, 6], dtype=int64)]
How to solve this issue?
Update
I ended up with idx[idx==len(Xs)] = 0 solution. I've also noticed two different approaches posted between the answers: transforming X into set vs np.sort. Both of them has plusses and minuses: set operations uses iterations which is quite slow in compare with numpy methods; however np.searchsorted speed increases logarithmically unlike acceses of set items which is instant. That why I decided to compare performance using data with huge sizes, especially 1 million items for X, A[0], A[1], A[2].

One idea would be less compute and minimal work when looping. So, here's one with those in mind -
a = np.concatenate(A)
m = np.isin(a,X)
l = np.array(list(map(len,A)))
a_m = a[m]
cut_idx = np.r_[0,l.cumsum()]
l_m = np.add.reduceat(m,cut_idx[:-1])
cl_m = np.r_[0,l_m.cumsum()]
out = [a_m[i:j] for (i,j) in zip(cl_m[:-1],cl_m[1:])]
Alternative #1 :
We can also use np.searchsorted to get the isin mask, like so -
Xs = np.sort(X)
idx = np.searchsorted(Xs,a)
idx[idx==len(Xs)] = 0
m = Xs[idx]==a
Another way with np.intersect1d
If you are looking for the most common/elegant one, think it would be with np.intersect1d -
In [43]: [np.intersect1d(X,A_i) for A_i in A]
Out[43]: [array([0, 2]), array([0, 1, 2, 3, 4, 5]), array([2, 4, 5])]
Solving your issue
You can also solve your out-of-bounds issue, with a simple fix -
for l in locations:
l[l==len(X)]=0

How about this, very simple and efficent:
import numpy as np
X = np.array([2,5,0,4,3,1])
A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]
X_set = set(X)
A = [np.array([a for a in arr if a in X_set]) for arr in A]
#[array([0, 2]), array([0, 1, 2, 3, 4, 5]), array([2, 5, 4])]
According to the docs, set operations all have O(1) complexity, therefore the overall is O(N)

Numpy array multiple mask

Trying to slice and average a numpy array multiple times, based on an integer mask array:
i.e.
import numpy as np
data = np.arange(11)
mask = np.array([0, 1, 1, 1, 0, 2, 2, 3, 3, 3, 3])
results = list()
for maskid in range(1,4):
result = np.average(data[mask==maskid])
results.append(result)
output = np.array(result)
Is there a way to do this faster, aka without the "for" loop?

One approach using np.bincount -
np.bincount(mask, data)/np.bincount(mask)
Another one with np.unique for a generic case when the elements in mask aren't necessarily sequential starting from 0 -
_,ids, count = np.unique(mask, return_inverse=1, return_counts=1)
out = np.bincount(ids, data)/count

Comparing two numpy arrays and removing elements

I have been going through several solutions, but I am not able to find a solution I need.
I have two numpy arrays. Let's take a small example here.
x = [1,2,3,4,5,6,7,8,9]
y = [3,4,5]
I want to compare x and y, and remove those values of x that are in y.
So I expect my final_x to be
final_x = [1,2,6,7,8,9]
I found out that np.in1d returns a boolean array the same length as x that is True where an element of x is in y and False otherwise. But how do I use it, if not any other method to get my final_x.??

If you really do have numpy arrays then you can use numpy.setdiff1d as below
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9])
y = np.array([3,4,5])
z = np.setdiff1d(x, y)
# array([1, 2, 6, 7, 8, 9])

Simply pass the negated version of boolean array returned by np.in1d to array x:
>>> x = np.array([1,2,3,4,5,6,7,8,9])
>>> y = [3,4,5]
>>> x[~np.in1d(x, y)]
array([1, 2, 6, 7, 8, 9])

You can use built-in sets:
final_x = set(x) - set(y)
and subtract the second from the first. You can convert final_x to a list or numpy.array if you feel so inclined.

first order differences along a given axis in NumPy array

#compute first differences of 1d array
from numpy import *
x = arange(10)
y = zeros(len(x))
for i in range(1,len(x)):
y[i] = x[i] - x[i-1]
print y
The above code works but there must be at least one easy, pythonesque way to do this without having to use a for loop. Any suggestions?

What about:
diff(x)
# array([1, 1, 1, 1, 1, 1, 1, 1, 1])

Yes, this exactly the kind of loop numpy elementwise operations is designed for. You just need to learn to take the right slices of the arrays.
x = numpy.arange(10)
y = numpy.zeros(x.shape)
y[1:] = x[1:] - x[:-1]
print y

several NumPy builtins will do the job--in particular, diff, ediff1d, and gradient.
i suspect ediff1d is the better choice for the specific cast described in the OP--unlike the other two, ediff1d is acdtually directed/limited to this specific use case--ie, first-order differences along a single axis (or axis of a 1D array).
>>> import numpy as NP
>>> x = NP.random.randint(1, 10, 10)
>>> x
array([4, 6, 6, 8, 1, 2, 1, 1, 5, 4])
>>> NP.ediff1d(x)
array([ 2, 0, 2, -7, 1, -1, 0, 4, -1])

Here's a pattern I used a lot for a while:
from itertools import izip
d = [a-b for a,b in izip(x[1:],x[:-1])]

y = [item - x[i - 1] for i, item in enumerate(x[1:])]
If you need to access the index of an item while looping over it, enumerate() is the Pythonic way. Also, a list comprehension is, in this case, more readable.
Moreover, you should never use wild imports (from numpy import *). It will always import more than you need and leads to unnecessary ambiguity. Rather, just import numpy or import what you need, e.g.
from numpy import arange, zeros

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python numpy index set from Boolean array - python

How do I transform a Boolean array into an iterable of indexes? E.g., import numpy as np import itertools as it x = np.array([1,0,1,1,0,0]) y = x > 0 retval = [i for i, y_i in enumerate(y) if y_i] Is there a nicer way?

Related

Finding an index numpy python

What is a best way to intersect multiple arrays with numpy array?

Numpy array multiple mask

Comparing two numpy arrays and removing elements

first order differences along a given axis in NumPy array

Categories

Resources