Python, The fastest way to find numbers indexes in np. array - python

The fastest way to find numbers indexes in np. array in Python is?
Suppose we have a list of numbers from 0 to 20, and we want to know the indexes of digits 2 and 5

The canonical way would be to use numpy's where method:
a = np.array(range(20))
np.where((a == 2) | (a == 5))
Note that in order to combine the two terms (a == 2) and (a == 5) we need the bitwise or operator |. The reason is that both (a == 2) and (a == 5) return a numpy array of dtype('bool'):
>>> a == 2
array([False, False, True, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
>>> (a == 5)
array([False, False, False, False, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
>>> (a == 2) | (a==5)
array([False, False, True, False, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])

Related

Finding a break in continuous indices in a numpy array

I have a conditional statement for an array A(assume it is A>10) and I get the following boolean result.
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, False, False, False, False,
False, False, False, False, False, False, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
Now I am finding the indices where the values are True. I get the following array.
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90])
What I need to do is to find the start index and end index of continuous indices. For example, in the above array the start index of the first of the continuous indices is 20 and the end index is 49. Similarly, the start index of the second set of continuous indices is 60 and the end index is 90.
So to summarize, my output should be :
start_indices = array([20,60])
end_indices = array([49,90])
How to do this?
Here is a solution using groupby and accumulate from itertools:
from itertools import groupby, accumulate
## input array
#a = array([False, False, ..., True, ..., False])
indices = list(accumulate(len(list(g)) for i,g in groupby(a)))
starts = indices[:len(indices)//2*2:2]
stops = [i-1 for i in indices[1::2]]
NB. it works with any iterable, not only numpy arrays.
output:
>>> starts
[20, 60]
>>> stops
[49, 90]
import numpy as np
# With A as the original array
changes = np.where(np.diff(A > 10))[0] # Gets the actual array out of a tuple
start = changes[::2] + 1
end = changes[1::2]

How to generate these lists of True and False?

I am looping through a list of 3 items, something like:
for i in range(3):
and trying to produce the following lists on each respective iteration:
[True, True, False, False, False, False]
[False, False, True, True, False, False]
[False, False, False, False, True, True]
What would be a good way in python to do this?
Here's one way:
>>> for i in range(3):
... print([(x // 2) == i for x in range(6)])
...
[True, True, False, False, False, False]
[False, False, True, True, False, False]
[False, False, False, False, True, True]
Try like this:
k = 0
for i in range(3):
# Other tasks
myList = [False for x in range(4)]
myList[k:k] = [True,True]
print(myList)
k += 2
L = [False, False, False, False, True, True]
for _ in range(3):
L = L[-2:] + L[:4]
print(L)

Converting numpy array to pure python integer to avoid integer overflow

I have asked this question before and being downvoted heavily. Anyway judging by the fact that noone really sees a triple downvoted question again I repost it to make clear that I am interested in the actual answer (if there is one).
Problem statement:
I am in a situation I need the arbitrary precision feature of pure python integers. At some point in my code I have a numpy array with boolean. Something like:
arr
array([ True, False, False, False, True, True, True, False, True,
True, False, False, True, True, True, False, True, False,
False, True, False, True, True, True, True, True, False,
True, False, True, True, False, True, True, False, True,
False, False, True, False, True, True, False, True, False,
True, True, False, True, True, True, False, False, False,
True, False, False, True, True, True, True, False, True,
False])
which I convert it to numpy.int64 using arr.astype(int) to make it arithmetic.
But I used this code to convert it to an integer it overflowed (and produced negative numbers I don't want to).
Code is using this function (which is pure python and wont have any integer overflow issue by itself):
def bool2int(x):
y = 0
for i,j in enumerate(x):
y += j<<i
return y
If I run the code directly on np.array (converted to int or not does not matter):
bool2int(arr)
-2393826705255337647
bool2int(h.astype(int))
-2393826705255337647
will I need a positive integer. So, I used a list comprehension:
bool2int([int(x) for x in arr])
16052917368454213969
Obviously, the number represented by arr exceeds the capacity of fixed precision integers (i.e. 263-1) to be able to use ti directly.
Is there any other direct way to achieve beyond list comprehension?
Edit:
For the theory of integer overflow in python I sued this source.
Using astype(int) seems to be working fine; the following code:
import numpy as np
test = np.array([True, False, False, False, True, True, True, False, True, True, False, False, True, True, True, False, True, False, False, True, False, True, True, True, True, True, False, True, False, True, True, False, True, True, False, True, False, False, True, False, True, True, False, True, False, True, True, False, True, True, True, False, False, False, True, False, False, True, True, True, True, False, True, False])
test_int = test.astype(int)
print(test_int)
print(test_int.sum())
Returns:
[1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 0 1 1 0
1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0]
37
The overflow exception you are getting seems unlikely here so I would look again into that because maybe you had an error somewhere else.
Edit
If you want to get a Python type instead of a numpy object just do:
test.astype(int).tolist()
One way of getting native Python type elements is .tolist(). Note that we can do this directly on the boolean array. Your code works fine with native Python bools.
>>> x = np.random.randint(0, 2, (100,)).astype(bool)
>>> x
array([ True, True, False, True, False, True, False, False, True,
False, False, True, True, False, False, False, True, False,
False, True, False, True, False, False, True, True, True,
True, True, True, True, False, False, False, False, False,
True, True, True, True, False, False, True, False, False,
False, False, True, False, True, True, False, False, True,
False, True, True, True, False, True, True, True, False,
True, True, True, True, False, True, True, True, False,
True, False, True, False, True, False, True, True, True,
False, False, True, True, True, True, True, False, False,
True, False, False, False, True, True, True, False, False, True], dtype=bool)
>>> bool2int(x)
-4925102932063228254
>>> bool2int(x.tolist())
774014555155191751582008547627L
As an added bonus it's actually faster.
>>> timeit(lambda:bool2int(x), number=1000)
0.24346303939819336
>>> timeit(lambda:bool2int(x.tolist()), number=1000)
0.010725975036621094

Choose random elements from specific elements of an array

I have a 1D (numpy) array with boolean values. for example:
x = [True, True, False, False, False, True, False, True, True, True, False, True, True, False]
The array contains 8 True values. I would like to keep, for example, exactly 3 (must be less than 8 in this case) as True values randomly from the 8 that exist. In other words I would like to randomly set 5 of those 8 True values as False.
A possible result can be:
x = [True, True, False, False, False, False, False, False, False, False, False, False, True, False]
How to implement it?
One approach would be -
# Get the indices of True values
idx = np.flatnonzero(x)
# Get unique indices of length 3 less than the number of indices and
# set those in x as False
x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
Sample run -
# Input array
In [79]: x
Out[79]:
array([ True, True, False, False, False, True, False, True, True,
True, False, True, True, False], dtype=bool)
# Get indices
In [80]: idx = np.flatnonzero(x)
# Set 3 minus number of True indices as False
In [81]: x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
# Verify output to have exactly three True values
In [82]: x
Out[82]:
array([ True, False, False, False, False, False, False, True, False,
False, False, True, False, False], dtype=bool)
Build an array with the number of desired True and False, then just shuffle it
import random
def buildRandomArray(size, numberOfTrues):
res = [False]*(size-numberOfTrues) + [True]*numberOfTrues
random.shuffle(res)
return res
Live example

How do I obtain a mask, reversing numpy.flatnonzero?

Given an arbitrary one-dimensional mask:
In [1]: import numpy as np
...: mask = np.array(np.random.random_integers(0,1,20), dtype=bool)
...: mask
Out[1]:
array([ True, False, True, False, False, True, False, True, True,
False, True, False, True, False, False, True, True, False,
True, True], dtype=bool)
We can obtain an array of the True elements of mask using np.flatnonzero:
In[2]: np.flatnonzero(mask)
Out[2]: array([ 0, 2, 5, 7, 8, 10, 12, 15, 16, 18, 19], dtype=int64)
But now how do I reverse this process and go from _2 to a mask?
Create an all-false mask and then use numpy's index array functionality to assign the True entries for the mask.
In[3]: new_mask = np.zeros(20, dtype=bool)
...: new_mask
Out[3]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False], dtype=bool)
In[4]: new_mask[_2] = True
...: new_mask
Out[4]:
array([ True, False, True, False, False, True, False, True, True,
False, True, False, True, False, False, True, True, False,
True, True], dtype=bool)
As a check we see that:
In[5]: np.flatnonzero(new_mask)
Out[5]: array([ 0, 2, 5, 7, 8, 10, 12, 15, 16, 18, 19], dtype=int64)
As expected, _5 == _2:
In[6]: np.all(_5 == _2)
Out[6]: True
You could use np.bincount:
In [304]: mask = np.random.binomial(1, 0.5, size=10).astype(bool); mask
Out[304]: array([ True, True, False, True, False, False, False, True, False, True], dtype=bool)
In [305]: idx = np.flatnonzero(mask); idx
Out[305]: array([0, 1, 3, 7, 9])
In [306]: np.bincount(idx, minlength=len(mask)).astype(bool)
Out[306]: array([ True, True, False, True, False, False, False, True, False, True], dtype=bool)

Categories

Resources