Choose random elements from specific elements of an array - python

I have a 1D (numpy) array with boolean values. for example:
x = [True, True, False, False, False, True, False, True, True, True, False, True, True, False]
The array contains 8 True values. I would like to keep, for example, exactly 3 (must be less than 8 in this case) as True values randomly from the 8 that exist. In other words I would like to randomly set 5 of those 8 True values as False.
A possible result can be:
x = [True, True, False, False, False, False, False, False, False, False, False, False, True, False]
How to implement it?

One approach would be -
# Get the indices of True values
idx = np.flatnonzero(x)
# Get unique indices of length 3 less than the number of indices and
# set those in x as False
x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
Sample run -
# Input array
In [79]: x
Out[79]:
array([ True, True, False, False, False, True, False, True, True,
True, False, True, True, False], dtype=bool)
# Get indices
In [80]: idx = np.flatnonzero(x)
# Set 3 minus number of True indices as False
In [81]: x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
# Verify output to have exactly three True values
In [82]: x
Out[82]:
array([ True, False, False, False, False, False, False, True, False,
False, False, True, False, False], dtype=bool)

Build an array with the number of desired True and False, then just shuffle it
import random
def buildRandomArray(size, numberOfTrues):
res = [False]*(size-numberOfTrues) + [True]*numberOfTrues
random.shuffle(res)
return res
Live example

Related

How to get index of numpy multidimensional array in reverse order?

For example I have this np.array:
[[True, True, False, False]
[True, False, True, False]
[False, True, False, True]
[False, False, True, True]]
I want to get the the first index that is True in each row but counting from the back of the row. So expected output is a (4,) array corresponding to each row:
[1, # First index that is True is at index 1
2, # First index that is True is at index 2
3, # First index that is True is at index 3
3] # First index that is True is at index 3
a = np.array(
[[True, True, False, False],
[True, False, True, False],
[False, True, False, True],
[False, False, True, True]]
)
idx = a.shape[1] - 1 - np.argmax(a[:,::-1], axis=1)
np.argmax() will find the index of the highest value (for each row with axis=1). Since True is equivalent to 1 and False to 0, it'll record the index of the first True value. Simply reverse the rows so you find the "last" one and then substract this from the row length to account for the fact that you're counting backwards.
you can use python to reversea row and find an element: row.reverse() and row.find(True). in numpy you can use numpy.flip(row) to reverse a row and numpy.where(row == True) to find an element in a row.
import numpy as np
x = np.array([[True, True, False, False],
[True, False, True, False],
[False, True, False, True],
[False, False, True, True]])
result = []
for row in x:
row = np.flip(row)
index = np.where(row == True)
result.append(index[0][0])
print(result)

How to generate these lists of True and False?

I am looping through a list of 3 items, something like:
for i in range(3):
and trying to produce the following lists on each respective iteration:
[True, True, False, False, False, False]
[False, False, True, True, False, False]
[False, False, False, False, True, True]
What would be a good way in python to do this?
Here's one way:
>>> for i in range(3):
... print([(x // 2) == i for x in range(6)])
...
[True, True, False, False, False, False]
[False, False, True, True, False, False]
[False, False, False, False, True, True]
Try like this:
k = 0
for i in range(3):
# Other tasks
myList = [False for x in range(4)]
myList[k:k] = [True,True]
print(myList)
k += 2
L = [False, False, False, False, True, True]
for _ in range(3):
L = L[-2:] + L[:4]
print(L)

Python, The fastest way to find numbers indexes in np. array

The fastest way to find numbers indexes in np. array in Python is?
Suppose we have a list of numbers from 0 to 20, and we want to know the indexes of digits 2 and 5
The canonical way would be to use numpy's where method:
a = np.array(range(20))
np.where((a == 2) | (a == 5))
Note that in order to combine the two terms (a == 2) and (a == 5) we need the bitwise or operator |. The reason is that both (a == 2) and (a == 5) return a numpy array of dtype('bool'):
>>> a == 2
array([False, False, True, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
>>> (a == 5)
array([False, False, False, False, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
>>> (a == 2) | (a==5)
array([False, False, True, False, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])

Converting numpy array to pure python integer to avoid integer overflow

I have asked this question before and being downvoted heavily. Anyway judging by the fact that noone really sees a triple downvoted question again I repost it to make clear that I am interested in the actual answer (if there is one).
Problem statement:
I am in a situation I need the arbitrary precision feature of pure python integers. At some point in my code I have a numpy array with boolean. Something like:
arr
array([ True, False, False, False, True, True, True, False, True,
True, False, False, True, True, True, False, True, False,
False, True, False, True, True, True, True, True, False,
True, False, True, True, False, True, True, False, True,
False, False, True, False, True, True, False, True, False,
True, True, False, True, True, True, False, False, False,
True, False, False, True, True, True, True, False, True,
False])
which I convert it to numpy.int64 using arr.astype(int) to make it arithmetic.
But I used this code to convert it to an integer it overflowed (and produced negative numbers I don't want to).
Code is using this function (which is pure python and wont have any integer overflow issue by itself):
def bool2int(x):
y = 0
for i,j in enumerate(x):
y += j<<i
return y
If I run the code directly on np.array (converted to int or not does not matter):
bool2int(arr)
-2393826705255337647
bool2int(h.astype(int))
-2393826705255337647
will I need a positive integer. So, I used a list comprehension:
bool2int([int(x) for x in arr])
16052917368454213969
Obviously, the number represented by arr exceeds the capacity of fixed precision integers (i.e. 263-1) to be able to use ti directly.
Is there any other direct way to achieve beyond list comprehension?
Edit:
For the theory of integer overflow in python I sued this source.
Using astype(int) seems to be working fine; the following code:
import numpy as np
test = np.array([True, False, False, False, True, True, True, False, True, True, False, False, True, True, True, False, True, False, False, True, False, True, True, True, True, True, False, True, False, True, True, False, True, True, False, True, False, False, True, False, True, True, False, True, False, True, True, False, True, True, True, False, False, False, True, False, False, True, True, True, True, False, True, False])
test_int = test.astype(int)
print(test_int)
print(test_int.sum())
Returns:
[1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 0 1 1 0
1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0]
37
The overflow exception you are getting seems unlikely here so I would look again into that because maybe you had an error somewhere else.
Edit
If you want to get a Python type instead of a numpy object just do:
test.astype(int).tolist()
One way of getting native Python type elements is .tolist(). Note that we can do this directly on the boolean array. Your code works fine with native Python bools.
>>> x = np.random.randint(0, 2, (100,)).astype(bool)
>>> x
array([ True, True, False, True, False, True, False, False, True,
False, False, True, True, False, False, False, True, False,
False, True, False, True, False, False, True, True, True,
True, True, True, True, False, False, False, False, False,
True, True, True, True, False, False, True, False, False,
False, False, True, False, True, True, False, False, True,
False, True, True, True, False, True, True, True, False,
True, True, True, True, False, True, True, True, False,
True, False, True, False, True, False, True, True, True,
False, False, True, True, True, True, True, False, False,
True, False, False, False, True, True, True, False, False, True], dtype=bool)
>>> bool2int(x)
-4925102932063228254
>>> bool2int(x.tolist())
774014555155191751582008547627L
As an added bonus it's actually faster.
>>> timeit(lambda:bool2int(x), number=1000)
0.24346303939819336
>>> timeit(lambda:bool2int(x.tolist()), number=1000)
0.010725975036621094

numpy mask array limiting the frequency of masked values

Starting from an array:
a = np.array([1,1,1,2,3,4,5,5])
and a filter:
m = np.array([1,5])
I am now building a mask with:
b = np.in1d(a,m)
that correctly returns:
array([ True, True, True, False, False, False, True, True], dtype=bool)
I would need to limit the number of boolean Trues for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):
array([ True, True, False, False, False, False, True, True], dtype=bool)
or
array([ True, False, True, False, False, False, True, True], dtype=bool)
or
array([ False, True, True, False, False, False, True, True], dtype=bool)
Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.
For a generic case with unsorted input array, here's one approach based on np.searchsorted -
N = 2 # Parameter to decide how many duplicates are allowed
sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]
Sample run -
In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])
In [38]: m
Out[38]: [1, 2, 5]
In [39]: N
Out[39]: 2
In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)

Categories

Resources