numpy mask array limiting the frequency of masked values

numpy mask array limiting the frequency of masked values - python

Starting from an array:
a = np.array([1,1,1,2,3,4,5,5])
and a filter:
m = np.array([1,5])
I am now building a mask with:
b = np.in1d(a,m)
that correctly returns:
array([ True, True, True, False, False, False, True, True], dtype=bool)
I would need to limit the number of boolean Trues for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):
array([ True, True, False, False, False, False, True, True], dtype=bool)
or
array([ True, False, True, False, False, False, True, True], dtype=bool)
or
array([ False, True, True, False, False, False, True, True], dtype=bool)
Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.

For a generic case with unsorted input array, here's one approach based on np.searchsorted -
N = 2 # Parameter to decide how many duplicates are allowed
sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]
Sample run -
In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])
In [38]: m
Out[38]: [1, 2, 5]
In [39]: N
Out[39]: 2
In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)

Related

Mask of boolean 2D numpy array with True values for elements contained in another 1D numpy array

Take the following example. I have an array test and want to get a boolean mask with True's for all elements that are equal to elements of ref.
import numpy as np
test = np.array([[2, 3, 1, 0], [5, 4, 2, 3], [6, 7, 5 ,4]])
ref = np.array([3, 4, 5])
I am looking for something equivalent to
mask = (test == ref[0]) | (test == ref[1]) | (test == ref[2])
which in this case should yield
>>> print(mask)
[[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]]
but without having to resort to any loops.

Numpy comes with a function isin that does exactly this
np.isin(test, ref)
which return
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])

You can use numpy broadcasting:
mask = (test[:,None] == ref[:,None]).any(1)
output:
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
NB. this is faster that numpy.isin, but creates a (X, X, Y) sized intermediate array where X, Y is the shape of test, so this will consume some memory on very large arrays

Efficient way to restrict a numpy boolean selector to the first few true values

I have a numpy boolean selector array which I can apply to array a. (not actually random in the problem domain, this is just convenient for the example). But I actually want to select using only the first n True entries of selector (up to n=3 in the example). So given selector plus a parameter n, how do I generate select_first_few, using numpy operations, thus avoiding an iterative loop?
>>> import numpy as np
>>> selector = np.random.random(10) > 0.5
>>> a = np.arange(10)
>>> selector
array([ True, False, True, True, True, False, True, False, True,
False])
>>> chosen, others = a[selector], a[~selector]
>>> chosen
array([0, 2, 3, 4, 6, 8])
>>> others
array([1, 5, 7, 9])
>>> select_first_few = np.array([ True, False, True, True, False, False, False, False, False,
... False])
>>> chosen_few, tough_luck = a[select_first_few], a[~select_first_few]
>>> chosen_few
array([0, 2, 3])
>>> tough_luck
array([1, 4, 5, 6, 7, 8, 9])

Approach #1
One approach would be using cumsum and argmax to get the extent and then slice thereafter to set False -
In [40]: n = 3
In [41]: selector
Out[41]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [42]: selector[(selector.cumsum()>n).argmax():] = 0
In [43]: selector # your select_first_few mask
Out[43]:
array([ True, False, True, True, False, False, False, False, False,
False])
Then, use this new selector to select and de-select elements off the input array.
Approach #2
Another approach would be to mask-the-mask -
n = 3
C = np.count_nonzero(selector)
newmask = np.zeros(C, dtype=bool)
newmask[:n] = 1
selector[selector] = newmask
Sample run -
In [62]: selector
Out[62]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [63]: n = 3
...: C = np.count_nonzero(selector)
...: newmask = np.zeros(C, dtype=bool)
...: newmask[:n] = 1
...: selector[selector] = newmask
In [64]: selector
Out[64]:
array([ True, False, True, True, False, False, False, False, False,
False])
Or make it shorter with on-the-fly concatenation of booleans -
n = 3
C = np.count_nonzero(selector)
selector[selector] = np.r_[np.ones(n,dtype=bool),np.zeros(C-n,dtype=bool)]
Approach #3
Most simplistic one -
selector &= selector.cumsum()<=n

Get the all the choosen indices in a list and slice this list.
Then use list comprehension to retrieve the data at those choosen indices.
import numpy as np
selector = np.random.random(10) > 0.5
data = np.arange(10)
choosen_indices = np.where(selector)
#select first 3 choosen
choosen_few_indices = choosen_indices[:3]
choosen_few = [data[i] for i in choosen_few_indices]
# if you are also interested in the not choosen data
not_choosen_indices = list(set(range(len(data))) - set(choosen_indices))
# proceed ...

Numpy conditional check with stop

I need your help. I want to walk over a three dimensional array and check in one direction the distance between two elements, if it is smaller the value should be True. As soon as the distance gets higher than a certain value the rest of the values in this dimension should be set to False.
Here is an example in 1D:
a = np.array([1,2,2,1,2,5,2,7,1,2])
b = magic_check_fct(a, threshold=3, axis=0)
print(b)
# The expected output is :
> b = [True, True, True, True, True, False, False, False, False, False]
For a simple check, the result with a <= threshold would be and is not the expected output:
> b = [True, True, True, True, True, False, True, False, True, True]
Is there an efficient way to this with numpy? This whole thing is performance critical.
Thanks for your help!

One way would be to use np.minimum.accumulate along that axis -
np.minimum.accumulate(a<=threshold,axis=0)
Sample run -
In [515]: a
Out[515]: array([1, 2, 2, 1, 2, 5, 2, 7, 1, 2])
In [516]: threshold = 3
In [518]: print np.minimum.accumulate(a<=threshold,axis=0)
[ True True True True True False False False False False]
Another with thresholding and then slicing for 1D arrays -
out = a<=threshold
if ~out.all():
out[out.argmin():] = 0

Here's one more approach using 1st discrete difference:
In [126]: threshold = 3
In [127]: mask = np.diff(a, prepend=a[0]) < threshold
In [128]: mask[mask.argmin():] = False
In [129]: mask
Out[129]:
array([ True, True, True, True, True, False, False, False, False,
False])

Choose random elements from specific elements of an array

I have a 1D (numpy) array with boolean values. for example:
x = [True, True, False, False, False, True, False, True, True, True, False, True, True, False]
The array contains 8 True values. I would like to keep, for example, exactly 3 (must be less than 8 in this case) as True values randomly from the 8 that exist. In other words I would like to randomly set 5 of those 8 True values as False.
A possible result can be:
x = [True, True, False, False, False, False, False, False, False, False, False, False, True, False]
How to implement it?

One approach would be -
# Get the indices of True values
idx = np.flatnonzero(x)
# Get unique indices of length 3 less than the number of indices and
# set those in x as False
x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
Sample run -
# Input array
In [79]: x
Out[79]:
array([ True, True, False, False, False, True, False, True, True,
True, False, True, True, False], dtype=bool)
# Get indices
In [80]: idx = np.flatnonzero(x)
# Set 3 minus number of True indices as False
In [81]: x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
# Verify output to have exactly three True values
In [82]: x
Out[82]:
array([ True, False, False, False, False, False, False, True, False,
False, False, True, False, False], dtype=bool)

Build an array with the number of desired True and False, then just shuffle it
import random
def buildRandomArray(size, numberOfTrues):
res = [False]*(size-numberOfTrues) + [True]*numberOfTrues
random.shuffle(res)
return res
Live example

making np.where returning the same coordinate if the same value comes multiple time

as the title say, I want to make np.where() returning a coordinate multiple time if it comes across the same value, exemple:
import numpy as np
a = 2*np.arange(5)
b = [8,8]
condition = np.isin(a,b)
print np.where(condition)
>>> (array([4], dtype=int64),)
it returns [4] because a[4] = 8, but since b has two 8, I want it to returns [4,4], is there a way to do this without iterating throught each b value?

With your a,b:
In [687]: condition=isin(a,b)
In [688]: condition
Out[688]: array([False, False, False, False, True], dtype=bool)
where just tells us the index of that one True value.
Switch the test, and you find that both items of b are in a.
In [697]: isin(b,a)
Out[697]: array([ True, True], dtype=bool)
You could use a simple broadcasted comparison:
In [700]: a[:,None]==b
Out[700]:
array([[False, False],
[False, False],
[False, False],
[False, False],
[ True, True]], dtype=bool)
In [701]: np.where(a[:,None]==b)
Out[701]: (array([4, 4], dtype=int32), array([0, 1], dtype=int32))
isin (and in1d which it uses) worries about uniqueness, but you aren't. So testing the array == gives you more control.
test if both values in b match the same a element
In [703]: (a[:,None]==b).all(axis=1)
Out[703]: array([False, False, False, False, True], dtype=bool)
test if any - essentially what in1d does:
In [704]: (a[:,None]==b).any(axis=1)
Out[704]: array([False, False, False, False, True], dtype=bool)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy mask array limiting the frequency of masked values - python

Related

Mask of boolean 2D numpy array with True values for elements contained in another 1D numpy array

Efficient way to restrict a numpy boolean selector to the first few true values

Numpy conditional check with stop

Choose random elements from specific elements of an array

making np.where returning the same coordinate if the same value comes multiple time

Categories

Resources