Starting from an array:
a = np.array([1,1,1,2,3,4,5,5])
and a filter:
m = np.array([1,5])
I am now building a mask with:
b = np.in1d(a,m)
that correctly returns:
array([ True, True, True, False, False, False, True, True], dtype=bool)
I would need to limit the number of boolean Trues for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):
array([ True, True, False, False, False, False, True, True], dtype=bool)
or
array([ True, False, True, False, False, False, True, True], dtype=bool)
or
array([ False, True, True, False, False, False, True, True], dtype=bool)
Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.
For a generic case with unsorted input array, here's one approach based on np.searchsorted -
N = 2 # Parameter to decide how many duplicates are allowed
sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]
Sample run -
In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])
In [38]: m
Out[38]: [1, 2, 5]
In [39]: N
Out[39]: 2
In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)
Related
Take the following example. I have an array test and want to get a boolean mask with True's for all elements that are equal to elements of ref.
import numpy as np
test = np.array([[2, 3, 1, 0], [5, 4, 2, 3], [6, 7, 5 ,4]])
ref = np.array([3, 4, 5])
I am looking for something equivalent to
mask = (test == ref[0]) | (test == ref[1]) | (test == ref[2])
which in this case should yield
>>> print(mask)
[[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]]
but without having to resort to any loops.
Numpy comes with a function isin that does exactly this
np.isin(test, ref)
which return
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
You can use numpy broadcasting:
mask = (test[:,None] == ref[:,None]).any(1)
output:
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
NB. this is faster that numpy.isin, but creates a (X, X, Y) sized intermediate array where X, Y is the shape of test, so this will consume some memory on very large arrays
I have a numpy boolean selector array which I can apply to array a. (not actually random in the problem domain, this is just convenient for the example). But I actually want to select using only the first n True entries of selector (up to n=3 in the example). So given selector plus a parameter n, how do I generate select_first_few, using numpy operations, thus avoiding an iterative loop?
>>> import numpy as np
>>> selector = np.random.random(10) > 0.5
>>> a = np.arange(10)
>>> selector
array([ True, False, True, True, True, False, True, False, True,
False])
>>> chosen, others = a[selector], a[~selector]
>>> chosen
array([0, 2, 3, 4, 6, 8])
>>> others
array([1, 5, 7, 9])
>>> select_first_few = np.array([ True, False, True, True, False, False, False, False, False,
... False])
>>> chosen_few, tough_luck = a[select_first_few], a[~select_first_few]
>>> chosen_few
array([0, 2, 3])
>>> tough_luck
array([1, 4, 5, 6, 7, 8, 9])
Approach #1
One approach would be using cumsum and argmax to get the extent and then slice thereafter to set False -
In [40]: n = 3
In [41]: selector
Out[41]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [42]: selector[(selector.cumsum()>n).argmax():] = 0
In [43]: selector # your select_first_few mask
Out[43]:
array([ True, False, True, True, False, False, False, False, False,
False])
Then, use this new selector to select and de-select elements off the input array.
Approach #2
Another approach would be to mask-the-mask -
n = 3
C = np.count_nonzero(selector)
newmask = np.zeros(C, dtype=bool)
newmask[:n] = 1
selector[selector] = newmask
Sample run -
In [62]: selector
Out[62]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [63]: n = 3
...: C = np.count_nonzero(selector)
...: newmask = np.zeros(C, dtype=bool)
...: newmask[:n] = 1
...: selector[selector] = newmask
In [64]: selector
Out[64]:
array([ True, False, True, True, False, False, False, False, False,
False])
Or make it shorter with on-the-fly concatenation of booleans -
n = 3
C = np.count_nonzero(selector)
selector[selector] = np.r_[np.ones(n,dtype=bool),np.zeros(C-n,dtype=bool)]
Approach #3
Most simplistic one -
selector &= selector.cumsum()<=n
Get the all the choosen indices in a list and slice this list.
Then use list comprehension to retrieve the data at those choosen indices.
import numpy as np
selector = np.random.random(10) > 0.5
data = np.arange(10)
choosen_indices = np.where(selector)
#select first 3 choosen
choosen_few_indices = choosen_indices[:3]
choosen_few = [data[i] for i in choosen_few_indices]
# if you are also interested in the not choosen data
not_choosen_indices = list(set(range(len(data))) - set(choosen_indices))
# proceed ...
I need your help. I want to walk over a three dimensional array and check in one direction the distance between two elements, if it is smaller the value should be True. As soon as the distance gets higher than a certain value the rest of the values in this dimension should be set to False.
Here is an example in 1D:
a = np.array([1,2,2,1,2,5,2,7,1,2])
b = magic_check_fct(a, threshold=3, axis=0)
print(b)
# The expected output is :
> b = [True, True, True, True, True, False, False, False, False, False]
For a simple check, the result with a <= threshold would be and is not the expected output:
> b = [True, True, True, True, True, False, True, False, True, True]
Is there an efficient way to this with numpy? This whole thing is performance critical.
Thanks for your help!
One way would be to use np.minimum.accumulate along that axis -
np.minimum.accumulate(a<=threshold,axis=0)
Sample run -
In [515]: a
Out[515]: array([1, 2, 2, 1, 2, 5, 2, 7, 1, 2])
In [516]: threshold = 3
In [518]: print np.minimum.accumulate(a<=threshold,axis=0)
[ True True True True True False False False False False]
Another with thresholding and then slicing for 1D arrays -
out = a<=threshold
if ~out.all():
out[out.argmin():] = 0
Here's one more approach using 1st discrete difference:
In [126]: threshold = 3
In [127]: mask = np.diff(a, prepend=a[0]) < threshold
In [128]: mask[mask.argmin():] = False
In [129]: mask
Out[129]:
array([ True, True, True, True, True, False, False, False, False,
False])
I have a 1D (numpy) array with boolean values. for example:
x = [True, True, False, False, False, True, False, True, True, True, False, True, True, False]
The array contains 8 True values. I would like to keep, for example, exactly 3 (must be less than 8 in this case) as True values randomly from the 8 that exist. In other words I would like to randomly set 5 of those 8 True values as False.
A possible result can be:
x = [True, True, False, False, False, False, False, False, False, False, False, False, True, False]
How to implement it?
One approach would be -
# Get the indices of True values
idx = np.flatnonzero(x)
# Get unique indices of length 3 less than the number of indices and
# set those in x as False
x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
Sample run -
# Input array
In [79]: x
Out[79]:
array([ True, True, False, False, False, True, False, True, True,
True, False, True, True, False], dtype=bool)
# Get indices
In [80]: idx = np.flatnonzero(x)
# Set 3 minus number of True indices as False
In [81]: x[np.random.choice(idx, len(idx)-3, replace=0)] = 0
# Verify output to have exactly three True values
In [82]: x
Out[82]:
array([ True, False, False, False, False, False, False, True, False,
False, False, True, False, False], dtype=bool)
Build an array with the number of desired True and False, then just shuffle it
import random
def buildRandomArray(size, numberOfTrues):
res = [False]*(size-numberOfTrues) + [True]*numberOfTrues
random.shuffle(res)
return res
Live example
as the title say, I want to make np.where() returning a coordinate multiple time if it comes across the same value, exemple:
import numpy as np
a = 2*np.arange(5)
b = [8,8]
condition = np.isin(a,b)
print np.where(condition)
>>> (array([4], dtype=int64),)
it returns [4] because a[4] = 8, but since b has two 8, I want it to returns [4,4], is there a way to do this without iterating throught each b value?
With your a,b:
In [687]: condition=isin(a,b)
In [688]: condition
Out[688]: array([False, False, False, False, True], dtype=bool)
where just tells us the index of that one True value.
Switch the test, and you find that both items of b are in a.
In [697]: isin(b,a)
Out[697]: array([ True, True], dtype=bool)
You could use a simple broadcasted comparison:
In [700]: a[:,None]==b
Out[700]:
array([[False, False],
[False, False],
[False, False],
[False, False],
[ True, True]], dtype=bool)
In [701]: np.where(a[:,None]==b)
Out[701]: (array([4, 4], dtype=int32), array([0, 1], dtype=int32))
isin (and in1d which it uses) worries about uniqueness, but you aren't. So testing the array == gives you more control.
test if both values in b match the same a element
In [703]: (a[:,None]==b).all(axis=1)
Out[703]: array([False, False, False, False, True], dtype=bool)
test if any - essentially what in1d does:
In [704]: (a[:,None]==b).any(axis=1)
Out[704]: array([False, False, False, False, True], dtype=bool)