I need your help. I want to walk over a three dimensional array and check in one direction the distance between two elements, if it is smaller the value should be True. As soon as the distance gets higher than a certain value the rest of the values in this dimension should be set to False.
Here is an example in 1D:
a = np.array([1,2,2,1,2,5,2,7,1,2])
b = magic_check_fct(a, threshold=3, axis=0)
print(b)
# The expected output is :
> b = [True, True, True, True, True, False, False, False, False, False]
For a simple check, the result with a <= threshold would be and is not the expected output:
> b = [True, True, True, True, True, False, True, False, True, True]
Is there an efficient way to this with numpy? This whole thing is performance critical.
Thanks for your help!
One way would be to use np.minimum.accumulate along that axis -
np.minimum.accumulate(a<=threshold,axis=0)
Sample run -
In [515]: a
Out[515]: array([1, 2, 2, 1, 2, 5, 2, 7, 1, 2])
In [516]: threshold = 3
In [518]: print np.minimum.accumulate(a<=threshold,axis=0)
[ True True True True True False False False False False]
Another with thresholding and then slicing for 1D arrays -
out = a<=threshold
if ~out.all():
out[out.argmin():] = 0
Here's one more approach using 1st discrete difference:
In [126]: threshold = 3
In [127]: mask = np.diff(a, prepend=a[0]) < threshold
In [128]: mask[mask.argmin():] = False
In [129]: mask
Out[129]:
array([ True, True, True, True, True, False, False, False, False,
False])
Related
Having a matrix with d features and n samples, I would like to compare each feature of a sample (row) against the mean of the column corresponding to that feature and then assign a corresponding label 1 or 0.
Eg. for a matrix X = [x11, x12; x21, x22] I compute the mean of the two columns (mu1, mu2) and then I keep on comparing (x11, x21 with mu1 and so on) to check whether these are greater or smaller than mu and to then assign a label to them according to the if statement (see below).
I have the mean vector for each column i.e. of length d.
I am now using for-loops however these are not computationally effective.
X_copy = X_train;
mu = np.mean(X_train, axis = 0)
for i in range(X_train.shape[0]):
for j in range(X_train.shape[1]):
if X_train[i,j]<mu[j]: #less than mean for the col, assign 0
X_copy[i,j] = 0
else:
X_copy[i,j] = 1 #more than or equal to mu for the col, assign 1
Is there any better alternative?
I don't have much experience with python hence thank you for understanding.
Direct comparison, which makes the average vector compare on each row of the original array. Then convert the data type of the result to int:
>>> X_train = np.random.rand(3, 4)
>>> X_train
array([[0.4789953 , 0.84095907, 0.53538172, 0.04880835],
[0.64554335, 0.50904539, 0.34069036, 0.5290601 ],
[0.84664389, 0.63984867, 0.66111495, 0.89803495]])
>>> (X_train >= X_train.mean(0)).astype(int)
array([[0, 1, 1, 0],
[0, 0, 0, 1],
[1, 0, 1, 1]])
Update:
There is a broadcast mechanism for operations between numpy arrays. For example, an array is compared with a number, which will make the number swim among all elements of the array and compare them one by one:
>>> X_train > 0.5
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
>>> X_train > np.full(X_train.shape, 0.5) # Equivalent effect.
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
Similarly, you can compare a vector with a 2D array, as long as the length of the vector is the same as that of the first dimension of the array:
>>> mu = X_train.mean(0)
>>> X_train > mu
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
>>> X_train > np.tile(mu, (X_train.shape[0], 1)) # Equivalent effect.
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
How do I compare other axes? My English is not good, so it is difficult for me to explain. Here I provide the official explanation of numpy. I hope you can get started through it: Broadcasting
Take the following example. I have an array test and want to get a boolean mask with True's for all elements that are equal to elements of ref.
import numpy as np
test = np.array([[2, 3, 1, 0], [5, 4, 2, 3], [6, 7, 5 ,4]])
ref = np.array([3, 4, 5])
I am looking for something equivalent to
mask = (test == ref[0]) | (test == ref[1]) | (test == ref[2])
which in this case should yield
>>> print(mask)
[[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]]
but without having to resort to any loops.
Numpy comes with a function isin that does exactly this
np.isin(test, ref)
which return
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
You can use numpy broadcasting:
mask = (test[:,None] == ref[:,None]).any(1)
output:
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
NB. this is faster that numpy.isin, but creates a (X, X, Y) sized intermediate array where X, Y is the shape of test, so this will consume some memory on very large arrays
I have a numpy boolean selector array which I can apply to array a. (not actually random in the problem domain, this is just convenient for the example). But I actually want to select using only the first n True entries of selector (up to n=3 in the example). So given selector plus a parameter n, how do I generate select_first_few, using numpy operations, thus avoiding an iterative loop?
>>> import numpy as np
>>> selector = np.random.random(10) > 0.5
>>> a = np.arange(10)
>>> selector
array([ True, False, True, True, True, False, True, False, True,
False])
>>> chosen, others = a[selector], a[~selector]
>>> chosen
array([0, 2, 3, 4, 6, 8])
>>> others
array([1, 5, 7, 9])
>>> select_first_few = np.array([ True, False, True, True, False, False, False, False, False,
... False])
>>> chosen_few, tough_luck = a[select_first_few], a[~select_first_few]
>>> chosen_few
array([0, 2, 3])
>>> tough_luck
array([1, 4, 5, 6, 7, 8, 9])
Approach #1
One approach would be using cumsum and argmax to get the extent and then slice thereafter to set False -
In [40]: n = 3
In [41]: selector
Out[41]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [42]: selector[(selector.cumsum()>n).argmax():] = 0
In [43]: selector # your select_first_few mask
Out[43]:
array([ True, False, True, True, False, False, False, False, False,
False])
Then, use this new selector to select and de-select elements off the input array.
Approach #2
Another approach would be to mask-the-mask -
n = 3
C = np.count_nonzero(selector)
newmask = np.zeros(C, dtype=bool)
newmask[:n] = 1
selector[selector] = newmask
Sample run -
In [62]: selector
Out[62]:
array([ True, False, True, True, True, False, True, False, True,
False])
In [63]: n = 3
...: C = np.count_nonzero(selector)
...: newmask = np.zeros(C, dtype=bool)
...: newmask[:n] = 1
...: selector[selector] = newmask
In [64]: selector
Out[64]:
array([ True, False, True, True, False, False, False, False, False,
False])
Or make it shorter with on-the-fly concatenation of booleans -
n = 3
C = np.count_nonzero(selector)
selector[selector] = np.r_[np.ones(n,dtype=bool),np.zeros(C-n,dtype=bool)]
Approach #3
Most simplistic one -
selector &= selector.cumsum()<=n
Get the all the choosen indices in a list and slice this list.
Then use list comprehension to retrieve the data at those choosen indices.
import numpy as np
selector = np.random.random(10) > 0.5
data = np.arange(10)
choosen_indices = np.where(selector)
#select first 3 choosen
choosen_few_indices = choosen_indices[:3]
choosen_few = [data[i] for i in choosen_few_indices]
# if you are also interested in the not choosen data
not_choosen_indices = list(set(range(len(data))) - set(choosen_indices))
# proceed ...
I'm practicing Dynamic Programming and I'm struggling with debugging my code. The idea is to find if a sum is possible given a list of numbers. Here's my code:
a = [2,3,7,8,10]
sum = 11
b = list(range(1, sum+1))
m = [[False for z in range(len(b))] for i in range(len(a))]
for i, x in enumerate(b):
for j, y in enumerate(a):
if x==y:
m[j][i]=True
elif y<x:
m[j][i] = m[j-1][i]
else:
m[j][i] = m[j-1][i] or m[j-i][y-x]
for i, n in enumerate(m):
print(a[i], n)
And here is the output:
2 [False, True, False, False, False, False, False, False, False, False, False]
3 [False, True, True, False, False, False, False, False, False, False, False]
7 [False, True, True, False, True, True, True, False, False, False, False]
8 [False, True, True, False, True, True, True, True, False, False, False]
10 [False, True, True, False, True, True, True, True, True, True, False]
As I understand it, in my else statement, the algorithm is supposed to go up 1 row and then look at the difference of x and y and check if that slot is possible. So for instance in the most obvious case, the last element in the last row. That would be 10(y)-11(x) which should go all the way back to index 1 on the row above it, which as we know it's True. Not entirely sure what I'm doing wrong, any help in understanding this would be greatly appreciated.
Given you only feed positive values, I don't quite follow why you need a two dimensional list. You can simply use a 1d list:
coins = [2,3,7,8,10]
sum = 11
Next we initialize the list possible that states whether it is possible to obtain a certain value. We set possible[0] to True since this sum can be accomplished with no coins.
possible = [False for _ in range(sum+1)]
possible[0] = True
Now you iterate over each coin, and over the list and "upgrade" the value if possible:
for coin in coins:
for i in range(sum-coin,-1,-1):
if possible[i]:
possible[i+coin] = True
After that, the list possible shows for each value from 0 up to (and including sum) whether you can construct it. So if possible[sum] is True, the sum can be constructed.
For the given coins and sum, one gets:
>>> possible
[True, False, True, True, False, True, False, True, True, True, True, True]
So values 0, 2, 3, 5, 7, 8, 9, 10, 11 are constructible with the coins.
Edit: track the coins
You can also keep track of the coins by slightly modifying the code:
possible = [None for _ in range(sum+1)]
possible[0] = []
for coin in coins:
for i in range(sum-coin,-1,-1):
if possible[i] is not None:
possible[i+coin] = possible[i]+[coin]
Now possible looks like:
>>> possible
[[], None, [2], [3], None, [2, 3], None, [7], [8], [2, 7], [10], [3, 8]]
So 0 can be constructed with coins [] (no coins); 2 can be constructed with [2] (one coin with value 2), 3 with [3], 5 with [2,3], etc.
Starting from an array:
a = np.array([1,1,1,2,3,4,5,5])
and a filter:
m = np.array([1,5])
I am now building a mask with:
b = np.in1d(a,m)
that correctly returns:
array([ True, True, True, False, False, False, True, True], dtype=bool)
I would need to limit the number of boolean Trues for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):
array([ True, True, False, False, False, False, True, True], dtype=bool)
or
array([ True, False, True, False, False, False, True, True], dtype=bool)
or
array([ False, True, True, False, False, False, True, True], dtype=bool)
Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.
For a generic case with unsorted input array, here's one approach based on np.searchsorted -
N = 2 # Parameter to decide how many duplicates are allowed
sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]
Sample run -
In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])
In [38]: m
Out[38]: [1, 2, 5]
In [39]: N
Out[39]: 2
In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)