Related
Here is an example of what I would like to do:
Assume Array A
A = np.array([[0, 1, 3, 5, 9],
[2, 7, 5, 1, 4]])
And Array B
B = np.array([2, 4])
I am looking for an operation that will increment the element indexed by array B in each row of array A by 1.
So the result A is:
A = np.array([[0, 1, 4, 5, 9],
[2, 7, 5, 1, 5]])
The index 2 of first row is increased by 1, and the index 4 of second row is increased by 1
You can achieve this by using advanced indexing in numpy:
A[np.arange(len(B)), B] += 1
This works by creating a 2D array with dimensions (len(B), len(B)) using np.arange(len(B)), which represents the row indices. The second index of the advanced indexing, B, represents the column indices. By adding 1 to A[np.arange(len(B)), B], you increment the elements in each row specified by B.
In numpy you can do by using arrange and shape of an array
import numpy as np
A = np.array([[0, 1, 3, 5, 9],
[2, 7, 5, 1, 4]])
B = np.array([2, 4])
A[np.arange(A.shape[0]), B] += 1
print(A)
np.arange(A.shape[0]) generates an array of integers from 0 to A.shape[0] - 1. A.shape[0] is basically rows
you can do with looping also..
import numpy as np
A = np.array([[0, 1, 3, 5, 9],
[2, 7, 5, 1, 4]])
B = np.array([2, 4])
for i, index in enumerate(B):
A[i][index] += 1
print(A)
For example, let's consider the following numpy array:
[1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
Also, let's suppose that the threshold is equal to 3.
That is to say that we are looking for sequences of at least two consecutive values that are all above the threshold.
The output would be the indices of those values, which in our case is:
[[3, 4, 5], [8, 9]]
If the output array was flattened that would work as well!
[3, 4, 5, 8, 9]
Output Explanation
In our initial array we can see that for index = 1 we have the value 5, which is greater than the threshold, but is not part of a sequence (of at least two values) where every value is greater than the threshold. That's why this index would not make it to our output.
On the other hand, for indices [3, 4, 5] we have a sequence of (at least two) neighboring values [5, 4, 6] where each and every of them are above the threshold and that's the reason that their indices are included in the final output!
My Code so far
I have approached the issue with something like this:
(arr > 3).nonzero()
The above command gathers the indices of all the items that are above the threshold. However, I cannot determine if they are consecutive or not. I have thought of trying a diff on the outcome of the above snippet and then may be locating ones (that is to say that indices are one after the other). Which would give us:
np.diff((arr > 3).nonzero())
But I'd still be missing something here.
If you convolve a boolean array with a window full of 1 of size win_size ([1] * win_size), then you will obtain an array where there is the value win_size where the condition held for win_size items:
import numpy as np
def groups(arr, *, threshold, win_size, merge_contiguous=False, flat=False):
conv = np.convolve((arr >= threshold).astype(int), [1] * win_size, mode="valid")
indexes_start = np.where(conv == win_size)[0]
indexes = [np.arange(index, index + win_size) for index in indexes_start]
if flat or merge_contiguous:
indexes = np.unique(indexes)
if merge_contiguous:
indexes = np.split(indexes, np.where(np.diff(indexes) != 1)[0] + 1)
return indexes
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
win_size = 2
print(groups(arr, threshold=threshold, win_size=win_size))
print(groups(arr, threshold=threshold, win_size=win_size, merge_contiguous=True))
print(groups(arr, threshold=threshold, win_size=win_size, flat=True))
[array([3, 4]), array([4, 5]), array([8, 9])]
[array([3, 4, 5]), array([8, 9])]
[3 4 5 8 9]
You can do what you want using simple numpy operations
import numpy as np
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
arr_padded = np.concatenate(([0], arr, [0]))
a = np.where(arr_padded > 3, 1, 0)
da = np.diff(a)
idx_start = (da == 1).nonzero()[0]
idx_stop = (da == -1).nonzero()[0]
valid = (idx_stop - idx_start >= 2).nonzero()[0]
result = [list(range(idx_start[i], idx_stop[i])) for i in valid]
print(result)
Explanation
Array a is a padded binary version of the original array, with 1s where the original elements are greater than three. da contains 1s where "islands" of 1s begin in a, and -1 where the "islands" end in a. Due to the padding, there is guaranteed to be an equal number of 1s and -1s in da. Extracting their indices, we can calculate the length of the islands. Valid index pairs are those whose respective "islands" have length >= 2. Then, its just a matter of generating all numbers between the index bounds of the valid "islands".
I follow your original idea. You are almost done.
I use another diff2 to pick the index of the first value in a sequence. See comments in code for details.
import numpy as np
arr = np.array([ 1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
all_idx = (arr > threshold).nonzero()[0]
# array([1, 3, 4, 5, 8, 9])
result = np.empty(0)
if all_idx.size > 1:
diff1 = np.zeros_like(all_idx)
diff1[1:] = np.diff(all_idx)
# array([0, 2, 1, 1, 3, 1])
diff1[0] = diff1[1]
# array([2, 2, 1, 1, 3, 1])
# **Positions with a value 1 in diff1 should be reserved.**
# But we also want the position before each 1. Create another diff2
diff2 = np.zeros_like(all_idx)
diff2[:-1] = np.diff(diff1)
# array([ 2, -1, 0, 2, -2, 0])
# **Positions with a negative value in diff2 should be reserved.**
result = all_idx[(diff1==1) | (diff2<0)]
print(result)
# array([3, 4, 5, 8, 9])
I'll try something different using window views, I'm not sure this works all the time so counterexamples are welcome. It has the advantage of not requiring Python loops.
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as window
def consec_thresh(arr, thresh):
win = window(np.argwhere(arr > thresh), (2, 1))
return np.unique(win[np.diff(win, axis=2).ravel() == 1, :,:].ravel())
How does it work?
So we start with the array and gather the indices where the threshold is met:
In [180]: np.argwhere(arr > 3)
Out[180]:
array([[1],
[3],
[4],
[5],
[8],
[9]])
Then we build a sliding window that makes up pair of values along the column (which is the reason for the (2, 1) shape of the window).
In [181]: window(np.argwhere(arr > 3), (2, 1))
Out[181]:
array([[[[1],
[3]]],
[[[3],
[4]]],
[[[4],
[5]]],
[[[5],
[8]]],
[[[8],
[9]]]])
Now we want to take the difference inside each pair, if it's one then the indices are consecutive.
In [182]: np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2)
Out[182]:
array([[[[2]]],
[[[1]]],
[[[1]]],
[[[3]]],
[[[1]]]])
We can plug those values back in the windows we created above,
In [185]: window(np.argwhere(arr > 3), (2, 1))[np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2).ravel() == 1, :, :]
Out[185]:
array([[[[3],
[4]]],
[[[4],
[5]]],
[[[8],
[9]]]])
Then we can ravel (flatten without copy when possible), we have to get rid of the repeated indices created by windowing so I call np.unique. We ravel again and get:
array([3, 4, 5, 8, 9])
The below iteration code should help with O(n) complexity
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
threshold = 3
sequence = 2
output = []
temp_arr = []
for i in range(len(arr)):
if arr[i] > threshold:
temp_arr.append(i)
else:
if len(temp_arr) >= sequence:
output.append(temp_arr)
temp_arr = []
if len(temp_arr):
output.append(temp_arr)
temp_arr = []
print(output)
# Output
# [[3, 4, 5], [8, 9]]
I would suggest using a for loop with two indces. You will have one that starts at j=1 and the other at i=0, both stepping forward by 1.
You can then ask if the value at both is greater than the threshold, if so
add the indices to a list and keep moving forward with j until the threshold or .next() is not greater than threshhold.
values = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
res=[]
threshold= 3
i=0
j=0
for _ in values:
j=i+1
lista=[]
try:
print(f"i: {i} j:{j}")
# check if condition is met
if(values[i] > threshold and values[j] > threshold):
lista.append(i)
# add sequence
while values[j] > threshold:
lista.append(j)
print(f"j while: {j}")
j+=1
if(j>=len(values)):
break
res.append(lista)
i=j
if(j>=len(values)):
break
except:
print("ex")
this works. but needs refactoring
Let's try the following code:
# Simple is better than complex
# Complex is better than complicated
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
arr_3=[i if arr[i]>3 else 'a' for i in range(len(arr))]
arr_4=''.join(str(x) for x in arr_3)
i=0
while i<len(arr_5):
if len(arr_5[i]) <=1:
del arr_5[i]
else:
i+=1
arr_6=[list(map(lambda x: int(x), list(x))) for x in arr_5]
print(arr_6)
Outputs:
[[3, 4, 5], [8, 9]]
Here is a solution that makes use of pandas Series:
thresh = 3
win_size = 2
s = pd.Series(arr)
# locating groups of values where there are at least (win_size) consecutive values above the threshold
groups = s.groupby(s.le(thresh).cumsum().loc[s.gt(thresh)]).transform('count').ge(win_size)
0 False
1 False
2 False
3 True
4 True
5 True
6 False
7 False
8 True
9 True
dtype: bool
We can now easily take their indices in a 1D array:
np.flatnonzero(groups)
# array([3, 4, 5, 8, 9], dtype=int64)
OR multiple lists:
[np.arange(index.start, index.stop) for index in np.ma.clump_unmasked(np.ma.masked_not_equal(groups.values, value=True))]
# [array([3, 4, 5], dtype=int64), array([8, 9], dtype=int64)]
When I tried using scipy.optimize.linear_sum_assignment as shown, it gives the assignment vector [0 2 3 1] with a total cost of 15.
However, from the cost matrix c, you can see that for the second task, the 5th agent has a cost of 1. So the expected assignment should be [0 3 None 2 1] (total cost of 9)
Why is linear_sum_assignment not returning the optimal assignments?
from scipy.optimize import linear_sum_assignment
c = [
[1, 5, 9, 5],
[5, 8, 3, 2],
[3, 2, 6, 8],
[7, 3, 5, 4],
[2, 1, 9, 9],
]
results = linear_sum_assignment(c)
print(results[1]) # [0 2 3 1]
linear_sum_assignment returns a tuple of two arrays. These are the row indices and column indices of the assigned values. For your example (with c converted to a numpy array):
In [51]: c
Out[51]:
array([[1, 5, 9, 5],
[5, 8, 3, 2],
[3, 2, 6, 8],
[7, 3, 5, 4],
[2, 1, 9, 9]])
In [52]: row, col = linear_sum_assignment(c)
In [53]: row
Out[53]: array([0, 1, 3, 4])
In [54]: col
Out[54]: array([0, 2, 3, 1])
The corresponding index pairs from row and col give the selected entries. That is, the indices of the selected entries are (0, 0), (1, 2), (3, 3) and (4, 1). It is these pairs that are the "assignments".
The sum associated with this assignment is 9:
In [55]: c[row, col].sum()
Out[55]: 9
In the original version of the question (but since edited),
it looks like you wanted to know the row index for each column, so you expected [0, 4, 1, 3]. The values that you want are in row, but the order is not what you expect, because the indices in col are not simply [0, 1, 2, 3]. To get the result in the form that you expected, you have to reorder the values in row based on the order of the indices in col. Here are two ways to do that.
First:
In [56]: result = np.zeros(4, dtype=int)
In [57]: result[col] = row
In [58]: result
Out[58]: array([0, 4, 1, 3])
Second:
In [59]: result = row[np.argsort(col)]
In [60]: result
Out[60]: array([0, 4, 1, 3])
Note that the example in the linear_sum_assignment docstring is potentially misleading; because it displays only col_ind in the python session, it gives the impression that col_ind is "the answer". In general, however, the answer involves both of the returned arrays.
I'm new in python, I was looking into a code which is similar to as follows,
import numpy as np
a = np.ones([1,1,5,5], dtype='int64')
b = np.ones([11], dtype='float64')
x = b[a]
print (x.shape)
# (1, 1, 5, 5)
I looked into the python numpy documentation I didn't find anything related to such case. I'm not sure what's going on here and I don't know where to look.
Edit
The actual code
def gausslabel(length=180, stride=2):
gaussian_pdf = signal.gaussian(length+1, 3)
label = np.reshape(np.arange(stride/2, length, stride), [1,1,-1,1])
y = np.reshape(np.arange(stride/2, length, stride), [1,1,1,-1])
delta = np.array(np.abs(label - y), dtype=int)
delta = np.minimum(delta, length-delta)+length/2
return gaussian_pdf[delta]
I guess that this code is trying to demonstrate that if you index an array with an array, the result is an array with the same shape as the indexing array (in this case a) and not the indexed array (i.e. b)
But it's confusing because b is full of 1s. Rather try this with a b full of different numbers:
>> a = np.ones([1,1,5,5], dtype='int64')
>> b = np.arange(11) + 3
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
>>> b[a]
array([[[[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]]]])
because a is an array of 1s, the only element of b that is indexed is b[1] which equals 4. The shape of the result though is the shape of a, the array used as the index.
I have a numpy array, say:
>>> a=np.array([[0,1,2],[4,3,6],[9,5,7],[8,9,8]])
>>> a
array([[0, 1, 2],
[4, 3, 6],
[9, 5, 7],
[8, 9, 8]])
I want to replace the second and third column elements with the minimum of them (row by row), except if one of these 2 elements is < 3.
The resulting array should be:
array([[0, 1, 2],# nothing changes since 1 and 2 are <3
[4, 3, 3], #min(3,6)=3 => 6 changed to 3
[9, 5, 5], #min(5,7)=5 => 7 changed to 5
[8, 8, 8]]) #min(9,8)=8 => 9 changed to 8
I know I can use clip, for instance a[:,1:3].clip(2,6,a[:,1:3]), but
1) clip will be applied to all elements, including those <3.
2) I don't know how to set the min and max values of clip to the minimum values of the 2 related elements of each row.
Just use the >= operator to first select what you are interested of:
b = a[:, 1:3] # select the columns
matching = numpy.all(b >= 3, axis=1) # find rows with all elements matching
b = b[matching, :] # select rows
Now you can replace the content with the minimum by e.g.:
# find row minimum and convert to a column vector
b[:, :] = b.min(1, keepdims=True)
We first defined a row_mask, depicting the <3 condition, and then apply a minimum along an axis to find the minimum (for rows in row_mask).
The newaxis part is required for the broadcasting of a 1dim array (of minimums) to the 2-dim target of the assignment.
a=np.array([[0,1,2],[4,3,6],[9,5,7],[8,9,8]])
row_mask = (a[:,0]>=3)
a[row_mask, 1:] = a[row_mask, 1:].min(axis=1)[...,np.newaxis]
a
=>
array([[0, 1, 2],
[4, 3, 3],
[9, 5, 5],
[8, 8, 8]])
Here's a one liner:
a[np.where(np.sum(a,axis=1)>3),1:3]=np.min(a[np.where(np.sum(a,axis=1)>3),1:3],axis=2).reshape(1,3,1)
Here's a breakdown:
>>> b = np.where(np.sum(a,axis=1)>3) # finds rows where, in a, row sums are > 3
(array([1, 2, 3]),)
>>> c = a[b,1:3] # the part of a that needs to change
array([[[3, 3],
[5, 5],
[8, 8]]])
>>> d = np.min(c,axis=2) # the minimum values in each row (cols 1 and 2)
array([[3, 5, 8]])
>>> e = d.reshape(1,3,1) # adjust shape for broadcast to a
array([[[3],
[5],
[8]]])
>>> a[np.where(np.sum(a,axis=1)>3),1:3] = e # set the values in a
>>> a
array([[0, 1, 2],
[4, 3, 3],
[9, 5, 5],
[8, 8, 8]])