Here is an algorithm I would like to implement using numpy:
For a given 1D array, calculate the maximum and the minimum over a sliding window.
Create a new array, with the first value equals to the first value in the given array.
For each subsequent values, clip the previous value inserted in the new array between the min and the max from the sliding window.
As an example, let's take the array a=[3, 4, 5, 4, 3, 2, 3, 3] and a sliding window of size 3. We find for min and max:
min = [3, 4, 3, 2, 2, 2]
max = [5, 5, 5, 4, 3, 3]
Now our output array will start with the first element from a, so it's 3. And for the next value, I clip 3 (the last value inserted) between 4 and 5 (the min and max found at index 1). The result is 4. For the next value I clip 4 between 3 and 5. It's still 4. And so on. So we finally have:
output = [3, 4, 4, 4, 3, 3]
I cannot find a way to avoid using a python for loop in my code. Here is what I have for the moment:
def second_window(array, samples):
sample_idx = samples - 1
output = np.zeros_like(array[0:-sample_idx])
start, stop = 0, len(array)
last_value = array[0]
# Sliding window is a deque of length 'samples'.
sliding_window = deque(array[start : start+sample_idx], samples)
for i in xrange( stop - start - sample_idx):
# Get the next value in sliding window. After the first loop,
# the left value gets discarded automatically.
sliding_window.append(array[start + i + sample_idx])
min_value, max_value = min(sliding_window), max(sliding_window)
# Clip the last value between sliding window min and max
last_value = min( max(last_value, min_value), max_value)
output[start + i] = last_value
return output
Would it be possible to achieve this result with only numpy?
I don't think you can. You can sometime do this kind of iterative computation with unbuffered ufuncs, but this isn't the case. But let me ellaborate...
OK, first the windowing an min/max calculations can be done much faster:
>>> a = np.array([3, 4, 5, 4, 3, 2, 3, 3])
>>> len_a = len(a)
>>> win = 3
>>> win_a = as_strided(a, shape=(len_a-win+1, win), strides=a.strides*2)
>>> win_a
array([[3, 4, 5],
[4, 5, 4],
[5, 4, 3],
[4, 3, 2],
[3, 2, 3],
[2, 3, 3]])
>>> min_ = np.min(win_a, axis=-1)
>>> max_ = np.max(win_a, axis=-1)
Now, lets create and fill up your output array:
>>> out = np.empty((len_a-win+1,), dtype=a.dtype)
>>> out[0] = a[0]
If np.clip where a ufunc, we could then try to do:
>>> np.clip(out[:-1], min_[1:], max_[1:], out=out[1:])
array([4, 3, 3, 3, 3])
>>> out
array([3, 4, 3, 3, 3, 3])
But this doesn't work, because np.clip is not a ufunc, and there seems to be some buffering involved.
And if you apply np.minimum and np.maximum separately, then it doesn't always work:
>>> np.minimum(out[:-1], max_[1:], out=out[1:])
array([3, 3, 3, 3, 3])
>>> np.maximum(out[1:], min_[1:], out=out[1:])
array([4, 3, 3, 3, 3])
>>> out
array([3, 4, 3, 3, 3, 3])
although for your particular case reversing the other does work:
>>> np.maximum(out[:-1], min_[1:], out=out[1:])
array([4, 4, 4, 4, 4])
>>> np.minimum(out[1:], max_[1:], out=out[1:])
array([4, 4, 4, 3, 3])
>>> out
array([3, 4, 4, 4, 3, 3])
You can use numpy.clip to perform the clipping operation in a vectorized way, but computing the min and max over a moving window is going to entail some Python loops and a deque or stack structure like you've already implemented.
See these questions for more examples of the approach:
Computing a moving maximum
Find the min number in all contiguous subarrays of size l of a array of size n
Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations
Related
For example, let's consider the following numpy array:
[1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
Also, let's suppose that the threshold is equal to 3.
That is to say that we are looking for sequences of at least two consecutive values that are all above the threshold.
The output would be the indices of those values, which in our case is:
[[3, 4, 5], [8, 9]]
If the output array was flattened that would work as well!
[3, 4, 5, 8, 9]
Output Explanation
In our initial array we can see that for index = 1 we have the value 5, which is greater than the threshold, but is not part of a sequence (of at least two values) where every value is greater than the threshold. That's why this index would not make it to our output.
On the other hand, for indices [3, 4, 5] we have a sequence of (at least two) neighboring values [5, 4, 6] where each and every of them are above the threshold and that's the reason that their indices are included in the final output!
My Code so far
I have approached the issue with something like this:
(arr > 3).nonzero()
The above command gathers the indices of all the items that are above the threshold. However, I cannot determine if they are consecutive or not. I have thought of trying a diff on the outcome of the above snippet and then may be locating ones (that is to say that indices are one after the other). Which would give us:
np.diff((arr > 3).nonzero())
But I'd still be missing something here.
If you convolve a boolean array with a window full of 1 of size win_size ([1] * win_size), then you will obtain an array where there is the value win_size where the condition held for win_size items:
import numpy as np
def groups(arr, *, threshold, win_size, merge_contiguous=False, flat=False):
conv = np.convolve((arr >= threshold).astype(int), [1] * win_size, mode="valid")
indexes_start = np.where(conv == win_size)[0]
indexes = [np.arange(index, index + win_size) for index in indexes_start]
if flat or merge_contiguous:
indexes = np.unique(indexes)
if merge_contiguous:
indexes = np.split(indexes, np.where(np.diff(indexes) != 1)[0] + 1)
return indexes
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
win_size = 2
print(groups(arr, threshold=threshold, win_size=win_size))
print(groups(arr, threshold=threshold, win_size=win_size, merge_contiguous=True))
print(groups(arr, threshold=threshold, win_size=win_size, flat=True))
[array([3, 4]), array([4, 5]), array([8, 9])]
[array([3, 4, 5]), array([8, 9])]
[3 4 5 8 9]
You can do what you want using simple numpy operations
import numpy as np
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
arr_padded = np.concatenate(([0], arr, [0]))
a = np.where(arr_padded > 3, 1, 0)
da = np.diff(a)
idx_start = (da == 1).nonzero()[0]
idx_stop = (da == -1).nonzero()[0]
valid = (idx_stop - idx_start >= 2).nonzero()[0]
result = [list(range(idx_start[i], idx_stop[i])) for i in valid]
print(result)
Explanation
Array a is a padded binary version of the original array, with 1s where the original elements are greater than three. da contains 1s where "islands" of 1s begin in a, and -1 where the "islands" end in a. Due to the padding, there is guaranteed to be an equal number of 1s and -1s in da. Extracting their indices, we can calculate the length of the islands. Valid index pairs are those whose respective "islands" have length >= 2. Then, its just a matter of generating all numbers between the index bounds of the valid "islands".
I follow your original idea. You are almost done.
I use another diff2 to pick the index of the first value in a sequence. See comments in code for details.
import numpy as np
arr = np.array([ 1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
all_idx = (arr > threshold).nonzero()[0]
# array([1, 3, 4, 5, 8, 9])
result = np.empty(0)
if all_idx.size > 1:
diff1 = np.zeros_like(all_idx)
diff1[1:] = np.diff(all_idx)
# array([0, 2, 1, 1, 3, 1])
diff1[0] = diff1[1]
# array([2, 2, 1, 1, 3, 1])
# **Positions with a value 1 in diff1 should be reserved.**
# But we also want the position before each 1. Create another diff2
diff2 = np.zeros_like(all_idx)
diff2[:-1] = np.diff(diff1)
# array([ 2, -1, 0, 2, -2, 0])
# **Positions with a negative value in diff2 should be reserved.**
result = all_idx[(diff1==1) | (diff2<0)]
print(result)
# array([3, 4, 5, 8, 9])
I'll try something different using window views, I'm not sure this works all the time so counterexamples are welcome. It has the advantage of not requiring Python loops.
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as window
def consec_thresh(arr, thresh):
win = window(np.argwhere(arr > thresh), (2, 1))
return np.unique(win[np.diff(win, axis=2).ravel() == 1, :,:].ravel())
How does it work?
So we start with the array and gather the indices where the threshold is met:
In [180]: np.argwhere(arr > 3)
Out[180]:
array([[1],
[3],
[4],
[5],
[8],
[9]])
Then we build a sliding window that makes up pair of values along the column (which is the reason for the (2, 1) shape of the window).
In [181]: window(np.argwhere(arr > 3), (2, 1))
Out[181]:
array([[[[1],
[3]]],
[[[3],
[4]]],
[[[4],
[5]]],
[[[5],
[8]]],
[[[8],
[9]]]])
Now we want to take the difference inside each pair, if it's one then the indices are consecutive.
In [182]: np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2)
Out[182]:
array([[[[2]]],
[[[1]]],
[[[1]]],
[[[3]]],
[[[1]]]])
We can plug those values back in the windows we created above,
In [185]: window(np.argwhere(arr > 3), (2, 1))[np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2).ravel() == 1, :, :]
Out[185]:
array([[[[3],
[4]]],
[[[4],
[5]]],
[[[8],
[9]]]])
Then we can ravel (flatten without copy when possible), we have to get rid of the repeated indices created by windowing so I call np.unique. We ravel again and get:
array([3, 4, 5, 8, 9])
The below iteration code should help with O(n) complexity
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
threshold = 3
sequence = 2
output = []
temp_arr = []
for i in range(len(arr)):
if arr[i] > threshold:
temp_arr.append(i)
else:
if len(temp_arr) >= sequence:
output.append(temp_arr)
temp_arr = []
if len(temp_arr):
output.append(temp_arr)
temp_arr = []
print(output)
# Output
# [[3, 4, 5], [8, 9]]
I would suggest using a for loop with two indces. You will have one that starts at j=1 and the other at i=0, both stepping forward by 1.
You can then ask if the value at both is greater than the threshold, if so
add the indices to a list and keep moving forward with j until the threshold or .next() is not greater than threshhold.
values = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
res=[]
threshold= 3
i=0
j=0
for _ in values:
j=i+1
lista=[]
try:
print(f"i: {i} j:{j}")
# check if condition is met
if(values[i] > threshold and values[j] > threshold):
lista.append(i)
# add sequence
while values[j] > threshold:
lista.append(j)
print(f"j while: {j}")
j+=1
if(j>=len(values)):
break
res.append(lista)
i=j
if(j>=len(values)):
break
except:
print("ex")
this works. but needs refactoring
Let's try the following code:
# Simple is better than complex
# Complex is better than complicated
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
arr_3=[i if arr[i]>3 else 'a' for i in range(len(arr))]
arr_4=''.join(str(x) for x in arr_3)
i=0
while i<len(arr_5):
if len(arr_5[i]) <=1:
del arr_5[i]
else:
i+=1
arr_6=[list(map(lambda x: int(x), list(x))) for x in arr_5]
print(arr_6)
Outputs:
[[3, 4, 5], [8, 9]]
Here is a solution that makes use of pandas Series:
thresh = 3
win_size = 2
s = pd.Series(arr)
# locating groups of values where there are at least (win_size) consecutive values above the threshold
groups = s.groupby(s.le(thresh).cumsum().loc[s.gt(thresh)]).transform('count').ge(win_size)
0 False
1 False
2 False
3 True
4 True
5 True
6 False
7 False
8 True
9 True
dtype: bool
We can now easily take their indices in a 1D array:
np.flatnonzero(groups)
# array([3, 4, 5, 8, 9], dtype=int64)
OR multiple lists:
[np.arange(index.start, index.stop) for index in np.ma.clump_unmasked(np.ma.masked_not_equal(groups.values, value=True))]
# [array([3, 4, 5], dtype=int64), array([8, 9], dtype=int64)]
Looking for something like
A = np.array([1, 2, 3, 4])
B = np.array([5, 7])
print A.add(B, 1)
[1, 7, 10, 4]
potentially choosing axis to add along.
I read the scipy docs for the function here : scipy.ndimage.uniform_filter1d. However, when I tried using it, I couldn't wrap around my head on it's working. I read the docs, ran the example over there in the Python Shell, used my own example but still no progress.
For eg:
>>> from scipy.ndimage import uniform_filter1d
>>> uniform_filter1d([2, 8, 0, 4, 1, 9, 9, 0], size=3)
array([4, 3, 4, 1, 4, 6, 6, 3])
>>> uniform_filter1d([1, 2, 3, 4, 5, 6, 7, 8], size=3)
array([1, 2, 3, 4, 5, 6, 7, 7])
When I saw the output of the second array, it felt like the function retained most of the array's elements. However in the second example it felt like barring 4 & 1 all the other elements in the output array were completely new.
Thus I would like you to help me understand the working and the use of this function.
What this filter does is, according to size, to take the arithmetic average of each pixel with its neighbor. Size is the size of the sub-array to calculate arithmetic average. The standard for pixels without enough neighbors is to reflect. Let us go its process:
uniform_filter1d([1,2,3,4,5,6], size=3)
[1,2,3,4,5,6] # index 0, Reflect 1 : [1,1,2] -> average: 4/3 = 1
[1,2,3,4,5,6] # index 1, [1,2,3] -> average: 6/3 = 2
[1,2,3,4,5,6] # index 2, [2,3,4] -> average: 9/3 = 3
[1,2,3,4,5,6] # index 3, [3,4,5] -> average: 12/3 = 4
[1,2,3,4,5,6] # index 4, [4,5,6] -> average: 15/3 = 5
[1,2,3,4,5,6] # index 5, Reflect 6 : [5,6,6] -> average: 17/3 = 5
Result: [1,2,3,4,5,5]
I am trying to understand numpy's argpartition function. I have made the documentation's example as basic as possible.
import numpy as np
x = np.array([3, 4, 2, 1])
print("x: ", x)
a=np.argpartition(x, 3)
print("a: ", a)
print("x[a]:", x[a])
This is the output...
('x: ', array([3, 4, 2, 1]))
('a: ', array([2, 3, 0, 1]))
('x[a]:', array([2, 1, 3, 4]))
In the line a=np.argpartition(x, 3) isn't the kth element the last element (the number 1)? If it is number 1, when x is sorted shouldn't 1 become the first element (element 0)?
In x[a], why is 2 the first element "in front" of 1?
What fundamental thing am I missing?
The more complete answer to what argpartition does is in the documentation of partition, and that one says:
Creates a copy of the array with its elements rearranged in such a way
that the value of the element in k-th position is in the position it
would be in a sorted array. All elements smaller than the k-th element
are moved before this element and all equal or greater are moved
behind it. The ordering of the elements in the two partitions is
undefined.
So, for the input array 3, 4, 2, 1, the sorted array would be 1, 2, 3, 4.
The result of np.partition([3, 4, 2, 1], 3) will have the correct value (i.e. same as sorted array) in the 3rd (i.e. last) element. The correct value for the 3rd element is 4.
Let me show this for all values of k to make it clear:
np.partition([3, 4, 2, 1], 0) - [1, 4, 2, 3]
np.partition([3, 4, 2, 1], 1) - [1, 2, 4, 3]
np.partition([3, 4, 2, 1], 2) - [1, 2, 3, 4]
np.partition([3, 4, 2, 1], 3) - [2, 1, 3, 4]
In other words: the k-th element of the result is the same as the k-th element of the sorted array. All elements before k are smaller than or equal to that element. All elements after it are greater than or equal to it.
The same happens with argpartition, except argpartition returns indices which can then be used for form the same result.
Similar to #Imtinan, I struggled with this. I found it useful to break up the function into the arg and the partition.
Take the following array:
array = np.array([9, 2, 7, 4, 6, 3, 8, 1, 5])
the corresponding indices are: [0,1,2,3,4,5,6,7,8] where 8th index = 5 and 0th = 9
if we do np.partition(array, k=5), the code is going to take the 5th element (not index) and then place it into a new array. It is then going to put those elements < 5th element before it and that > 5th element after, like this:
pseudo output: [lower value elements, 5th element, higher value elements]
if we compute this we get:
array([3, 5, 1, 4, 2, 6, 8, 7, 9])
This makes sense as the 5th element in the original array = 6, [1,2,3,4,5] are all lower than 6 and [7,8,9] are higher than 6. Note that the elements are not ordered.
The arg part of the np.argpartition() then goes one step further and swaps the elements out for their respective indices in the original array. So if we did:
np.argpartition(array, 5) we will get:
array([5, 8, 7, 3, 1, 4, 6, 2, 0])
from above, the original array had this structure [index=value]
[0=9, 1=2, 2=7, 3=4, 4=6, 5=3, 6=8, 7=1, 8=5]
you can map the value of the index to the output and you with satisfy the condition:
argpartition() = partition(), like this:
[index form] array([5, 8, 7, 3, 1, 4, 6, 2, 0]) becomes
[3, 5, 1, 4, 2, 6, 8, 7, 9]
which is the same as the output of np.partition(array),
array([3, 5, 1, 4, 2, 6, 8, 7, 9])
Hopefully, this makes sense, it was the only way I could get my head around the arg part of the function.
i remember having a hard time figuring it out too, maybe the documentation is written badly but this is what it means
When you do a=np.argpartition(x, 3) then x is sorted in such a way that only the element at the k'th index will be sorted (in our case k=3)
So when you run this code basically you are asking what would the value of the 3rd index be in a sorted array. Hence the output is ('x[a]:', array([2, 1, 3, 4]))where only element 3 is sorted.
As the document suggests all numbers smaller than the kth element are before it (in no particular order) hence you get 2 before 1, since its no particular order.
i hope this clarifies it, if you are still confused then feel free to comment :)
I have the following matrix:
import numpy as np
A:
matrix([[ 1, 2, 3, 4],
[ 3, 4, 10, 8]])
The question is how do I input the following restriction: if any number of a column in the matrix A is less than or equal to (<=) K (3), then change the last number of that column to minimum between the last entry of the column and 5? So basically, my matrix should transform to this:
A:
matrix([[ 1, 2, 3, 4],
[ 3, 4, 5, 8]])
I tried this function:
A[-1][np.any(A <= 3, axis=0)] = np.maximum(A[-1], 5)
But I have the following error:
TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions
You should be using np.minimum here. Create a mask, and index, setting values accordingly.
B = np.array(A)
m = (B <= 3).any(0)
A[-1, m] = np.minimum(A[-1, m], 5)
A
matrix([[1, 2, 3, 4],
[3, 4, 5, 8]])
Here is one way:
A[-1][np.logical_and(A[-1] > 5, np.any(A <= 3, axis=0))] = 5
# matrix([[1, 2, 3, 4],
# [3, 4, 5, 8]])
This takes advantage of the fact you only need to change a number if it greater than 5. Therefore, the minimum criterion is taken care of by the A[-1] > 5 condition.