I am trying to understand numpy's argpartition function. I have made the documentation's example as basic as possible.
import numpy as np
x = np.array([3, 4, 2, 1])
print("x: ", x)
a=np.argpartition(x, 3)
print("a: ", a)
print("x[a]:", x[a])
This is the output...
('x: ', array([3, 4, 2, 1]))
('a: ', array([2, 3, 0, 1]))
('x[a]:', array([2, 1, 3, 4]))
In the line a=np.argpartition(x, 3) isn't the kth element the last element (the number 1)? If it is number 1, when x is sorted shouldn't 1 become the first element (element 0)?
In x[a], why is 2 the first element "in front" of 1?
What fundamental thing am I missing?
The more complete answer to what argpartition does is in the documentation of partition, and that one says:
Creates a copy of the array with its elements rearranged in such a way
that the value of the element in k-th position is in the position it
would be in a sorted array. All elements smaller than the k-th element
are moved before this element and all equal or greater are moved
behind it. The ordering of the elements in the two partitions is
undefined.
So, for the input array 3, 4, 2, 1, the sorted array would be 1, 2, 3, 4.
The result of np.partition([3, 4, 2, 1], 3) will have the correct value (i.e. same as sorted array) in the 3rd (i.e. last) element. The correct value for the 3rd element is 4.
Let me show this for all values of k to make it clear:
np.partition([3, 4, 2, 1], 0) - [1, 4, 2, 3]
np.partition([3, 4, 2, 1], 1) - [1, 2, 4, 3]
np.partition([3, 4, 2, 1], 2) - [1, 2, 3, 4]
np.partition([3, 4, 2, 1], 3) - [2, 1, 3, 4]
In other words: the k-th element of the result is the same as the k-th element of the sorted array. All elements before k are smaller than or equal to that element. All elements after it are greater than or equal to it.
The same happens with argpartition, except argpartition returns indices which can then be used for form the same result.
Similar to #Imtinan, I struggled with this. I found it useful to break up the function into the arg and the partition.
Take the following array:
array = np.array([9, 2, 7, 4, 6, 3, 8, 1, 5])
the corresponding indices are: [0,1,2,3,4,5,6,7,8] where 8th index = 5 and 0th = 9
if we do np.partition(array, k=5), the code is going to take the 5th element (not index) and then place it into a new array. It is then going to put those elements < 5th element before it and that > 5th element after, like this:
pseudo output: [lower value elements, 5th element, higher value elements]
if we compute this we get:
array([3, 5, 1, 4, 2, 6, 8, 7, 9])
This makes sense as the 5th element in the original array = 6, [1,2,3,4,5] are all lower than 6 and [7,8,9] are higher than 6. Note that the elements are not ordered.
The arg part of the np.argpartition() then goes one step further and swaps the elements out for their respective indices in the original array. So if we did:
np.argpartition(array, 5) we will get:
array([5, 8, 7, 3, 1, 4, 6, 2, 0])
from above, the original array had this structure [index=value]
[0=9, 1=2, 2=7, 3=4, 4=6, 5=3, 6=8, 7=1, 8=5]
you can map the value of the index to the output and you with satisfy the condition:
argpartition() = partition(), like this:
[index form] array([5, 8, 7, 3, 1, 4, 6, 2, 0]) becomes
[3, 5, 1, 4, 2, 6, 8, 7, 9]
which is the same as the output of np.partition(array),
array([3, 5, 1, 4, 2, 6, 8, 7, 9])
Hopefully, this makes sense, it was the only way I could get my head around the arg part of the function.
i remember having a hard time figuring it out too, maybe the documentation is written badly but this is what it means
When you do a=np.argpartition(x, 3) then x is sorted in such a way that only the element at the k'th index will be sorted (in our case k=3)
So when you run this code basically you are asking what would the value of the 3rd index be in a sorted array. Hence the output is ('x[a]:', array([2, 1, 3, 4]))where only element 3 is sorted.
As the document suggests all numbers smaller than the kth element are before it (in no particular order) hence you get 2 before 1, since its no particular order.
i hope this clarifies it, if you are still confused then feel free to comment :)
Related
Suppose I have the following array:
import numpy as np
x = np.array([1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5])
How can I manipulate it to remove the term in equally spaced intervals and adapt the new length for it? For example, I'd like to have:
x = [1,2,3,4,
1,2,3,4,
1,2,3,4]
Where the terms from positions 4, 9, and 14 were excluded (so every 5 terms, one gets excluded). If possible, I'd like to have a code that I could use for an array with length N. Thank you in advance!
In your case, you can simply run code below after initializing the x array(as you did your question):
x.reshape(3,5)[:,:4]
Output
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
If you are interested in getting a vector and not a matrix(such as the output above), you can call the flatten function on the code above:
x.reshape(3,5)[:,:4].flatten()
Output
array([1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4])
Explanation
Since x is a numpy array, we can use NumPy in-built functions such as reshape. This function, which has a self-explanatory name, shapes the array into the desired format. x was a vector of 15 elements. Therefore, running x.reshape(3,5) gives us a matrix with 3 rows and five columns. [:, :4] is to reselect the first four columns. flatten function changes a matrix into a vector.
IIUC, you can use a boolean mask generated with the modulo (%) operator:
N = 5
mask = np.arange(len(x))%N != N-1
x[mask]
output: array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
This works even if your array has not a size that is a multiple of N
How can the elements from (take only range 3 to 8)
a = np.array([1,2,3,4,5,6,7,8,9])
go to
A = np.array([[0,0,0],
[0,0,0]])
Ideal output would be:
A = ([[3,4,5],
[6,7,8]])
np.arange(3, 9).reshape((2, 3))
outputs
array([[3, 4, 5],
[6, 7, 8]])
A possible technique, presuming that you have an existing numpy array a is to use slicing and reshaping:
Starting array
>>> a = np.array([1,2,3,4,5,6,7,8,9])
>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Slicing
>>> A = a[2:-1]
>>> A
array([3, 4, 5, 6, 7, 8])
Reshaping
>>> A.reshape((2, 3))
>>> A
array([[3, 4, 5],
[6, 7, 8]])
The above solution presumes that you know which index to choose when doing the slicing. In this case, I presumed that we knew that the element 3 occurs at the second index position and I presumed that we knew that the last desired element 8 occurred in the second to last position in the array (at index -1). For clarity's sake: slicing starts at the given index, but goes up to and not including the second index position AND it is often easier to find the index position close to the end of the list by counting backwards using negative index numbers as I have done here. An alternate would be to use the index position of the last element which is an 8:
A = a[2:8].
A one-liner solution would be to daisy-chain the method calls together:
Starting array
>>> a = np.array([1,2,3,4,5,6,7,8,9])
>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Slicing and reshaping
>>> A = a[2:-1].reshape((2, 3))
>>> A
array([[3, 4, 5],
[6, 7, 8]])
How to initialize an array whose first dimension is fixed (say 5) but the second dimension may vary. For example, we create an array arr with five entries and then we add some element, e.g. to arr[1] by appending some value, and then to arr[2], and then again we append to arr[1], etc.
You can use a 2D list here to make your life easier
#Define the list
a = [[],[]]
#Add 5 elements to both sublists
for i in range(6):
a[0].append(i)
a[1].append(i)
print(a)
#[[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]]
#Add more elements to 2nd sublist
a[1].append(6)
a[1].append(7)
print(a)
#[[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6, 7]]
Let's say I have
arr = np.arange(6)
arr
array([0, 1, 2, 3, 4, 5])
and I decide that I want to treat an array "like a circle": When I run out of material at the end, I want to start at index 0 again. That is, I want a convenient way of selecting x elements, starting at index i.
Now, if x == 6, I can simply do
i = 3
np.hstack((arr[i:], arr[:i]))
Out[9]: array([3, 4, 5, 0, 1, 2])
But is there a convenient way of generally doing this, even if x > 6, without having to manually breaking the array apart and thinking through the logic?
For example:
print(roll_array_arround(arr)[2:17])
should return.
array([2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0])
See mode='wrap' in ndarray.take:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
Taking your hypothetical function:
print(roll_array_arround(arr)[2:17])
If it is implied that it is a true slice of the original array that you are after, that is not going to happen; a wrapped-around array cannot be expressed as a strided view of the original; so if you seek a function that maps an ndarray to an ndarray, this will necessarily involve a copy of your data.
That is, efficiency-wise, you shouldnt expect to find solution that significantly differs in performance from the expression below.
print(arr.take(np.arange(2,17), mode='wrap'))
Modulus operation seems like the best fit here -
def rolling_array(n, x, i):
# n is rolling period
# x is length of array
# i is starting number
return np.mod(np.arange(i,i+x),n)
Sample runs -
In [61]: rolling_array(n=6, x=6, i=3)
Out[61]: array([3, 4, 5, 0, 1, 2])
In [62]: rolling_array(n=6, x=17, i=2)
Out[62]: array([2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0])
A solution you can look into would probably be :
from itertools import cycle
list_to_rotate = np.array([1,2,3,4,5])
rotatable_list = cycle(list_to_rotate)
You need to roll your array.
>>> x = np.arange(10)
>>> np.roll(x, 2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
See numpy documentation for more details.
Here is an algorithm I would like to implement using numpy:
For a given 1D array, calculate the maximum and the minimum over a sliding window.
Create a new array, with the first value equals to the first value in the given array.
For each subsequent values, clip the previous value inserted in the new array between the min and the max from the sliding window.
As an example, let's take the array a=[3, 4, 5, 4, 3, 2, 3, 3] and a sliding window of size 3. We find for min and max:
min = [3, 4, 3, 2, 2, 2]
max = [5, 5, 5, 4, 3, 3]
Now our output array will start with the first element from a, so it's 3. And for the next value, I clip 3 (the last value inserted) between 4 and 5 (the min and max found at index 1). The result is 4. For the next value I clip 4 between 3 and 5. It's still 4. And so on. So we finally have:
output = [3, 4, 4, 4, 3, 3]
I cannot find a way to avoid using a python for loop in my code. Here is what I have for the moment:
def second_window(array, samples):
sample_idx = samples - 1
output = np.zeros_like(array[0:-sample_idx])
start, stop = 0, len(array)
last_value = array[0]
# Sliding window is a deque of length 'samples'.
sliding_window = deque(array[start : start+sample_idx], samples)
for i in xrange( stop - start - sample_idx):
# Get the next value in sliding window. After the first loop,
# the left value gets discarded automatically.
sliding_window.append(array[start + i + sample_idx])
min_value, max_value = min(sliding_window), max(sliding_window)
# Clip the last value between sliding window min and max
last_value = min( max(last_value, min_value), max_value)
output[start + i] = last_value
return output
Would it be possible to achieve this result with only numpy?
I don't think you can. You can sometime do this kind of iterative computation with unbuffered ufuncs, but this isn't the case. But let me ellaborate...
OK, first the windowing an min/max calculations can be done much faster:
>>> a = np.array([3, 4, 5, 4, 3, 2, 3, 3])
>>> len_a = len(a)
>>> win = 3
>>> win_a = as_strided(a, shape=(len_a-win+1, win), strides=a.strides*2)
>>> win_a
array([[3, 4, 5],
[4, 5, 4],
[5, 4, 3],
[4, 3, 2],
[3, 2, 3],
[2, 3, 3]])
>>> min_ = np.min(win_a, axis=-1)
>>> max_ = np.max(win_a, axis=-1)
Now, lets create and fill up your output array:
>>> out = np.empty((len_a-win+1,), dtype=a.dtype)
>>> out[0] = a[0]
If np.clip where a ufunc, we could then try to do:
>>> np.clip(out[:-1], min_[1:], max_[1:], out=out[1:])
array([4, 3, 3, 3, 3])
>>> out
array([3, 4, 3, 3, 3, 3])
But this doesn't work, because np.clip is not a ufunc, and there seems to be some buffering involved.
And if you apply np.minimum and np.maximum separately, then it doesn't always work:
>>> np.minimum(out[:-1], max_[1:], out=out[1:])
array([3, 3, 3, 3, 3])
>>> np.maximum(out[1:], min_[1:], out=out[1:])
array([4, 3, 3, 3, 3])
>>> out
array([3, 4, 3, 3, 3, 3])
although for your particular case reversing the other does work:
>>> np.maximum(out[:-1], min_[1:], out=out[1:])
array([4, 4, 4, 4, 4])
>>> np.minimum(out[1:], max_[1:], out=out[1:])
array([4, 4, 4, 3, 3])
>>> out
array([3, 4, 4, 4, 3, 3])
You can use numpy.clip to perform the clipping operation in a vectorized way, but computing the min and max over a moving window is going to entail some Python loops and a deque or stack structure like you've already implemented.
See these questions for more examples of the approach:
Computing a moving maximum
Find the min number in all contiguous subarrays of size l of a array of size n
Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations