I know that this question has been asked a hundred times, but the answer always seems to be "use numpy's argsort". But either I am misinterpreting what most people are asking, or the answers are not correct for the question. Whatever be the case, I wish to get indices of a list's ascending order. The phrasing is confusing, so as an example, given a list [4, 2, 1, 3] I expect to get a list back [3, 1, 0, 2]. The smallest item is 1, so it gets index 0, the largest one is 4 so it gets index 3. It seems to me that argsort is often suggested, but it just doesn't seem to do that.
from numpy import argsort
l = [4, 2, 1, 3]
print(argsort(l))
# [2, 1, 3, 0]
# Expected [3, 1, 0, 2]
Clearly argsort is doing something else, so what is it actually doing and how is it similar to the expected behaviour so that it is so often (wrongly) suggested? And, more importantly, how can I get the desired output?
The argsort() is basically converting your list to a sorted list of indices.
l = [4, 2, 1, 3]
First it gets index of each element in the list so new list becomes:
indexed=[0, 1, 2, 3]
Then it sorts the indexed list according to the items in the original list. As 4:0 , 2:1 , 1:2 and 3:3 where : means "corresponds to".
Sorting the original list we get
l=[1, 2, 3, 4]
And placing values of each corresponding index of old list
new=[2,1,3,0]
So basically it sorts the indices of a list according to the original list.
The reason why you are not getting the 'right,' or expected, answer is because you are asking the wrong question!
What you are after is the element rank after sort while Numpy's argsort() returns the sorted index list, as documented!. These are not the same thing (as you found out ;) )!
#hpaulj answered me correctly, but in a comment. And you can't see him.
His answer helped me a lot, it allows me to get what I want.
import numpy as np
l = [4, 2, 1, 3]
print(np.argsort(np.argsort(l)))
Return:
[3, 1, 0, 2]
This is what you expect. This method returns the indices for the array if it were sorted.
⚠️ But note that if the input array contains repetitions, then there is an interesting effect:
import numpy as np
l = [4, 2, 1, 3, 4]
print(np.argsort(np.argsort(l)))
Return:
[3 1 0 2 4]
He may not harm you, but he does harm to me. I solve this problem like this:
import numpy as np
l = [4, 2, 1, 3, 4]
ret2 = np.vectorize(lambda val: np.searchsorted(np.unique(l), val))(l)
print('Returned', ret2)
print('Expected', [3, 1, 0, 2, 3])
Return:
Returned [3 1 0 2 3]
Expected [3, 1, 0, 2, 3]
True, my solution will be slow due to the vectorize function.
But nothing prevents you from using numba. I haven't tested it though 😉.
Related
Suppose I have the following array:
import numpy as np
x = np.array([1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5])
How can I manipulate it to remove the term in equally spaced intervals and adapt the new length for it? For example, I'd like to have:
x = [1,2,3,4,
1,2,3,4,
1,2,3,4]
Where the terms from positions 4, 9, and 14 were excluded (so every 5 terms, one gets excluded). If possible, I'd like to have a code that I could use for an array with length N. Thank you in advance!
In your case, you can simply run code below after initializing the x array(as you did your question):
x.reshape(3,5)[:,:4]
Output
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
If you are interested in getting a vector and not a matrix(such as the output above), you can call the flatten function on the code above:
x.reshape(3,5)[:,:4].flatten()
Output
array([1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4])
Explanation
Since x is a numpy array, we can use NumPy in-built functions such as reshape. This function, which has a self-explanatory name, shapes the array into the desired format. x was a vector of 15 elements. Therefore, running x.reshape(3,5) gives us a matrix with 3 rows and five columns. [:, :4] is to reselect the first four columns. flatten function changes a matrix into a vector.
IIUC, you can use a boolean mask generated with the modulo (%) operator:
N = 5
mask = np.arange(len(x))%N != N-1
x[mask]
output: array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
This works even if your array has not a size that is a multiple of N
This question already has answers here:
Finding indices of matches of one array in another array
(4 answers)
Closed 3 years ago.
I have a numpy array A which contains unique IDs that can be in any order - e.g. A = [1, 3, 2]. I have a second numpy array B, which is a record of when the ID is used - e.g. B = [3, 3, 1, 3, 2, 1, 2, 3, 1, 1, 2, 3, 3, 1]. Array B is always much longer than array A.
I need to find the indexed location of the ID in A for each time the ID is used in B. So in the example above my returned result would be: result = [1, 1, 0, 1, 2, 0, 2, 1, 0, 0, 2, 1, 1, 0].
I've already written a simple solution that gets the correct result using a for loop to append the result to a new list and using numpy.where, but I can't figure out the correct syntax to vectorize this.
import numpy as np
A = np.array([1, 3, 2])
B = np.array([3, 3, 1, 3, 2, 1, 2, 3, 1, 1, 2, 3, 3, 1])
IdIndxs = []
for ID in B:
IdIndxs.append(np.where(A == ID)[0][0])
IdIndxs = np.array(IdIndxs)
Can someone come up with a simple vector based solution that runs quickly - the for loop becomes very slow when running on a typical problem where is A is of the size of 10K-100K elements and B is some multiple, usually 5-10x larger than A.
I'm sure the solution is simple, but I just can't see it today.
You can use this:
import numpy as np
# test data
A = np.array([1, 3, 2])
B = np.array([3, 3, 1, 3, 2, 1, 2, 3, 1, 1, 2, 3, 3, 1])
# get indexes
sorted_keys = np.argsort(A)
indexes = sorted_keys[np.searchsorted(A, B, sorter=sorted_keys)]
Output:
[1 1 0 1 2 0 2 1 0 0 2 1 1 0]
The numpy-indexed library (disclaimer: I am its author) was designed to provide these type of vectorized operations where numpy for some reason does not. Frankly given how common this vectorized list.index equivalent is useful it definitely ought to be in numpy; but numpy is a slow-moving project that takes backwards compatibility very seriously, and I dont think we will see this until numpy2.0; but until then this is pip and conda installable with the same ease.
import numpy_indexed as npi
idx = npi.indices(A, B)
Reworking your logic but using a list comprehension and numpy.fromiter which should boost performance.
IdIndxs = np.fromiter([np.where(A == i)[0][0] for i in B], B.dtype)
About performance
I've done a quick test comparing fromiter with your solution, and I do not see such boost in performance. Even using a B array of millions of elements, they are of the same order.
This question already has answers here:
Rank items in an array using Python/NumPy, without sorting array twice
(11 answers)
Closed 3 years ago.
I have an array and I want to get the order of each element.
a=[1.83976,1.57624,1.00528,1.55184]
np.argsort(a)
the above code return
array([2, 3, 1, 0], dtype=int64)
but I want to get the array of the order of each element.
e.g.
[0, 1, 3, 2]
means a[0] is largest number (0th)
a[1] is 1th
a[2] is 3rd
a[3] is 2nd
Ill explain,np.argsort(np.argsort(a)) gives the element in there order like 1.83976 is the highest value in the array so it is assigned the highest value 3. I just subtracted the value from len(a)-1 to get your output.
>>> import numpy as np
>>> a=[1.83976,1.57624,1.00528,1.55184]
>>> np.argsort(a)
array([2, 3, 1, 0])
>>> np.argsort(np.argsort(a))
array([3, 2, 0, 1])
>>> [len(a)-i for i in np.argsort(np.argsort(a))]
[1, 2, 4, 3]
>>> [len(a)-1-i for i in np.argsort(np.argsort(a))]
[0, 1, 3, 2]
>>> np.array([len(a)-1]*len(a))-np.argsort(np.argsort(a))
array([0, 1, 3, 2])
By default argsort returnes indexes of sorted elements in ascending order. As you need descending order, argsort(-a) will give you right sorted indexes. To get the rank of elements you need to apply argsort again.
a = np.array([1.83976,1.57624,1.00528,1.55184])
indx_sorted = np.argsort(-a)
np.argsort(indx_sorted)
>>> array([0, 1, 3, 2])
This question already has answers here:
Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it
(4 answers)
Closed 4 years ago.
I would like to sort a numpy array and find out where each element went.
numpy.argsort will tell me for each index in the sorted array, which index in the unsorted array goes there. I'm looking for something like the inverse: For each index in the unsorted array, where does it go in the sorted array.
a = np.array([1, 4, 2, 3])
# a sorted is [1,2,3,4]
# the 1 goes to index 0
# the 4 goes to index 3
# the 2 goes to index 1
# the 3 goes to index 2
# desired output
[0, 3, 1, 2]
# for comparison, argsort output
[0, 2, 3, 1]
A simple solution uses numpy.searchsorted
np.searchsorted(np.sort(a), a)
# produces [0, 3, 1, 2]
I'm unhappy with this solution, because it seems very inefficient. It sorts and searches in two separate steps.
This fancy indexing fails for arrays with duplicates, look at:
a = np.array([1, 4, 2, 3, 5])
print(np.argsort(a)[np.argsort(a)])
print(np.searchsorted(np.sort(a),a))
a = np.array([1, 4, 2, 3, 5, 2])
print(np.argsort(a)[np.argsort(a)])
print(np.searchsorted(np.sort(a),a))
You can just use argsort twice on the list.
At first the fact that this works seems a bit confusing, but if you think about it for a while it starts to make sense.
a = np.array([1, 4, 2, 3])
argSorted = np.argsort(a) # [0, 2, 3, 1]
invArgSorted = np.argsort(argSorted) # [0, 3, 1, 2]
You just need to invert the permutation that sorts the array. As shown in the linked question, you can do that like this:
import numpy as np
def sorted_position(array):
a = np.argsort(array)
a[a.copy()] = np.arange(len(a))
return a
print(sorted_position([0.1, 0.2, 0.0, 0.5, 0.8, 0.4, 0.7, 0.3, 0.9, 0.6]))
# [1 2 0 5 8 4 7 3 9 6]
I have a numpy.ndarray, and want to remove first h elements and last t.
As I see, the more general way is by selecting:
h, t = 1, 1
my_array = [0,1,2,3,4,5]
middle = my_array[h:-t]
and the middle is [1,2,3,4]. This is correct, but when I want not to remove anything, I used h = 0 and t = 0 since I was trying to remove nothing, but this returns empty array. I know it is because of t = 0 and I also know that an if condition for this border case would solve it with my_array[h:] but I don't want this solution (my problem is a little more complex, with more dimensions, code will become ugly)
Any ideas?
Instead, use
middle = my_array[h:len(my_array)-t]
For completeness, here's the trial run:
my_array = [0,1,2,3,4,5]
h,t = 0,0
middle = my_array[h:len(my_array)-t]
print(middle)
Output: [0, 1, 2, 3, 4, 5]
This example was just for a standard array. Since your ultimate goal is to work with numpy multidimensional arrays, this problem is actually a bit trickier. When you say you want to remove the first h elements and the last t elements, are we guaranteed that h and t satisfy the proper divisibility criteria so that the result will be a well-formed array?
I actually think the cleanest solution is simply to use this solution, but divide out by the appropriate factor first. For example, in two dimensions:
h = 3
t = 6
a = numpy.array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
d = numpy.prod(numpy.shape(a)[1:])
mid_a = a[int(h/3):int(len(a)-t/3)]
print(mid_a)
Output: array([[4, 5, 6]])
I have included the int casts in the indices because python 3 automatically promotes division to float, even when the numerator evenly divides the denominator.
The i:j can be replaced with a slice object. and ':j' with slice(None,j), etc:
In [55]: alist = [0,1,2,3,4,5]
In [56]: h,t=1,-1; alist[slice(h,t)]
Out[56]: [1, 2, 3, 4]
In [57]: h,t=None,-1; alist[slice(h,t)]
Out[57]: [0, 1, 2, 3, 4]
In [58]: h,t=None,None; alist[slice(h,t)]
Out[58]: [0, 1, 2, 3, 4, 5]
This works for lists and arrays. For multidimensional arrays use a tuple of indices, which can include slice objects
x[i:j, k:l]
x[(slice(i,j), Ellipsis, slice(k,l))]