Related
I have a numerical list:
myList = [1, 2, 3, 100, 5]
Now if I sort this list to obtain [1, 2, 3, 5, 100].
What I want is the indices of the elements from the
original list in the sorted order i.e. [0, 1, 2, 4, 3]
--- ala MATLAB's sort function that returns both
values and indices.
If you are using numpy, you have the argsort() function available:
>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html
This returns the arguments that would sort the array or list.
Something like next:
>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]
enumerate(myList) gives you a list containing tuples of (index, value):
[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]
You sort the list by passing it to sorted and specifying a function to extract the sort key (the second element of each tuple; that's what the lambda is for. Finally, the original index of each sorted element is extracted using the [i[0] for i in ...] list comprehension.
myList = [1, 2, 3, 100, 5]
sorted(range(len(myList)),key=myList.__getitem__)
[0, 1, 2, 4, 3]
I did a quick performance check on these with perfplot (a project of mine) and found that it's hard to recommend anything else but
np.argsort(x)
(note the log scale):
Code to reproduce the plot:
import perfplot
import numpy as np
def sorted_enumerate(seq):
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]
def sorted_enumerate_key(seq):
return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]
def sorted_range(seq):
return sorted(range(len(seq)), key=seq.__getitem__)
b = perfplot.bench(
setup=np.random.rand,
kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, np.argsort],
n_range=[2 ** k for k in range(15)],
xlabel="len(x)",
)
b.save("out.png")
The answers with enumerate are nice, but I personally don't like the lambda used to sort by the value. The following just reverses the index and the value, and sorts that. So it'll first sort by value, then by index.
sorted((e,i) for i,e in enumerate(myList))
Updated answer with enumerate and itemgetter:
sorted(enumerate(a), key=lambda x: x[1])
# [(0, 1), (1, 2), (2, 3), (4, 5), (3, 100)]
Zip the lists together: The first element in the tuple will the index, the second is the value (then sort it using the second value of the tuple x[1], x is the tuple)
Or using itemgetter from the operatormodule`:
from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))
Essentially you need to do an argsort, what implementation you need depends if you want to use external libraries (e.g. NumPy) or if you want to stay pure-Python without dependencies.
The question you need to ask yourself is: Do you want the
indices that would sort the array/list
indices that the elements would have in the sorted array/list
Unfortunately the example in the question doesn't make it clear what is desired because both will give the same result:
>>> arr = np.array([1, 2, 3, 100, 5])
>>> np.argsort(np.argsort(arr))
array([0, 1, 2, 4, 3], dtype=int64)
>>> np.argsort(arr)
array([0, 1, 2, 4, 3], dtype=int64)
Choosing the argsort implementation
If you have NumPy at your disposal you can simply use the function numpy.argsort or method numpy.ndarray.argsort.
An implementation without NumPy was mentioned in some other answers already, so I'll just recap the fastest solution according to the benchmark answer here
def argsort(l):
return sorted(range(len(l)), key=l.__getitem__)
Getting the indices that would sort the array/list
To get the indices that would sort the array/list you can simply call argsort on the array or list. I'm using the NumPy versions here but the Python implementation should give the same results
>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(arr)
array([1, 2, 0, 3], dtype=int64)
The result contains the indices that are needed to get the sorted array.
Since the sorted array would be [1, 2, 3, 4] the argsorted array contains the indices of these elements in the original.
The smallest value is 1 and it is at index 1 in the original so the first element of the result is 1.
The 2 is at index 2 in the original so the second element of the result is 2.
The 3 is at index 0 in the original so the third element of the result is 0.
The largest value 4 and it is at index 3 in the original so the last element of the result is 3.
Getting the indices that the elements would have in the sorted array/list
In this case you would need to apply argsort twice:
>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(np.argsort(arr))
array([2, 0, 1, 3], dtype=int64)
In this case :
the first element of the original is 3, which is the third largest value so it would have index 2 in the sorted array/list so the first element is 2.
the second element of the original is 1, which is the smallest value so it would have index 0 in the sorted array/list so the second element is 0.
the third element of the original is 2, which is the second-smallest value so it would have index 1 in the sorted array/list so the third element is 1.
the fourth element of the original is 4 which is the largest value so it would have index 3 in the sorted array/list so the last element is 3.
If you do not want to use numpy,
sorted(range(len(seq)), key=seq.__getitem__)
is fastest, as demonstrated here.
The other answers are WRONG.
Running argsort once is not the solution.
For example, the following code:
import numpy as np
x = [3,1,2]
np.argsort(x)
yields array([1, 2, 0], dtype=int64) which is not what we want.
The answer should be to run argsort twice:
import numpy as np
x = [3,1,2]
np.argsort(np.argsort(x))
gives array([2, 0, 1], dtype=int64) as expected.
Most easiest way you can use Numpy Packages for that purpose:
import numpy
s = numpy.array([2, 3, 1, 4, 5])
sort_index = numpy.argsort(s)
print(sort_index)
But If you want that you code should use baisc python code:
s = [2, 3, 1, 4, 5]
li=[]
for i in range(len(s)):
li.append([s[i],i])
li.sort()
sort_index = []
for x in li:
sort_index.append(x[1])
print(sort_index)
We will create another array of indexes from 0 to n-1
Then zip this to the original array and then sort it on the basis of the original values
ar = [1,2,3,4,5]
new_ar = list(zip(ar,[i for i in range(len(ar))]))
new_ar.sort()
`
s = [2, 3, 1, 4, 5]
print([sorted(s, reverse=False).index(val) for val in s])
For a list with duplicate elements, it will return the rank without ties, e.g.
s = [2, 2, 1, 4, 5]
print([sorted(s, reverse=False).index(val) for val in s])
returns
[1, 1, 0, 3, 4]
Import numpy as np
FOR INDEX
S=[11,2,44,55,66,0,10,3,33]
r=np.argsort(S)
[output]=array([5, 1, 7, 6, 0, 8, 2, 3, 4])
argsort Returns the indices of S in sorted order
FOR VALUE
np.sort(S)
[output]=array([ 0, 2, 3, 10, 11, 33, 44, 55, 66])
Code:
s = [2, 3, 1, 4, 5]
li = []
for i in range(len(s)):
li.append([s[i], i])
li.sort()
sort_index = []
for x in li:
sort_index.append(x[1])
print(sort_index)
Try this, It worked for me cheers!
firstly convert your list to this:
myList = [1, 2, 3, 100, 5]
add a index to your list's item
myList = [[0, 1], [1, 2], [2, 3], [3, 100], [4, 5]]
next :
sorted(myList, key=lambda k:k[1])
result:
[[0, 1], [1, 2], [2, 3], [4, 5], [3, 100]]
A variant on RustyRob's answer (which is already the most performant pure Python solution) that may be superior when the collection you're sorting either:
Isn't a sequence (e.g. it's a set, and there's a legitimate reason to want the indices corresponding to how far an iterator must be advanced to reach the item), or
Is a sequence without O(1) indexing (among Python's included batteries, collections.deque is a notable example of this)
Case #1 is unlikely to be useful, but case #2 is more likely to be meaningful. In either case, you have two choices:
Convert to a list/tuple and use the converted version, or
Use a trick to assign keys based on iteration order
This answer provides the solution to #2. Note that it's not guaranteed to work by the language standard; the language says each key will be computed once, but not the order they will be computed in. On every version of CPython, the reference interpreter, to date, it's precomputed in order from beginning to end, so this works, but be aware it's not guaranteed. In any event, the code is:
sizediterable = ...
sorted_indices = sorted(range(len(sizediterable)), key=lambda _, it=iter(sizediterable): next(it))
All that does is provide a key function that ignores the value it's given (an index) and instead provides the next item from an iterator preconstructed from the original container (cached as a defaulted argument to allow it to function as a one-liner). As a result, for something like a large collections.deque, where using its .__getitem__ involves O(n) work (and therefore computing all the keys would involve O(n²) work), sequential iteration remains O(1), so generating the keys remains just O(n).
If you need something guaranteed to work by the language standard, using built-in types, Roman's solution will have the same algorithmic efficiency as this solution (as neither of them rely on the algorithmic efficiency of indexing the original container).
To be clear, for the suggested use case with collections.deque, the deque would have to be quite large for this to matter; deques have a fairly large constant divisor for indexing, so only truly huge ones would have an issue. Of course, by the same token, the cost of sorting is pretty minimal if the inputs are small/cheap to compare, so if your inputs are large enough that efficient sorting matters, they're large enough for efficient indexing to matter too.
This question already has answers here:
Increment Numpy array with repeated indices
(3 answers)
Closed 1 year ago.
I have a numpy array-like
x = np.zeros(4, dtype=np.int)
And I have a list of indices like [1, 2, 3, 2, 1] and I want to add 1 to the corresponding array elements, such that for each element in the index list, x is incremented at that position:
x = [0, 2, 2, 1]
I tried doing this using:
x[indices] += 1
But for some reason, it only updates the indices once, and if an index occurs more often than once it is not registered. I could of course just create a simple for loop but I was wondering if there is a one-line solution.
What you are essentially trying to do, is to replace the indexes by their frequencies.
Try np.bincount. Technically that does the same what you are trying to do.
indices = [1, 2, 3, 2, 1]
np.bincount(indices)
array([0, 2, 2, 1])
If you think about what you are doing. You are saying that for index 0, you dont want to count anything. but for index 1, you want 2 counts, .. and so on. Hope that gives you an intuitive sense of why this is the same.
#Stef's solution with np.unique, does exactly the same thing as what np.bincount would do.
You can use unique with return_counts set to True:
idx, cnt = np.unique(indices, return_counts=True)
x[idx] += cnt
You want np.add.at:
np.add.at(x, indices, 1)
x
Out[]:
array([0, 2, 2, 1])
This works even if x doesn't start out as np.zeros
Based on your question, you can really just write
import numpy as np
indices = [1, 2, 3, 2, 1]
x = np.array([indices.count(i) for i in range(4)])
Because count counts repeated elements. But the full solution would be
import numpy as np
x = np.zeros(4, dtype=np.int)
indices = [1, 2, 3, 2, 1]
result = np.array([indices.count(i) for i in range(4)])
x += result
I've been given a homework task that asks me to find in a list of data the greatest continuous increase. i.e [1,2,3,4,5,3,1,2,3] the greatest static increase here is 4.
I've written a function that takes a single list and spits out a list of sublists like this.
def group_data(lst):
sublist= [[lst[0]]]
for i in range(1, len(lst)):
if lst[i-1] < lst[i]:
sublist[-1].append(lst[i])
else:
sublist.append([lst[i]])
return(sublist)
Which does what it's supposed to
group_data([1,2,3,4,5,6,7,8,9,10,1,2,3,5,4,7,8])
Out[3]: [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 5], [4, 7, 8]]
And I now want to subtract the last element of each individual list from the first to find their differences. But I'm having difficulty figuring out how to map the function to each list rather than each element of the list. Any help would be greatly appreciated.
you can do it using map function where arr is your grouped list
list(map(lambda x: x[-1]-x[0], arr ))
For this problem I think itertools.groupby would be a good choice. Since your final goal is to find the difference of longest consecutive numbers:
from itertools import groupby
max_l = max([len(list(g)) - 1 for k, g in groupby(enumerate([1,2,3,4,5,6,7,8,9,10,1,2,3,5,4,7,8]), key=lambda x: x[0] - x[1])])
print(max_l)
#it will print 9
Explanation:
First groupby the numbers with the difference between index and number value. For example [0, 1, 2, 4] will create [0, 0, 0, 1] as the index of 0 is 0, so 0-0=0, for the second one 1-1=0. Then take the maximum length of the grouped list. Since you want difference, I used len(list(g)) - 1
I'm trying to do the following in python: given a list of lists and an integer i
input = [[1, 2, 3, 4], [1, 2, 3, 4], [5, 6, 7, 8]]
i = 1
I need to obtain another list which has all 1s for the elements of the i-th list, 0 otherwise
output = [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]
I wrote this code
output = []
for sublist in range(0, len(input)):
for item in range(0, len(input[sublist])):
output.append(1 if sublist == i else 0)
and it obviously works, but since I'm a newbie in python I suppose there's a better 'pythonic' way of doing this.
I thought using map could work, but I can't get the index of the list with it.
Creating extra variable to get index of current element in interation is quite unpythonic. Usual alternative is usage of enumerate built-in function.
Return an enumerate object. sequence must be a sequence, an iterator,
or some other object which supports iteration. The next() method of
the iterator returned by enumerate() returns a tuple containing a
count (from start which defaults to 0) and the values obtained from
iterating over sequence.
You may use list comprehension with double loop inside it for concise one liner:
input_seq = [[1, 2, 3, 4], [1, 2, 3, 4], [5, 6, 7, 8]]
i = 1
o = [1 if idx == i else 0 for idx, l in enumerate(input_seq) for _ in l]
Alternatively,
o = [int(idx == i) for idx, l in enumerate(input_seq) for _ in l]
Underscore is just throwaway name, since in this case we don't care for actual values stored in input sublists.
Here's a 1-liner, but it's not really obvious:
output = [int(j == i) for j, sublist in enumerate(input) for _ in sublist]
Somewhat more obvious:
output = []
for j, sublist in enumerate(input):
output.extend([int(i == j)] * len(sublist))
Then "0 or 1?" is computed only once per sublist, which may or may not be more efficient.
I've an array my_array and I want, due to specific reasons ignore the values -5 and -10 of it (yes, in the example below there's not a -10 but in other arrays I've to manage yes), and get the index of the three minimum values of the array, and append them to a new list titled lista_indices_candidatos.
This is my code.
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
a = np.array(my_array)
indices = a.argsort()
indices = indices[a[indices] != -5]
indices = indices[a[indices] != -10]
lista_indices_candidatos = []
for i in indices[:3]:
lista_indices_candidatos.append(i)
print lista_indices_candidatos
This gets me the index of the 3 minimum values [6, 0, 3] from the array [4, -5, 10, 4, 4, 4, 0, 4, 4]
The thing is that, if there are repeated values, this get's me the first three minimum values (the first 4 (index 0) the second 4 (index 3), ignoring the rest 4's of the array.
How can I change the code to get completely randomly the three minimum values, without taking always the first three?
myArray = [4, -5, 10, 4, 4, 4, 0, 4, 4]
myUniqueArray = list(set(myArray))
myUniqueArray.sort()
return [myArray.index(myUniqueArray[0]), myArray.index(myUniqueArray[1]), myArray.index(myUniqueArray[2])]
.index would not give you a random index in the sense that it will always be the same value for a give set of input list but you could play with that part.
I haven't introduced randomness, because it don't really see the point for doing this.
If you need the first 3 lowest positive values:
sorted([x for x in my_array if x >= 0])[:3]
If you need the first three lowest positive values and their initial index:
sorted([(x,idx) for idx,x in enumerate(my_array) if x >= 0], key=lambda t: t[0])[:3]
If you just need the first 3 lowest positive values initial indexes:
[i for x,i in sorted([(x,idx) for idx,x in enumerate(my_array) if x >= 0], key=lambda t: t[0])[:3]]
My take is that you want to get 3 random indices for values in my_array, excluding [-10, -5], the 3 random indices must be chosen within the index list of the 3 lowest values of the remaining set, right?
What about this:
from random import sample
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
sample([i for i, x in enumerate(my_array) if x in sorted(set(my_array) - {-10, -5})[:3]], 3)
Factoring out the limited set of values, that would be:
from random import sample
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
filtered_list = sorted(set(my_array) - {-10, -5})[:3]
# Print 3 sample indices from my_array
print sample([i for i, x in enumerate(my_array) if x in filtered_list], 3)
Ok, I'm also not sure what you are trying to achieve. I like the simplicity of Nasha's answer, but I think you want to always have the index of the 0 in the result set. The way I understand you, you want the index of the lowest three values and only if one of those values is listed more than once, do you want to pick randomly from those.
Here's my try a solution:
import random
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
my_dict = {}
lista_indices_candidatos = []
for index, item in enumerate(my_array):
try:
my_dict[item] = my_dict[item] + [index]
except:
my_dict[item] = [index]
for i in [x for x in sorted(my_array) if x != -10 and x != -5][:3]:
lista_indices_candidatos.append(random.choice(my_dict[i]))
print lista_indices_candidatos
In this solution, I build a dictionary with all the values from my_array as keys. The values of the dictionary is a list of indexes these numbers have in my_array. I then use a list comprehension and slicing to get the three lowest values to iterate over in the for loop. There, I can randomly pick an index for a given value by randomly selecting from my_dict.
I bet there are better ways to achieve what you want to achieve, though. Maybe you can let us know what it is you are trying to do so we can improve on our answers.
After reading your comment, I see that you do not actually want a completely random selection, but instead a random selection without repetition. So here's an updated version.
import random
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
my_dict = {}
lista_indices_candidatos = []
for index, item in enumerate(my_array):
try:
my_dict[item] = my_dict[item] + [index]
except:
my_dict[item] = [index]
for l in my_dict:
random.shuffle(my_dict[l])
for i in [x for x in sorted(my_array) if x != -10 and x != -5][:3]:
lista_indices_candidatos.append(my_dict[i].pop())
print lista_indices_candidatos
How about this one:
import random
def eachIndexSorted(a): # ... without -5 and -10
for value in sorted(set(a) - { -5, -10 }):
indexes = [ i for i in range(len(a)) if a[i] == value ]
random.shuffle(indexes)
for i in indexes:
yield i
def firstN(iterator, n):
for i in range(n):
yield iterator.next()
print list(firstN(eachIndexSorted(my_array), 3))
If you have very large data, then sorting the complete set might be too costly; finding each next minimum iteratively might then be a better approach. (Ask for more details if this aspect is unclear and important for you.)