sorting a list of tuples in Python - python

While working on a problem from Google Python class, I formulated following result by using 2-3 examples from Stack overflow-
def sort_last(tuples):
return [b for a,b in sorted((tup[1], tup) for tup in tuples)]
print sort_last([(1, 3), (3, 2), (2, 1)])
I learned List comprehension yesterday, so know a little about list comprehension but I am confused how this solution is working overall. Please help me to understand this (2nd line in function).

That pattern is called decorate-sort-undecorate.
You turn each (1, 3) into (3, (1, 3)), wrapping each tuple in a new tuple, with the item you want to sort by first.
You sort, with the outer tuple ensuring that the second item in the original tuple is sorted on first.
You go back from (3, (1, 3)) to (1, 3) while maintaining the order of the list.
In Python, explicitly decorating is almost always unnecessary. Instead, use the key argument of sorted:
sorted(list_of_tuples, key=lambda tup: tup[1]) # or key=operator.itemgetter(1)
Or, if you want to sort on the reversed version of the tuple, no matter its length:
sorted(list_of_tuples, key=lambda tup: tup[::-1])
# or key=operator.itemgetter(slice(None, None, -1))

Lets break it down:
In : [(tup[1],tup) for tup in tuples]
Out: [(3, (1, 3)), (2, (3, 2)), (1, (2, 1))]
So we just created new tuple where its first value is the last value of inner tuple - this way it is sorted by the 2nd value of each tuple in 'tuples'.
Now we sort the returned list:
In: sorted([(3, (1, 3)), (2, (3, 2)), (1, (2, 1))])
Out: [(1, (2, 1)), (2, (3, 2)), (3, (1, 3))]
So we now have our list sorted by its 2nd value of each tuple. All that is remain is to extract the original tuple, and this is done by taking only b from the for loop.
The list comprehension iterates the given list (sorted([...] in this case) and returns the extracted values by order.

Your example works by creating a new list with the element at index 1 followed by the original tuple for each tuple in the list. Eg. (3,(1,3)) for the first element. The sorted function sorts by each element starting from index 0 so the list is sorted by the second item. The function then goes through each item in the new list, and returns the orignal tuples.
Another way of doing this is by using the key parameter in the sorted function which sorts based on the value of the key. In this case you want the key to be the item in each tuple at index 1.
>>> from operator import itemgetter
>>> sorted([(1, 3), (3, 2), (2, 1)],key=itemgetter(1))

Pls refer to the accepted answer.. + here is an example for better visualization,
key is a function that will be called to transform the collection's items for comparison.. like compareTo method in Java.
The parameter passed to key must be something that is callable. Here, the use of lambda creates an anonymous function (which is a callable).
The syntax of lambda is the word lambda followed by a iterable name then a single block of code.
Below example, we are sorting a list of tuple that holds the info abt time of certain event and actor name.
We are sorting this list by time of event occurrence - which is the 0th element of a tuple.
Shout out for the Ready Player One fans! =)
>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]
Note - s.sort([cmp[, key[, reverse]]]) sorts the items of s in place

Related

How to circumvent slow search with Index() method for a large list

I have a large list myList containing tuples.
I need to remove the duplicates in this list (that is the tuples with same elements in the same order). I also need to keep track of this list's indices in a separate list, indexList. If I remove a duplicate, I need to change its index in indexList to first identical value's index.
To demonstrate what I mean, if myList looks like this:
myList = [(6, 2), (4, 3), (6, 2), (8, 1), (5, 4), (4, 3), (2, 1)]
Then I need to construct indexList like this:
indexList = (0, 1, 0, 2, 3, 1, 4)
Here the third value is identical to first, so it (third value) gets index 0. Also the subsequent value gets an updated index of 2 and so on.
Here is how I achieved this:
unique = set()
i = 0
for v in myList[:]:
if v not in unique:
unique.add(v)
indexList.append(i)
i = i+1
else:
myList.pop(i)
indexList.append(myList.index(v))
This does what I need. However index() method makes the script very slow when myList contains hundreds of thousands of elements. As I understand this is because it's an O(n) operation.
So what changes could I make to achieve the same result but make it faster?
If you make a dict to store the first index of each value, you can do the lookup in O(1) instead of O(n). So in this case, before the for loop, do indexes = {}, and then in the if block, do indexes[v] = i and in the else block use indexes[v] instead of myList.index(v).

Why is python's sorted called stable despite not preserving the original order?

Summary
Sorting in Python is guaranteed to be stable since Python 2.2, as documented here and here.
Wikipedia explains what the property of being stable means for the behavior of the algorithm:
A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
However, when sorting objects, such as tuples, sorting appears to be unstable.
For example,
>>> a = [(1, 3), (3, 2), (2, 4), (1, 2)]
>>> sorted(a)
[(1, 2), (1, 3), (2, 4), (3, 2)]
However, to be considered stable, I thought the new sequence should've been
[(1, 3), (1, 2), (2, 4), (3, 2)]
because, in the original sequence, the tuple (1, 3) appears before tuple (1, 2). The sorted function is relying on the 2-ary "keys" when the 1-ary "keys" are equal. (To clarify, the 1-ary key of some tuple t would be t[0] and the 2-ary t[1].)
To produce the expected result, we have to do the following:
>>> sorted(a, key=lambda t: t[0])
[(1, 3), (1, 2), (2, 4), (3, 2)]
I'm guessing there's a false assumption on my part, either about sorted or maybe on how tuple and/or list types are treated during comparison.
Questions
Why is the sorted function said to be "stable" even though it alters the original sequence in this manner?
Wouldn't setting the default behavior to that of the lambda version be more consistent with what "stable" means? Why is it not set this way?
Is this behavior simply a side-effect of how tuples and/or lists are inherently compared (i.e. the false assumption)?
Thanks.
Please note that this is not about whether the default behavior is or isn't useful, common, or something else. It's about whether the default behavior is consistent with the definition of what it means to be stable (which, IMHO, does not appear to be the case) and the guarantee of stability mentioned in the docs.
Think about it - (1, 2) comes before (1, 3), does it not? Sorting a list by default does not automatically mean "just sort it based off the first element". Otherwise you could say that apple comes before aardvark in the alphabet. In other words, this has nothing to do with stability.
The docs also have a nice explanation about how data structures such as lists and tuples are sorted lexicographically:
In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.
Stable sort keeps the order of those elements which are considered equal from the sorting point of view. Because tuples are compared element by element lexicographically, (1, 2) precedes (1, 3), so it should go first:
>>> (1, 2) < (1, 3)
True
A tuple's key is made out of all of its items.
>>> (1,2) < (1,3)
True

Lambda function behavior

I'm trying to understand the behavior of the lambda below. What value is actually passing on to argument pair? This will help me understand the return part pair[1].
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: pair[1])
print (pairs)
As I understand, sort will sort the list pairs. It will compare if a function is passed as a parameter. So how I'm I getting the below output:
OUTPUT:
[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]
If you want to sort by the numeric value, change your lambda to
pairs.sort(key=lambda pair: pair[0])
Python is zero-indexed. The first element of each tuple has index 0. pair[1] would refer to the words in the tuple, not the numbers. So if you want to sort by the text, alphabetically, what you have works.
If you want to see what's being passed through the lambda --- which was your question:
from __future__ import print_function #Needed if you're on Python 2
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: print(pair[1]))
Which returns
one
two
three
four
Verify this by checking the output if you print(pair[0]) or print(pair).
The lambda function receives a parameter which in your case is a tuple of size 2 and returns the second element (the number in word format in your case). the sort method will sort your pairs list according to the key you pass to it, which is the lambda function in your code. In python, when sorting a list of strings, it will sort in lexicographically, so your code will sort the pairs in a way that the 2nd elements are sorted lexicographically.
Try to avoid lambda expressions where possible. At one point, they were going to be eliminated from the language altogether, and various helper functions and classes were introduced to fill the void that would be left behind. (Ultimately, lambda expressions survived, but the helpers remained.)
One of those was the itemgetter class, which can be used to define functions suitable for use with sort.
from operator import itemgetter
pairs.sort(key=itemgetter(0))
(As #metropolis points out, you want to use 0, not 1, to sort by the initial integer component of each pair.)

python: multidimensional sorting of class object

is there a way to sort a list first for x, than y and than z. I'm not sure if my code will do it: (ch is a object with the attribut left_edge)
ch.sort(cmp=lambda x,y: cmp(x.left_edge[0], y.left_edge[0]))
ch.sort(cmp=lambda x,y: cmp(x.left_edge[1], y.left_edge[1]))
ch.sort(cmp=lambda x,y: cmp(x.left_edge[2], y.left_edge[2]))
Simple example:
unsorted
(1,1,2),(2,1,1),(1,1,3),(2,1,2)
sorted
(1,1,2),(1,1,3),(2,1,1),(2,1,2)
but I need the sorted objects...
That is exactly how the default tuple comparer works:
>>> l = [(1, 1, 2), (2, 1, 1), (1, 1, 3), (2, 1, 2)]
>>> sorted(l)
[(1, 1, 2), (1, 1, 3), (2, 1, 1), (2, 1, 2)]
See the comparison description in the documentation:
Comparison of objects of the same type depends on the type:
Tuples and lists are compared lexicographically using comparison of
corresponding elements. This means that to compare equal, each element
must compare equal and the two sequences must be of the same type and
have the same length.
If not equal, the sequences are ordered the same as their first
differing elements. For example, cmp([1,2,x], [1,2,y]) returns the
same as cmp(x,y). If the corresponding element does not exist, the
shorter sequence is ordered first (for example, [1,2] < [1,2,3]).
You should avoid using the cmp argument to sort: if you ever want to upgrade to Python 3.x you'll find it no longer exists. Use the key argument instead:
ch.sort(key=lambda x: x.left_edge)
If, as it appears the left_edge attribute is simply a list or tuple then just use it directly as the key value and it should all work. If it is something unusual that is subscriptable but doesn't compare then build the tuple:
ch.sort(key=lambda x: (x.left_edge[0],x.left_edge[1],x.left_edge[2]))

Returning the highest 6 names in a List of tuple in Python

Please I want to return first 6 names (only the names) with the highest corresponding integers from the list of tuple below.
I have been able to return all the names from highest (sms) to lowest (boss).
[('sms', 10), ('bush', 9), ('michaels', 7), ('operations', 6), ('research', 5), ('code', 4), ('short', 3), ('ukandu', 2), ('technical', 1), ('apeh', 1), ('boss', 1)]
Thank you.
heapq.nlargest is what you want here:
import heapq
from operator import itemgetter
largest_names = [x[0] for x in heapq.nlargest(6,your_list,key=itemgetter(1))]
It will be more efficient than sorting as it only takes the biggest elements and discards the rest. Of course, it is less efficient than slicing if the list is pre-sorted for other reasons.
Complexity:
heapq: O(N)
sorting: O(NlogN)
slicing (only if pre-sorted): O(6)
Explanation:
heapq.nlargest(6,your_list,key=itemgetter(1))
This line returns a list of (name,value) tuples, but only the 6 biggest ones -- comparison is done by the second (index=1 --> key=itemgetter(1)) element in the tuple.
The rest of the line is a list-comprehension over the 6 biggest name,value tuples which only takes the name portion of the tuple and stores it in a list.
It might be of interest to you that you could store this data as a collections.Counter as well.
d = collections.Counter(dict(your_list))
biggest = [x[0] for x in d.most_common(6)]
It's probably not worth converting just to do this calculation (that's what heapq is for after all ;-), but it might be worth converting to make the data easier to work with.
data=[('sms', 10), ('bush', 9), ('michaels', 7), ('operations', 6), ('research', 5), ('code', 4), ('short', 3), ('ukandu', 2), ('technical', 1), ('apeh', 1), ('boss', 1)]
return [x[0] for x in sorted(data, key=lambda x: x[1], reverse=True)[0:6]]
Which does following:
sorted returns data sorted using key function. Since standard sorting order is from ascending, reverse=True sets it do descending;
lambda x: x[1] is anonymous function which returns second element of the argument (of a tuple in this case); itemgetter(1) is nicer way to do this, but requires additional imports;
[0:6] slices first 6 elements of the list;
[x[0] for x in ... ] creates a list of first elements of each passed tuple;
If the data is already sorted simply slice off the first six tuples and then get the names:
first_six = data[0:6] # or data[:6]
only_names = [entry[0] for entry in first_six]
The list comprehension can be unrolled to:
only_names = []
for entry in first_six:
only_names.append(entry[0])
If the list is not already sorted you can use the key keyword argument of the sort method (or the sorted built-in) to sort by score:
data.sort(key=lambda entry: entry[1], reverse=True)
lambda is an anonymous function - the equivalent is:
def get_score(entry):
return entry[1]
data.sort(key=get_score, reverse=True)

Categories

Resources