I'm trying to understand the behavior of the lambda below. What value is actually passing on to argument pair? This will help me understand the return part pair[1].
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: pair[1])
print (pairs)
As I understand, sort will sort the list pairs. It will compare if a function is passed as a parameter. So how I'm I getting the below output:
OUTPUT:
[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]
If you want to sort by the numeric value, change your lambda to
pairs.sort(key=lambda pair: pair[0])
Python is zero-indexed. The first element of each tuple has index 0. pair[1] would refer to the words in the tuple, not the numbers. So if you want to sort by the text, alphabetically, what you have works.
If you want to see what's being passed through the lambda --- which was your question:
from __future__ import print_function #Needed if you're on Python 2
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: print(pair[1]))
Which returns
one
two
three
four
Verify this by checking the output if you print(pair[0]) or print(pair).
The lambda function receives a parameter which in your case is a tuple of size 2 and returns the second element (the number in word format in your case). the sort method will sort your pairs list according to the key you pass to it, which is the lambda function in your code. In python, when sorting a list of strings, it will sort in lexicographically, so your code will sort the pairs in a way that the 2nd elements are sorted lexicographically.
Try to avoid lambda expressions where possible. At one point, they were going to be eliminated from the language altogether, and various helper functions and classes were introduced to fill the void that would be left behind. (Ultimately, lambda expressions survived, but the helpers remained.)
One of those was the itemgetter class, which can be used to define functions suitable for use with sort.
from operator import itemgetter
pairs.sort(key=itemgetter(0))
(As #metropolis points out, you want to use 0, not 1, to sort by the initial integer component of each pair.)
Related
Summary
Sorting in Python is guaranteed to be stable since Python 2.2, as documented here and here.
Wikipedia explains what the property of being stable means for the behavior of the algorithm:
A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
However, when sorting objects, such as tuples, sorting appears to be unstable.
For example,
>>> a = [(1, 3), (3, 2), (2, 4), (1, 2)]
>>> sorted(a)
[(1, 2), (1, 3), (2, 4), (3, 2)]
However, to be considered stable, I thought the new sequence should've been
[(1, 3), (1, 2), (2, 4), (3, 2)]
because, in the original sequence, the tuple (1, 3) appears before tuple (1, 2). The sorted function is relying on the 2-ary "keys" when the 1-ary "keys" are equal. (To clarify, the 1-ary key of some tuple t would be t[0] and the 2-ary t[1].)
To produce the expected result, we have to do the following:
>>> sorted(a, key=lambda t: t[0])
[(1, 3), (1, 2), (2, 4), (3, 2)]
I'm guessing there's a false assumption on my part, either about sorted or maybe on how tuple and/or list types are treated during comparison.
Questions
Why is the sorted function said to be "stable" even though it alters the original sequence in this manner?
Wouldn't setting the default behavior to that of the lambda version be more consistent with what "stable" means? Why is it not set this way?
Is this behavior simply a side-effect of how tuples and/or lists are inherently compared (i.e. the false assumption)?
Thanks.
Please note that this is not about whether the default behavior is or isn't useful, common, or something else. It's about whether the default behavior is consistent with the definition of what it means to be stable (which, IMHO, does not appear to be the case) and the guarantee of stability mentioned in the docs.
Think about it - (1, 2) comes before (1, 3), does it not? Sorting a list by default does not automatically mean "just sort it based off the first element". Otherwise you could say that apple comes before aardvark in the alphabet. In other words, this has nothing to do with stability.
The docs also have a nice explanation about how data structures such as lists and tuples are sorted lexicographically:
In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.
Stable sort keeps the order of those elements which are considered equal from the sorting point of view. Because tuples are compared element by element lexicographically, (1, 2) precedes (1, 3), so it should go first:
>>> (1, 2) < (1, 3)
True
A tuple's key is made out of all of its items.
>>> (1,2) < (1,3)
True
When map has different-length inputs, a fill value of None is used for the missing inputs:
>>> x = [[1,2,3,4],[5,6]]
>>> map(lambda *x:x, *x)
[(1, 5), (2, 6), (3, None), (4, None)]
This is the same behavior as:
>>> import itertools
>>> list(itertools.izip_longest(*x))
[(1, 5), (2, 6), (3, None), (4, None)]
What's the reason map provides this behavior, and not the following?
>>> map(lambda *x:x, *x)
[(1, 5), (2, 6), (3,), (4,)]
…and is there an easy way to get the latter behavior either with some flavor of zip or map?
I think this a design decision that the core devs opted for at the time they implemented map. There's no universally defined behavior for map when it is used with multiple iterables, quoting from Map (higher-order function):
Map with 2 or more lists encounters the issue of handling when the
lists are of different lengths. Various languages differ on this; some
raise an exception, some stop after the length of the shortest list
and ignore extra items on the other lists; some continue on to the
length of the longest list, and for the lists that have already ended,
pass some placeholder value to the function indicating no value.
So, Python core devs opted for None as placeholder for shorter iterables at the time map was introduced to Python in 1993.
But in case of itertools.imap it short-circuits with the shortest iterable because its design is heavily inspired from languages like Standard ML, Haskell and APL. In Standard ML and Haskell map ends with shortest iterable(I am not sure about APL though).
Python 3 also removed the map(None, ...)(or we should say itertools.imap, Python 3's map is actually almost itertools.imap: Move map() from itertools to builtins) construct because it was present in Python 2 only because at the time map was added to Python there was no zip() function in Python.
From Issue2186: map and filter shouldn't support None as first argument (in Py3k only):
I concur with Guido that we never would have created map(None, ...) if
zip() had existed. The primary use case is obsolete.
To get the result you want I would suggest using itertools.izip_longest with a sentinel value(object()) rather than default None, None will break things if the iterables itself contain None:
from itertools import izip_longest
def solve(seq):
sentinel = object()
return [tuple(x for x in item if x is not sentinel) for item in
izip_longest(*seq, fillvalue=sentinel)]
print solve([[1,2,3,4],[5,6]])
# [(1, 5), (2, 6), (3,), (4,)]
Given that the first list is always longer and that there are only two lists, you would do something like this:
x = [1,2,3,4,5]
y = ['a','b']
zip(x,y) + [(i,) for i in x[len(y):]]
[(1, 'a'), (2, 'b'), (3,), (4,), (5,)]
is there a way to sort a list first for x, than y and than z. I'm not sure if my code will do it: (ch is a object with the attribut left_edge)
ch.sort(cmp=lambda x,y: cmp(x.left_edge[0], y.left_edge[0]))
ch.sort(cmp=lambda x,y: cmp(x.left_edge[1], y.left_edge[1]))
ch.sort(cmp=lambda x,y: cmp(x.left_edge[2], y.left_edge[2]))
Simple example:
unsorted
(1,1,2),(2,1,1),(1,1,3),(2,1,2)
sorted
(1,1,2),(1,1,3),(2,1,1),(2,1,2)
but I need the sorted objects...
That is exactly how the default tuple comparer works:
>>> l = [(1, 1, 2), (2, 1, 1), (1, 1, 3), (2, 1, 2)]
>>> sorted(l)
[(1, 1, 2), (1, 1, 3), (2, 1, 1), (2, 1, 2)]
See the comparison description in the documentation:
Comparison of objects of the same type depends on the type:
Tuples and lists are compared lexicographically using comparison of
corresponding elements. This means that to compare equal, each element
must compare equal and the two sequences must be of the same type and
have the same length.
If not equal, the sequences are ordered the same as their first
differing elements. For example, cmp([1,2,x], [1,2,y]) returns the
same as cmp(x,y). If the corresponding element does not exist, the
shorter sequence is ordered first (for example, [1,2] < [1,2,3]).
You should avoid using the cmp argument to sort: if you ever want to upgrade to Python 3.x you'll find it no longer exists. Use the key argument instead:
ch.sort(key=lambda x: x.left_edge)
If, as it appears the left_edge attribute is simply a list or tuple then just use it directly as the key value and it should all work. If it is something unusual that is subscriptable but doesn't compare then build the tuple:
ch.sort(key=lambda x: (x.left_edge[0],x.left_edge[1],x.left_edge[2]))
Please I want to return first 6 names (only the names) with the highest corresponding integers from the list of tuple below.
I have been able to return all the names from highest (sms) to lowest (boss).
[('sms', 10), ('bush', 9), ('michaels', 7), ('operations', 6), ('research', 5), ('code', 4), ('short', 3), ('ukandu', 2), ('technical', 1), ('apeh', 1), ('boss', 1)]
Thank you.
heapq.nlargest is what you want here:
import heapq
from operator import itemgetter
largest_names = [x[0] for x in heapq.nlargest(6,your_list,key=itemgetter(1))]
It will be more efficient than sorting as it only takes the biggest elements and discards the rest. Of course, it is less efficient than slicing if the list is pre-sorted for other reasons.
Complexity:
heapq: O(N)
sorting: O(NlogN)
slicing (only if pre-sorted): O(6)
Explanation:
heapq.nlargest(6,your_list,key=itemgetter(1))
This line returns a list of (name,value) tuples, but only the 6 biggest ones -- comparison is done by the second (index=1 --> key=itemgetter(1)) element in the tuple.
The rest of the line is a list-comprehension over the 6 biggest name,value tuples which only takes the name portion of the tuple and stores it in a list.
It might be of interest to you that you could store this data as a collections.Counter as well.
d = collections.Counter(dict(your_list))
biggest = [x[0] for x in d.most_common(6)]
It's probably not worth converting just to do this calculation (that's what heapq is for after all ;-), but it might be worth converting to make the data easier to work with.
data=[('sms', 10), ('bush', 9), ('michaels', 7), ('operations', 6), ('research', 5), ('code', 4), ('short', 3), ('ukandu', 2), ('technical', 1), ('apeh', 1), ('boss', 1)]
return [x[0] for x in sorted(data, key=lambda x: x[1], reverse=True)[0:6]]
Which does following:
sorted returns data sorted using key function. Since standard sorting order is from ascending, reverse=True sets it do descending;
lambda x: x[1] is anonymous function which returns second element of the argument (of a tuple in this case); itemgetter(1) is nicer way to do this, but requires additional imports;
[0:6] slices first 6 elements of the list;
[x[0] for x in ... ] creates a list of first elements of each passed tuple;
If the data is already sorted simply slice off the first six tuples and then get the names:
first_six = data[0:6] # or data[:6]
only_names = [entry[0] for entry in first_six]
The list comprehension can be unrolled to:
only_names = []
for entry in first_six:
only_names.append(entry[0])
If the list is not already sorted you can use the key keyword argument of the sort method (or the sorted built-in) to sort by score:
data.sort(key=lambda entry: entry[1], reverse=True)
lambda is an anonymous function - the equivalent is:
def get_score(entry):
return entry[1]
data.sort(key=get_score, reverse=True)
While working on a problem from Google Python class, I formulated following result by using 2-3 examples from Stack overflow-
def sort_last(tuples):
return [b for a,b in sorted((tup[1], tup) for tup in tuples)]
print sort_last([(1, 3), (3, 2), (2, 1)])
I learned List comprehension yesterday, so know a little about list comprehension but I am confused how this solution is working overall. Please help me to understand this (2nd line in function).
That pattern is called decorate-sort-undecorate.
You turn each (1, 3) into (3, (1, 3)), wrapping each tuple in a new tuple, with the item you want to sort by first.
You sort, with the outer tuple ensuring that the second item in the original tuple is sorted on first.
You go back from (3, (1, 3)) to (1, 3) while maintaining the order of the list.
In Python, explicitly decorating is almost always unnecessary. Instead, use the key argument of sorted:
sorted(list_of_tuples, key=lambda tup: tup[1]) # or key=operator.itemgetter(1)
Or, if you want to sort on the reversed version of the tuple, no matter its length:
sorted(list_of_tuples, key=lambda tup: tup[::-1])
# or key=operator.itemgetter(slice(None, None, -1))
Lets break it down:
In : [(tup[1],tup) for tup in tuples]
Out: [(3, (1, 3)), (2, (3, 2)), (1, (2, 1))]
So we just created new tuple where its first value is the last value of inner tuple - this way it is sorted by the 2nd value of each tuple in 'tuples'.
Now we sort the returned list:
In: sorted([(3, (1, 3)), (2, (3, 2)), (1, (2, 1))])
Out: [(1, (2, 1)), (2, (3, 2)), (3, (1, 3))]
So we now have our list sorted by its 2nd value of each tuple. All that is remain is to extract the original tuple, and this is done by taking only b from the for loop.
The list comprehension iterates the given list (sorted([...] in this case) and returns the extracted values by order.
Your example works by creating a new list with the element at index 1 followed by the original tuple for each tuple in the list. Eg. (3,(1,3)) for the first element. The sorted function sorts by each element starting from index 0 so the list is sorted by the second item. The function then goes through each item in the new list, and returns the orignal tuples.
Another way of doing this is by using the key parameter in the sorted function which sorts based on the value of the key. In this case you want the key to be the item in each tuple at index 1.
>>> from operator import itemgetter
>>> sorted([(1, 3), (3, 2), (2, 1)],key=itemgetter(1))
Pls refer to the accepted answer.. + here is an example for better visualization,
key is a function that will be called to transform the collection's items for comparison.. like compareTo method in Java.
The parameter passed to key must be something that is callable. Here, the use of lambda creates an anonymous function (which is a callable).
The syntax of lambda is the word lambda followed by a iterable name then a single block of code.
Below example, we are sorting a list of tuple that holds the info abt time of certain event and actor name.
We are sorting this list by time of event occurrence - which is the 0th element of a tuple.
Shout out for the Ready Player One fans! =)
>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]
Note - s.sort([cmp[, key[, reverse]]]) sorts the items of s in place