python max function using 'key' and lambda expression - python

I come from OOP background and trying to learn python.
I am using the max function which uses a lambda expression to return the instance of type Player having maximum totalScore among the list players.
def winner():
w = max(players, key=lambda p: p.totalScore)
The function correctly returns instance of type Player having maximum totalScore.
I am confused about the following three things:
How does the max function work? What are the arguments it is taking? I looked at the documentation but failed to understand.
What is use of the keyword key in max function? I know it is also used in context of sort function
Meaning of the lambda expression? How to read them? How do they work?
These are all very noobish conceptual questions but will help me understand the language. It would help if you could give examples to explain.
Thanks

lambda is an anonymous function, it is equivalent to:
def func(p):
return p.totalScore
Now max becomes:
max(players, key=func)
But as def statements are compound statements they can't be used where an expression is required, that's why sometimes lambda's are used.
Note that lambda is equivalent to what you'd put in a return statement of a def. Thus, you can't use statements inside a lambda, only expressions are allowed.
What does max do?
max(a, b, c, ...[, key=func]) -> value
With a single iterable argument, return its largest item. With two or
more arguments, return the largest argument.
So, it simply returns the object that is the largest.
How does key work?
By default in Python 2 key compares items based on a set of rules based on the type of the objects (for example a string is always greater than an integer).
To modify the object before comparison, or to compare based on a particular attribute/index, you've to use the key argument.
Example 1:
A simple example, suppose you have a list of numbers in string form, but you want to compare those items by their integer value.
>>> lis = ['1', '100', '111', '2']
Here max compares the items using their original values (strings are compared lexicographically so you'd get '2' as output) :
>>> max(lis)
'2'
To compare the items by their integer value use key with a simple lambda:
>>> max(lis, key=lambda x:int(x)) # compare `int` version of each item
'111'
Example 2: Applying max to a list of tuples.
>>> lis = [(1,'a'), (3,'c'), (4,'e'), (-1,'z')]
By default max will compare the items by the first index. If the first index is the same then it'll compare the second index. As in my example, all items have a unique first index, so you'd get this as the answer:
>>> max(lis)
(4, 'e')
But, what if you wanted to compare each item by the value at index 1? Simple: use lambda:
>>> max(lis, key = lambda x: x[1])
(-1, 'z')
Comparing items in an iterable that contains objects of different type:
List with mixed items:
lis = ['1','100','111','2', 2, 2.57]
In Python 2 it is possible to compare items of two different types:
>>> max(lis) # works in Python 2
'2'
>>> max(lis, key=lambda x: int(x)) # compare integer version of each item
'111'
But in Python 3 you can't do that any more:
>>> lis = ['1', '100', '111', '2', 2, 2.57]
>>> max(lis)
Traceback (most recent call last):
File "<ipython-input-2-0ce0a02693e4>", line 1, in <module>
max(lis)
TypeError: unorderable types: int() > str()
But this works, as we are comparing integer version of each object:
>>> max(lis, key=lambda x: int(x)) # or simply `max(lis, key=int)`
'111'

Strongly simplified version of max:
def max(items, key=lambda x: x):
current = item[0]
for item in items:
if key(item) > key(current):
current = item
return current
Regarding lambda:
>>> ident = lambda x: x
>>> ident(3)
3
>>> ident(5)
5
>>> times_two = lambda x: 2*x
>>> times_two(2)
4

max function is used to get the maximum out of an iterable.
The iterators may be lists, tuples, dict objects, etc. Or even custom objects as in the example you provided.
max(iterable[, key=func]) -> value
max(a, b, c, ...[, key=func]) -> value
With a single iterable argument, return its largest item.
With two or more arguments, return the largest argument.
So, the key=func basically allows us to pass an optional argument key to the function on whose basis is the given iterator/arguments are sorted & the maximum is returned.
lambda is a python keyword that acts as a pseudo function. So, when you pass player object to it, it will return player.totalScore. Thus, the iterable passed over to function max will sort according to the key totalScore of the player objects given to it & will return the player who has maximum totalScore.
If no key argument is provided, the maximum is returned according to default Python orderings.
Examples -
max(1, 3, 5, 7)
>>>7
max([1, 3, 5, 7])
>>>7
people = [('Barack', 'Obama'), ('Oprah', 'Winfrey'), ('Mahatma', 'Gandhi')]
max(people, key=lambda x: x[1])
>>>('Oprah', 'Winfrey')

How does the max function work?
It looks for the "largest" item in an iterable. I'll assume that you
can look up what that is, but if not, it's something you can loop over,
i.e. a list or string.
What is use of the keyword key in max function? I know it is also used in context of sort function
Key is a lambda function that will tell max which objects in the iterable are larger than others. Say if you were sorting some object that you created yourself, and not something obvious, like integers.
Meaning of the lambda expression? How to read them? How do they work?
That's sort of a larger question. In simple terms, a lambda is a function you can pass around, and have other pieces of code use it. Take this for example:
def sum(a, b, f):
return (f(a) + f(b))
This takes two objects, a and b, and a function f.
It calls f() on each object, then adds them together. So look at this call:
>>> sum(2, 2, lambda a: a * 2)
8
sum() takes 2, and calls the lambda expression on it. So f(a) becomes 2 * 2, which becomes 4. It then does this for b, and adds the two together.
In not so simple terms, lambdas come from lambda calculus, which is the idea of a function that returns a function; a very cool math concept for expressing computation. You can read about that here, and then actually understand it here.
It's probably better to read about this a little more, as lambdas can be confusing, and it's not immediately obvious how useful they are. Check here.

According to the documentation:
max(iterable[, key]) max(arg1, arg2, *args[, key]) Return the
largest item in an iterable or the largest of two or more arguments.
If one positional argument is provided, iterable must be a non-empty
iterable (such as a non-empty string, tuple or list). The largest item
in the iterable is returned. If two or more positional arguments are
provided, the largest of the positional arguments is returned.
The optional key argument specifies a one-argument ordering function
like that used for list.sort(). The key argument, if supplied, must be
in keyword form (for example, max(a,b,c,key=func)).
What this is saying is that in your case, you are providing a list, in this case players. Then the max function will iterate over all the items in the list and compare them to each other to get a "maximum".
As you can imagine, with a complex object like a player determining its value for comparison is tricky, so you are given the key argument to determine how the max function will decide the value of each player. In this case, you are using a lambda function to say "for each p in players get p.totalscore and use that as his value for comparison".

max is built in function which takes first argument an iterable (like list or tuple)
keyword argument key has it's default value None but it accept function to evaluate, consider it as wrapper which evaluates iterable based on function
Consider this example dictionary:
d = {'aim':99, 'aid': 45, 'axe': 59, 'big': 9, 'short': 995, 'sin':12, 'sword':1, 'friend':1000, 'artwork':23}
Ex:
>>> max(d.keys())
'sword'
As you can see if you only pass the iterable without kwarg(a function to key) it is returning maximum value of key(alphabetically)
Ex.
Instead of finding max value of key alphabetically you might need to find max key by length of key:
>>>max(d.keys(), key=lambda x: len(x))
'artwork'
in this example lambda function is returning length of key which will be iterated hence while evaluating values instead of considering alphabetically it will keep track of max length of key and returns key which has max length
Ex.
>>> max(d.keys(), key=lambda x: d[x])
'friend'
in this example lambda function is returning value of corresponding dictionary key which has maximum value

Assuming that people who come to this page actually want to know what is key= inside len(), here is the simple answer:
len() counts the length of the object. If we specify len as a key function in min(), max(), it will return the smallest/largest item based on their length.
food = ['bread', 'tea', 'banana', 'kiwi', 'tomato']
print(max(food, key=len)) # banana
print(min(food, key=len)) # tea

Related

Unpacking arguments: how to stop a list from turning to a nested list

I have created a function called other_func that results in a list, for example: [12,322,32]
I want to create a function that will receive the other function and it will sort this list. I want to use *args as seen below, to better understand how it works:
def biggest_gap(*args):
result = sorted(args)
return result
The issue is that it results in a nested list:
biggest_gap(other_func(3)) # The use of the other_func does not matter, only that it creates a list of numbers
[[322,32,12]]
If I use the sort() method:
def biggest_gap(*args):
result = args.sort()
return result
returns:
AttributeError: 'tuple' object has no attribute 'sort'
The question is how to stop the 'sorted' approach from creating a nested list and simply create a list or how to make the sort() method not throw an error.
def biggest_gap(*args):
means that args will be a list (well, technically a tuple) of all arguments you gave to the biggest_gap function.
biggest_gap(other_func(3))
will give a list to the biggest_gap function. That's one argument.
So what you get is "a tuple of (a list)".
What you meant to do was giving a multiple individual arguments, by "splatting" the list returned from other_func:
biggest_gap(*other_func(3))
The difference the * makes is
biggest_gap([322, 32, 12]) # without * - biggest_gap receives 1 argument
biggest_gap(*[322, 32, 12]) # with * - biggest_gap receives 3 arguments
biggest_gap(322, 32, 12) # hard-coded equivalent
See https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists
Ok, this is a weird problem with *args in that it returns (in this case) a tuple of args assigned to the variable name args. So, for example, given a function:
def test(*args):
return args
It will return:
>>> test("Hello", "World")
('Hello', 'World')
>>>
A tuple.
Then, sorted, this gets turned into a list.
So, now we can go back and help the original problem, as the nested list comes as a result of the function "other_function" returning a list of 3 numbers, say [1,23,44], and the function is then applied to it.
>>> sorted(test([1,23,44]))
[[1, 23, 44]]
>>>
NB: Tuples don't have a .sort method, instead an alternate method is to use the built in sorted() function.

Comparing lists by min function

I was trying to compare different lists and to get the shorter one among them with the min() built-in function (I know that this isn't what min() made for but I was just trying) and I've got some results that made me not sure what the output was based on
min(['1','0','l'], ['1', '2'])
>>> ['1', '0', 'l']
min(['1','2','3'], ['1', '2'])
>>> ['1', '2']
min(['1', '2'], ['4'])
>>> ['1', '2']
min(['1', 'a'], ['0'])
>>> ['0']
min(['1', 'a'], ['100000000'])
>>> ['1', 'a']
I don't know what is the output based on and I hope someone can help me and clarify this behavior.
The min() function takes the keyword argument key which you can use to specify what exact value to compare. It is a function which gets the list in your case as the argument.
So to get the shortest list, you can use this code:
min(['1','0','l'], ['1', '2'], key=lambda x: len(x))
Regarding your code and how the min() function determines the result:
You can look at your list like a string (which is just a list of characters in theory). If you'd compare a string to another one, you'd look at the letters from left to right and order them by their leftmost letters. For example abc < acb because b comes before c in the alphabet (and a=a so we can ignore the first letter).
With lists it's the same. It will go through the items from left to right and compare them until it finds the first one which is not equal in both lists. The smaller one of those is then used to determine the "smaller" list.
min finds the 'smallest' of the lists by the comparison operator they provide. For lists, it works by lexicographical order - of two lists, the one whose first unequal(to the elements in the other list at the same index) element is larger, is the larger list.
You can check what an inbuilt function does in the documentation
as you can see the minimum function accepts two things as its parameters:
min(iterable, *[, key, default]) : which is used to get the smallest value in an iterable object such as a list.
min(arg1, arg2, *args[, key]): which is what you are current using. It gets the minimum value when both arguments are compared. When comparing lists to see which one is smaller, it will get the first index that does not have the same value in both lists i.e.
a = [3,5,1]
b = [3,3,1]
result = a > b # true
here the first index that is not the same on both lists is index 1, and so the comparison is 5 > 3 (which is true)
using this logic of comparing lists, the min() function will return the list that has the smallest index which is unique and smaller than the other list.
See lexicographical order.
If you place characters, then we use lexicographical ordering, and so
>>> 'a' < 'b'
True
>>> '1' < '2'
True
>>> 'a' < 'A'
False
From the documentation:
Docstring:
min(iterable, *[, default=obj, key=func]) -> value
min(arg1, arg2, *args, *[, key=func]) -> value
With a single iterable argument, return its smallest item. The
default keyword-only argument specifies an object to return if
the provided iterable is empty.
With two or more arguments, return the smallest argument.
So, for example,
IN: min([5,4,3], [6])
OUT: [6]
As #Tim Woocker wrote, you should use a function(argument key) to specify what you want to compare.

Why does this min() call work for dictionaries

I'm trying to find the key of the smallest element in a dictionary.
dictionary = {'a': 5, 'b': 7, 'c': 8}
I should get 'a' as the key.
There was this piece of code that I found but I'm not really sure how it works.
def key_of_min_value(d):
print(min(d, key=d.get))
I'm confused on what key = d.get means while I'm assuming the min(d, ...) part is saying that it's getting the minimum element in the dictionary.
The word "key" is a bit overloaded here. The key passed to min, i.e. d.get, is a callable which is used to transform the values before the comparison. There is a similar key argument for the built-in function sorted.
This "key" is unrelated to the word "key" as used when referring to the keys/values of dictionaries.
So the code works by iterating to find the k in the dictionary for which d.get(k) is minimal.
From the documentation for min:
min(arg1, arg2, *args[, key]):
Return the smallest item in an iterable or the smallest of two or more arguments.
If one positional argument is provided, it should be an iterable. The smallest item in the iterable is returned. If two or more positional arguments are provided, the smallest of the positional arguments is returned.
There are two optional keyword-only arguments. The key argument specifies a one-argument ordering function like that used for list.sort(). The default argument specifies an object to return if the provided iterable is empty. If the iterable is empty and default is not provided, a ValueError is raised.
It's important to understand that min operates by iterating through the argument and (without specifying a key) returns the "lowest" ranked value. Comparisons are done with the <, >, etc. operators. When you pass in a callable as key, the individual elements will be used as an argument to this function.
To break this down some more, this is (more-or-less) what is happening when you call min(d, key=d.get):
lowest_val = None
lowest_result = None
for item in d:
if lowest_val is None or d.get(item) < lowest_val:
lowest_val = d.get(item) # These are the elements we are comparing
lowest_result = item # But this is the element we are 'returning'
print(f"The min value (result of the callable key) is '{lowest_val}'")
print(f"The min item corresponding to the min value is '{lowest_result}'")
Notice we don't compare each item, but we compare the result of whatever the callable key is, when item is used as an argument.

Why did we use Lambda as function argument here?

Few questions on the below code to find if a list is sorted or not:
Why did we use lambda as key here ? Does it always mean key of a list can be derived so ?
In the enumerate loop , why did we compare key(el) < key(lst[i]) and not key(el) <key(el-1) or lst[i+1] <lst[i] ?
def is_sorted(lst, key=lambda x:x):
for i, el in enumerate(lst[1:]):
if key(el) < key(lst[i]): # i is the index of the previous element
return False
return True
hh=[1,2,3,4,6]
val = is_sorted(hh)
print(val)
(NB: the code above was taken from this SO answer)
This code scans a list to see if it is sorted low to high. The first problem is to decide what "low" and "high" mean for arbitrary types. Its easy for integers, but what about user defined types? So, the author lets you pass in a function that converts a type to something whose comparison works the way you want.
For instance, lets say you want to sort tuples, but based on the 3rd item which you know to be an integer, it would be key=lambda x: x[2]. But the author provides a default key=lamba x:x which just returns the object its supplied for items that are already their own sort key.
The second part is easy. If any item is less than the item just before it, then we found an example where its not low to high. The reason it works is literally in the comment - i is the index of the element directly preceding el. We know this because we enumerated on the second and following elements of the list (enumerate(lst[1:]))
enumerate yields both index and current element:
for i, el in enumerate(lst):
print(i,el)
would print:
0 1
1 2
2 3
3 4
4 6
By slicing the list off by one (removing the first element), the code introduces a shift between the index and the current element, and it allows to access by index only once (not seen as pythonic to use indexes on lists when iterating on them fully)
It's still better/pythonic to zip (interleave) list and a sliced version of the list and pass a comparison to all, no indices involved, clearer code:
import itertools
def is_sorted(lst, key=lambda x:x):
return all(key(current) < key(prev) for prev,current in zip(lst,itertools.islice(lst,1,None,None)))
The slicing being done by islice, no extra list is generated (otherwise it's the same as lst[1:])
The key function (here: identity function by default) is the function which converts from the value to the comparable value. For integers, identity is okay, unless we want to reverse comparison, in which case we would pass lambda x:-x
The point is not that the lambda "derives" the key of a list. Rather, it's a function that allows you to choose the key. That is, given a list of objects of type X, what attribute would you use to compare them with? The default is the identity function - ie use the plain value of each element. But you could choose anything here.
You could indeed write this function by comparing lst[i+1] < lst[i]. You couldn't however write it by comparing key(el) < key(el-1), because el is the value of the element itself, not the index.
This is a function that test if a list has been sorted, as an example with the builtin sorted function. This function takes an keyword argument key which is used on every single element on the list to compute its compare value:
>>> sorted([(0,3),(1,2),(2,1),(3,0)])
[(0, 3), (1, 2), (2, 1), (3, 0)]
>>> sorted([(0,3),(1,2),(2,1),(3,0)],key=lambda x:x[1])
[(3, 0), (2, 1), (1, 2), (0, 3)]
The key keyword in your function is to be able to mimic the behavior of sorted:
>>> is_sorted([(0,3),(1,2),(2,1),(3,0)])
True
>>> is_sorted([(0,3),(1,2),(2,1),(3,0)],key=lambda x:x[1])
False
The default lambda is just there to mimic a default behavior where nothing is changed.

Iterate over a pair of iterables, sorted by an attribute

One way (the fastest way?) to iterate over a pair of iterables a and b in sorted order is to chain them and sort the chained iterable:
for i in sorted(chain(a, b)):
print i
For instance, if the elements of each iterable are:
a: 4, 6, 1
b: 8, 3
then this construct would produce elements in the order
1, 3, 4, 6, 8
However, if the iterables iterate over objects, this sorts the objects by their memory address. Assuming each iterable iterates over the same type of object,
What is the fastest way to iterate over a particular
attribute of the objects, sorted by this attribute?
What if the attribute to be chosen differs between iterables? If iterables a and b both iterate over objects of type foo, which has attributes foo.x and foo.y of the same type, how could one iterate over elements of a sorted by x and b sorted by y?
For an example of #2, if
a: (x=4,y=3), (x=6,y=2), (x=1,y=7)
b: (x=2,y=8), (x=2,y=3)
then the elements should be produced in the order
1, 3, 4, 6, 8
as before. Note that only the x attributes from a and the y attributes from b enter into the sort and the result.
Tim Pietzcker has already answered for the case where you're using the same attribute for each iterable. If you're using different attributes of the same type, you can do it like this (using complex numbers as a ready-made class that has two attributes of the same type):
In Python 2:
>>> a = [1+4j, 7+0j, 3+6j, 9+2j, 5+8j]
>>> b = [2+5j, 8+1j, 4+7j, 0+3j, 6+9j]
>>> keyed_a = ((n.real, n) for n in a)
>>> keyed_b = ((n.imag, n) for n in b)
>>> from itertools import chain
>>> sorted_ab = zip(*sorted(chain(keyed_a, keyed_b), key=lambda t: t[0]))[1]
>>> sorted_ab
((1+4j), (8+1j), (3+6j), 3j, (5+8j), (2+5j), (7+0j), (4+7j), (9+2j), (6+9j))
Since in Python 3 zip() returns an iterator, we need to coerce it to a list before attempting to subscript it:
>>> # ... as before up to 'from itertools import chain'
>>> sorted_ab = list(zip(*sorted(chain(keyed_a, keyed_b), key=lambda t: t[0])))[1]
>>> sorted_ab
((1+4j), (8+1j), (3+6j), 3j, (5+8j), (2+5j), (7+0j), (4+7j), (9+2j), (6+9j))
Answer to question 1: You can provide a key attribute to sorted(). For example if you want to sort by the object's .name, then use
sorted(chain(a, b), key=lambda x: x.name)
As for question 2: I guess you'd need another attribute for each object (like foo.z, as suggested by Zero Piraeus) that can be accessed by sorted(), since that function has no way of telling where the object it's currently sorting used to come from. After all, it is receiving a new iterator from chain() which doesn't contain any information about whether the current element is from a or b.

Categories

Resources