Few questions on the below code to find if a list is sorted or not:
Why did we use lambda as key here ? Does it always mean key of a list can be derived so ?
In the enumerate loop , why did we compare key(el) < key(lst[i]) and not key(el) <key(el-1) or lst[i+1] <lst[i] ?
def is_sorted(lst, key=lambda x:x):
for i, el in enumerate(lst[1:]):
if key(el) < key(lst[i]): # i is the index of the previous element
return False
return True
hh=[1,2,3,4,6]
val = is_sorted(hh)
print(val)
(NB: the code above was taken from this SO answer)
This code scans a list to see if it is sorted low to high. The first problem is to decide what "low" and "high" mean for arbitrary types. Its easy for integers, but what about user defined types? So, the author lets you pass in a function that converts a type to something whose comparison works the way you want.
For instance, lets say you want to sort tuples, but based on the 3rd item which you know to be an integer, it would be key=lambda x: x[2]. But the author provides a default key=lamba x:x which just returns the object its supplied for items that are already their own sort key.
The second part is easy. If any item is less than the item just before it, then we found an example where its not low to high. The reason it works is literally in the comment - i is the index of the element directly preceding el. We know this because we enumerated on the second and following elements of the list (enumerate(lst[1:]))
enumerate yields both index and current element:
for i, el in enumerate(lst):
print(i,el)
would print:
0 1
1 2
2 3
3 4
4 6
By slicing the list off by one (removing the first element), the code introduces a shift between the index and the current element, and it allows to access by index only once (not seen as pythonic to use indexes on lists when iterating on them fully)
It's still better/pythonic to zip (interleave) list and a sliced version of the list and pass a comparison to all, no indices involved, clearer code:
import itertools
def is_sorted(lst, key=lambda x:x):
return all(key(current) < key(prev) for prev,current in zip(lst,itertools.islice(lst,1,None,None)))
The slicing being done by islice, no extra list is generated (otherwise it's the same as lst[1:])
The key function (here: identity function by default) is the function which converts from the value to the comparable value. For integers, identity is okay, unless we want to reverse comparison, in which case we would pass lambda x:-x
The point is not that the lambda "derives" the key of a list. Rather, it's a function that allows you to choose the key. That is, given a list of objects of type X, what attribute would you use to compare them with? The default is the identity function - ie use the plain value of each element. But you could choose anything here.
You could indeed write this function by comparing lst[i+1] < lst[i]. You couldn't however write it by comparing key(el) < key(el-1), because el is the value of the element itself, not the index.
This is a function that test if a list has been sorted, as an example with the builtin sorted function. This function takes an keyword argument key which is used on every single element on the list to compute its compare value:
>>> sorted([(0,3),(1,2),(2,1),(3,0)])
[(0, 3), (1, 2), (2, 1), (3, 0)]
>>> sorted([(0,3),(1,2),(2,1),(3,0)],key=lambda x:x[1])
[(3, 0), (2, 1), (1, 2), (0, 3)]
The key keyword in your function is to be able to mimic the behavior of sorted:
>>> is_sorted([(0,3),(1,2),(2,1),(3,0)])
True
>>> is_sorted([(0,3),(1,2),(2,1),(3,0)],key=lambda x:x[1])
False
The default lambda is just there to mimic a default behavior where nothing is changed.
Related
What does for row_number, row in enumerate(cursor): do in Python?
What does enumerate mean in this context?
The enumerate() function adds a counter to an iterable.
So for each element in cursor, a tuple is produced with (counter, element); the for loop binds that to row_number and row, respectively.
Demo:
>>> elements = ('foo', 'bar', 'baz')
>>> for elem in elements:
... print elem
...
foo
bar
baz
>>> for count, elem in enumerate(elements):
... print count, elem
...
0 foo
1 bar
2 baz
By default, enumerate() starts counting at 0 but if you give it a second integer argument, it'll start from that number instead:
>>> for count, elem in enumerate(elements, 42):
... print count, elem
...
42 foo
43 bar
44 baz
If you were to re-implement enumerate() in Python, here are two ways of achieving that; one using itertools.count() to do the counting, the other manually counting in a generator function:
from itertools import count
def enumerate(it, start=0):
# return an iterator that adds a counter to each element of it
return zip(count(start), it)
and
def enumerate(it, start=0):
count = start
for elem in it:
yield (count, elem)
count += 1
The actual implementation in C is closer to the latter, with optimisations to reuse a single tuple object for the common for i, ... unpacking case and using a standard C integer value for the counter until the counter becomes too large to avoid using a Python integer object (which is unbounded).
It's a builtin function that returns an object that can be iterated over. See the documentation.
In short, it loops over the elements of an iterable (like a list), as well as an index number, combined in a tuple:
for item in enumerate(["a", "b", "c"]):
print item
prints
(0, "a")
(1, "b")
(2, "c")
It's helpful if you want to loop over a sequence (or other iterable thing), and also want to have an index counter available. If you want the counter to start from some other value (usually 1), you can give that as second argument to enumerate.
I am reading a book (Effective Python) by Brett Slatkin and he shows another way to iterate over a list and also know the index of the current item in the list but he suggests that it is better not to use it and to use enumerate instead.
I know you asked what enumerate means, but when I understood the following, I also understood how enumerate makes iterating over a list while knowing the index of the current item easier (and more readable).
list_of_letters = ['a', 'b', 'c']
for i in range(len(list_of_letters)):
letter = list_of_letters[i]
print (i, letter)
The output is:
0 a
1 b
2 c
I also used to do something, even sillier before I read about the enumerate function.
i = 0
for n in list_of_letters:
print (i, n)
i += 1
It produces the same output.
But with enumerate I just have to write:
list_of_letters = ['a', 'b', 'c']
for i, letter in enumerate(list_of_letters):
print (i, letter)
As other users have mentioned, enumerate is a generator that adds an incremental index next to each item of an iterable.
So if you have a list say l = ["test_1", "test_2", "test_3"], the list(enumerate(l)) will give you something like this: [(0, 'test_1'), (1, 'test_2'), (2, 'test_3')].
Now, when this is useful? A possible use case is when you want to iterate over items, and you want to skip a specific item that you only know its index in the list but not its value (because its value is not known at the time).
for index, value in enumerate(joint_values):
if index == 3:
continue
# Do something with the other `value`
So your code reads better because you could also do a regular for loop with range but then to access the items you need to index them (i.e., joint_values[i]).
Although another user mentioned an implementation of enumerate using zip, I think a more pure (but slightly more complex) way without using itertools is the following:
def enumerate(l, start=0):
return zip(range(start, len(l) + start), l)
Example:
l = ["test_1", "test_2", "test_3"]
enumerate(l)
enumerate(l, 10)
Output:
[(0, 'test_1'), (1, 'test_2'), (2, 'test_3')]
[(10, 'test_1'), (11, 'test_2'), (12, 'test_3')]
As mentioned in the comments, this approach with range will not work with arbitrary iterables as the original enumerate function does.
The enumerate function works as follows:
doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
for i in enumerate(doc1):
print(i)
The output is
(0, 'I like movie')
(1, " But I don't like the cast")
(2, ' The story is very nice')
I am assuming that you know how to iterate over elements in some list:
for el in my_list:
# do something
Now sometimes not only you need to iterate over the elements, but also you need the index for each iteration. One way to do it is:
i = 0
for el in my_list:
# do somethings, and use value of "i" somehow
i += 1
However, a nicer way is to user the function "enumerate". What enumerate does is that it receives a list, and it returns a list-like object (an iterable that you can iterate over) but each element of this new list itself contains 2 elements: the index and the value from that original input list:
So if you have
arr = ['a', 'b', 'c']
Then the command
enumerate(arr)
returns something like:
[(0,'a'), (1,'b'), (2,'c')]
Now If you iterate over a list (or an iterable) where each element itself has 2 sub-elements, you can capture both of those sub-elements in the for loop like below:
for index, value in enumerate(arr):
print(index,value)
which would print out the sub-elements of the output of enumerate.
And in general you can basically "unpack" multiple items from list into multiple variables like below:
idx,value = (2,'c')
print(idx)
print(value)
which would print
2
c
This is the kind of assignment happening in each iteration of that loop with enumerate(arr) as iterable.
the enumerate function calculates an elements index and the elements value at the same time. i believe the following code will help explain what is going on.
for i,item in enumerate(initial_config):
print(f'index{i} value{item}')
i have a list of strings like this:
['id:9', 'vector:1', 'table:1', 'product:10', 'number:3', 'Number:4']
i want to sort it from higher integer value to lower and then the rest:
['product:10', 'id:9', 'Number:4', 'number:3', 'vector:1', 'table:1']
the values are all integer without including 0, the string that are attached to them can be all lower case, all upper case, part lower case, part upper case, while also being similar to another item: Number, NUMBER, number, NUMber
i tried using natsort but that didn't arrange them correctly, also i tried some other solutions discussed here still didn't work in my case, so how can this be done in python?
If you create a function converting each string to a tuple you can then sort on those tuples:
l = ['id:9', 'vector:1', 'table:1', 'product:10', 'number:3', 'Number:4']
def negative_num_then_name(s):
name, num_str = s.split(':')
return -int(num_str), name
l.sort(key=negative_num_then_name)
l
['product:10', 'id:9', 'Number:4', 'number:3', 'table:1', 'vector:1']
If you want to understand what's going on under the covers, this shows you what is actually being sorted (in ascending order):
l = ['id:9', 'vector:1', 'table:1', 'product:10', 'number:3', 'Number:4']
[negative_num_then_name(elem) for elem in l]
[(-9, 'id'),
(-1, 'vector'),
(-1, 'table'),
(-10, 'product'),
(-3, 'number'),
(-4, 'Number')]
sort looks first to the first element of each tuple - the integer - and only in the case of a tie it then looks to the second element - the string.
I was checking out this solution to a question on leetcode.com
def topKFrequent(self, words, k):
count = collections.Counter(words)
heap = [(-freq, word) for word, freq in count.items()]
heapq.heapify(heap)
return [heapq.heappop(heap)[1] for _ in xrange(k)]
and when I provide it an array of strings like ["aa", "aaa", "a"] and 1 it correctly returns ["a"]. My question is did the heap also lexographically sort the tuples internally? Because according to me if there was no sorting, it would have simply returned ["aa"] (the order in which the heap was constructed since counts of all three are the same). Or have I misunderstood heapq?
You have a heap of integer/string pairs, and so it is ordered based on the definition of < for tuples, which takes into account both elements of each type.
Given ["aa", "aaa", "a"], count.items() is sequence of tuples [('aa', 1), ('aaa', 1), ('a', 1)]. You then build a heap using the list of tuples
[(-1, 'aa'), (-1, 'aaa'), (-1, 'a')]
Since the first element of each tuple is the same, the comparisons are determined solely by the second, string, element.
heapq just compares values from the queue using using the "less than" operator [1] regardless of the type of the value. It is the type of the value that defines what the comparison will return. So, what makes the difference here is the tuple itself. As from the documentation:
The comparison [of sequence objects] uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.
Checking some examples:
>>> (0, 'a') < (1, 'aa')
True
>>> (1, 'a') < (1, 'aa')
True
>>> (1, 'aa') < (1, 'a')
False
>>> (2, 'a') < (1, 'aa')
False
So you are right, the values are ordered lexicographically and the second value of the tuple is relevant. However, heapq does not have to do anything here to get this result, the mere tuple comparison does that.
[1] One can check it in the code. Here is one of the lines where the comparison is made by heapq (in C):
cmp = PyObject_RichCompareBool(newitem, parent, Py_LT);
This PyObject_RichCompareBool() is, according to the documentation:
the equivalent of the Python expression o1 op o2, where op is the operator corresponding to opid.
Heaps are partial orderings. They are not sorted. You can, however, build sorts out of them by storing values in a heap and pulling them out one at a time. These sorts are not stable, because heaps do not try to preserve the ordering of "equal" values.
Here's another kind of Python heap you might be interested in:
https://pypi.org/project/fibonacci-heap-mod/
The expectation of the leetcode question is to solve the problem in O(nlogk). So we have to keep only 'k' elements in the heap at any time, which means we have to use "minHeap" (freq, word) and not (-freq,word).
We want 'minHeap' to keep the 'minimum frequency' and 'max lexicographical' value at the top of the heap. That is tricky, because by default it would keep 'minimum frequency' and 'min lex'.
The only solution is to create an object that can have 'freq' and 'word' and override the 'lt' method to do this
def __lt__(self, other):
if self.c == other.c:
return self.w > other.w
return self.c < other.c
I've been searching around for a succinct explanation of what's going on "under the hood" for the following, but no luck so far.
Why, when you try the following:
mylist = ["a","b","c","d"]
for index, item in mylist:
print item
I get this error:
ValueError: need more than 1 value to unpack
But when I try:
for item in mylist:
print item
This is returned:
a
b
c
d
If indexes are a part of the structure of a list, why can't I print them out along with the items?
I understand the solution to this is to use enumerate(), but I'm curious about why iterating through lists (without using enumerate()) works this way and returns that ValueError.
I think what I'm not understanding is: if you can find items in a list by using their index (such as the case with item = L[index] ) — doesn't that mean that one some level, indexes are an inherent part of a list as a data structure? Or is item = L[index] really just a way to get Python to count the items in a list using indexes (starting at 0 obviously)? In other words, item = L[index] is "applying" indexes to the items in the list, starting at 0.
If indexes are a part of the structure of a list...
Except they aren't. Not when you iterate over the list. The indexing becomes a matter of time/occurrence, and they are no longer associated with the elements themselves.
If you were to actually print out the result of the enumerate() function as a list:
print(list(enumerate(["a","b","c","d"])))
You would see this:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
Therefore, if you wanted to print the index and item at that index using enumerate(), you could technically write this:
for pair in enumerate(mylist):
print pair[0], pair[1]
However, that's not the best (i.e. Pythonic) way of doing things. Python lets you write the above much more nicely like so:
for index, item in enumerate(mylist):
print index, item
This works because when you use the index, item syntax, you are telling Python to "unpack" each pair in that list by treating the components of each pair separately.
For more on how this tuple unpacking magic works, see:
Tuple unpacking in for loops
I come from OOP background and trying to learn python.
I am using the max function which uses a lambda expression to return the instance of type Player having maximum totalScore among the list players.
def winner():
w = max(players, key=lambda p: p.totalScore)
The function correctly returns instance of type Player having maximum totalScore.
I am confused about the following three things:
How does the max function work? What are the arguments it is taking? I looked at the documentation but failed to understand.
What is use of the keyword key in max function? I know it is also used in context of sort function
Meaning of the lambda expression? How to read them? How do they work?
These are all very noobish conceptual questions but will help me understand the language. It would help if you could give examples to explain.
Thanks
lambda is an anonymous function, it is equivalent to:
def func(p):
return p.totalScore
Now max becomes:
max(players, key=func)
But as def statements are compound statements they can't be used where an expression is required, that's why sometimes lambda's are used.
Note that lambda is equivalent to what you'd put in a return statement of a def. Thus, you can't use statements inside a lambda, only expressions are allowed.
What does max do?
max(a, b, c, ...[, key=func]) -> value
With a single iterable argument, return its largest item. With two or
more arguments, return the largest argument.
So, it simply returns the object that is the largest.
How does key work?
By default in Python 2 key compares items based on a set of rules based on the type of the objects (for example a string is always greater than an integer).
To modify the object before comparison, or to compare based on a particular attribute/index, you've to use the key argument.
Example 1:
A simple example, suppose you have a list of numbers in string form, but you want to compare those items by their integer value.
>>> lis = ['1', '100', '111', '2']
Here max compares the items using their original values (strings are compared lexicographically so you'd get '2' as output) :
>>> max(lis)
'2'
To compare the items by their integer value use key with a simple lambda:
>>> max(lis, key=lambda x:int(x)) # compare `int` version of each item
'111'
Example 2: Applying max to a list of tuples.
>>> lis = [(1,'a'), (3,'c'), (4,'e'), (-1,'z')]
By default max will compare the items by the first index. If the first index is the same then it'll compare the second index. As in my example, all items have a unique first index, so you'd get this as the answer:
>>> max(lis)
(4, 'e')
But, what if you wanted to compare each item by the value at index 1? Simple: use lambda:
>>> max(lis, key = lambda x: x[1])
(-1, 'z')
Comparing items in an iterable that contains objects of different type:
List with mixed items:
lis = ['1','100','111','2', 2, 2.57]
In Python 2 it is possible to compare items of two different types:
>>> max(lis) # works in Python 2
'2'
>>> max(lis, key=lambda x: int(x)) # compare integer version of each item
'111'
But in Python 3 you can't do that any more:
>>> lis = ['1', '100', '111', '2', 2, 2.57]
>>> max(lis)
Traceback (most recent call last):
File "<ipython-input-2-0ce0a02693e4>", line 1, in <module>
max(lis)
TypeError: unorderable types: int() > str()
But this works, as we are comparing integer version of each object:
>>> max(lis, key=lambda x: int(x)) # or simply `max(lis, key=int)`
'111'
Strongly simplified version of max:
def max(items, key=lambda x: x):
current = item[0]
for item in items:
if key(item) > key(current):
current = item
return current
Regarding lambda:
>>> ident = lambda x: x
>>> ident(3)
3
>>> ident(5)
5
>>> times_two = lambda x: 2*x
>>> times_two(2)
4
max function is used to get the maximum out of an iterable.
The iterators may be lists, tuples, dict objects, etc. Or even custom objects as in the example you provided.
max(iterable[, key=func]) -> value
max(a, b, c, ...[, key=func]) -> value
With a single iterable argument, return its largest item.
With two or more arguments, return the largest argument.
So, the key=func basically allows us to pass an optional argument key to the function on whose basis is the given iterator/arguments are sorted & the maximum is returned.
lambda is a python keyword that acts as a pseudo function. So, when you pass player object to it, it will return player.totalScore. Thus, the iterable passed over to function max will sort according to the key totalScore of the player objects given to it & will return the player who has maximum totalScore.
If no key argument is provided, the maximum is returned according to default Python orderings.
Examples -
max(1, 3, 5, 7)
>>>7
max([1, 3, 5, 7])
>>>7
people = [('Barack', 'Obama'), ('Oprah', 'Winfrey'), ('Mahatma', 'Gandhi')]
max(people, key=lambda x: x[1])
>>>('Oprah', 'Winfrey')
How does the max function work?
It looks for the "largest" item in an iterable. I'll assume that you
can look up what that is, but if not, it's something you can loop over,
i.e. a list or string.
What is use of the keyword key in max function? I know it is also used in context of sort function
Key is a lambda function that will tell max which objects in the iterable are larger than others. Say if you were sorting some object that you created yourself, and not something obvious, like integers.
Meaning of the lambda expression? How to read them? How do they work?
That's sort of a larger question. In simple terms, a lambda is a function you can pass around, and have other pieces of code use it. Take this for example:
def sum(a, b, f):
return (f(a) + f(b))
This takes two objects, a and b, and a function f.
It calls f() on each object, then adds them together. So look at this call:
>>> sum(2, 2, lambda a: a * 2)
8
sum() takes 2, and calls the lambda expression on it. So f(a) becomes 2 * 2, which becomes 4. It then does this for b, and adds the two together.
In not so simple terms, lambdas come from lambda calculus, which is the idea of a function that returns a function; a very cool math concept for expressing computation. You can read about that here, and then actually understand it here.
It's probably better to read about this a little more, as lambdas can be confusing, and it's not immediately obvious how useful they are. Check here.
According to the documentation:
max(iterable[, key]) max(arg1, arg2, *args[, key]) Return the
largest item in an iterable or the largest of two or more arguments.
If one positional argument is provided, iterable must be a non-empty
iterable (such as a non-empty string, tuple or list). The largest item
in the iterable is returned. If two or more positional arguments are
provided, the largest of the positional arguments is returned.
The optional key argument specifies a one-argument ordering function
like that used for list.sort(). The key argument, if supplied, must be
in keyword form (for example, max(a,b,c,key=func)).
What this is saying is that in your case, you are providing a list, in this case players. Then the max function will iterate over all the items in the list and compare them to each other to get a "maximum".
As you can imagine, with a complex object like a player determining its value for comparison is tricky, so you are given the key argument to determine how the max function will decide the value of each player. In this case, you are using a lambda function to say "for each p in players get p.totalscore and use that as his value for comparison".
max is built in function which takes first argument an iterable (like list or tuple)
keyword argument key has it's default value None but it accept function to evaluate, consider it as wrapper which evaluates iterable based on function
Consider this example dictionary:
d = {'aim':99, 'aid': 45, 'axe': 59, 'big': 9, 'short': 995, 'sin':12, 'sword':1, 'friend':1000, 'artwork':23}
Ex:
>>> max(d.keys())
'sword'
As you can see if you only pass the iterable without kwarg(a function to key) it is returning maximum value of key(alphabetically)
Ex.
Instead of finding max value of key alphabetically you might need to find max key by length of key:
>>>max(d.keys(), key=lambda x: len(x))
'artwork'
in this example lambda function is returning length of key which will be iterated hence while evaluating values instead of considering alphabetically it will keep track of max length of key and returns key which has max length
Ex.
>>> max(d.keys(), key=lambda x: d[x])
'friend'
in this example lambda function is returning value of corresponding dictionary key which has maximum value
Assuming that people who come to this page actually want to know what is key= inside len(), here is the simple answer:
len() counts the length of the object. If we specify len as a key function in min(), max(), it will return the smallest/largest item based on their length.
food = ['bread', 'tea', 'banana', 'kiwi', 'tomato']
print(max(food, key=len)) # banana
print(min(food, key=len)) # tea