Python: about sort - python

I noticed that the results are different of the two lines. One is a sorted list, while the other is a sorted dictionary. Cant figure out why adding .item will give this difference:
aa={'a':1,'d':2,'c':3,'b':4}
bb=sorted(aa,key=lambda x:x[0])
print(bb)
#['a', 'b', 'c', 'd']
aa={'a':1,'d':2,'c':3,'b':4}
bb=sorted(aa.items(),key=lambda x:x[0])
print(bb)
# [('a', 1), ('b', 4), ('c', 3), ('d', 2)]

The first version implicitly sorts the keys in the dictionary, and is equivalent to sorting aa.keys(). The second version sorts the items, that is: a list of tuples of the form (key, value).

When you iterate on dictionary then you get iterate of keys not (key, value) pair. The sorted method takes any object on which we can iterate and hence you're seeing a difference.
You can verify this by prining while iterating on the dict:
aa={'a':1,'d':2,'c':3,'b':4}
for key in aa:
print(key)
for key in aa.keys():
print(key)
All of the above two for loops print same values.

In the second example, items() method applied to a dictionary returns an iterable collection of tuples (dictionary_key, dictrionary_value). Then the collection is being sorted.
In the first example, a dictionary is automatically casted to an iterable collection of its keys first. (And note: only very first characters of each of them are used for comparinson while sorting, which is probably NOT what you want)

Related

Understanding sorting of IP address in Python programming using map [duplicate]

Say I have
votes = {'Charlie': 20, 'Able': 10, 'Baker': 20, 'Dog': 15}
I understand
print(sorted(votes.items(), key=lambda x: x[1]))
will lead to
[('Able', 10), ('Dog', 15), ('Baker', 20), ('Charlie', 20)]
But how does this work?
The function you pass in to key is given each of the items that are being sorted, and returns a "key" that Python can sort by. So, if you want to sort a list of strings by the reverse of the string, you could do this:
list_of_strings.sort(key=lambda s: s[::-1])
This lets you specify the value each item is sorted by, without having to change the item. That way, you don't have to build a list of reversed strings, sort that, then reverse them back.
# DON'T do this
data = ['abc', 'def', 'ghi', 'jkl']
reversed_data = [s[::-1] for s in data]
reversed_data.sort()
data = [s[::-1] for s in reversed_data]
# Do this
data.sort(key=lambda s: s[::-1])
In your case, the code is sorting each item by the second item in the tuple, whereas normally it would initially sort by the first item in the tuple, then break ties with the second item.
>>> votes = {'Charlie': 20, 'Able': 10, 'Baker': 20, 'Dog': 15}
If we apply .items() on the votes dictionary above we get:
>>> votes_items=votes.items()
>>> votes_items
[('Charlie', 20), ('Baker', 20), ('Able', 10), ('Dog', 15)]
#a list of tuples, each tuple having two items indexed 0 and 1
For each tuple, the first index [0] are the strings ('Charlie','Able','Baker','Dog') and the second index [1] the integers (20,10,20,15).
print(sorted(votes.items(), key = lambda x: x[1])) instructs python to sort the items(tuples) in votes using the second index [1] of each tuple, the integers, as the basis of the sorting.
Python compares each integer from each tuple and returns a list that has ranked each tuple in ascending order (this can be reversed with the reverse=True argument) using each tuple's integer as the key to determine the tuple's rank,
Where there is a tie in the key, the items are ranked in the order they are originally in the dictionary. (so ('Charlie', 20) is before ('Baker', 20) because there is a 20==20 tie on the key but ('Charlie', 20) comes before ('Baker', 20) in the original votes dictionary).
The output then is:
[('Able', 10), ('Dog', 15), ('Charlie', 20), ('Baker', 20)]
I hope this makes it easier to understand.
key is a function that will be called to transform the collection's items before they are compared. The parameter passed to key must be something that is callable.
The use of lambda creates an anonymous function (which is callable). In the case of sorted the callable only takes one parameters. Python's lambda is pretty simple. It can only do and return one thing really.
The key parameter takes a function as its value, which is applied to each element before sorting, so that the elements are sorted based on the output of this function.
For example if you want to sort a list of strings based on their length, you can do something like this:
list = ['aaaaaa', 'bb', 'ccc', 'd']
sorted(list, key=len)
# ['d', 'bb', 'ccc', 'aaaaaa']

.get with tuples with dictionaries

Say I have a dictionary with tuples as the keys for example
dictionary = {('a','b'):1, ('c','d'):2}
Is it possible to return None if you try to find a value using a key not in the dictionary when using .get()?
I've tried
dictionary.get('a','c')
but this returns an integer and I've tried
dictionary.get(['a','c'])
and
dictionary.get([('a','c')])
but both return a type error.
To use ('a', 'c') as the key, you need to write like this:
dictionary.get(('a', 'c'))
Notice the doubled parentheses, it's necessary like that, to pass a tuple as the key parameter.
If you write dictionary.get('a', 'c'),
that means that 'a' is the key to get,
and 'c' is the default value to return in case the key doesn't exist.
And dictionary.get(['a','c']) cannot work,
because [...] is a list, and it's not hashable type.
And in any case ['a', 'c'] is not equal to ('a', 'c'),
so would not match anyway.

Use OrderedDict or ordered list?(novice)

(Using Python 3.4.3)
Here's what I want to do: I have a dictionary where the keys are strings and the values are the number of times that string occurs in file. I need to output which string(s) occur with the greatest frequency, along with their frequencies (if there's a tie for the most-frequent, output all of the most-frequent).
I had tried to use OrderedDict. I can create it fine, but I struggle to get it to output specifically the most frequently occurring. I can keep trying, but I'm not sure an OrderedDict is really what I should be using, since I'll never need the actual OrderedDict once I've determined and output the most-frequent strings and their frequency. A fellow student recommended an ordered list, but I don't see how I'd preserve the link between the keys and values as I currently have them.
Is OrderedDict the best tool to do what I'm looking for, or is there something else? If it is, is there a way to filter/slice(or equivalent) the OrderedDict?
You can simply use sorted with a proper key function, in this case you can use operator.itemgetter(1) which will sorts your items based on values.
from operator import itemgetter
print sorted(my_dict.items(),key=itemgetter(1),reverse=True)
This can be solved in two steps. First sort your dictionary entries by their frequency so that the highest frequency is first.
Secondly use Python's groupby function to take matching entries from the list. As you are only interested in the highest, you stop after one iteration. For example:
from itertools import groupby
from operator import itemgetter
my_dict = {"a" : 8, "d" : 3, "c" : 8, "b" : 2, "e" : 2}
for k, g in groupby(sorted(my_dict.items(), key=itemgetter(1), reverse=True), key=itemgetter(1)):
print list(g)
break
This would display:
[('a', 8), ('c', 8)]
As a and c are equal top.
If you remove the break statement, you would get the full list:
[('a', 8), ('c', 8)]
[('d', 3)]
[('b', 2), ('e', 2)]

Using dictionaries in loop

I am trying to write a code that replicates greedy algorithm and for that I need to make sure that my calculations use the highest value possible. Potential values are presented in a dictionary and my goal is to use largest value first and then move on to lower values. However since dictionary values are not sequenced, in for loop I am getting unorganized sequences. For example, out put of below code would start from 25.
How can I make sure that my code is using a dictionary yet following the sequence of (500,100,25,10,5)?
a={"f":500,"o":100,"q":25,"d":10,"n":5}
for i in a:
print a[i]
Two ideas spring to mind:
Use collections.OrderedDict, a dictionary subclass which remembers the order in which items are added. As long as you add the pairs in descending value order, looping over this dict will return them in the right order.
If you can't be sure the items will be added to the dict in the right order, you could construct them by sorting:
Get the values of the dictionary with values()
Sort by (ascending) value: this is sorted(), and Python will default to sorting in ascending order
Get them by descending value instead: this is reverse=True
Here's an example:
for value in sorted(a.values(), reverse=True):
print value
Dictionaries yield their keys when you iterate them normally, but you can use the items() view to get tuples of the key and value. That'll be un-ordered, but you can then use sorted() on the "one-th" element of the tuples (the value) with reverse set to True:
a={"f":500,"o":100,"q":25,"d":10,"n":5}
for k, v in sorted(a.items(), key=operator.itemgetter(1), reverse=True):
print(v)
I'm guessing that you do actually need the keys, but if not, you can just use values() instead of items(): sorted(a.values(), reverse=True)
You can use this
>>> a={"f":500,"o":100,"q":25,"d":10,"n":5}
>>> for value in sorted(a.itervalues(),reverse=True):
... print value
...
500
100
25
10
5
>>>
a={"f":500,"o":100,"q":25,"d":10,"n":5}
k = sorted(a, key=a.__getitem__, reverse=True)
v = sorted(a.values(), reverse=True)
sorted_a = zip(k,v)
print (sorted_a)
Output:
[('f', 500), ('o', 100), ('q', 25), ('d', 10), ('n', 5)]

Python - input from list of tuples

I've declared a list of tuples that I would like to manipulate. I have a function that returns an option from the user. I would like to see if the user has entered any one of the keys 'A', 'W', 'K'. With a dictionary, I would say this: while option not in author.items() option = get_option(). How can I accomplish this with a list of tuples?
authors = [('A', "Aho"), ('W', "Weinberger"), ('K', "Kernighan")]
authors = [('A', "Aho"), ('W', "Weinberger"), ('K', "Kernighan")]
option = get_option()
while option not in (x[0] for x in authors):
option = get_option()
How this works :
(x[0] for x in authors) is an generator expression, this yield the [0]th element of each item one by one from authors list, and that element is then matched against the option. As soon as match is found it short-circuits and exits.
Generator expressions yield one item at a time, so are memory efficient.
How about something like
option in zip(*authors)[0]
We are using zip to essentially separate the letters from the words. Nevertheless, since we are dealing with a list of tuples, we must unpack it using *:
>>> zip(*authors)
[('A', 'W', 'K'), ('Aho', 'Weinberger', 'Kernighan')]
>>> zip(*authors)[0]
('A', 'W', 'K')
Then we simply use option in to test if option is contained in zip(*authors)[0].
There are good answers here that cover doing this operation with zip, but you don't have to do it like that - you can use an OrderedDict instead.
from collections import OrderedDict
authors = OrderedDict([('A', "Aho"), ('W', "Weinberger"), ('K', "Kernighan")])
Since it remembers its entry order, you can iterate over it without fear of getting odd or unusual orderings of your keys.

Categories

Resources