This question already has answers here:
How to sort a list with two keys but one in reverse order?
(8 answers)
Closed 28 days ago.
I have a dictionary of 200,000 items (the keys are strings and the values are integers).
What is the best/most pythonic way to print the items sorted by descending value then ascending key (i.e. a 2 key sort)?
a={ 'keyC':1, 'keyB':2, 'keyA':1 }
b = a.items()
b.sort( key=lambda a:a[0])
b.sort( key=lambda a:a[1], reverse=True )
print b
>>>[('keyB', 2), ('keyA', 1), ('keyC', 1)]
You can't sort dictionaries. You have to sort the list of items.
Previous versions were wrong. When you have a numeric value, it's easy to sort in reverse order. These will do that. But this isn't general. This only works because the value is numeric.
a = { 'key':1, 'another':2, 'key2':1 }
b= a.items()
b.sort( key=lambda a:(-a[1],a[0]) )
print b
Here's an alternative, using an explicit function instead of a lambda and the cmp instead of the key option.
def valueKeyCmp( a, b ):
return cmp( (-a[1], a[0]), (-b[1], b[0] ) )
b.sort( cmp= valueKeyCmp )
print b
The more general solution is actually two separate sorts
b.sort( key=lambda a:a[1], reverse=True )
b.sort( key=lambda a:a[0] )
print b
data = { 'keyC':1, 'keyB':2, 'keyA':1 }
for key, value in sorted(data.items(), key=lambda x: (-1*x[1], x[0])):
print key, value
The most pythonic way to do it would be to know a little more about the actual data -- specifically, the maximum value you can have -- and then do it like this:
def sortkey((k, v)):
return (maxval - v, k)
items = thedict.items()
items.sort(key=sortkey)
but unless you already know the maximum value, searching for the maximum value means looping through the dict an extra time (with max(thedict.itervalues())), which may be expensive. Alternatively, a keyfunc version of S.Lott's solution:
def sortkey((k, v)):
return (-v, k)
items = thedict.items()
items.sort(key=sortkey)
An alternative that doesn't care about the types would be a comparison function:
def sortcmp((ak, av), (bk, bv)):
# compare values 'in reverse'
r = cmp(bv, av)
if not r:
# and then keys normally
r = cmp(ak, bk)
return r
items = thedict.items()
items.sort(cmp=sortcmp)
and this solution actually works for any type of key and value that you want to mix ascending and descending sorting with in the same key. If you value brevity you can write sortcmp as:
def sortcmp((ak, av), (bk, bv)):
return cmp((bk, av), (ak, bv))
You can use something like this:
dic = {'aaa':1, 'aab':3, 'aaf':3, 'aac':2, 'aad':2, 'aae':4}
def sort_compare(a, b):
c = cmp(dic[b], dic[a])
if c != 0:
return c
return cmp(a, b)
for k in sorted(dic.keys(), cmp=sort_compare):
print k, dic[k]
Don't know how pythonic it is however :)
Building on Thomas Wouters and Ricardo Reyes solutions:
def combine(*cmps):
"""Sequence comparisons."""
def comparator(a, b):
for cmp in cmps:
result = cmp(a, b):
if result:
return result
return 0
return comparator
def reverse(cmp):
"""Invert a comparison."""
def comparator(a, b):
return cmp(b, a)
return comparator
def compare_nth(cmp, n):
"""Compare the n'th item from two sequences."""
def comparator(a, b):
return cmp(a[n], b[n])
return comparator
rev_val_key_cmp = combine(
# compare values, decreasing
reverse(compare_nth(1, cmp)),
# compare keys, increasing
compare_nth(0, cmp)
)
data = { 'keyC':1, 'keyB':2, 'keyA':1 }
for key, value in sorted(data.items(), cmp=rev_val_key_cmp):
print key, value
>>> keys = sorted(a, key=lambda k: (-a[k], k))
or
>>> keys = sorted(a)
>>> keys.sort(key=a.get, reverse=True)
then
print [(key, a[key]) for key in keys]
[('keyB', 2), ('keyA', 1), ('keyC', 1)]
Related
I am working on a function
def common_words(dictionary, N):
if len(dictionary) > N:
max(dictionary, key=dictionary.get)
Description of the function is:
The first parameter is the dictionary of word counts and the second is
a positive integer N. This function should update the dictionary so
that it includes the most common (highest frequency words). At most N
words should be included in the dictionary. If including all words
with some word count would result in a dictionary with more than N
words, then none of the words with that word count should be included.
(i.e., in the case of a tie for the N+1st most common word, omit all
of the words in the tie.)
So I know that I need to get the N items with the highest values but I am not sure how to do that. I also know that once I get N items that if there are any duplicate values that I need to pop them out.
For example, given
k = {'a':5, 'b':4, 'c':4, 'd':1}
then
common_words(k, 2)
should modify k so that it becomes {'a':5}.
Here's my algorithm for this problem.
Extract the data from the dictionary into a list and sort it in descending order on the dictionary values.
Clear the original dictionary.
Group the sorted data into groups that have the same value.
Re-populate the dictionary with the all (key, value) pairs from each group in the sorted list if that will keep the total dictionary size <= N. If adding a group would make the total dictionary size > N, then return.
The grouping operation can be easily done using the standard itertools.groupby function.
To perform the sorting and grouping we need an appropriate key function, as described in the groupby, list and sorted docs. Since we need the second item of each tuple we could use
def keyfunc(t):
return t[1]
or
keyfunc = lambda t: t[1]
but it's more efficient to use operator.itemgetter.
from operator import itemgetter
from itertools import groupby
def common_words(d, n):
keyfunc = itemgetter(1)
lst = sorted(d.items(), key=keyfunc, reverse=True)
d.clear()
for _, g in groupby(lst, key=keyfunc):
g = list(g)
if len(d) + len(g) <= n:
d.update(g)
else:
break
# test
data = {'a':5, 'b':4, 'c':4, 'd':1}
common_words(data, 4)
print(data)
common_words(data, 2)
print(data)
output
{'c': 4, 'd': 1, 'b': 4, 'a': 5}
{'a': 5}
my algorithm as below
1st build tuple list from dictionary sorted based on value from
largest to smallest
check for if item[N-1] match item[N] value, if yes, drop item[N-1]
(index start from 0, so -1 there)
finally, convert the slice of tuple list up to N element back to
dict, may change to use OrderedDict here if wanna retain the items order
it will just return the dictionary as it is if the dictionary length is less than N
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
return dict(tmp[:N])
# return [i[0] for i in tmp[:N]] # comment line above and uncomment this line to get keys only as your title mention how to get keys
else:
return dictionary
# return dictionary.keys() # comment line above and uncomment this line to get keys only as your title mention how to get keys
>>> common_words({'a':5, 'b':4, 'c':4, 'd':1}, 2)
{'a': 5}
OP wanna modify input dictionary within function and return None, it can be modified as below
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
# return dict(tmp[:N])
for i in tmp[N:]:
dictionary.pop(i[0])
>>> k = {'a':5, 'b':4, 'c':4, 'd':1}
>>> common_words(k, 2)
>>> k
{'a': 5}
I have made a small demo of a more complex problem
def f(a):
return tuple([x for x in range(a)])
d = {}
[d['1'],d['2']] = f(2)
print d
# {'1': 0, '2': 1}
# Works
Now suppose the keys are programmatically generated
How do i achieve the same thing for this case?
n = 10
l = [x for x in range(n)]
[d[x] for x in l] = f(n)
print d
# SyntaxError: can't assign to list comprehension
You can't, it's a syntactical feature of the assignment statement. If you do something dynamic, it'll use different syntax, and thus not work.
If you have some function results f() and a list of keys keys, you can use zip to create an iterable of keys and results, and loop over them:
d = {}
for key, value in zip(keys, f()):
d[key] = value
That is easily rewritten as a dict comprehension:
d = {key: value for key, value in zip(keys, f())}
Or, in this specific case as mentioned by #JonClements, even as
d = dict(zip(keys, f()))
I have a dictionary, i need to sort it on the descending order of the MI Value.And print the contents in dict one by one in descending order along with 'hi'
My coding:
d = dict()
for item in a:
specificy = c[item]
MI1= specificx/float(specificy)
MI2= MI1*specificx
M13= specificx*specificy
MI = MI1* math.log(MI1/float(MI2))
d[x + ' ' + item] = MI
print d
for k,v in d:
print k + v + 'hi'
This should do:
import operator
for k in sorted(d.items(), key=operator.itemgetter(1), reverse=True):
print(k + d[k] + 'hi')
It works by getting the items of the dictionary, sorting them by the values, reversed, then printing that.
See also: https://stackoverflow.com/a/613218/565635
To sort using key change index of x to 0
for k,v in sorted(d.items(), key=lambda x: x[0], reverse=False):
print k + v + 'hi'
To sort using value change index of x to 1
for k,v in sorted(d.items(), key=lambda x: x[1], reverse=False):
print k + v + 'hi'
To sort a dictionary on the item value you can use
sorted(d, key=d.__getitem__)
In your case the code becomes
for k in sorted(d, key=d.__getitem__, reverse=True):
print(k + d[k] + "hi")
Explanation
When in Python you write
d[k]
what is evaluated is
d.__getitem__(k)
so d.__getitem__ when d is a dictionary is a function that given a key returns the value associated to that key in the dictionary.
sorted instead is a predefined function that returns a sorted version of a sequence and accepts an optional parameter (named somewhat unfortunately key, but note that key has no relation to dictionaries here). This parameter can be used to determine on what the ordering comparison should be done; sorted also supports another optional parameter reversed where you can determine if ascendant or descendant sorting is required.
Finally when a dictionary is used as a sequence (for example passing it to sorted or iterating over it in a for) what you obtain are the keys of the dictionary.
This for example implies that sorted(d, key=d.__getitem__) returns the keys of the dictionary sorted according the value.
If I have a Python dictionary, how do I get the key to the entry which contains the minimum value?
I was thinking about something to do with the min() function...
Given the input:
{320:1, 321:0, 322:3}
It would return 321.
Best: min(d, key=d.get) -- no reason to interpose a useless lambda indirection layer or extract items or keys!
>>> d = {320: 1, 321: 0, 322: 3}
>>> min(d, key=d.get)
321
Here's an answer that actually gives the solution the OP asked for:
>>> d = {320:1, 321:0, 322:3}
>>> d.items()
[(320, 1), (321, 0), (322, 3)]
>>> # find the minimum by comparing the second element of each tuple
>>> min(d.items(), key=lambda x: x[1])
(321, 0)
Using d.iteritems() will be more efficient for larger dictionaries, however.
For multiple keys which have equal lowest value, you can use a list comprehension:
d = {320:1, 321:0, 322:3, 323:0}
minval = min(d.values())
res = [k for k, v in d.items() if v==minval]
[321, 323]
An equivalent functional version:
res = list(filter(lambda x: d[x]==minval, d))
min(d.items(), key=lambda x: x[1])[0]
>>> d = {320:1, 321:0, 322:3}
>>> min(d, key=lambda k: d[k])
321
For the case where you have multiple minimal keys and want to keep it simple
def minimums(some_dict):
positions = [] # output variable
min_value = float("inf")
for k, v in some_dict.items():
if v == min_value:
positions.append(k)
if v < min_value:
min_value = v
positions = [] # output variable
positions.append(k)
return positions
minimums({'a':1, 'b':2, 'c':-1, 'd':0, 'e':-1})
['e', 'c']
min(zip(d.values(), d.keys()))[1]
Use the zip function to create an iterator of tuples containing values and keys. Then wrap it with a min function which takes the minimum based on the first key. This returns a tuple containing (value, key) pair. The index of [1] is used to get the corresponding key.
If you are not sure that you have not multiple minimum values, I would suggest:
d = {320:1, 321:0, 322:3, 323:0}
print ', '.join(str(key) for min_value in (min(d.values()),) for key in d if d[key]==min_value)
"""Output:
321, 323
"""
Another approach to addressing the issue of multiple keys with the same min value:
>>> dd = {320:1, 321:0, 322:3, 323:0}
>>>
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> print [v for k,v in groupby(sorted((v,k) for k,v in dd.iteritems()), key=itemgetter(0)).next()[1]]
[321, 323]
You can get the keys of the dict using the keys function, and you're right about using min to find the minimum of that list.
This is an answer to the OP's original question about the minimal key, not the minimal answer.
Use min with an iterator (for python 3 use items instead of iteritems); instead of lambda use the itemgetter from operator, which is faster than lambda.
from operator import itemgetter
min_key, _ = min(d.iteritems(), key=itemgetter(1))
d={}
d[320]=1
d[321]=0
d[322]=3
value = min(d.values())
for k in d.keys():
if d[k] == value:
print k,d[k]
I compared how the following three options perform:
import random, datetime
myDict = {}
for i in range( 10000000 ):
myDict[ i ] = random.randint( 0, 10000000 )
# OPTION 1
start = datetime.datetime.now()
sorted = []
for i in myDict:
sorted.append( ( i, myDict[ i ] ) )
sorted.sort( key = lambda x: x[1] )
print( sorted[0][0] )
end = datetime.datetime.now()
print( end - start )
# OPTION 2
start = datetime.datetime.now()
myDict_values = list( myDict.values() )
myDict_keys = list( myDict.keys() )
min_value = min( myDict_values )
print( myDict_keys[ myDict_values.index( min_value ) ] )
end = datetime.datetime.now()
print( end - start )
# OPTION 3
start = datetime.datetime.now()
print( min( myDict, key=myDict.get ) )
end = datetime.datetime.now()
print( end - start )
Sample output:
#option 1
236230
0:00:14.136808
#option 2
236230
0:00:00.458026
#option 3
236230
0:00:00.824048
Or __getitem__:
>>> d = {320: 1, 321: 0, 322: 3}
>>> min(d, key=d.__getitem__)
321
To create an orderable class you have to override six special functions, so that it would be called by the min() function.
These methods are__lt__ , __le__, __gt__, __ge__, __eq__ , __ne__ in order they are less than, less than or equal, greater than, greater than or equal, equal, not equal.
For example, you should implement __lt__ as follows:
def __lt__(self, other):
return self.comparable_value < other.comparable_value
Then you can use the min function as follows:
minValue = min(yourList, key=(lambda k: yourList[k]))
This worked for me.
my_dic = {320:1, 321:0, 322:3}
min_value = sorted(my_dic, key=lambda k: my_dic[k])[0]
print(min_value)
A solution with only the sorted method.
I sorted values from smallest to largest with sorted method
When we get the first index, it gives the smallest key.
# python
d={320:1, 321:0, 322:3}
reduce(lambda x,y: x if d[x]<=d[y] else y, d.iterkeys())
321
I got this far:
def most_frequent(string):
d = dict()
for key in string:
if key not in d:
d[key] = 1
else:
d[key] += 1
return d
print most_frequent('aabbbc')
Returning:
{'a': 2, 'c': 1, 'b': 3}
Now I need to:
reverse the pair
sort by number by decreasing order
only print the letters out
Should I convert this dictionary to tuples or list?
Here's a one line answer
sortedLetters = sorted(d.iteritems(), key=lambda (k,v): (v,k))
This should do it nicely.
def frequency_analysis(string):
d = dict()
for key in string:
d[key] = d.get(key, 0) + 1
return d
def letters_in_order_of_frequency(string):
frequencies = frequency_analysis(string)
# frequencies is of bounded size because number of letters is bounded by the dictionary, not the input size
frequency_list = [(freq, letter) for (letter, freq) in frequencies.iteritems()]
frequency_list.sort(reverse=True)
return [letter for freq, letter in frequency_list]
string = 'aabbbc'
print letters_in_order_of_frequency(string)
Here is something that returns a list of tuples rather than a dictionary:
import operator
if __name__ == '__main__':
test_string = 'cnaa'
string_dict = dict()
for letter in test_string:
if letter not in string_dict:
string_dict[letter] = test_string.count(letter)
# Sort dictionary by values, credits go here http://stackoverflow.com/questions/613183/sort-a-dictionary-in-python-by-the-value/613218#613218
ordered_answer = sorted(string_dict.items(), key=operator.itemgetter(1), reverse=True)
print ordered_answer
Python 2.7 supports this use case directly:
>>> from collections import Counter
>>> Counter('abracadabra').most_common()
[('a', 5), ('r', 2), ('b', 2), ('c', 1), ('d', 1)]
chills42 lambda function wins, I think but as an alternative, how about generating the dictionary with the counts as the keys instead?
def count_chars(string):
distinct = set(string)
dictionary = {}
for s in distinct:
num = len(string.split(s)) - 1
dictionary[num] = s
return dictionary
def print_dict_in_reverse_order(d):
_list = d.keys()
_list.sort()
_list.reverse()
for s in _list:
print d[s]
EDIT This will do what you want. I'm stealing chills42 line and adding another:
sortedLetters = sorted(d.iteritems(), key=lambda (k,v): (v,k))
sortedString = ''.join([c[0] for c in reversed(sortedLetters)])
------------original answer------------
To print out the sorted string add another line to chills42 one-liner:
''.join(map(lambda c: str(c[0]*c[1]), reversed(sortedLetters)))
This prints out 'bbbaac'
If you want single letters, 'bac' use this:
''.join([c[0] for c in reversed(sortedLetters)])
from collections import defaultdict
def most_frequent(s):
d = defaultdict(int)
for c in s:
d[c] += 1
return "".join([
k for k, v in sorted(
d.iteritems(), reverse=True, key=lambda (k, v): v)
])
EDIT:
here is my one liner:
def most_frequent(s):
return "".join([
c for frequency, c in sorted(
[(s.count(c), c) for c in set(s)], reverse=True
)
])
Here's the code for your most_frequent function:
>>> a = 'aabbbc'
>>> {i: a.count(i) for i in set(a)}
{'a': 2, 'c': 1, 'b': 3}
this particular syntax is for py3k, but it's easy to write something similar using syntax of previous versions. it seems to me a bit more readable than yours.
def reversedSortedFrequency(string)
from collections import defaultdict
d = defaultdict(int)
for c in string:
d[c]+=1
return sorted([(v,k) for k,v in d.items()], key=lambda (k,v): -k)
Here is the fixed version (thank you for pointing out bugs)
def frequency(s):
return ''.join(
[k for k, v in
sorted(
reduce(
lambda d, c: d.update([[c, d.get(c, 0) + 1]]) or d,
list(s),
dict()).items(),
lambda a, b: cmp(a[1], b[1]),
reverse=True)])
I think the use of reduce makes the difference in this sollution compared to the others...
In action:
>>> from frequency import frequency
>>> frequency('abbbccddddxxxyyyyyz')
'ydbxcaz'
This includes extracting the keys (and counting them) as well!!! Another nice property is the initialization of the dictionary on the same line :)
Also: no includes, just builtins.
The reduce function is kinda hard to wrap my head around, and setting dictionary values in a lambda is also a bit cumbersome in python, but, ah well, it works!