python dictionary values sorting

python dictionary values sorting - python

I have 2 dictionaries, dict1 and dict2 which contain the same keys, but different values for the keys. What I want to do is for each dictionary, sort the values from largest to smallest, and then give each value a rank 1-N, 1 being the largest value. From here, I want to get the difference of the ranks for the values in each dictionary for the same key. For example:
dict1 = {a:0.6, b:0.3, c:0.9, d:1.2, e:0.2}
dict2 = {a:1.4, b:7.7, c:9.0, d:2.5, e:2.0}
# sorting by values would look like this:
dict1 = {d:1.2, c:0.9, a:0.6, b:0.3, e:0.2}
dict2 = {c:9.0, b:7.7, d:2.5, e:2.0, a:1.4}
#ranking the values would produce this:
dict1 = {d:1, c:2, a:3, b:4, e:5}
dict2 = {c:1, b:2, d:3, e:4, a:5}
#computing the difference between ranks would be something like this:
diffs = {}
for x in dict1.keys():
diffs[x] = (dict1[x] - dict2[x])
#diffs would look like this:
diffs[a] = -2
diffs[b] = 2
diffs[c] = 1
diffs[d] = -2
diffs[e] = 1
I know dictionaries are meant to be random and not sortable, but maybe there is a method to put the keys and values into a list? The main challenges I am facing are getting the keys and values sorted by value (largest to smallest) and then changing the value to its respective rank in the sorted list.

A simple solution for small dicts is
dict1 = {"a":0.6, "b":0.3, "c":0.9, "d":1.2, "e":0.2}
dict2 = {"a":1.4, "b":7.7, "c":9.0, "d":2.5, "e":2.0}
k1 = sorted(dict1, key=dict1.get)
k2 = sorted(dict2, key=dict2.get)
diffs = dict((k, k2.index(k) - k1.index(k)) for k in dict1)
A more efficient, less readable version for larger dicts:
ranks1 = dict(map(reversed, enumerate(sorted(dict1, key=dict1.get))))
ranks2 = dict(map(reversed, enumerate(sorted(dict2, key=dict2.get))))
diffs = dict((k, ranks2[k] - ranks1[k]) for k in dict1)

You may be interested in collections.OrderedDict
Here's a sample, my initial thougth is you were also looking for dictionaries with keys ordered by values, things that od1 and od2 are.
d1 = {"a":0.6, "b":0.3, "c":0.9, "d":1.2, "e":0.2}
d2 = {"a":1.4, "b":7.7, "c":9.0, "d":2.5, "e":2.0}
od1 = OrderedDict(sorted(d1.items(), key=lambda t: t[1]))
od2 = OrderedDict(sorted(d2.items(), key=lambda t: t[1]))
k1 = od1.keys()
k2 = od2.keys()
diff = dict((k, n - k2.index(k)) for n, k in enumerate(k1))
If you don't need them then Sven solution is probably faster.
edit: not that faster honestly... (sven.py is his second, more efficient version):
$ cat /tmp/mine.py | time python -m timeit
10000000 loops, best of 3: 0.0842 usec per loop
real 0m 3.69s
user 0m 3.38s
sys 0m 0.03s
$ cat /tmp/sven.py | time python -m timeit
10000000 loops, best of 3: 0.085 usec per loop
real 0m 3.86s
user 0m 3.42s
sys 0m 0.03s
If someone wants to post formatted bigger dicts I'll test them too.

What version of python are you using? If 2.7, use OrderedDict.
Per the Python 2.7 docs:
OrderedDict(sorted(d.items(), key=d.get))
If you're using Python 2.4-2.6 you can still use OrderedDict by installing it from pypi here or if you have setuptools, run
easy_install ordereddict

A dictionary is not the right data structure to solve this problem. You should convert to sorted lists as soon as possible and produce the dictionary only as the final result. The following sample solution uses iterators and generator expressions where possible, to avoid creating too many (potentially large) helper lists along the way:
def get_ranking(vals):
'''Return a list of pairs: (key, ranking), sorted by key.'''
ranking = sorted(((v, k) for k, v in vals.iteritems()), reverse=True)
return sorted((k, i) for (i, (_v, k)) in enumerate(ranking))
def ranking_diff(rank1, rank2):
return dict((k, v1 - v2) for (k, v1), (_, v2) in itertools.izip(rank1, rank2))
def get_diffs(dict1, dict2):
r1 = get_ranking(dict1)
r2 = get_ranking(dict2)
return ranking_diff(r1, r2)
print get_diffs(dict1, dict2)
# prints: {'a': -2, 'c': 1, 'b': 2, 'e': 1, 'd': -2}
Please note that this solution assumes that both dicts contain exactly the same keys.

Related

Python reverse dictionary items order

Assume I have a dictionary:
d = {3: 'three', 2: 'two', 1: 'one'}
I want to rearrange the order of this dictionary so that the dictionary is:
d = {1: 'one', 2: 'two', 3: 'three'}
I was thinking something like the reverse() function for lists, but that did not work. Thanks in advance for your answers!

Since Python 3.8 and above, the items view is iterable in reverse, so you can just do:
d = dict(reversed(d.items()))
On 3.7 and 3.6, they hadn't gotten around to implementing __reversed__ on dict and dict views (issue33462: reversible dict), so use an intermediate list or tuple, which do support reversed iteration:
d = {3: 'three', 2: 'two', 1: 'one'}
d = dict(reversed(list(d.items())))
Pre-3.6, you'd need collections.OrderedDict (both for the input and the output) to achieve the desired result. Plain dicts did not preserve any order until CPython 3.6 (as an implementation detail) and Python 3.7 (as a language guarantee).

Standard Python dictionaries (Before Python 3.6) don't have an order and don't guarantee order. This is exactly what the creation of OrderedDict is for.
If your Dictionary was an OrderedDict you could reverse it via:
import collections
mydict = collections.OrderedDict()
mydict['1'] = 'one'
mydict['2'] = 'two'
mydict['3'] = 'three'
collections.OrderedDict(reversed(list(mydict.items())))

Another straightforward solution, which is guaranteed to work for Python v3.7 and over:
d = {'A':'a', 'B':'b', 'C':'c', 'D':'d'}
dr = {k: d[k] for k in reversed(d)}
print(dr)
Output:
{'D': 'd', 'C': 'c', 'B': 'b', 'A': 'a'}
Note that reversed dictionaries are still considered equal to their unreversed originals, i.e.:
(d == dr) == True
In response to someone upvoting this comment, I was curious to see which solution is actually faster.
As usual, it depends. Reversing a 10,000 item dictionary 10,000 times is faster with the solution using list and reversed on the items. But reversing a 1,000,000 item dictionary 100 times (i.e. the same number of items in total reversed dictionaries, just a bigger starting dictionary) is faster with the comprehension - it's left up to the reader to find the exact point where it flips. If you deal with large dictionaries, you may want to benchmark either if performance matters:
from random import randint
from timeit import timeit
def f1(d):
return dict(reversed(list(d.items())))
def f2(d):
return {k: d[k] for k in reversed(d)}
def compare(n):
d = {i: randint(1, 100) for i in range(n)}
print(timeit(lambda: f1(d), number=100000000 // n))
print(timeit(lambda: f2(d), number=100000000 // n))
compare(10000)
compare(1000000)
Results (one run, typical results):
4.1554735
4.7047593
8.750093200000002
6.7306311

Python - Find average in dict elements

I have dict like:
dict = [{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}]
I need to get average of all different keys. Result should looks like:
avg = [{'a':1.5, 'b':3.5, 'c':5}]
I can get summary of all keys, but Im failing to realize how can I count same keys in order to get average number.

This can be easily done with pandas:
>>> import pandas
>>> df = pandas.DataFrame([{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}])
>>> df.mean()
a 1.5
b 3.5
c 5.0
dtype: float64
If you need a dictionary as result:
>>> dict(df.mean())
{'a': 1.5, 'b': 3.5, 'c': 5.0}

You could create an intermediate dictionary that collects all encountered values as lists:
dct = [{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}]
from collections import defaultdict
intermediate = defaultdict(list)
for subdict in dct:
for key, value in subdict.items():
intermediate[key].append(value)
# intermediate is now: defaultdict(list, {'a': [2, 1], 'b': [3, 4], 'c': [5]})
And finally calculate the average by dividing the sum of each list by the length of each list:
for key, value in intermediate.items():
print(key, sum(value)/len(value))
which prints:
b 3.5
c 5.0
a 1.5

You can use a for loop with a counter and then divide the sum of each by the counter.
Also it is weird you are calling the array/list a dict...
I'd suggest something like this:
Create a new dict:
letter_count = {}
-For loop over the current dicts
-Add the letter to the letter count if it doesn't exist
-If it does exist, update the value with the value of the item (+=number) as well as update the counter by one
-Once the for loop is done, divide each value by the counter
-Return the new dict letter_count

I thought of adding a unique answer using PyFunctional
from functional import seq
l = [{'a':2, 'b':3}, {'b':4}, {'a':1, 'c':5}]
a = (seq(l)
# convert dictionary to list
.map(lambda d: seq(d).map(lambda k: (k, d[k])))
.flatten()
# append 1 for counter
.map(lambda (k, v): (k, (v, 1)))
# sum of values, and counts
.reduce_by_key(lambda a, b: (a[0]+b[0], a[1]+b[1]))
# average
.map(lambda (k, (v, c)): (k, float(v)/c))
# convert to dict
.to_dict()
)
print(a)
Output
{'a': 1.5, 'c': 5.0, 'b': 3.5}

Python get remaining runoff voting

I am a little stuck on writing a function for a project. This function takes a dictionary of candidates who's values are the number of votes they received. I then have to return a set containing the remaining_candidates. In other words the candidate with the least amount of votes should not be in the set being returned and if for example all of the candidates have the same votes, the set should be empty. I am having trouble getting started here.
For example I know I can sort the dictionary like so:
x = min(canadites, key=canadites.__getitem__)
but that will not work if the candidates have the same value, as it just pops up the last one in the dict.
Any ideas?
Update: To make things clear.
Lets say I have the following dictionary:
canadites = {'X':22,'Y':1, 'Z':0}
Ideally the function should return a set containing only X and Y. But if Y and Z where both 1
x = min(canadites, key=canadites.__getitem__)
seems to only return Z

It's cleaner to create a new dict instead of popping items from the old one:
>>> d = {'a':1, 'b':2, 'c':1, 'd':3}
>>> min_val = min(d.values())
>>> {k:v for k,v in d.items() if v > min_val}
{'b': 2, 'd': 3}
In python2, itervalues and iteritems would be more efficient, although this is a micro-optimization in most cases.

python dict: get vs setdefault

The following two expressions seem equivalent to me. Which one is preferable?
data = [('a', 1), ('b', 1), ('b', 2)]
d1 = {}
d2 = {}
for key, val in data:
# variant 1)
d1[key] = d1.get(key, []) + [val]
# variant 2)
d2.setdefault(key, []).append(val)
The results are the same but which version is better or rather more pythonic?
Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).
To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.
My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?

Your two examples do the same thing, but that doesn't mean get and setdefault do.
The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.
Making the two methods as similar as possible, I ran
from timeit import timeit
print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)
and got
0.794723378711
0.811882272256
0.724429205999
0.722129751973
So setdefault is around 10% faster than get for this purpose.
The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.
See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).

The accepted answer from agf isn't comparing like with like. After:
print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] contains a list with 10,000 items whereas after:
print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:
print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)
and in fact is faster than the faulty setdefault example.
The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.
Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.
So, assuming d3, d4 = defaultdict(list), {}
# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
d4[key].append(val)
else:
d4[key] = [val]
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.

For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -
Scenario-1
root = {}
root.setdefault('A', [])
print(root)
Scenario-2
root = {}
root.get('A', [])
print(root)
In Scenario-1 output will be {'A': []} while in Scenario-2 {}
So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.
Now let come where this will be useful-
Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.
using setdefault()
def fn1(dic, key, lst):
dic.setdefault(key, []).extend(lst)
using get()
def fn2(dic, key, lst):
dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here
Now lets examine timings -
dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])
Took 288 ns
dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])
Took 128 s
So there is a very large timing difference between these two approaches.

You might want to look at defaultdict in the collections module. The following is equivalent to your examples.
from collections import defaultdict
data = [('a', 1), ('b', 1), ('b', 2)]
d = defaultdict(list)
for k, v in data:
d[k].append(v)
There's more here.

1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/
dict.setdefault typical usage
somedict.setdefault(somekey,[]).append(somevalue)
dict.get typical usage
theIndex[word] = 1 + theIndex.get(word,0)
2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
dict.setdefault() is equivalent to get or set & get. Or set if necessary then get. It's especially efficient if your dictionary key is expensive to compute or long to type.
The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.
3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html
get(key[, default])
Return the value for key if key is in the dictionary, else default. If
default is not given, it defaults to None, so that this method never
raises a KeyError.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

The logic of dict.get is:
if key in a_dict:
value = a_dict[key]
else:
value = default_value
Take an example:
In [72]: a_dict = {'mapping':['dict', 'OrderedDict'], 'array':['list', 'tuple']}
In [73]: a_dict.get('string', ['str', 'bytes'])
Out[73]: ['str', 'bytes']
In [74]: a_dict.get('array', ['str', 'byets'])
Out[74]: ['list', 'tuple']
The mechamism of setdefault is:
levels = ['master', 'manager', 'salesman', 'accountant', 'assistant']
#group them by the leading letter
group_by_leading_letter = {}
# the logic expressed by obvious if condition
for level in levels:
leading_letter = level[0]
if leading_letter not in group_by_leading_letter:
group_by_leading_letter[leading_letter] = [level]
else:
group_by_leading_letter[leading_letter].append(word)
In [80]: group_by_leading_letter
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
The setdefault dict method is for precisely this purpose. The preceding for loop can be rewritten as:
In [87]: for level in levels:
...: leading = level[0]
...: group_by_leading_letter.setdefault(leading,[]).append(level)
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
It's very simple, means that either a non-null list append an element or a null list append an element.
The defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:
from collections import defualtdict
group_by_leading_letter = defaultdict(list)
for level in levels:
group_by_leading_letter[level[0]].append(level)

There is no strict answer to this question. They both accomplish the same purpose. They can both be used to deal with missing values on keys. The only difference that I have found is that with setdefault(), the key that you invoke (if not previously in the dictionary) gets automatically inserted while it does not happen with get(). Here is an example:
Setdefault()
>>> myDict = {'A': 'GOD', 'B':'Is', 'C':'GOOD'} #(1)
>>> myDict.setdefault('C') #(2)
'GOOD'
>>> myDict.setdefault('C','GREAT') #(3)
'GOOD'
>>> myDict.setdefault('D','AWESOME') #(4)
'AWESOME'
>>> myDict #(5)
{'A': 'GOD', 'B': 'Is', 'C': 'GOOD', 'D': 'AWSOME'}
>>> myDict.setdefault('E')
>>>
Get()
>>> myDict = {'a': 1, 'b': 2, 'c': 3} #(1)
>>> myDict.get('a',0) #(2)
1
>>> myDict.get('d',0) #(3)
0
>>> myDict #(4)
{'a': 1, 'b': 2, 'c': 3}
Here is my conclusion: there is no specific answer to which one is best specifically when it comes to default values imputation. The only difference is that setdefault() automatically adds any new key with a default value in the dictionary while get() does not. For more information, please go here !

In [1]: person_dict = {}
In [2]: person_dict['liqi'] = 'LiQi'
In [3]: person_dict.setdefault('liqi', 'Liqi')
Out[3]: 'LiQi'
In [4]: person_dict.setdefault('Kim', 'kim')
Out[4]: 'kim'
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}
In [8]: person_dict.get('Dim', '')
Out[8]: ''
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}

Python: How to construct a tuple, value dictionary from a list of key,value dictionary?

I have a list of dicts as follows:
lst = [{'unitname':'unit1', 'test1': 2, 'test2': 9}, {'unitname':'unit2', 'test1': 24, 'test2': 35}]
How do I contruct a single dict as follows:
dictA = { ('unit1','test1'): 2, ('unit1','test2'): 9, ('unit2','test1'):24, ('unit2','test2' : 35 }
`
I have all the unit names & test names in a list:
unitnames = ['unit1','unit2']
testnames = ['test1','test2']
I tried but missed out some tests for some units.
dictA = {}
for unit in unitnames:
for dict in lst:
for k,v in dict.items():
dictA[unit,k] = v
Advices? Thanks.

dict(((d['unitname'], k), t)
for d in lst
for (k, t) in d.iteritems()
if k != 'unitname')

You could try:
dictA = {}
for l in lst:
name = l.pop('unitname')
for test in l:
dictA[name, test] = l[test]
Posted at the same time and with the same assumptions as Gareth's solution - however this will not give you the extra item of (name, 'unitname') = name
Marcelo Cantos's solution is quite elegant, but would be easier for mere mortals like us to parse like this:
dict( ((d['unitname'], k), t)
for d in lst
for (k, t) in d.iteritems()
if k != 'unitname'
)

dictA = {}
for d in lst:
unit = d['unitname']
for test in testnames:
if test in d:
dictA[unit,test] = d[test]
I'm assuming (1) that all the dicts in your list have a unitname key, (2) that its value is always one of the units you're interested in, (3) that some dicts in the list may have entries for tests you aren't interested in, and (4) that some tests you're interested in may be absent from some dicts in the list. Those assumptions are a bit arbitrary; if any happen to be wrong it shouldn't be hard to adjust the code for them.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python dictionary values sorting - python

What version of python are you using? If 2.7, use OrderedDict. Per the Python 2.7 docs: OrderedDict(sorted(d.items(), key=d.get)) If you're using Python 2.4-2.6 you can still use OrderedDict by installing it from pypi here or if you have setuptools, run easy_install ordereddict

Related

Python reverse dictionary items order

Python - Find average in dict elements

Python get remaining runoff voting

python dict: get vs setdefault

Python: How to construct a tuple, value dictionary from a list of key,value dictionary?

Categories

Resources