Related
I'm working on python 3.2.2.
Breaking my head more than 3 hours to sort a dictionary by it's keys.
I managed to make it a sorted list with 2 argument members, but can not make it a sorted dictionary in the end.
This is what I've figured:
myDic={10: 'b', 3:'a', 5:'c'}
sorted_list=sorted(myDic.items(), key=lambda x: x[0])
But no matter what I can not make a dictionary out of this sorted list. How do I do that? Thanks!
A modern and fast solution, for Python 3.7. May also work in some interpreters of Python 3.6.
TLDR
To sort a dictionary by keys use:
sorted_dict = {k: disordered[k] for k in sorted(disordered)}
Almost three times faster than the accepted answer; probably more when you include imports.
Comment on the accepted answer
The example in the accepted answer instead of iterating over the keys only - with key parameter of sorted() or the default behaviour of dict iteration - iterates over tuples (key, value), which suprisingly turns out to be much slower than comparing the keys only and accessing dictionary elements in a list comprehension.
How to sort by key in Python 3.7
The big change in Python 3.7 is that the dictionaries are now ordered by default.
You can generate sorted dict using dict comprehensions.
Using OrderedDict might still be preferable for the compatibility sake.
Do not use sorted(d.items()) without key.
See:
disordered = {10: 'b', 3: 'a', 5: 'c'}
# sort keys, then get values from original - fast
sorted_dict = {k: disordered[k] for k in sorted(disordered)}
# key = itemgetter - slower
from operator import itemgetter
key = itemgetter(0)
sorted_dict = {k: v for k, v in sorted(disordered.items(), key=key)}
# key = lambda - the slowest
key = lambda item: item[0]
sorted_dict = {k: v for k in sorted(disordered.items(), key=key)}
Timing results:
Best for {k: d[k] for k in sorted(d)}: 7.507327548999456
Best for {k: v for k, v in sorted(d.items(), key=key_getter)}: 12.031082626002899
Best for {k: v for k, v in sorted(d.items(), key=key_lambda)}: 14.22885995300021
Best for dict(sorted(d.items(), key=key_getter)): 11.209122000000207
Best for dict(sorted(d.items(), key=key_lambda)): 13.289728325995384
Best for dict(sorted(d.items())): 14.231471302999125
Best for OrderedDict(sorted(d.items(), key=key_getter)): 16.609151654003654
Best for OrderedDict(sorted(d.items(), key=key_lambda)): 18.52622927199991
Best for OrderedDict(sorted(d.items())): 19.436101284998585
Testing code:
from timeit import repeat
setup_code = """
from operator import itemgetter
from collections import OrderedDict
import random
random.seed(0)
d = {i: chr(i) for i in [random.randint(0, 120) for repeat in range(120)]}
key_getter = itemgetter(0)
key_lambda = lambda item: item[0]
"""
cases = [
# fast
'{k: d[k] for k in sorted(d)}',
'{k: v for k, v in sorted(d.items(), key=key_getter)}',
'{k: v for k, v in sorted(d.items(), key=key_lambda)}',
# slower
'dict(sorted(d.items(), key=key_getter))',
'dict(sorted(d.items(), key=key_lambda))',
'dict(sorted(d.items()))',
# the slowest
'OrderedDict(sorted(d.items(), key=key_getter))',
'OrderedDict(sorted(d.items(), key=key_lambda))',
'OrderedDict(sorted(d.items()))',
]
for code in cases:
times = repeat(code, setup=setup_code, repeat=3)
print(f"Best for {code}: {min(times)}")
dict does not keep its elements' order. What you need is an OrderedDict: http://docs.python.org/library/collections.html#collections.OrderedDict
edit
Usage example:
>>> from collections import OrderedDict
>>> a = {'foo': 1, 'bar': 2}
>>> a
{'foo': 1, 'bar': 2}
>>> b = OrderedDict(sorted(a.items()))
>>> b
OrderedDict([('bar', 2), ('foo', 1)])
>>> b['foo']
1
>>> b['bar']
2
I don't think you want an OrderedDict. It sounds like you'd prefer a SortedDict, that is a dict that maintains its keys in sorted order. The sortedcontainers module provides just such a data type. It's written in pure-Python, fast-as-C implementations, has 100% coverage and hours of stress.
Installation is easy with pip:
pip install sortedcontainers
Note that if you can't pip install then you can simply pull the source files from the open-source repository.
Then you're code is simply:
from sortedcontainers import SortedDict
myDic = SortedDict({10: 'b', 3:'a', 5:'c'})
sorted_list = list(myDic.keys())
The sortedcontainers module also maintains a performance comparison with other popular implementations.
Python's ordinary dicts cannot be made to provide the keys/elements in any specific order. For that, you could use the OrderedDict type from the collections module. Note that the OrderedDict type merely keeps a record of insertion order. You would have to sort the entries prior to initializing the dictionary if you want subsequent views/iterators to return the elements in order every time. For example:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sorted_list=sorted(myDic.items(), key=lambda x: x[0])
>>> myOrdDic = OrderedDict(sorted_list)
>>> myOrdDic.items()
[(3, 'a'), (5, 'c'), (10, 'b')]
>>> myOrdDic[7] = 'd'
>>> myOrdDic.items()
[(3, 'a'), (5, 'c'), (10, 'b'), (7, 'd')]
If you want to maintain proper ordering for newly added items, you really need to use a different data structure, e.g., a binary tree/heap. This approach of building a sorted list and using it to initialize a new OrderedDict() instance is just woefully inefficient unless your data is completely static.
Edit: So, if the object of sorting the data is merely to print it in order, in a format resembling a python dict object, something like the following should suffice:
def pprint_dict(d):
strings = []
for k in sorted(d.iterkeys()):
strings.append("%d: '%s'" % (k, d[k]))
return '{' + ', '.join(strings) + '}'
Note that this function is not flexible w/r/t the types of the key, value pairs (i.e., it expects the keys to be integers and the corresponding values to be strings). If you need more flexibility, use something like strings.append("%s: %s" % (repr(k), repr(d[k]))) instead.
With Python 3.7 I could do this:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sortDic = sorted(myDic.items())
>>> print(dict(sortDic))
{3:'a', 5:'c', 10: 'b'}
If you want a list of tuples:
>>> myDic={10: 'b', 3:'a', 5:'c'}
>>> sortDic = sorted(myDic.items())
>>> print(sortDic)
[(3, 'a'), (5, 'c'), (10, 'b')]
Dictionaries are unordered by definition, What would be the main reason for ordering by key? A list of tuples created by the sort method can be used for whatever the need may have been, but changing the list of tuples back into a dictionary will return a random order
>>> myDic
{10: 'b', 3: 'a', 5: 'c'}
>>> sorted(myDic.items())
[(3, 'a'), (5, 'c'), (10, 'b')]
>>> print(dict(myDic.items()))
{10: 'b', 3: 'a', 5: 'c'}
Maybe not that good but I've figured this:
def order_dic(dic):
ordered_dic={}
key_ls=sorted(dic.keys())
for key in key_ls:
ordered_dic[key]=dic[key]
return ordered_dic
Any modern solution to this problem?
I worked around it with:
order = sorted([ job['priority'] for job in self.joblist ])
sorted_joblist = []
while order:
min_priority = min(order)
for job in self.joblist:
if job['priority'] == min_priority:
sorted_joblist += [ job ]
order.remove(min_priority)
self.joblist = sorted_joblist
The joblist is formatted as:
joblist = [ { 'priority' : 3, 'name' : 'foo', ... }, { 'priority' : 1, 'name' : 'bar', ... } ]
Basically I create a list (order) with all the elements by which I want to sort the dict
then I iterate this list and the dict, when I find the item on the dict I send it to a new dict and remove the item from 'order'.
Seems to be working, but I suppose there are better solutions.
I'm not sure whether this could help, but I had a similar problem and I managed to solve it, by defining an apposite function:
def sor_dic_key(diction):
lista = []
diction2 = {}
for x in diction:
lista.append([x, diction[x]])
lista.sort(key=lambda x: x[0])
for l in lista:
diction2[l[0]] = l[1]
return diction2
This function returns another dictionary with the same keys and relative values, but sorted by its keys.
Similarly, I defined a function that could sort a dictionary by its values. I just needed to use x[1] instead of x[0] in the lambda function. I find this second function mostly useless, but one never can tell!
I like python numpy for this kind of stuff! eg:
r=readData()
nsorted = np.lexsort((r.calls, r.slow_requests, r.very_slow_requests, r.stalled_requests))
I have an example of importing CSV data into a numpy and ordering by column priorities.
https://github.com/unixunion/toolbox/blob/master/python/csv-numpy.py
Kegan
The accepted answer definitely works, but somehow miss an important point.
The OP is asking for a dictionary sorted by it's keys this is just not really possible and not what OrderedDict is doing.
OrderedDict is maintaining the content of the dictionary in insertion order. First item inserted, second item inserted, etc.
>>> d = OrderedDict()
>>> d['foo'] = 1
>>> d['bar'] = 2
>>> d
OrderedDict([('foo', 1), ('bar', 2)])
>>> d = OrderedDict()
>>> d['bar'] = 2
>>> d['foo'] = 1
>>> d
OrderedDict([('bar', 2), ('foo', 1)])
Hencefore I won't really be able to sort the dictionary inplace, but merely to create a new dictionary where insertion order match key order. This is explicit in the accepted answer where the new dictionary is b.
This may be important if you are keeping access to dictionaries through containers. This is also important if you itend to change the dictionary later by adding or removing items: they won't be inserted in key order but at the end of dictionary.
>>> d = OrderedDict({'foo': 5, 'bar': 8})
>>> d
OrderedDict([('foo', 5), ('bar', 8)])
>>> d['alpha'] = 2
>>> d
OrderedDict([('foo', 5), ('bar', 8), ('alpha', 2)])
Now, what does mean having a dictionary sorted by it's keys ? That makes no difference when accessing elements by keys, this only matter when you are iterating over items. Making that a property of the dictionary itself seems like overkill. In many cases it's enough to sort keys() when iterating.
That means that it's equivalent to do:
>>> d = {'foo': 5, 'bar': 8}
>>> for k,v in d.iteritems(): print k, v
on an hypothetical sorted by key dictionary or:
>>> d = {'foo': 5, 'bar': 8}
>>> for k, v in iter((k, d[k]) for k in sorted(d.keys())): print k, v
Of course it is not hard to wrap that behavior in an object by overloading iterators and maintaining a sorted keys list. But it is likely overkill.
Sorting dictionaries by value using comprehensions. I think it's nice as 1 line and no need for functions or lambdas
a = {'b':'foo', 'c':'bar', 'e': 'baz'}
a = {f:a[f] for f in sorted(a, key=a.__getitem__)}
Easy and straightforward way:
op = {'1': (1,0,6),'3': (0,45,8),'2': (2,34,10)}
lp3 = sorted(op.items(), key=operator.itemgetter(0), reverse=True)
print(lp3)
ref: https://blog.csdn.net/weixin_37922873/article/details/81210032
I am using Python 2.7.x. I have a dictionary (I mean {}), key is int and value is string. I want to retrieve the key which has the minimal integer value. In C++, I think we can use map, which sort keys. And in Python, not sure if anything similar we can leverage? If my understanding is correct, Python dictionary (I mean {}) is not sorted by key.
thanks in advance,
Lin
Update
The OP has expressed a need for O(1) performance when finding the minimum key in a dictionary. Try the sortedcontainers module. It offers a SortedDict class:
>>> from sortedcontainers import SortedDict
>>> d = SortedDict({100: 'a', 27: 'b', 1234: 'c'})
>>> d.keys()
SortedSet([27, 100, 1234], key=None, load=1000)
>>> d.keys()[0]
27
>>> d[d.keys()[0]]
'b'
For a Python builtin dictionary you can use min(d) to find the lowest key:
>>> d = {100: 'a', 27: 'b', 1234: 'c'}
>>> print(d)
{1234: 'c', 27: 'b', 100: 'a'}
>>> print(min(d))
27
>>> print(d[min(d)])
b
In Python, dictionaries are represented internally by hash tables so you cannot natively get back the keys in sorted order. You can use sorted(d.keys()) to return a list of keys in sorted order. You can also use collections.OrderedDict if you pre-sort the keys. If you do not know the order of the keys ahead of time and you need to maintain the keys in sorted order as you insert values, you could take a look at this library for the SortedDict type.
There are binary tree implementations in Python:
https://pypi.python.org/pypi/bintrees/2.0.2
You can also use the blist.sorteddict.
You could also use the built-in bisect module to maintain a sorted list of tuples.
If all you care about is finding the minimum and you don't actually care about maintaining a sort, you can use the built-in heapq module to maintain a min-heap data structure which gives you constant time access to the minimal element.
Building off of #mhawke 's answer, you can simply do:
d = {1 : "a", 2 : "b", 3 : "c" }
print d[min(d.keys())] # => 'a'
Is it possible to get a partial view of a dict in Python analogous of pandas df.tail()/df.head(). Say you have a very long dict, and you just want to check some of the elements (the beginning, the end, etc) of the dict. Something like:
dict.head(3) # To see the first 3 elements of the dictionary.
{[1,2], [2, 3], [3, 4]}
Thanks
Kinda strange desire, but you can get that by using this
from itertools import islice
# Python 2.x
dict(islice(mydict.iteritems(), 0, 2))
# Python 3.x
dict(islice(mydict.items(), 0, 2))
or for short dictionaries
# Python 2.x
dict(mydict.items()[0:2])
# Python 3.x
dict(list(mydict.items())[0:2])
Edit:
in Python 3.x:
Without using libraries it's possible to do it this way. Use method:
.items()
which returns a list of dictionary keys with values.
It's necessary to convert it to a list otherwise an error will occur 'my_dict' object is not subscriptable. Then convert it to the dictionary. Now it's ready to slice with square brackets.
dict(list(my_dict.items())[:3])
import itertools
def glance(d):
return dict(itertools.islice(d.iteritems(), 3))
>>> x = {1:2, 3:4, 5:6, 7:8, 9:10, 11:12}
>>> glance(x)
{1: 2, 3: 4, 5: 6}
However:
>>> x['a'] = 2
>>> glance(x)
{1: 2, 3: 4, u'a': 2}
Notice that inserting a new element changed what the "first" three elements were in an unpredictable way. This is what people mean when they tell you dicts aren't ordered. You can get three elements if you want, but you can't know which three they'll be.
I know this question is 3 years old but here a pythonic version (maybe simpler than the above methods) for Python 3.*:
[print(v) for i, v in enumerate(my_dict.items()) if i < n]
It will print the first n elements of the dictionary my_dict
one-up-ing #Neb's solution with Python 3 dict comprehension:
{k: v for i, (k, v) in enumerate(my_dict.items()) if i < n}
It returns a dict rather than printouts
For those who would rather solve this problem with pandas dataframes. Just stuff your dictionary mydict into a dataframe, rotate it, and get the first few rows:
pd.DataFrame(mydict, index=[0]).T.head()
0 hi0
1 hi1
2 hi2
3 hi3
4 hi4
From the documentation:
CPython implementation detail: Keys and values are listed in an
arbitrary order which is non-random, varies across Python
implementations, and depends on the dictionary’s history of insertions
and deletions.
I've only toyed around at best with other Python implementations (eg PyPy, IronPython, etc), so I don't know for certain if this is the case in all Python implementations, but the general idea of a dict/hashmap/hash/etc is that the keys are unordered.
That being said, you can use an OrderedDict from the collections library. OrderedDicts remember the order of the keys as you entered them.
If keys are someway sortable, you can do this:
head = dict([(key, myDict[key]) for key in sorted(myDict.keys())[:3]])
Or perhaps:
head = dict(sorted(mydict.items(), key=lambda: x:x[0])[:3])
Where x[0] is the key of each key/value pair.
list(reverse_word_index.items())[:10]
Change the number from 10 to however many items of the dictionary reverse_word_index you want to preview
A quick and short solution can be this:
import pandas as pd
d = {"a": [1,2], "b": [2, 3], "c": [3, 4]}
pd.Series(d).head()
a [1, 2]
b [2, 3]
c [3, 4]
dtype: object
This gives back a dictionary:
dict(list(my_dictname.items())[0:n])
If you just want to have a glance of your dict, then just do:
list(freqs.items())[0:n]
Order of items in a dictionary is preserved in Python 3.7+, so this question makes sense.
To get a dictionary with only 10 items from the start you can use pandas:
d = {"a": [1,2], "b": [2, 3], "c": [3, 4]}
import pandas as pd
result = pd.Series(d).head(10).to_dict()
print(result)
This will produce a new dictionary.
d = {"a": 1,"b": 2,"c": 3}
for i in list(d.items())[:2]:
print('{}:{}'.format(d[i][0], d[i][1]))
a:1
b:2
The following two expressions seem equivalent to me. Which one is preferable?
data = [('a', 1), ('b', 1), ('b', 2)]
d1 = {}
d2 = {}
for key, val in data:
# variant 1)
d1[key] = d1.get(key, []) + [val]
# variant 2)
d2.setdefault(key, []).append(val)
The results are the same but which version is better or rather more pythonic?
Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).
To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.
My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?
Your two examples do the same thing, but that doesn't mean get and setdefault do.
The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.
Making the two methods as similar as possible, I ran
from timeit import timeit
print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)
and got
0.794723378711
0.811882272256
0.724429205999
0.722129751973
So setdefault is around 10% faster than get for this purpose.
The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.
See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).
The accepted answer from agf isn't comparing like with like. After:
print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] contains a list with 10,000 items whereas after:
print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:
print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)
and in fact is faster than the faulty setdefault example.
The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.
Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.
So, assuming d3, d4 = defaultdict(list), {}
# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
d4[key].append(val)
else:
d4[key] = [val]
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.
For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -
Scenario-1
root = {}
root.setdefault('A', [])
print(root)
Scenario-2
root = {}
root.get('A', [])
print(root)
In Scenario-1 output will be {'A': []} while in Scenario-2 {}
So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.
Now let come where this will be useful-
Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.
using setdefault()
def fn1(dic, key, lst):
dic.setdefault(key, []).extend(lst)
using get()
def fn2(dic, key, lst):
dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here
Now lets examine timings -
dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])
Took 288 ns
dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])
Took 128 s
So there is a very large timing difference between these two approaches.
You might want to look at defaultdict in the collections module. The following is equivalent to your examples.
from collections import defaultdict
data = [('a', 1), ('b', 1), ('b', 2)]
d = defaultdict(list)
for k, v in data:
d[k].append(v)
There's more here.
1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/
dict.setdefault typical usage
somedict.setdefault(somekey,[]).append(somevalue)
dict.get typical usage
theIndex[word] = 1 + theIndex.get(word,0)
2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
dict.setdefault() is equivalent to get or set & get. Or set if necessary then get. It's especially efficient if your dictionary key is expensive to compute or long to type.
The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.
3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html
get(key[, default])
Return the value for key if key is in the dictionary, else default. If
default is not given, it defaults to None, so that this method never
raises a KeyError.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
The logic of dict.get is:
if key in a_dict:
value = a_dict[key]
else:
value = default_value
Take an example:
In [72]: a_dict = {'mapping':['dict', 'OrderedDict'], 'array':['list', 'tuple']}
In [73]: a_dict.get('string', ['str', 'bytes'])
Out[73]: ['str', 'bytes']
In [74]: a_dict.get('array', ['str', 'byets'])
Out[74]: ['list', 'tuple']
The mechamism of setdefault is:
levels = ['master', 'manager', 'salesman', 'accountant', 'assistant']
#group them by the leading letter
group_by_leading_letter = {}
# the logic expressed by obvious if condition
for level in levels:
leading_letter = level[0]
if leading_letter not in group_by_leading_letter:
group_by_leading_letter[leading_letter] = [level]
else:
group_by_leading_letter[leading_letter].append(word)
In [80]: group_by_leading_letter
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
The setdefault dict method is for precisely this purpose. The preceding for loop can be rewritten as:
In [87]: for level in levels:
...: leading = level[0]
...: group_by_leading_letter.setdefault(leading,[]).append(level)
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
It's very simple, means that either a non-null list append an element or a null list append an element.
The defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:
from collections import defualtdict
group_by_leading_letter = defaultdict(list)
for level in levels:
group_by_leading_letter[level[0]].append(level)
There is no strict answer to this question. They both accomplish the same purpose. They can both be used to deal with missing values on keys. The only difference that I have found is that with setdefault(), the key that you invoke (if not previously in the dictionary) gets automatically inserted while it does not happen with get(). Here is an example:
Setdefault()
>>> myDict = {'A': 'GOD', 'B':'Is', 'C':'GOOD'} #(1)
>>> myDict.setdefault('C') #(2)
'GOOD'
>>> myDict.setdefault('C','GREAT') #(3)
'GOOD'
>>> myDict.setdefault('D','AWESOME') #(4)
'AWESOME'
>>> myDict #(5)
{'A': 'GOD', 'B': 'Is', 'C': 'GOOD', 'D': 'AWSOME'}
>>> myDict.setdefault('E')
>>>
Get()
>>> myDict = {'a': 1, 'b': 2, 'c': 3} #(1)
>>> myDict.get('a',0) #(2)
1
>>> myDict.get('d',0) #(3)
0
>>> myDict #(4)
{'a': 1, 'b': 2, 'c': 3}
Here is my conclusion: there is no specific answer to which one is best specifically when it comes to default values imputation. The only difference is that setdefault() automatically adds any new key with a default value in the dictionary while get() does not. For more information, please go here !
In [1]: person_dict = {}
In [2]: person_dict['liqi'] = 'LiQi'
In [3]: person_dict.setdefault('liqi', 'Liqi')
Out[3]: 'LiQi'
In [4]: person_dict.setdefault('Kim', 'kim')
Out[4]: 'kim'
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}
In [8]: person_dict.get('Dim', '')
Out[8]: ''
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}
I have 2 dictionaries, dict1 and dict2 which contain the same keys, but different values for the keys. What I want to do is for each dictionary, sort the values from largest to smallest, and then give each value a rank 1-N, 1 being the largest value. From here, I want to get the difference of the ranks for the values in each dictionary for the same key. For example:
dict1 = {a:0.6, b:0.3, c:0.9, d:1.2, e:0.2}
dict2 = {a:1.4, b:7.7, c:9.0, d:2.5, e:2.0}
# sorting by values would look like this:
dict1 = {d:1.2, c:0.9, a:0.6, b:0.3, e:0.2}
dict2 = {c:9.0, b:7.7, d:2.5, e:2.0, a:1.4}
#ranking the values would produce this:
dict1 = {d:1, c:2, a:3, b:4, e:5}
dict2 = {c:1, b:2, d:3, e:4, a:5}
#computing the difference between ranks would be something like this:
diffs = {}
for x in dict1.keys():
diffs[x] = (dict1[x] - dict2[x])
#diffs would look like this:
diffs[a] = -2
diffs[b] = 2
diffs[c] = 1
diffs[d] = -2
diffs[e] = 1
I know dictionaries are meant to be random and not sortable, but maybe there is a method to put the keys and values into a list? The main challenges I am facing are getting the keys and values sorted by value (largest to smallest) and then changing the value to its respective rank in the sorted list.
A simple solution for small dicts is
dict1 = {"a":0.6, "b":0.3, "c":0.9, "d":1.2, "e":0.2}
dict2 = {"a":1.4, "b":7.7, "c":9.0, "d":2.5, "e":2.0}
k1 = sorted(dict1, key=dict1.get)
k2 = sorted(dict2, key=dict2.get)
diffs = dict((k, k2.index(k) - k1.index(k)) for k in dict1)
A more efficient, less readable version for larger dicts:
ranks1 = dict(map(reversed, enumerate(sorted(dict1, key=dict1.get))))
ranks2 = dict(map(reversed, enumerate(sorted(dict2, key=dict2.get))))
diffs = dict((k, ranks2[k] - ranks1[k]) for k in dict1)
You may be interested in collections.OrderedDict
Here's a sample, my initial thougth is you were also looking for dictionaries with keys ordered by values, things that od1 and od2 are.
d1 = {"a":0.6, "b":0.3, "c":0.9, "d":1.2, "e":0.2}
d2 = {"a":1.4, "b":7.7, "c":9.0, "d":2.5, "e":2.0}
od1 = OrderedDict(sorted(d1.items(), key=lambda t: t[1]))
od2 = OrderedDict(sorted(d2.items(), key=lambda t: t[1]))
k1 = od1.keys()
k2 = od2.keys()
diff = dict((k, n - k2.index(k)) for n, k in enumerate(k1))
If you don't need them then Sven solution is probably faster.
edit: not that faster honestly... (sven.py is his second, more efficient version):
$ cat /tmp/mine.py | time python -m timeit
10000000 loops, best of 3: 0.0842 usec per loop
real 0m 3.69s
user 0m 3.38s
sys 0m 0.03s
$ cat /tmp/sven.py | time python -m timeit
10000000 loops, best of 3: 0.085 usec per loop
real 0m 3.86s
user 0m 3.42s
sys 0m 0.03s
If someone wants to post formatted bigger dicts I'll test them too.
What version of python are you using? If 2.7, use OrderedDict.
Per the Python 2.7 docs:
OrderedDict(sorted(d.items(), key=d.get))
If you're using Python 2.4-2.6 you can still use OrderedDict by installing it from pypi here or if you have setuptools, run
easy_install ordereddict
A dictionary is not the right data structure to solve this problem. You should convert to sorted lists as soon as possible and produce the dictionary only as the final result. The following sample solution uses iterators and generator expressions where possible, to avoid creating too many (potentially large) helper lists along the way:
def get_ranking(vals):
'''Return a list of pairs: (key, ranking), sorted by key.'''
ranking = sorted(((v, k) for k, v in vals.iteritems()), reverse=True)
return sorted((k, i) for (i, (_v, k)) in enumerate(ranking))
def ranking_diff(rank1, rank2):
return dict((k, v1 - v2) for (k, v1), (_, v2) in itertools.izip(rank1, rank2))
def get_diffs(dict1, dict2):
r1 = get_ranking(dict1)
r2 = get_ranking(dict2)
return ranking_diff(r1, r2)
print get_diffs(dict1, dict2)
# prints: {'a': -2, 'c': 1, 'b': 2, 'e': 1, 'd': -2}
Please note that this solution assumes that both dicts contain exactly the same keys.