How does the performance of dictionary key lookups compare in Python?

How does the performance of dictionary key lookups compare in Python? - python

How does:
dict = {}
if key not in dict:
dict[key] = foo
Compare to:
try:
dict[key]
except KeyError:
dict[key] = foo
ie, is the look up of a key in anyway faster than the linear search through dict.keys(), that I assume the first form will do?

Just to clarify one point: if key not in d doesn't do a linear search through d's keys. It uses the dict's hash table to quickly find the key.

You're looking for the setdefault method:
>>> r = {}
>>> r.setdefault('a', 'b')
'b'
>>> r
{'a': 'b'}
>>> r.setdefault('a', 'e')
'b'
>>> r
{'a': 'b'}

The answer depends on how often the key is already in the dict (BTW, has anyone mentioned to you how bad an idea it is to hide a builtin such as dict behind a variable?)
if key not in dct:
dct[key] = foo
If the key is in the dictionary this does one dictionary lookup. If the key is in the dictionary it looks up the dictionary twice.
try:
dct[key]
except KeyError:
dct[key] = foo
This may be slightly faster for the case where the key is in the dictionary, but throwing an exception has quite a big overhead, so it is almost always not the best option.
dct.setdefault(key, foo)
This one is slightly tricky: it always involves two dictionary lookups: the first one is to find the setdefault method in the dict class, the second is to look for key in the dct object. Also if foo is an expression it will be evaluated every time whereas the earlier options only evaluate it when they have to.
Also look at collections.defaultdict. That is the most appropriate solution for a large class of situations like this.

Try: my_dict.setdefault(key, default). It's slightly slower than the other options, though.
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
#!/usr/bin/env python
example_dict = dict(zip(range(10), range(10)))
def kn(key, d):
if key not in d:
d[key] = 'foo'
def te(key, d):
try:
d[key]
except KeyError:
d[key] = 'foo'
def sd(key, d):
d.setdefault(key, 'foo')
if __name__ == '__main__':
from timeit import Timer
t = Timer("kn(2, example_dict)", "from __main__ import kn, example_dict")
print t.timeit()
t = Timer("te(2, example_dict)", "from __main__ import te, example_dict")
print t.timeit()
t = Timer("sd(2, example_dict)", "from __main__ import sd, example_dict")
print t.timeit()
# kn: 0.249855041504
# te: 0.244259119034
# sd: 0.375113964081

my_dict.get(key, foo) returns foo if key isn't in my_dict. The default value is None, so my_dict.get(key) will return None if key isn't in my_dict. The first of your options is better if you want to just add key to your dictionary. Don't worry about speed here. If you find that populating your dictionary is a hot spot in your program, then think about it. But it isn't. So don't.

Related

Simple way to remove empty sets from dict

There's a common problem where I need to keep track of a bunch of collections in a dictionary. Let's say I want to keep track of which items I borrowed from my friends. The defaultdict class is quite useful to do this:
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
# defaultdict(<class 'set'>, {'Peter': {'salt'}, 'Eric': {'jacket', 'car'}})
This allows me to add items to the respective sets without worrying if any key is already in the set. Now if I return the salt to Peter. This means I owe him nothing and he can be removed from the dictionary. Doing this is slightly more cumbersome.
d['Peter'].remove('salt')
if not d['Peter']:
del(d['Peter'])
I know I could put this in some function, but for readability I would like a class that removes the key automatically if the corresponding set is empty. Is there some way to do this?
Edit
Okay I realize a pretty major problem with this idea when trying to solve it using inheritance and changing the index function. This is that that when calling d[index] the value is obviously returned already before calling .remove(something), which makes it impossible for the dictionary to know that it has been emptied. I'm guessing there's not really a way around using something different.

The problem with using a defaultdict to do what you want is that even accessing a key sets that key using the factory function. Consider:
from collections import defaultdict
d = defaultdict(set)
if d["Peter"]:
print("I owe something to Peter")
print(d)
# defaultdict(set, {'Peter': set()})
Also, the problem with creating a sub-class, as you've realized, the __getitem__() method is called before the set is ever emptied, so you'd have to call another function that checks if the set is empty and remove it.
A better idea might be to just not include keys with empty sets when you're creating the string representation.
class NewDefaultDict(defaultdict):
def __repr__(self):
return (f"NewDefaultDict({repr(self.default_factory)}, {{" +
", ".join(f"{repr(k)}: {repr(v)}" for k, v in self.items() if v) +
"})")
nd = NewDefaultDict(set)
nd["Peter"].add("salt")
nd["Paul"].add("pepper")
nd["Paul"].remove("pepper")
print(nd)
# NewDefaultDict(<class 'set'>, {'Peter': {'salt'}})
You would also need to redefine __contains__() to check if the value is empty, so that e.g. "Paul" in nd returns False:
def __contains__(self, key):
return defaultdict.__contains__(self, key) and self[key]
To make it compatible with for ... in nd constructs and dict-unpacking, you can redefine __iter__():
def __iter__(self):
for key in defaultdict.__iter__(self):
if self[key]: yield key
Then,
for k in nd:
print(k)
gives:
Peter

A dictionary comprehension might be useful.
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
d['Peter'].remove('salt')
d2 = {k: v for k, v in d.items() if len(v) > 0}
The d2 dictionary is now:
{'Eric': {'car', 'jacket'}}
Alternatively, using the fact that an empty set is considered false in Python.
d2 = {k: v for k, v in d.items() if v}
Defining a class to implement this logic, similar to the other answer, we can simply ignore keys/values where the value meets a criteria. A function is passed using the ignore parameter to define that criteria.
from collections import defaultdict
class default_ignore_dict(defaultdict):
def __init__(self, factory, ignore, *args, **kwargs):
defaultdict.__init__(self, factory, *args, **kwargs)
self.ignore = ignore
def __contains__(self, key):
return defaultdict.__contains__(self, key) and not self.ignore(self[key])
def items(self):
return ((k, v) for k, v in defaultdict.items(self) if not self.ignore(v))
def keys(self):
return (k for k, _ in self.items())
def values(self):
return (v for _, v in self.items())
Testing this:
>>> d = default_ignore_dict(set, lambda s: not s)
>>> d['Peter'].add('salt')
>>> d['Peter'].remove('salt')
>>> d['Eric'].add('car')
>>> d['Eric'].add('jacket')
>>>
>>> 'Peter' in d
False
>>> list(d.items())
[('Eric', {'car', 'jacket'})]
>>>

python dictionary replace dict[] operator with dict.get() behavior

my_dict = {'a': 1}
I wish for my_dict['a'] to behave the same as my_dict.get('a')
That way, if I do my_dict['b'], I will not raise an error but get the default None value, the same way you would get it from my_dict.get('b')
In the case of my_dict = {'a': {'b': 2}} I could do my_dict['a']['b'] and it would act as my_dict.get('a').get('b')
When doing my_dict['b'] = 2 it will act same as my_dict.update({'b': 2})
Is it possible to do so that I will not have to inherit from dict?

You can use a collections.defaultdict() object to add a new value to the dictionary each time you try to access a non-existing key:
>>> from collections import defaultdict
>>> d = defaultdict(lambda: None)
>>> d['a'] is None
True
>>> d
defaultdict(<function <lambda> at 0x10f463e18>, {'a': None})
If you don't want the key added, create a subclass of dict that implements the __missing__ method:
class DefaultNoneDict(dict):
def __missing__(self, key):
return None
This explicitly won't add new keys:
>>> d = DefaultNoneDict()
>>> d['a'] is None
True
>>> d
{}
If you wanted to chain .get() calls, you'll have to return an empty dictionary instead, otherwise dict.get(keyA).get(keyB) will fail with an attribute error (the first None returned won't have a .get() method).
Generally speaking, it is better to stick to the default type and be explicit. There is nothing wrong with:
value = some_d.get(outer, {}).get(inner)
Using a defaultdict or a dict subclass with custom __missing__ hook have a downside: they will always produce a default when the key is missing, even when you accidentally produced incorrect keys somewhere else in your code. I often opt for an explicit dict.get() or dict.setdefault() codepath over defaultdict precisely because I want a non-existing key to produce an error in other parts of my project.

Merging values from 2 dictionaries (Python)

(I'm new to Python!)
Trying to figure out this homework question:
The function will takes as input two dictionaries, each mapping strings to integers. The function will return a dictionary that maps strings from the two input dictionaries to the sum of the integers in the two input dictionaries.
my idea was this:
def add(dicA,dicB):
dicA = {}
dicB = {}
newdictionary = dicA.update(dicB)
however, that brings back None.
In the professor's example:
print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
the output is:
{'alice':15, 'Bob':103, 'Carlie':2}
My issue really is that I don't understand how to add up the values from each dictionaries. I know that the '+' is not supported with dictionaries. I'm not looking for anyone to do my homework for me, but any suggestions would be very much appreciated!

From the documentation:
update([other])
Update the dictionary with the key/value pairs from other, overwriting existing keys. Return None.
You don't want to replace key/value pairs, you want to add the values for similar keys. Go through each dictionary and add each value to the relevant key:
def add(dicA,dicB):
result = {}
for d in dicA, dicB:
for key in d:
result[key] = result.get(key, 0) + d[key]
return result
result.get(key, 0) will retrieve the value of an existing key or produce 0 if key is not yet present.

First of all, a.update(b) updates a in place, and returns None.
Secondly, a.update(b) wouldn't help you to sum the keys; it would just produce a dictionary with the resulting dictionary having all the key, value pairs from b:
>>> a = {'alice':10, 'Bob':3, 'Carlie':1}
>>> b = {'alice':5, 'Bob':100, 'Carlie':1}
>>> a.update(b)
>>> a
{'alice': 5, 'Carlie': 1, 'Bob': 100}
It'd be easiest to use collections.Counter to achieve the desired result. As a plus, it does support addition with +:
from collections import Counter
def add(dicA, dicB):
return dict(Counter(dicA) + Counter(dicB))
This produces the intended result:
>>> print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
{'alice': 15, 'Carlie': 2, 'Bob': 103}

The following is not meant to be the most elegant solution, but to get a feeling on how to deal with dicts.
dictA = {'Alice':10, 'Bob':3, 'Carlie':1}
dictB = {'Alice':5, 'Bob':100, 'Carlie':1}
# how to iterate through a dictionary
for k,v in dictA.iteritems():
print k,v
# make a new dict to keep tally
newdict={}
for d in [dictA,dictB]: # go through a list that has your dictionaries
print d
for k,v in d.iteritems(): # go through each dictionary item
if not k in newdict.keys():
newdict[k]=v
else:
newdict[k]+=v
print newdict
Output:
Bob 3
Alice 10
Carlie 1
{'Bob': 3, 'Alice': 10, 'Carlie': 1}
{'Bob': 100, 'Alice': 5, 'Carlie': 1}
{'Bob': 103, 'Alice': 15, 'Carlie': 2}

def add(dicA,dicB):
You define a function that takes two arguments, dicA and dicB.
dicA = {}
dicB = {}
Then you assign an empty dictionary to both those variables, overwriting the dictionaries you passed to the function.
newdictionary = dicA.update(dicB)
Then you update dicA with the values from dicB, and assign the result to newdictionary. dict.update always returns None though.
And finally, you don’t return anything from the function, so it does not give you any results.
In order to combine those dictionaries, you actually need to use the values that were passed to it. Since dict.update mutates the dictionary it is called on, this would change one of those passed dictionaries, which we generally do not want to do. So instead, we use an empty dictionary, and then copy the values from both dictionaries into it:
def add (dicA, dicB):
newDictionary = {}
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary
If you want the values to sum up automatically, then use a Counter instead of a normal dictionary:
from collections import Counter
def add (dicA, dicB):
newDictionary = Counter()
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary

I suspect your professor wants to achieve this using more simple methods. But you can achieve this very easily using collections.Counter.
from collections import Counter
def add(a, b):
return dict(Counter(a) + Counter(b))
Your professor probably wants something like this:
def add(a, b):
new_dict = copy of a
for each key/value pair in b
if key in new_dict
add value to value already present in new_dict
else
insert key/value pair into new_dict
return new_dict

You can try this:
def add(dict1, dict2):
return dict([(key,dict1[key]+dict2[key]) for key in dict1.keys()])

I personally like using a dictionary's get method for this kind of merge:
def add(a, b):
result = {}
for dictionary in (a, b):
for key, value in dictionary.items():
result[key] = result.get(key, 0) + value
return result

How to change one item in dictionary

I need a function to change one item in composite dictionary.
I've tried something like..
def SetItem(keys, value):
item = self.dict
for key in keys:
item = item[key]
item = value
and
SetItem(['key1', 'key2'], 86)
It should be equivalent to self.dict['key1']['key2'] = 86, but this function has no effect.

Almost. You actually want to do something like:
def set_keys(d, keys, value):
item = d
for key in keys[:-1]:
item = item[key]
item[keys[-1]] = value
Or recursively like this:
def set_key(d, keys, value):
if len(keys) == 1:
d[keys[0]] = value
else:
set_key(d[keys[0]], keys[1:], value)
Marcin's right though. You would really want to incorporate something more rigorous, with some error handling for missing keys/missing dicts.

setItem = lambda self,names,value: map((lambda name: setattr(self,name,value)),names)

You don't have a self parameter
Just use the line of working code you have.
If you insist, here's a way:
def setitem(self, keys, value):
reduce(dict.get, # = lambda dictionary, key: dictionary[key]
keys[:-1], self.dictionary)[keys[-1]] = value
Obviously, this will break if the list of keys hits a non-dict value. You'll want to handle that. In fact, an explicit loop would probably be better for that reason, but you get the idea.

An idea involving recursion and EAFP, both of which I always like:
def set_item(d, keys, value):
key = keys.pop(0)
try:
set_item(d[key], keys, value)
# IndexError happens when the pop fails (empty list), KeyError happens when it's not a dict.
# Assume both mean we should finish recursing
except (IndexError, KeyError):
d[key] = value
Example:
>>> d = {'a': {'aa':1, 'ab':2}, 'b':{'ba':1, 'bb':2}}
>>> set_item(d, ['a', 'ab'], 50)
>>> print d
{'a': {'aa': 1, 'ab': 50}, 'b': {'ba': 1, 'bb': 2}}
Edit: As Marcin points out below, this will not work for arbitrarily nested dicts since Python has a recursion limit. It's also not for highly performance-sensitive situations (recursion in Python generally isn't). Nonetheless, outside of these two situations I find this to be somewhat more explicit than something involving reduce or lambda.

Python defaultdict and lambda

In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.

I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.

You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}

defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.

All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE

y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does the performance of dictionary key lookups compare in Python? - python

How does: dict = {} if key not in dict: dict[key] = foo Compare to: try: dict[key] except KeyError: dict[key] = foo ie, is the look up of a key in anyway faster than the linear search through dict.keys(), that I assume the first form will do?

Just to clarify one point: if key not in d doesn't do a linear search through d's keys. It uses the dict's hash table to quickly find the key.

You're looking for the setdefault method: >>> r = {} >>> r.setdefault('a', 'b') 'b' >>> r {'a': 'b'} >>> r.setdefault('a', 'e') 'b' >>> r {'a': 'b'}

Related

Simple way to remove empty sets from dict

python dictionary replace dict[] operator with dict.get() behavior

Merging values from 2 dictionaries (Python)

How to change one item in dictionary

Python defaultdict and lambda

Categories

Resources