Python defaultdict(default) vs dict.get(key, default)

Python defaultdict(default) vs dict.get(key, default) - python

Suppose I want to create a dict (or dict-like object) that returns a default value if I attempt to access a key that's not in the dict.
I can do this either by using a defaultdict:
from collections import defaultdict
foo = defaultdict(lambda: "bar")
print(foo["hello"]) # "bar"
or by using a regular dict and always using dict.get(key, default) to retrieve values:
foo = dict()
print(foo.get("hello", "bar")) # "bar"
print(foo["hello"]) # KeyError (as expected)
Other than the obvious ergonomic overhead of having to remember to use .get() with a default value instead of the expected bracket syntax, what's the difference between these 2 approaches?

Asides from the ergonomics of having .get everwhere, one important difference is if you lookup a missing key in defaultdict it will insert a new element into itself rather than just returning the default. The most important implications of this are:
Later iterations will retrieve all keys looked up in a defaultdict
As more ends up stored in the dictionary, more memory is typically used
Mutation of the default will store that mutation in a defaultdict, with .get the default is lost unless stored explicty
from collections import defaultdict
default_foo = defaultdict(list)
dict_foo = dict()
for i in range(1024):
default_foo[i]
dict_foo.get(i, [])
print(len(default_foo.items())) # 1024
print(len(dict_foo.items())) # 0
# Defaults in defaultdict's can be mutated where as with .get mutations are lost
default_foo[1025].append("123")
dict_foo.get(1025, []).append("123")
print(default_foo[1025]) # ["123"]
print(dict_foo.get(1025, [])) # []

The difference here really comes down to how you want your program to handle a KeyError.
foo = dict()
def do_stuff_with_foo():
print(foo["hello"])
# Do something here
if __name__ == "__main__":
try:
foo["hello"] # The key exists and has a value
except KeyError:
# The first code snippet does this
foo["hello"] = "bar"
do_stuff_with_foo()
# The second code snippet does this
exit(-1)
It's a matter of do we want to stop the program entirely? Do we want the user to fill in a value for foo["hello"] or do we want to use a default value?
The first approach is a more compact way to do foo.get("hello", "bar")
But the kicker is the matter of is this what we really want to happen?

Related

What's the point of using [object instance].self?

I was checking the code of the toolz library's groupby function in Python and I found this:
def groupby(key, seq):
""" Group a collection by a key function
"""
if not callable(key):
key = getter(key)
d = collections.defaultdict(lambda: [].append)
for item in seq:
d[key(item)](item)
rv = {}
for k, v in d.items():
rv[k] = v.__self__
return rv
Is there any reason to use rv[k] = v.__self__ instead of rv[k] = v?

This is a somewhat confusing trick to save a small amount of time:
We are creating a defaultdict with a factory function that returns a bound append method of a new list instance with [].append. Then we can just do d[key(item)](item) instead of d[key(item)].append(item) like we would have if we create a defaultdict that contains lists. If we don't lookup append everytime, we gain a small amount of time.
But now the dict contains bound methods instead of the lists, so we have to get the original list instance back via __self__.
__self__ is an attribute described for instance methods that returns the original instance. You can verify that with this for example:
>>> a = []
>>> a.append.__self__ is a
True

This is a somewhat convoluted, but possibly more efficient approach to creating and using a defaultdict of lists.
First, remember that the default item is lambda: [].append. This means create a new list, and store a bound append method in the dictionary. This saves you a method bind on every further append to the same key, and the garbage collect that follows. For example, the following more standard approach is less efficient:
d = collections.defaultdict(list)
for item in seq:
d[key(item)].append(item)
The problem then becomes how to get the original lists back out of the dictionary, since the reference is not stored explicitly. Luckily, bound methods have a __self__ attribute which does just that. Here, [].append.__self__ is a reference to the original [].
As a side note, the last loop could be a comprehension:
return {k: v.__self__ for k, v in d.items()}

Interpret input as string and class member name

In a few __init__ of different classes I have to use several times the construct
try:
self.member_name = kwargs['member_name']
except:
self.member_name = default_value
or as suggested by Moses Koledoye
self.member_name = kwargs.get('member_name', default_value)
I would like to have a method that inputs, say, the string 'member_name' and default_value and that the corresponding initialization gets produced. For example, if one inputs 'pi_approx' and 3.14 the resulting code is
self.pi_approx = kwargs.get('pi_approx', 3.14)
In this way I can replace a long sequence of these initializations by a loop along a list of all the required members and their default values.
This technique emulate a switch statement is not the same thing but kind of has a similar flavor.
I am not sure how to approach what I want to do.
Assuming that initializer(m_name, default_val) is the construction that gets replaced by
self.m_name = kwargs.get('m_name', default_val)
I would then used it by having a lists member_names = [m_name1, m_name2, m_name3] and default_values = [def_val1, def_val2, def_val3] and calling
for m_name, d_val in zip(member_names, default_values):
initializer(m_name, d_val)
This would replace long list of try's and also make the code a bit more readable.

If your try/except was meant to handle KeyError, then you can use the get method of the kwargs dict which allows you to supply a default value:
self.member_name = kwargs.get('member_name', default)
Which can be extended to your list of attribute names using setattr:
for m_name, d_val in zip(member_names, default_values):
setattr(self, m_name, kwargs.get(m_name, d_val))

Python 3 changing value of dictionary key in for loop not working

I have python 3 code that is not working as expected:
def addFunc(x,y):
print (x+y)
def subABC(x,y,z):
print (x-y-z)
def doublePower(base,exp):
print(2*base**exp)
def RootFunc(inputDict):
for k,v in inputDict.items():
if v[0]==1:
d[k] = addFunc(*v[1:])
elif v[0] ==2:
d[k] = subABC(*v[1:])
elif v[0]==3:
d[k] = doublePower(*v[1:])
d={"s1_7":[1,5,2],"d1_6":[2,12,3,3],"e1_3200":[3,40,2],"s2_13":[1,6,7],"d2_30":[2,42,2,10]}
RootFunc(d)
#test to make sure key var assignment works
print(d)
I get:
{'d2_30': None, 's2_13': None, 's1_7': None, 'e1_3200': None, 'd1_6': None}
I expected:
{'d2_30': 30, 's2_13': 13, 's1_7': 7, 'e1_3200': 3200, 'd1_6': 6}
What's wrong?
Semi related: I know dictionaries are unordered but is there any reason why python picked this order? Does it run the keys through a randomizer?

print does not return a value. It returns None, so every time you call your functions, they're printing to standard output and returning None. Try changing all print statements to return like so:
def addFunc(x,y):
return x+y
This will give the value x+y back to whatever called the function.
Another problem with your code (unless you meant to do this) is that you define a dictionary d and then when you define your function, you are working on this dictionary d and not the dictionary that is 'input':
def RootFunc(inputDict):
for k,v in inputDict.items():
if v[0]==1:
d[k] = addFunc(*v[1:])
Are you planning to always change d and not the dictionary that you are iterating over, inputDict?
There may be other issues as well (accepting a variable number of arguments within your functions, for instance), but it's good to address one problem at a time.
Additional Notes on Functions:
Here's some sort-of pseudocode that attempts to convey how functions are often used:
def sample_function(some_data):
modified_data = []
for element in some_data:
do some processing
add processed crap to modified_data
return modified_data
Functions are considered 'black box', which means you structure them so that you can dump some data into them and they always do the same stuff and you can call them over and over again. They will either return values or yield values or update some value or attribute or something (the latter are called 'side effects'). For the moment, just pay attention to the return statement.
Another interesting thing is that functions have 'scope' which means that when I just defined it with a fake-name for the argument, I don't actually have to have a variable called "some_data". I can pass whatever I want to the function, but inside the function I can refer to the fake name and create other variables that really only matter within the context of the function.
Now, if we run my function above, it will go ahead and process the data:
sample_function(my_data_set)
But this is often kind of pointless because the function is supposed to return something and I didn't do anything with what it returned. What I should do is assign the value of the function and its arguments to some container so I can keep the processed information.
my_modified_data = sample_function(my_data_set)
This is a really common way to use functions and you'll probably see it again.
One Simple Way to Approach Your Problem:
Taking all this into consideration, here is one way to solve your problem that comes from a really common programming paradigm:
def RootFunc(inputDict):
temp_dict = {}
for k,v in inputDict.items():
if v[0]==1:
temp_dict[k] = addFunc(*v[1:])
elif v[0] ==2:
temp_dict[k] = subABC(*v[1:])
elif v[0]==3:
temp_dict[k] = doublePower(*v[1:])
return temp_dict
inputDict={"s1_7":[1,5,2],"d1_6":[2,12,3,3],"e1_3200":[3,40,2],"s2_13":[1,6,7],"d2_30"[2,42,2,10]}
final_dict = RootFunc(inputDict)

As erewok stated, you are using "print" and not "return" which may be the source of your error. And as far as the ordering is concerned, you already know that dictionaries are unordered, according to python doc at least, the ordering is not random, but rather implemented as hash tables.
Excerpt from the python doc: [...]A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary. [...]
Now key here is that the order of the element is not really random. I have often noticed that the order stays the same no matter how I construct a dictionary on some values... using lambda or just creating it outright, the order has always remained the same, so it can't be random, but it's definitely arbitrary.

How to make a dictionary that returns key for keys missing from the dictionary instead of raising KeyError?

I want to create a python dictionary that returns me the key value for the keys are missing from the dictionary.
Usage example:
dic = smart_dict()
dic['a'] = 'one a'
print(dic['a'])
# >>> one a
print(dic['b'])
# >>> b

dicts have a __missing__ hook for this:
class smart_dict(dict):
def __missing__(self, key):
return key
Could simplify it as (since self is never used):
class smart_dict(dict):
#staticmethod
def __missing__(key):
return key

Why don't you just use
dic.get('b', 'b')
Sure, you can subclass dict as others point out, but I find it handy to remind myself every once in a while that get can have a default value!
If you want to have a go at the defaultdict, try this:
dic = defaultdict()
dic.__missing__ = lambda key: key
dic['b'] # should set dic['b'] to 'b' and return 'b'
except... well: AttributeError: ^collections.defaultdict^object attribute '__missing__' is read-only, so you will have to subclass:
from collections import defaultdict
class KeyDict(defaultdict):
def __missing__(self, key):
return key
d = KeyDict()
print d['b'] #prints 'b'
print d.keys() #prints []

Congratulations. You too have discovered the uselessness of the
standard collections.defaultdict type. If that execrable midden heap of code smell
offends your delicate sensibilities as much as it did mine, this is your lucky
StackOverflow day.
Thanks to the forbidden wonder of the 3-parameter
variant of the type()
builtin, crafting a non-useless default dictionary type is both fun and profitable.
What's Wrong with dict.__missing__()?
Absolutely nothing, assuming you like excess boilerplate and the shocking silliness of collections.defaultdict – which should behave as expected but really doesn't. To be fair, Jochen
Ritzel's accepted
solution of subclassing dict and
implementing the optional __missing__()
method is a fantastic
workaround for small-scale use cases only requiring a single default dictionary.
But boilerplate of this sort scales poorly. If you find yourself instantiating
multiple default dictionaries, each with their own slightly different logic for
generating missing key-value pairs, an industrial-strength alternative
automating boilerplate is warranted.
Or at least nice. Because why not fix what's broken?
Introducing DefaultDict
In less than ten lines of pure Python (excluding docstrings, comments, and
whitespace), we now define a DefaultDict type initialized with a user-defined
callable generating default values for missing keys. Whereas the callable passed
to the standard collections.defaultdict type uselessly accepts no
parameters, the callable passed to our DefaultDict type usefully accepts the
following two parameters:
The current instance of this dictionary.
The current missing key to generate a default value for.
Given this type, solving sorin's
question reduces to a single line of Python:
>>> dic = DefaultDict(lambda self, missing_key: missing_key)
>>> dic['a'] = 'one a'
>>> print(dic['a'])
one a
>>> print(dic['b'])
b
Sanity. At last.
Code or It Didn't Happen
def DefaultDict(keygen):
'''
Sane **default dictionary** (i.e., dictionary implicitly mapping a missing
key to the value returned by a caller-defined callable passed both this
dictionary and that key).
The standard :class:`collections.defaultdict` class is sadly insane,
requiring the caller-defined callable accept *no* arguments. This
non-standard alternative requires this callable accept two arguments:
#. The current instance of this dictionary.
#. The current missing key to generate a default value for.
Parameters
----------
keygen : CallableTypes
Callable (e.g., function, lambda, method) called to generate the default
value for a "missing" (i.e., undefined) key on the first attempt to
access that key, passed first this dictionary and then this key and
returning this value. This callable should have a signature resembling:
``def keygen(self: DefaultDict, missing_key: object) -> object``.
Equivalently, this callable should have the exact same signature as that
of the optional :meth:`dict.__missing__` method.
Returns
----------
MappingType
Empty default dictionary creating missing keys via this callable.
'''
# Global variable modified below.
global _DEFAULT_DICT_ID
# Unique classname suffixed by this identifier.
default_dict_class_name = 'DefaultDict' + str(_DEFAULT_DICT_ID)
# Increment this identifier to preserve uniqueness.
_DEFAULT_DICT_ID += 1
# Dynamically generated default dictionary class specific to this callable.
default_dict_class = type(
default_dict_class_name, (dict,), {'__missing__': keygen,})
# Instantiate and return the first and only instance of this class.
return default_dict_class()
_DEFAULT_DICT_ID = 0
'''
Unique arbitrary identifier with which to uniquify the classname of the next
:func:`DefaultDict`-derived type.
'''
The key ...get it, key? to this arcane wizardry is the call to
the 3-parameter variant
of the type() builtin:
type(default_dict_class_name, (dict,), {'__missing__': keygen,})
This single line dynamically generates a new dict subclass aliasing the
optional __missing__ method to the caller-defined callable. Note the distinct
lack of boilerplate, reducing DefaultDict usage to a single line of Python.
Automation for the egregious win.

The first respondent mentioned defaultdict,
but you can define __missing__ for any subclass of dict:
>>> class Dict(dict):
def __missing__(self, key):
return key
>>> d = Dict(a=1, b=2)
>>> d['a']
1
>>> d['z']
'z'
Also, I like the second respondent's approach:
>>> d = dict(a=1, b=2)
>>> d.get('z', 'z')
'z'

I agree this should be easy to do, and also easy to set up with different defaults or functions that transform a missing value somehow.
Inspired by Cecil Curry's answer, I asked myself: why not have the default-generator (either a constant or a callable) as a member of the class, instead of generating different classes all the time? Let me demonstrate:
# default behaviour: return missing keys unchanged
dic = FlexDict()
dic['a'] = 'one a'
print(dic['a'])
# 'one a'
print(dic['b'])
# 'b'
# regardless of default: easy initialisation with existing dictionary
existing_dic = {'a' : 'one a'}
dic = FlexDict(existing_dic)
print(dic['a'])
# 'one a'
print(dic['b'])
# 'b'
# using constant as default for missing values
dic = FlexDict(existing_dic, default = 10)
print(dic['a'])
# 'one a'
print(dic['b'])
# 10
# use callable as default for missing values
dic = FlexDict(existing_dic, default = lambda missing_key: missing_key * 2)
print(dic['a'])
# 'one a'
print(dic['b'])
# 'bb'
print(dic[2])
# 4
How does it work? Not so difficult:
class FlexDict(dict):
'''Subclass of dictionary which returns a default for missing keys.
This default can either be a constant, or a callable accepting the missing key.
If "default" is not given (or None), each missing key will be returned unchanged.'''
def __init__(self, content = None, default = None):
if content is None:
super().__init__()
else:
super().__init__(content)
if default is None:
default = lambda missing_key: missing_key
self.default = default # sets self._default
#property
def default(self):
return self._default
#default.setter
def default(self, val):
if callable(val):
self._default = val
else: # constant value
self._default = lambda missing_key: val
def __missing__(self, x):
return self.default(x)
Of course, one can debate whether one wants to allow changing the default-function after initialisation, but that just means removing #default.setter and absorbing its logic into __init__.
Enabling introspection into the current (constant) default value could be added with two extra lines.

Subclass dict's __getitem__ method. For example, How to properly subclass dict and override __getitem__ & __setitem__

Return a default value if a dictionary key is not available

I need a way to get a dictionary value if its key exists, or simply return None, if it does not.
However, Python raises a KeyError exception if you search for a key that does not exist. I know that I can check for the key, but I am looking for something more explicit. Is there a way to just return None if the key does not exist?

You can use dict.get()
value = d.get(key)
which will return None if key is not in d. You can also provide a different default value that will be returned instead of None:
value = d.get(key, "empty")

Wonder no more. It's built into the language.
>>> help(dict)
Help on class dict in module builtins:
class dict(object)
| dict() -> new empty dictionary
| dict(mapping) -> new dictionary initialized from a mapping object's
| (key, value) pairs
...
|
| get(...)
| D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
|
...

Use dict.get
Returns the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.

You should use the get() method from the dict class
d = {}
r = d.get('missing_key', None)
This will result in r == None. If the key isn't found in the dictionary, the get function returns the second argument.

If you want a more transparent solution, you can subclass dict to get this behavior:
class NoneDict(dict):
def __getitem__(self, key):
return dict.get(self, key)
>>> foo = NoneDict([(1,"asdf"), (2,"qwerty")])
>>> foo[1]
'asdf'
>>> foo[2]
'qwerty'
>>> foo[3] is None
True

I usually use a defaultdict for situations like this. You supply a factory method that takes no arguments and creates a value when it sees a new key. It's more useful when you want to return something like an empty list on new keys (see the examples).
from collections import defaultdict
d = defaultdict(lambda: None)
print d['new_key'] # prints 'None'

A one line solution would be:
item['key'] if 'key' in item else None
This is useful when trying to add dictionary values to a new list and want to provide a default:
eg.
row = [item['key'] if 'key' in item else 'default_value']

As others have said above, you can use get().
But to check for a key, you can also do:
d = {}
if 'keyname' in d:
# d['keyname'] exists
pass
else:
# d['keyname'] does not exist
pass

You could use a dict object's get() method, as others have already suggested. Alternatively, depending on exactly what you're doing, you might be able use a try/except suite like this:
try:
<to do something with d[key]>
except KeyError:
<deal with it not being there>
Which is considered to be a very "Pythonic" approach to handling the case.

For those using the dict.get technique for nested dictionaries, instead of explicitly checking for every level of the dictionary, or extending the dict class, you can set the default return value to an empty dictionary except for the out-most level. Here's an example:
my_dict = {'level_1': {
'level_2': {
'level_3': 'more_data'
}
}
}
result = my_dict.get('level_1', {}).get('level_2', {}).get('level_3')
# result -> 'more_data'
none_result = my_dict.get('level_1', {}).get('what_level', {}).get('level_3')
# none_result -> None
WARNING: Please note that this technique only works if the expected key's value is a dictionary. If the key what_level did exist in the dictionary but its value was a string or integer etc., then it would've raised an AttributeError.

I was thrown aback by what was possible in python2 vs python3. I will answer it based on what I ended up doing for python3. My objective was simple: check if a json response in dictionary format gave an error or not. My dictionary is called "token" and my key that I am looking for is "error". I am looking for key "error" and if it was not there setting it to value of None, then checking is the value is None, if so proceed with my code. An else statement would handle if I do have the key "error".
if ((token.get('error', None)) is None):
do something

You can use try-except block
try:
value = dict['keyname']
except IndexError:
value = None

d1={"One":1,"Two":2,"Three":3}
d1.get("Four")
If you will run this code there will be no 'Keyerror' which means you can use 'dict.get()' to avoid error and execute your code

If you have a more complex requirement that equates to a cache, this class might come in handy:
class Cache(dict):
""" Provide a dictionary based cache
Pass a function to the constructor that accepts a key and returns
a value. This function will be called exactly once for any key
required of the cache.
"""
def __init__(self, fn):
super()
self._fn = fn
def __getitem__(self, key):
try:
return super().__getitem__(key)
except KeyError:
value = self[key] = self._fn(key)
return value
The constructor takes a function that is called with the key and should return the value for the dictionary. This value is then stored and retrieved from the dictionary next time. Use it like this...
def get_from_database(name):
# Do expensive thing to retrieve the value from somewhere
return value
answer = Cache(get_from_database)
x = answer(42) # Gets the value from the database
x = answer(42) # Gets the value directly from the dictionary

If you can do it with False, then, there's also the hasattr built-in funtion:
e=dict()
hasattr(e, 'message'):
>>> False

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python defaultdict(default) vs dict.get(key, default) - python

Related

What's the point of using [object instance].self?

Interpret input as string and class member name

Python 3 changing value of dictionary key in for loop not working

How to make a dictionary that returns key for keys missing from the dictionary instead of raising KeyError?

Return a default value if a dictionary key is not available

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python defaultdict(default) vs dict.get(key, default) - python

Related

What's the point of using [object instance].__self__?

Interpret input as string and class member name

Python 3 changing value of dictionary key in for loop not working

How to make a dictionary that returns key for keys missing from the dictionary instead of raising KeyError?

Return a default value if a dictionary key is not available

Categories

Resources

What's the point of using [object instance].self?