The common memoization recipes (like this or these) use dict to store the cache, and therefore require that the function arguments be hashable.
I want the function to work with as many different argument types as possible, and certainly including dict, set, list. What is the best way to achieve that?
One approach I was considering is to wrap all non-hashable arguments into their hashable subclasses (i.e., define a subclass of dict that defines its own __hash__ function).
Alternatively, I was thinking to create a subclass of dict that relies on a different hash function than hash (it's not too hard to define a global my_hash function that recursively works on containers), and use this subclass to store the cache. But I don't think there's an easy way to achieve do that.
EDIT:
I think I will try the solution that I suggested for general hashing of python containers. With that, I should be able to wrap the tuple of (*args, **kwargs) into the automatically hashable class, and use the regular memoization.
Method 1 (splitting keys and values)
This is based on the idea that dictionaries are just zipped keys and values.
With this idea, we can make something like a dictionary to store keys (function arguments) and values (returned values from the function).
Not sure how slow it will be since it uses list.index. Maybe zipping would be faster?
class memoize:
def __init__(self, func):
self.func = func
self.known_keys = []
self.known_values = []
def __call__(self, *args, **kwargs):
key = (args, kwargs)
if key in self.known_keys:
i = self.known_keys.index(key)
return self.known_values[i]
else:
value = self.func(*args, **kwargs)
self.known_keys.append(key)
self.known_values.append(value)
return value
It works!:
>>> #memoize
... def whatever(unhashable):
... print(*unhashable) # Just to know when called for this example
... return 12345
...
>>> whatever([1, 2, 3, 4])
1 2 3 4
12345
>>> whatever([1, 2, 3, 4])
12345
>>> whatever({"a": "b", "c": "d"})
a c
12345
>>> whatever({"a": "b", "c": "d"})
12345
Method 2 (fake hashes)
class memoize:
def __init__(self, func):
self.func = func
self.known = {}
def __call__(self, *args, **kwargs):
key = give_fake_hash((args, kwargs))
try:
return self.known[key]
except KeyError:
value = self.func(*args, **kwargs)
self.known[key] = value
return value
def give_fake_hash(obj):
cls = type(obj)
name = "Hashable" + cls.__name__
def fake_hash(self):
return hash(repr(self))
t = type(name, (cls, ), {"__hash__": fake_hash})
return t(obj)
Method 2.5 (working for dicts)
import operator
class memoize:
def __init__(self, func):
self.func = func
self.known = {}
def __call__(self, *args, **kwargs):
key = give_fake_hash((args, kwargs))
try:
return self.known[key]
except KeyError:
value = self.func(*args, **kwargs)
self.known[key] = value
return value
def fake_hash(self):
return hash(repr(self))
class HashableTuple(tuple):
__hash__ = fake_hash
class RereprDict(dict):
def __repr__(self):
try:
self._cached_repr
except AttributeError:
self._cached_repr = repr(sorted(self.items(), key=operator.itemgetter(0)))
finally:
return self._cached_repr
__hash__ = fake_hash
def fix_args(args):
for elem in args:
if isinstance(elem, dict):
elem = RereprDict(elem)
yield elem
def give_fake_hash(tup):
args, kwargs = tup
args = tuple(fix_args(args))
kwargs = RereprDict(kwargs)
return HashableTuple((args, kwargs))
There's a reason dict/list/set/etc. are not hashable, and it is they are mutable.
That's one of the main reasons their immutable counterparts exist (frozendict/frozenset/tuple). (Well, tuple isn't exactly a frozen-list, but in practice it serves the purpose).
Therefor, for your purpose, use the immutable alternatives.
Here's a quick demonstration of why you shouldn't hash mutable objects. Bare in mind that hashing requires that a==b ==> hash(a)==hash(b).
#memoize
def f(x): ...
d1 = {'a':5}
d2 = {'a':99}
res1 = f(d1)
res2 = f(d2)
d1['a'] = 99
res3 = f(d1) # what should be returned? not well defined...
Related
Let's say I have this dictionary in python, defined at the module level (mysettings.py):
settings = {
'expensive1' : expensive_to_compute(1),
'expensive2' : expensive_to_compute(2),
...
}
I would like those values to be computed when the keys are accessed:
from mysettings import settings # settings is only "prepared"
print settings['expensive1'] # Now the value is really computed.
Is this possible? How?
Don't inherit build-in dict. Even if you overwrite dict.__getitem__() method, dict.get() would not work as you expected.
The right way is to inherit abc.Mapping from collections.
from collections.abc import Mapping
class LazyDict(Mapping):
def __init__(self, *args, **kw):
self._raw_dict = dict(*args, **kw)
def __getitem__(self, key):
func, arg = self._raw_dict.__getitem__(key)
return func(arg)
def __iter__(self):
return iter(self._raw_dict)
def __len__(self):
return len(self._raw_dict)
Then you can do:
settings = LazyDict({
'expensive1': (expensive_to_compute, 1),
'expensive2': (expensive_to_compute, 2),
})
I also list sample code and examples here: https://gist.github.com/gyli/9b50bb8537069b4e154fec41a4b5995a
If you don't separe the arguments from the callable, I don't think it's possible. However, this should work:
class MySettingsDict(dict):
def __getitem__(self, item):
function, arg = dict.__getitem__(self, item)
return function(arg)
def expensive_to_compute(arg):
return arg * 3
And now:
>>> settings = MySettingsDict({
'expensive1': (expensive_to_compute, 1),
'expensive2': (expensive_to_compute, 2),
})
>>> settings['expensive1']
3
>>> settings['expensive2']
6
Edit:
You may also want to cache the results of expensive_to_compute, if they are to be accessed multiple times. Something like this
class MySettingsDict(dict):
def __getitem__(self, item):
value = dict.__getitem__(self, item)
if not isinstance(value, int):
function, arg = value
value = function(arg)
dict.__setitem__(self, item, value)
return value
And now:
>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2),
(<function expensive_to_compute at 0x9b0a62c>, 1)])
>>> settings['expensive1']
3
>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2), 3])
You may also want to override other dict methods depending of how you want to use the dict.
Store references to the functions as the values for the keys i.e:
def A():
return "that took ages"
def B():
return "that took for-ever"
settings = {
"A": A,
"B": B,
}
print(settings["A"]())
This way, you only evaluate the function associated with a key when you access it and invoke it. A suitable class which can handle having non-lazy values would be:
import types
class LazyDict(dict):
def __getitem__(self,key):
item = dict.__getitem__(self,key)
if isinstance(item,types.FunctionType):
return item()
else:
return item
usage:
settings = LazyDict([("A",A),("B",B)])
print(settings["A"])
>>>
that took ages
You can make expensive_to_compute a generator function:
settings = {
'expensive1' : expensive_to_compute(1),
'expensive2' : expensive_to_compute(2),
}
Then try:
from mysettings import settings
print next(settings['expensive1'])
I would populate the dictionary values with callables and change them to the result upon reading.
class LazyDict(dict):
def __getitem__(self, k):
v = super().__getitem__(k)
if callable(v):
v = v()
super().__setitem__(k, v)
return v
def get(self, k, default=None):
if k in self:
return self.__getitem__(k)
return default
Then with
def expensive_to_compute(arg):
print('Doing heavy stuff')
return arg * 3
you can do:
>>> settings = LazyDict({
'expensive1': lambda: expensive_to_compute(1),
'expensive2': lambda: expensive_to_compute(2),
})
>>> settings.__repr__()
"{'expensive1': <function <lambda> at 0x000001A0BA2B8EA0>, 'expensive2': <function <lambda> at 0x000001A0BA2B8F28>}"
>>> settings['expensive1']
Doing heavy stuff
3
>>> settings.get('expensive2')
Doing heavy stuff
6
>>> settings.__repr__()
"{'expensive1': 3, 'expensive2': 6}"
I recently needed something similar. Mixing both strategies from Guangyang Li and michaelmeyer, here is how I did it:
class LazyDict(MutableMapping):
"""Lazily evaluated dictionary."""
function = None
def __init__(self, *args, **kargs):
self._dict = dict(*args, **kargs)
def __getitem__(self, key):
"""Evaluate value."""
value = self._dict[key]
if not isinstance(value, ccData):
value = self.function(value)
self._dict[key] = value
return value
def __setitem__(self, key, value):
"""Store value lazily."""
self._dict[key] = value
def __delitem__(self, key):
"""Delete value."""
return self._dict[key]
def __iter__(self):
"""Iterate over dictionary."""
return iter(self._dict)
def __len__(self):
"""Evaluate size of dictionary."""
return len(self._dict)
Let's lazily evaluate the following function:
def expensive_to_compute(arg):
return arg * 3
The advantage is that the function is yet to be defined within the object and the arguments are the ones actually stored (which is what I needed):
>>> settings = LazyDict({'expensive1': 1, 'expensive2': 2})
>>> settings.function = expensive_to_compute # function unknown until now!
>>> settings['expensive1']
3
>>> settings['expensive2']
6
This approach works with a single function only.
I can point out the following advantages:
implements the complete MutableMapping API
if your function is non-deterministic, you can reset a value to re-evaluate
pass in a function to generate the values on the first attribute get:
class LazyDict(dict):
""" Fill in the values of a dict at first access """
def __init__(self, fn, *args, **kwargs):
self._fn = fn
self._fn_args = args or []
self._fn_kwargs = kwargs or {}
return super(LazyDict, self).__init__()
def _fn_populate(self):
if self._fn:
self._fn(self, *self._fn_args, **self._fn_kwargs)
self._fn = self._fn_args = self._fn_kwargs = None
def __getattribute__(self, name):
if not name.startswith('_fn'):
self._fn_populate()
return super(LazyDict, self).__getattribute__(name)
def __getitem__(self, item):
self._fn_populate()
return super(LazyDict, self).__getitem__(item)
>>> def _fn(self, val):
... print 'lazy loading'
... self['foo'] = val
...
>>> d = LazyDict(_fn, 'bar')
>>> d
{}
>>> d['foo']
lazy loading
'bar'
>>>
Alternatively, one can use the LazyDictionary package that creates a thread-safe lazy dictionary.
Installation:
pip install lazydict
Usage:
from lazydict import LazyDictionary
import tempfile
lazy = LazyDictionary()
lazy['temp'] = lambda: tempfile.mkdtemp()
I'm trying to create an object collection proxy, which could do something like this:
class A:
def do_something():
# ...
class B:
def get_a():
return A()
class Proxy:
?
collection = [B(), B()]
proxy = Proxy(collection)
proxy.get_a().do_something()
# ^ for each B in collection get_a() and do_something()
What would be the best architecture / strategy for achieving this?
The key question, I guess is, how to cache the result of get_a() so I can then proxy do_something()
N.B. I don't expect proxy.get_a().do_something() to return anything sensible, it's only supposed to do things.
Simple enough... you may want to adapt it to do some more checking
class A(object):
def do_something(self):
print id(self), "called"
class B(object):
def get_a(self):
return A()
class Proxy(object):
def __init__(self, objs):
self._objs = objs
def __getattr__(self, name):
def func(*args, **kwargs):
return Proxy([getattr(o, name)(*args, **kwargs) for o in self._objs])
return func
collection = [B(), B()]
proxy = Proxy(collection)
proxy.get_a().do_something()
Results in:
4455571152 called
4455571216 called
The most pythonic way of going about this would probably be a list comprehension:
results = [b.get_a().do_something() for b in collection]
If you want to cache calls to B.get_a(), you can use memoization. A simple way of doing memoization yourself could look like this:
cache = None
# ...
class B:
def get_a(self):
global cache
if cache is None:
cache = A()
return cache
If you want to use caching in multiple places, you'll need to cache results based on keys in order to distinguish them, and for convenience's sake write a decorator that you can simply wrap functions with whose results you want to cache.
A good example of this is found in Python Algorithms: Mastering Basic Algorithms in the Python Language (see this question). Modified for your case, to not use the function arguments but the function name as cache key, it would look like this:
from functools import wraps
def memoize(func):
cache = {}
key = func.__name__
# wraps(func)
def wrap(*args):
if key not in cache:
cache[key] = func(*args)
return cache[key]
return wrap
class A:
def do_something(self):
return 1
class B:
#memoize
def get_a(self):
print "B.get_a() was called"
return A()
collection = [B(), B()]
results = [b.get_a().do_something() for b in collection]
print results
Output:
B.get_a() was called
[1, 1]
I'm trying to write an interface that abstracts another interface somewhat.
The bottom interface is somewhat inconsistent about what it requires: sometimes id's, and sometimes names. I'm trying to hide details like these.
I want to create a list-like object that will allow you to add names to it, but internally store id's associated with those names.
Preferably, I'd like to use something like descriptors for class attributes, except that they work on list items instead. That is, a function (like __get__) is called for everything added to the list to convert it to the id's I want to store internally, and another function (like __set__) to return objects (that provide convenience methods) instead of the actual id's when trying to retrieve items from the list.
So that I can do something like this:
def get_thing_id_from_name(name):
# assume that this is more complicated
return other_api.get_id_from_name_or_whatever(name)
class Thing(object)
def __init__(self, thing_id):
self.id = thing_id
self.name = other_api.get_name_somehow(id)
def __eq__(self, other):
if isinstance(other, basestring):
return self.name == other
if isinstance(other, Thing):
return self.thing_id == other.thing_id
return NotImplemented
tl = ThingList()
tl.append('thing_one')
tl.append('thing_two')
tl[1] = 'thing_three'
print tl[0].id
print tl[0] == 'thing_one'
print tl[1] == Thing(3)
The documentation recommends defining 17 methods (not including a constructor) for an object that acts like a mutable sequence. I don't think subclassing list is going to help me out at all. It feels like I ought to be able to achieve this just defining a getter and setter somewhere.
UserList is apparently depreciated (although is in python3? I'm using 2.7 though).
Is there a way to achieve this, or something similar, without having to redefine so much functionality?
Yo don't need to override all the list methods -- __setitem__, __init__ and \append should be enough - you may want to have insert and some others as well. You could write __setitem__ and __getitem__ to call __set__ and __get__ methods on a sepecial "Thing" class exactly as descriptors do.
Here is a short example - maybe something like what you want:
class Thing(object):
def __init__(self, thing):
self.value = thing
self.name = str(thing)
id = property(lambda s: id(s))
#...
def __repr__(self):
return "I am a %s" %self.name
class ThingList(list):
def __init__(self, items):
for item in items:
self.append(item)
def append(self, value):
list.append(self, Thing(value))
def __setitem__(self, index, value):
list.__setitem__(self, index, Thing(value))
Example:
>>> a = ThingList(range(3))
>>> a.append("three")
>>> a
[I am a 0, I am a 1, I am a 2, I am a three]
>>> a[0].id
35242896
>>>
-- edit --
The O.P. commented: "I was really hoping that there would be a way to have all the functionality from list - addition, extending, slices etc. and only have to redefine the get/set item behaviour."
So mote it be - one really has to override all relevant methods in this way. But if what we want to avoid is just a lot of boiler plate code with a lot of functions doing almost the same, the new, overriden methods, can be generated dynamically - all we need is a decorator to change ordinary objects into Things for all operations that set values:
class Thing(object):
# Prevents duplicating the wrapping of objects:
def __new__(cls, thing):
if isinstance(thing, cls):
return thing
return object.__new__(cls, thing)
def __init__(self, thing):
self.value = thing
self.name = str(thing)
id = property(lambda s: id(s))
#...
def __repr__(self):
return "I am a %s" %self.name
def converter(func, cardinality=1):
def new_func(*args):
# Pick the last item in the argument list, which
# for all item setter methods on a list is the one
# which actually contains the values
if cardinality == 1:
args = args[:-1] + (Thing(args[-1] ),)
else:
args = args[:-1] + ([Thing(item) for item in args[-1]],)
return func(*args)
new_func.func_name = func.__name__
return new_func
my_list_dict = {}
for single_setter in ("__setitem__", "append", "insert"):
my_list_dict[single_setter] = converter(getattr(list, single_setter), cardinality=1)
for many_setter in ("__setslice__", "__add__", "__iadd__", "__init__", "extend"):
my_list_dict[many_setter] = converter(getattr(list, many_setter), cardinality="many")
MyList = type("MyList", (list,), my_list_dict)
And it works thus:
>>> a = MyList()
>>> a
[]
>>> a.append(5)
>>> a
[I am a 5]
>>> a + [2,3,4]
[I am a 5, I am a 2, I am a 3, I am a 4]
>>> a.extend(range(4))
>>> a
[I am a 5, I am a 0, I am a 1, I am a 2, I am a 3]
>>> a[1:2] = range(10,12)
>>> a
[I am a 5, I am a 10, I am a 11, I am a 1, I am a 2, I am a 3]
>>>
When subclassing builtin types, I noticed a rather important difference between Python 2 and Python 3 in the return type of the methods of the built-in types. The following code illustrates this for sets:
class MySet(set):
pass
s1 = MySet([1, 2, 3, 4, 5])
s2 = MySet([1, 2, 3, 6, 7])
print(type(s1.union(s2)))
print(type(s1.intersection(s2)))
print(type(s1.difference(s2)))
With Python 2, all the return values are of type MySet. With Python 3, the return types are set. I could not find any documentation on what the result is supposed to be, nor any documentation about the change in Python 3.
Anyway, what I really care about is this: is there a simple way in Python 3 to get the behavior seen in Python 2, without redefining every single method of the built-in types?
This isn't a general change for built-in types when moving from Python 2.x to 3.x -- list and int, for example, have the same behaviour in 2.x and 3.x. Only the set type was changed to bring it in line with the other types, as discussed in this bug tracker issue.
I'm afraid there is no really nice way to make it behave the old way. Here is some code I was able to come up with:
class MySet(set):
def copy(self):
return MySet(self)
def _make_binary_op(in_place_method):
def bin_op(self, other):
new = self.copy()
in_place_method(new, other)
return new
return bin_op
__rand__ = __and__ = _make_binary_op(set.__iand__)
intersection = _make_binary_op(set.intersection_update)
__ror__ = __or__ = _make_binary_op(set.__ior__)
union = _make_binary_op(set.update)
__sub__ = _make_binary_op(set.__isub__)
difference = _make_binary_op(set.difference_update)
__rxor__ = xor__ = _make_binary_op(set.__ixor__)
symmetric_difference = _make_binary_op(set.symmetric_difference_update)
del _make_binary_op
def __rsub__(self, other):
new = MySet(other)
new -= self
return new
This will simply overwrite all methods with versions that return your own type. (There is a whole lot of methods!)
Maybe for your application, you can get away with overwriting copy() and stick to the in-place methods.
Perhaps a metaclass to do all that humdrum wrapping for you would make it easier:
class Perpetuate(type):
def __new__(metacls, cls_name, cls_bases, cls_dict):
if len(cls_bases) > 1:
raise TypeError("multiple bases not allowed")
result_class = type.__new__(metacls, cls_name, cls_bases, cls_dict)
base_class = cls_bases[0]
known_attr = set()
for attr in cls_dict.keys():
known_attr.add(attr)
for attr in base_class.__dict__.keys():
if attr in ('__new__'):
continue
code = getattr(base_class, attr)
if callable(code) and attr not in known_attr:
setattr(result_class, attr, metacls._wrap(base_class, code))
elif attr not in known_attr:
setattr(result_class, attr, code)
return result_class
#staticmethod
def _wrap(base, code):
def wrapper(*args, **kwargs):
if args:
cls = args[0]
result = code(*args, **kwargs)
if type(result) == base:
return cls.__class__(result)
elif isinstance(result, (tuple, list, set)):
new_result = []
for partial in result:
if type(partial) == base:
new_result.append(cls.__class__(partial))
else:
new_result.append(partial)
result = result.__class__(new_result)
elif isinstance(result, dict):
for key in result:
value = result[key]
if type(value) == base:
result[key] = cls.__class__(value)
return result
wrapper.__name__ = code.__name__
wrapper.__doc__ = code.__doc__
return wrapper
class MySet(set, metaclass=Perpetuate):
pass
s1 = MySet([1, 2, 3, 4, 5])
s2 = MySet([1, 2, 3, 6, 7])
print(s1.union(s2))
print(type(s1.union(s2)))
print(s1.intersection(s2))
print(type(s1.intersection(s2)))
print(s1.difference(s2))
print(type(s1.difference(s2)))
As a follow-up to Sven's answer, here is a universal wrapping solution that takes care of all non-special methods. The idea is to catch the first lookup coming from a method call, and install a wrapper method that does the type conversion. At subsequent lookups, the wrapper is returned directly.
Caveats:
1) This is more magic trickery than I like to have in my code.
2) I'd still need to wrap special methods (__and__ etc.) manually because their lookup bypasses __getattribute__
import types
class MySet(set):
def __getattribute__(self, name):
attr = super(MySet, self).__getattribute__(name)
if isinstance(attr, types.BuiltinMethodType):
def wrapper(self, *args, **kwargs):
result = attr(self, *args, **kwargs)
if isinstance(result, set):
return MySet(result)
else:
return result
setattr(MySet, name, wrapper)
return wrapper
return attr
I would like to combine OrderedDict() and defaultdict() from collections in one object, which shall be an ordered, default dict.
Is this possible?
The following (using a modified version of this recipe) works for me:
from collections import OrderedDict, Callable
class DefaultOrderedDict(OrderedDict):
# Source: http://stackoverflow.com/a/6190500/562769
def __init__(self, default_factory=None, *a, **kw):
if (default_factory is not None and
not isinstance(default_factory, Callable)):
raise TypeError('first argument must be callable')
OrderedDict.__init__(self, *a, **kw)
self.default_factory = default_factory
def __getitem__(self, key):
try:
return OrderedDict.__getitem__(self, key)
except KeyError:
return self.__missing__(key)
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = value = self.default_factory()
return value
def __reduce__(self):
if self.default_factory is None:
args = tuple()
else:
args = self.default_factory,
return type(self), args, None, None, self.items()
def copy(self):
return self.__copy__()
def __copy__(self):
return type(self)(self.default_factory, self)
def __deepcopy__(self, memo):
import copy
return type(self)(self.default_factory,
copy.deepcopy(self.items()))
def __repr__(self):
return 'OrderedDefaultDict(%s, %s)' % (self.default_factory,
OrderedDict.__repr__(self))
Here is another possibility, inspired by Raymond Hettinger's super() Considered Super, tested on Python 2.7.X and 3.4.X:
from collections import OrderedDict, defaultdict
class OrderedDefaultDict(OrderedDict, defaultdict):
def __init__(self, default_factory=None, *args, **kwargs):
#in python3 you can omit the args to super
super(OrderedDefaultDict, self).__init__(*args, **kwargs)
self.default_factory = default_factory
If you check out the class's MRO (aka, help(OrderedDefaultDict)), you'll see this:
class OrderedDefaultDict(collections.OrderedDict, collections.defaultdict)
| Method resolution order:
| OrderedDefaultDict
| collections.OrderedDict
| collections.defaultdict
| __builtin__.dict
| __builtin__.object
meaning that when an instance of OrderedDefaultDict is initialized, it defers to the OrderedDict's init, but this one in turn will call the defaultdict's methods before calling __builtin__.dict, which is precisely what we want.
If you want a simple solution that doesn't require a class, you can just use OrderedDict.setdefault(key, default=None) or OrderedDict.get(key, default=None). If you only get / set from a few places, say in a loop, you can easily just setdefault.
totals = collections.OrderedDict()
for i, x in some_generator():
totals[i] = totals.get(i, 0) + x
It is even easier for lists with setdefault:
agglomerate = collections.OrderedDict()
for i, x in some_generator():
agglomerate.setdefault(i, []).append(x)
But if you use it more than a few times, it is probably better to set up a class, like in the other answers.
Here's another solution to think about if your use case is simple like mine and you don't necessarily want to add the complexity of a DefaultOrderedDict class implementation to your code.
from collections import OrderedDict
keys = ['a', 'b', 'c']
items = [(key, None) for key in keys]
od = OrderedDict(items)
(None is my desired default value.)
Note that this solution won't work if one of your requirements is to dynamically insert new keys with the default value. A tradeoff of simplicity.
Update 3/13/17 - I learned of a convenience function for this use case. Same as above but you can omit the line items = ... and just:
od = OrderedDict.fromkeys(keys)
Output:
OrderedDict([('a', None), ('b', None), ('c', None)])
And if your keys are single characters, you can just pass one string:
OrderedDict.fromkeys('abc')
This has the same output as the two examples above.
You can also pass a default value as the second arg to OrderedDict.fromkeys(...).
Another simple approach would be to use dictionary get method
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d['key'] = d.get('key', 0) + 1
>>> d['key'] = d.get('key', 0) + 1
>>> d
OrderedDict([('key', 2)])
>>>
A simpler version of #zeekay 's answer is:
from collections import OrderedDict
class OrderedDefaultListDict(OrderedDict): #name according to default
def __missing__(self, key):
self[key] = value = [] #change to whatever default you want
return value
A simple and elegant solution building on #NickBread.
Has a slightly different API to set the factory, but good defaults are always nice to have.
class OrderedDefaultDict(OrderedDict):
factory = list
def __missing__(self, key):
self[key] = value = self.factory()
return value
I created slightly fixed and more simplified version of the accepted answer, actual for python 3.7.
from collections import OrderedDict
from copy import copy, deepcopy
import pickle
from typing import Any, Callable
class DefaultOrderedDict(OrderedDict):
def __init__(
self,
default_factory: Callable[[], Any],
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.default_factory = default_factory
def __getitem__(self, key):
try:
return super().__getitem__(key)
except KeyError:
return self.__missing__(key)
def __missing__(self, key):
self[key] = value = self.default_factory()
return value
def __reduce__(self):
return type(self), (self.default_factory, ), None, None, iter(self.items())
def copy(self):
return self.__copy__()
def __copy__(self):
return type(self)(self.default_factory, self)
def __deepcopy__(self, memo):
return type(self)(self.default_factory, deepcopy(tuple(self.items()), memo))
def __repr__(self):
return f'{self.__class__.__name__}({self.default_factory}, {OrderedDict(self).__repr__()})'
And, that may be even more important, provided some tests.
a = DefaultOrderedDict(list)
# testing default
assert a['key'] == []
a['key'].append(1)
assert a['key'] == [1, ]
# testing repr
assert repr(a) == "DefaultOrderedDict(<class 'list'>, OrderedDict([('key', [1])]))"
# testing copy
b = a.copy()
assert b['key'] is a['key']
c = copy(a)
assert c['key'] is a['key']
d = deepcopy(a)
assert d['key'] is not a['key']
assert d['key'] == a['key']
# testing pickle
saved = pickle.dumps(a)
restored = pickle.loads(saved)
assert restored is not a
assert restored == a
# testing order
a['second_key'] = [2, ]
a['key'] = [3, ]
assert list(a.items()) == [('key', [3, ]), ('second_key', [2, ])]
Inspired by other answers on this thread, you can use something like,
from collections import OrderedDict
class OrderedDefaultDict(OrderedDict):
def __missing__(self, key):
value = OrderedDefaultDict()
self[key] = value
return value
I would like to know if there're any downsides of initializing another object of the same class in the missing method.
i tested the default dict and discovered it's also sorted!
maybe it was just a coincidence but anyway you can use the sorted function:
sorted(s.items())
i think it's simpler