I've got a class that has replaced __iter__ to hide extra unneeded data. I've made the rest of my code backwards compatible by setting iteritems to either dict.iteritems or dict.items depending on the python version, and I can then call iteritems(class_object), but it doesn't seem to work well with my class.
It'll be easier to explain with an example:
class Test(dict):
def __init__(self, some_dict):
self.some_dict = some_dict
super(self.__class__, self).__init__(self.some_dict)
def __iter__(self):
for k, v in self.some_dict.iteritems():
yield k, v['value']
test_dict = {
'a': {'value': 'what',
'hidden': 123},
'b': {'value': 'test'}
}
If I do Test(test_dict).__iter__(), then it correctly returns {'a': 'what', 'b': 'test'}
If I add iteritems = __iter__ to the class, then it also works when doing Test(test_dict).iteritems()
However, no matter what I try, doing dict.iteritems(Test(test_dict)) defaults to the standard dict iterating, and returns {'a': {'hidden': 123, 'value': 'what'}, 'b': {'value': 'test'}}
I've tried a couple of trace functions but they don't go deep enough to figure out what's going on.
The dict.iteritems() method reaches straight into the internal data structures of the dict implementation. You passed in a subclass of dict so those same data structures are there for it to access. You can't override this behaviour.
Not that dict.iteritems() would ever use __iter__; the latter produces keys only, not key-value pairs!
You should instead define iteritems differently; given a PY3 boolean variable that is False for Python 2, True otherwise:
from operator import methodcaller
iteritems = methodcaller('items' if PY3 else 'iteritems')
Now iteritems(object) is translated to object.iteritems() or object.items(), as needed, and the correct method is always called.
Next, to extend dictionary behaviour, instead of subclassing dict, I'd subclass collections.MutableMapping (*):
from collections import MutableMapping
class Test(MutableMapping):
def __init__(self, some_dict):
self.some_dict = some_dict.copy()
def __getitem__(self, key):
return self.some_dict[key]
def __setitem__(self, key, value):
self.some_dict[key] = value
def __delitem__(self, key):
del self.some_dict[key]
def __len__(self):
return len(self.some_dict)
def __iter__(self):
for k, v in self.some_dict.iteritems():
yield k, v['value']
This implements all the same methods that dict provides, except for copy and the dict.fromkeys() class method.
You could instead inherit from collections.UserDict(), which adds those two remaining methods:
try:
# Python 2
from UserDict import UserDict
except ImportError:
from collections import UserDict
class Test(UserDict):
def __iter__(self):
for k, v in self.data.iteritems():
yield k, v['value']
Only an alternate __iter__ implementation is needed in that case.
In either case, you still can't use dict.iteritems on these objects, because that method can only work with actual dict objects.
(*) collections.MutableMapping is the Python 2 location of that class, the official Python 3 location is collections.abc.MutableMapping but there are aliases in place to support Python 2-compatible code.
Related
There's a common problem where I need to keep track of a bunch of collections in a dictionary. Let's say I want to keep track of which items I borrowed from my friends. The defaultdict class is quite useful to do this:
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
# defaultdict(<class 'set'>, {'Peter': {'salt'}, 'Eric': {'jacket', 'car'}})
This allows me to add items to the respective sets without worrying if any key is already in the set. Now if I return the salt to Peter. This means I owe him nothing and he can be removed from the dictionary. Doing this is slightly more cumbersome.
d['Peter'].remove('salt')
if not d['Peter']:
del(d['Peter'])
I know I could put this in some function, but for readability I would like a class that removes the key automatically if the corresponding set is empty. Is there some way to do this?
Edit
Okay I realize a pretty major problem with this idea when trying to solve it using inheritance and changing the index function. This is that that when calling d[index] the value is obviously returned already before calling .remove(something), which makes it impossible for the dictionary to know that it has been emptied. I'm guessing there's not really a way around using something different.
The problem with using a defaultdict to do what you want is that even accessing a key sets that key using the factory function. Consider:
from collections import defaultdict
d = defaultdict(set)
if d["Peter"]:
print("I owe something to Peter")
print(d)
# defaultdict(set, {'Peter': set()})
Also, the problem with creating a sub-class, as you've realized, the __getitem__() method is called before the set is ever emptied, so you'd have to call another function that checks if the set is empty and remove it.
A better idea might be to just not include keys with empty sets when you're creating the string representation.
class NewDefaultDict(defaultdict):
def __repr__(self):
return (f"NewDefaultDict({repr(self.default_factory)}, {{" +
", ".join(f"{repr(k)}: {repr(v)}" for k, v in self.items() if v) +
"})")
nd = NewDefaultDict(set)
nd["Peter"].add("salt")
nd["Paul"].add("pepper")
nd["Paul"].remove("pepper")
print(nd)
# NewDefaultDict(<class 'set'>, {'Peter': {'salt'}})
You would also need to redefine __contains__() to check if the value is empty, so that e.g. "Paul" in nd returns False:
def __contains__(self, key):
return defaultdict.__contains__(self, key) and self[key]
To make it compatible with for ... in nd constructs and dict-unpacking, you can redefine __iter__():
def __iter__(self):
for key in defaultdict.__iter__(self):
if self[key]: yield key
Then,
for k in nd:
print(k)
gives:
Peter
A dictionary comprehension might be useful.
from collections import defaultdict
d = defaultdict(set)
d['Peter'].add('salt')
d['Eric'].add('car')
d['Eric'].add('jacket')
d['Peter'].remove('salt')
d2 = {k: v for k, v in d.items() if len(v) > 0}
The d2 dictionary is now:
{'Eric': {'car', 'jacket'}}
Alternatively, using the fact that an empty set is considered false in Python.
d2 = {k: v for k, v in d.items() if v}
Defining a class to implement this logic, similar to the other answer, we can simply ignore keys/values where the value meets a criteria. A function is passed using the ignore parameter to define that criteria.
from collections import defaultdict
class default_ignore_dict(defaultdict):
def __init__(self, factory, ignore, *args, **kwargs):
defaultdict.__init__(self, factory, *args, **kwargs)
self.ignore = ignore
def __contains__(self, key):
return defaultdict.__contains__(self, key) and not self.ignore(self[key])
def items(self):
return ((k, v) for k, v in defaultdict.items(self) if not self.ignore(v))
def keys(self):
return (k for k, _ in self.items())
def values(self):
return (v for _, v in self.items())
Testing this:
>>> d = default_ignore_dict(set, lambda s: not s)
>>> d['Peter'].add('salt')
>>> d['Peter'].remove('salt')
>>> d['Eric'].add('car')
>>> d['Eric'].add('jacket')
>>>
>>> 'Peter' in d
False
>>> list(d.items())
[('Eric', {'car', 'jacket'})]
>>>
Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)
I want to be able to do:
x = defaultdict(...stuff...)
x[0][1][0]
{}
So, I can do x = defaultdict(defaultdict), but that's only a second level:
x[0]
{}
x[0][0]
KeyError: 0
There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?
Note this is asking how to do an infinite-level recursive defaultdict, so it's distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.
I'll probably just end up using the bunch pattern, but when I realized I didn't know how to do this, it got me interested.
The other answers here tell you how to create a defaultdict which contains "infinitely many" defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.
You may have been looking for:
defaultdict(lambda: defaultdict(dict))
The reasons why you might prefer this construct are:
It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
This enables the "leaf" of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))
For an arbitrary number of levels:
def rec_dd():
return defaultdict(rec_dd)
>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}
Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:
rec_dd = lambda: defaultdict(rec_dd)
There is a nifty trick for doing that:
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
Similar to BrenBarn's solution, but doesn't contain the name of the variable tree twice, so it works even after changes to the variable dictionary:
tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))
Then you can create each new x with x = tree().
For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.
class NestedDefaultDict(defaultdict):
def __init__(self, *args, **kwargs):
super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
Usage:
my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']
print(my_dict) # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}
I based this of Andrew's answer here.
If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:
def nested_defaultdict(existing=None, **kwargs):
if existing is None:
existing = {}
if not isinstance(existing, dict):
return existing
existing = {key: nested_defaultdict(val) for key, val in existing.items()}
return defaultdict(nested_defaultdict, existing, **kwargs)
https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775
Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.
(cross posting from Can't pickle defaultdict)
def wrap_defaultdict(instance, times=1):
"""Wrap an instance an arbitrary number of `times` to create nested defaultdict.
Parameters
----------
instance - list, dict, int, collections.Counter
times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
Notes
-----
using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
- thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
"""
from collections import defaultdict
def _dd(x):
return defaultdict(x.copy)
dd = defaultdict(instance)
for i in range(times-1):
dd = _dd(dd)
return dd
Based on Chris W answer, however, to address the type annotation concern, you could make it a factory function that defines the detailed types. For example this is the final solution to my problem when I was researching this question:
def frequency_map_factory() -> dict[str, dict[str, int]]:
"""
Provides a recorder of: per X:str, frequency of Y:str occurrences.
"""
return defaultdict(lambda: defaultdict(int))
here is a recursive function to convert a recursive default dict to a normal dict
def defdict_to_dict(defdict, finaldict):
# pass in an empty dict for finaldict
for k, v in defdict.items():
if isinstance(v, defaultdict):
# new level created and that is the new value
finaldict[k] = defdict_to_dict(v, {})
else:
finaldict[k] = v
return finaldict
defdict_to_dict(my_rec_default_dict, {})
#nucklehead's response can be extended to handle arrays in JSON as well:
def nested_dict(existing=None, **kwargs):
if existing is None:
existing = defaultdict()
if isinstance(existing, list):
existing = [nested_dict(val) for val in existing]
if not isinstance(existing, dict):
return existing
existing = {key: nested_dict(val) for key, val in existing.items()}
return defaultdict(nested_dict, existing, **kwargs)
Here's a solution similar to #Stanislav's answer that works with multiprocessing and also allows for termination of the nesting:
from collections import defaultdict
from functools import partial
class NestedDD(defaultdict):
def __init__(self, n, *args, **kwargs):
self.n = n
factory = partial(build_nested_dd, n=n - 1) if n > 1 else int
super().__init__(factory, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
def build_nested_dd(n):
return NestedDD(n)
Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)
I want to be able to do:
x = defaultdict(...stuff...)
x[0][1][0]
{}
So, I can do x = defaultdict(defaultdict), but that's only a second level:
x[0]
{}
x[0][0]
KeyError: 0
There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?
Note this is asking how to do an infinite-level recursive defaultdict, so it's distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.
I'll probably just end up using the bunch pattern, but when I realized I didn't know how to do this, it got me interested.
The other answers here tell you how to create a defaultdict which contains "infinitely many" defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.
You may have been looking for:
defaultdict(lambda: defaultdict(dict))
The reasons why you might prefer this construct are:
It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
This enables the "leaf" of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))
For an arbitrary number of levels:
def rec_dd():
return defaultdict(rec_dd)
>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}
Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:
rec_dd = lambda: defaultdict(rec_dd)
There is a nifty trick for doing that:
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
Similar to BrenBarn's solution, but doesn't contain the name of the variable tree twice, so it works even after changes to the variable dictionary:
tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))
Then you can create each new x with x = tree().
For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.
class NestedDefaultDict(defaultdict):
def __init__(self, *args, **kwargs):
super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
Usage:
my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']
print(my_dict) # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}
I based this of Andrew's answer here.
If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:
def nested_defaultdict(existing=None, **kwargs):
if existing is None:
existing = {}
if not isinstance(existing, dict):
return existing
existing = {key: nested_defaultdict(val) for key, val in existing.items()}
return defaultdict(nested_defaultdict, existing, **kwargs)
https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775
Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.
(cross posting from Can't pickle defaultdict)
def wrap_defaultdict(instance, times=1):
"""Wrap an instance an arbitrary number of `times` to create nested defaultdict.
Parameters
----------
instance - list, dict, int, collections.Counter
times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
Notes
-----
using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
- thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
"""
from collections import defaultdict
def _dd(x):
return defaultdict(x.copy)
dd = defaultdict(instance)
for i in range(times-1):
dd = _dd(dd)
return dd
Based on Chris W answer, however, to address the type annotation concern, you could make it a factory function that defines the detailed types. For example this is the final solution to my problem when I was researching this question:
def frequency_map_factory() -> dict[str, dict[str, int]]:
"""
Provides a recorder of: per X:str, frequency of Y:str occurrences.
"""
return defaultdict(lambda: defaultdict(int))
here is a recursive function to convert a recursive default dict to a normal dict
def defdict_to_dict(defdict, finaldict):
# pass in an empty dict for finaldict
for k, v in defdict.items():
if isinstance(v, defaultdict):
# new level created and that is the new value
finaldict[k] = defdict_to_dict(v, {})
else:
finaldict[k] = v
return finaldict
defdict_to_dict(my_rec_default_dict, {})
#nucklehead's response can be extended to handle arrays in JSON as well:
def nested_dict(existing=None, **kwargs):
if existing is None:
existing = defaultdict()
if isinstance(existing, list):
existing = [nested_dict(val) for val in existing]
if not isinstance(existing, dict):
return existing
existing = {key: nested_dict(val) for key, val in existing.items()}
return defaultdict(nested_dict, existing, **kwargs)
Here's a solution similar to #Stanislav's answer that works with multiprocessing and also allows for termination of the nesting:
from collections import defaultdict
from functools import partial
class NestedDD(defaultdict):
def __init__(self, n, *args, **kwargs):
self.n = n
factory = partial(build_nested_dd, n=n - 1) if n > 1 else int
super().__init__(factory, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
def build_nested_dd(n):
return NestedDD(n)
Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)
I want to be able to do:
x = defaultdict(...stuff...)
x[0][1][0]
{}
So, I can do x = defaultdict(defaultdict), but that's only a second level:
x[0]
{}
x[0][0]
KeyError: 0
There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?
Note this is asking how to do an infinite-level recursive defaultdict, so it's distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.
I'll probably just end up using the bunch pattern, but when I realized I didn't know how to do this, it got me interested.
The other answers here tell you how to create a defaultdict which contains "infinitely many" defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.
You may have been looking for:
defaultdict(lambda: defaultdict(dict))
The reasons why you might prefer this construct are:
It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
This enables the "leaf" of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))
For an arbitrary number of levels:
def rec_dd():
return defaultdict(rec_dd)
>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}
Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:
rec_dd = lambda: defaultdict(rec_dd)
There is a nifty trick for doing that:
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
Similar to BrenBarn's solution, but doesn't contain the name of the variable tree twice, so it works even after changes to the variable dictionary:
tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))
Then you can create each new x with x = tree().
For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.
class NestedDefaultDict(defaultdict):
def __init__(self, *args, **kwargs):
super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
Usage:
my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']
print(my_dict) # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}
I based this of Andrew's answer here.
If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:
def nested_defaultdict(existing=None, **kwargs):
if existing is None:
existing = {}
if not isinstance(existing, dict):
return existing
existing = {key: nested_defaultdict(val) for key, val in existing.items()}
return defaultdict(nested_defaultdict, existing, **kwargs)
https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775
Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.
(cross posting from Can't pickle defaultdict)
def wrap_defaultdict(instance, times=1):
"""Wrap an instance an arbitrary number of `times` to create nested defaultdict.
Parameters
----------
instance - list, dict, int, collections.Counter
times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
Notes
-----
using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
- thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
"""
from collections import defaultdict
def _dd(x):
return defaultdict(x.copy)
dd = defaultdict(instance)
for i in range(times-1):
dd = _dd(dd)
return dd
Based on Chris W answer, however, to address the type annotation concern, you could make it a factory function that defines the detailed types. For example this is the final solution to my problem when I was researching this question:
def frequency_map_factory() -> dict[str, dict[str, int]]:
"""
Provides a recorder of: per X:str, frequency of Y:str occurrences.
"""
return defaultdict(lambda: defaultdict(int))
here is a recursive function to convert a recursive default dict to a normal dict
def defdict_to_dict(defdict, finaldict):
# pass in an empty dict for finaldict
for k, v in defdict.items():
if isinstance(v, defaultdict):
# new level created and that is the new value
finaldict[k] = defdict_to_dict(v, {})
else:
finaldict[k] = v
return finaldict
defdict_to_dict(my_rec_default_dict, {})
#nucklehead's response can be extended to handle arrays in JSON as well:
def nested_dict(existing=None, **kwargs):
if existing is None:
existing = defaultdict()
if isinstance(existing, list):
existing = [nested_dict(val) for val in existing]
if not isinstance(existing, dict):
return existing
existing = {key: nested_dict(val) for key, val in existing.items()}
return defaultdict(nested_dict, existing, **kwargs)
Here's a solution similar to #Stanislav's answer that works with multiprocessing and also allows for termination of the nesting:
from collections import defaultdict
from functools import partial
class NestedDD(defaultdict):
def __init__(self, n, *args, **kwargs):
self.n = n
factory = partial(build_nested_dd, n=n - 1) if n > 1 else int
super().__init__(factory, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
def build_nested_dd(n):
return NestedDD(n)
in pinax Userdict.py:
def __getitem__(self, key):
if key in self.data:
return self.data[key]
if hasattr(self.__class__, "__missing__"):
return self.__class__.__missing__(self, key)
why does it do this on self.__class__.__missing__.
thanks
The UserDict.py presented here emulates built-in dict closely, so for example:
>>> class m(dict):
... def __missing__(self, key): return key + key
...
>>> a=m()
>>> a['ciao']
'ciaociao'
just as you can override the special method __missing__ to deal with missing keys when you subclass the built-in dict, so can you override it when you subclass that UserDict.
The official Python docs for dict are here, and they do say:
New in version 2.5: If a subclass of
dict defines a method __missing__(),
if the key key is not present, the
d[key] operation calls that method
with the key key as argument. The
d[key] operation then returns or
raises whatever is returned or raised
by the __missing__(key) call if the
key is not present. No other
operations or methods invoke
__missing__(). If __missing__() is not defined, KeyError is raised.
__missing__() must be a method; it cannot be an instance variable. For an
example, see collections.defaultdict.
If you want to use default values in a dict (aka __missing__), you can check out defaultdict from collections module:
from collections import defaultdict
a = defaultdict(int)
a[1] # -> 0
a[2] += 1
a # -> defaultdict(int, {1: 0, 2: 1})