Dangers of overriding a dict in Python - python

I came across this question of trying to implement a dictionary using the collections.abc MutableMapping because I was looking for something similar.
For context, I was looking to implement a dictionary that would out of convenience also act as a mutable object so if I write this for example d = CustomDictionary({'a': 4}) then d.a returns 4.
For reference, here is the code posted by Aaron Hall for this particular problem:
from collections.abc import MutableMapping
class D(MutableMapping):
'''
Mapping that works like both a dict and a mutable object, i.e.
d = D(foo='bar')
and
d.foo returns 'bar'
'''
# ``__init__`` method required to create instance from class.
def __init__(self, *args, **kwargs):
'''Use the object dict'''
self.__dict__.update(*args, **kwargs)
# The next five methods are requirements of the ABC.
def __setitem__(self, key, value):
self.__dict__[key] = value
def __getitem__(self, key):
return self.__dict__[key]
def __delitem__(self, key):
del self.__dict__[key]
def __iter__(self):
return iter(self.__dict__)
def __len__(self):
return len(self.__dict__)
# The final two methods aren't required, but nice for demo purposes:
def __str__(self):
'''returns simple dict representation of the mapping'''
return str(self.__dict__)
def __repr__(self):
'''echoes class, id, & reproducible representation in the REPL'''
return '{}, D({})'.format(super(D, self).__repr__(),
self.__dict__)
However I didn't consider the dangers of doing so. Namely, if I created this custom dictionary class then I would expect to have methods. But what if a method name clashes with a key with the same name? For example:
def doSomething(self):
""" A method of CustomDictionary"""
print("hey!")
d = CustomDictionary()
d['a'] = 3
d['doSomething'] = 4
d.doSomething()
would raise a TypeError: 'int' object is not callable since d.doSomething would return 4 which is not a callable function.
What do you think? How would I go about implementing methods for a custom dictionary class while avoiding this problem.
Unfortunately I couldn't comment on the post since I don't have enough reputation but I was hoping this question deserves its own post.

Related

Return a custom value when a class method is accessed as an attribute, but still allow for it to perform a computation when called?

Specifically, I would want MyClass.my_method to be used for lookup of a value in the class dictionary, but MyClass.my_method() to be a method that accepts arguments and performs a computation to update an attribute in MyClass and then returns MyClass with all its attributes (including the updated one).
I am thinking that this might be doable with Python's descriptors (maybe overriding __get__ or __call__), but I can't figure out how this would look. I understand that the behavior might be confusing, but I am interested if it is possible (and if there are any other major caveats).
I have seen that you can do something similar for classes and functions by overriding __repr__, but I can't find a similar way for a method within a class. My returned value will also not always be a string, which seems to prohibit the __repr__-based approaches mentioned in these two questions:
Possible to change a function's repr in python?
How to create a custom string representation for a class object?
Thank you Joel for the minimal implementation. I found that the remaining problem is the lack of initialization of the parent, since I did not find a generic way of initializing it, I need to check for attributes in the case of list/dict, and add the initialization values to the parent accordingly.
This addition to the code should make it work for lists/dicts:
def classFactory(parent, init_val, target):
class modifierClass(parent):
def __init__(self, init_val):
super().__init__()
dict_attr = getattr(parent, "update", None)
list_attr = getattr(parent, "extend", None)
if callable(dict_attr): # parent is dict
self.update(init_val)
elif callable(list_attr): # parent is list
self.extend(init_val)
self.target = target
def __call__(self, *args):
self.target.__init__(*args)
return modifierClass(init_val)
class myClass:
def __init__(self, init_val=''):
self.method = classFactory(init_val.__class__, init_val, self)
Unfortunately, we need to add case by case, but this works as intended.
A slightly less verbose way to write the above is the following:
def classFactory(parent, init_val, target):
class modifierClass(parent):
def __init__(self, init_val):
if isinstance(init_val, list):
self.extend(init_val)
elif isinstance(init_val, dict):
self.update(init_val)
self.target = target
def __call__(self, *args):
self.target.__init__(*args)
return modifierClass(init_val)
class myClass:
def __init__(self, init_val=''):
self.method = classFactory(init_val.__class__, init_val, self)
As jasonharper commented,
MyClass.my_method() works by looking up MyClass.my_method, and then attempting to call that object. So the result of MyClass.my_method cannot be a plain string, int, or other common data type [...]
The trouble comes specifically from reusing the same name for this two properties, which is very confusing just as you said. So, don't do it.
But for the sole interest of it you could try to proxy the value of the property with an object that would return the original MyClass instance when called, use an actual setter to perform any computation you wanted, and also forward arbitrary attributes to the proxied value.
class MyClass:
_my_method = whatever
#property
def my_method(self):
my_class = self
class Proxy:
def __init__(self, value):
self.__proxied = value
def __call__(self, value):
my_class.my_method = value
return my_class
def __getattr__(self, name):
return getattr(self.__proxied, name)
def __str__(self):
return str(self.__proxied)
def __repr__(self):
return repr(self.__proxied)
return Proxy(self._my_method)
#my_method.setter
def my_method(self, value):
# your computations
self._my_method = value
a = MyClass()
b = a.my_method('do not do this at home')
a is b
# True
a.my_method.split(' ')
# ['do', 'not', 'do', 'this', 'at', 'home']
And today, duck typing will abuse you, forcing you to delegate all kinds of magic methods to the proxied value in the proxy class, until the poor codebase where you want to inject this is satisfied with how those values quack.
This is a minimal implementation of Guillherme's answer that updates the method instead of a separate modifiable parameter:
def classFactory(parent, init_val, target):
class modifierClass(parent):
def __init__(self, init_val):
self.target = target
def __call__(self, *args):
self.target.__init__(*args)
return modifierClass(init_val)
class myClass:
def __init__(self, init_val=''):
self.method = classFactory(init_val.__class__, init_val, self)
This and the original answer both work well for single values, but it seems like lists and dictionaries are returned as empty instead of with the expected values and I am not sure why so help is appreciated here:

Lazy-loading variables using overloaded decorators

I have a state object that represents a system. Properties within the state object are populated from [huge] text files. As not every property is accessed every time a state instance, is created, it makes sense to lazily load them.:
class State:
def import_positions(self):
self._positions = {}
# Code which populates self._positions
#property
def positions(self):
try:
return self._positions
except AttributeError:
self.import_positions()
return self._positions
def import_forces(self):
self._forces = {}
# Code which populates self._forces
#property
def forces(self):
try:
return self._forces
except AttributeError:
self.import_forces()
return self._forces
There's a lot of repetitive boilerplate code here. Moreover, sometimes an import_abc can populate a few variables (i.e. import a few variables from a small data file if its already open).
It makes sense to overload #property such that it accepts a function to "provide" that variable, viz:
class State:
def import_positions(self):
self._positions = {}
# Code which populates self._positions
#lazyproperty(import_positions)
def positions(self):
pass
def import_forces(self):
self._forces = {}
# Code which populates self._forces and self._strain
#lazyproperty(import_forces)
def forces(self):
pass
#lazyproperty(import_forces)
def strain(self):
pass
However, I cannot seem to find a way to trace exactly what method are being called in the #property decorator. As such, I don't know how to approach overloading #property into my own #lazyproperty.
Any thoughts?
Maybe you want something like this. It's a sort of simple memoization function combined with #property.
def lazyproperty(func):
values = {}
def wrapper(self):
if not self in values:
values[self] = func(self)
return values[self]
wrapper.__name__ = func.__name__
return property(wrapper)
class State:
#lazyproperty
def positions(self):
print 'loading positions'
return {1, 2, 3}
s = State()
print s.positions
print s.positions
Which prints:
loading positions
set([1, 2, 3])
set([1, 2, 3])
Caveat: entries in the values dictionary won't be garbage collected, so it's not suitable for long-running programs. If the loaded value is immutable across all classes, it can be stored on the function object itself for better speed and memory use:
try:
return func.value
except AttributeError:
func.value = func(self)
return func.value
I think you can remove even more boilerplate by writing a custom descriptor class that decorates the loader method. The idea is to have the descriptor itself encode the lazy-loading logic, meaning that the only thing you define in an actual method is the loader itself (which is the only thing that, apparently, really does have to vary for different values). Here's an example:
class LazyDesc(object):
def __init__(self, func):
self.loader = func
self.secretAttr = '_' + func.__name__
def __get__(self, obj, cls):
try:
return getattr(obj, self.secretAttr)
except AttributeError:
print("Lazily loading", self.secretAttr)
self.loader(obj)
return getattr(obj, self.secretAttr)
class State(object):
#LazyDesc
def positions(self):
self._positions = {'some': 'positions'}
#LazyDesc
def forces(self):
self._forces = {'some': 'forces'}
Then:
>>> x = State()
>>> x.forces
Lazily loading _forces
{'some': 'forces'}
>>> x.forces
{'some': 'forces'}
>>> x.positions
Lazily loading _positions
{'some': 'positions'}
>>> x.positions
{'some': 'positions'}
Notice that the "lazy loading" message was printed only on the first access for each attribute. This version also auto-creates the "secret" attribute to hold the real data by prepending an underscore to the method name (i.e., data for positions is stored in _positions. In this example, there's no setter, so you can't do x.positions = blah (although you can still mutate the positions with x.positions['key'] = val), but the approach could be extended to allow setting as well.
The nice thing about this approach is that your lazy logic is transparently encoded in the descriptor __get__, meaning that it easily generalizes to other kinds of boilerplate that you might want to abstract away in a similar manner.
However, I cannot seem to find a way to trace exactly what method are
being called in the #property decorator.
property is actually a type (whether you use it with the decorator syntax of not is orthogonal), which implements the descriptor protocol (https://docs.python.org/2/howto/descriptor.html). An overly simplified (I skipped the deleter, doc and quite a few other things...) pure-python implementation would look like this:
class property(object):
def __init__(self, fget=None, fset=None):
self.fget = fget
self.fset = fset
def setter(self, func):
self.fset = func
return func
def __get__(self, obj, type=None):
return self.fget(obj)
def __set__(self, obj, value):
if self.fset:
self.fset(obj, value)
else:
raise AttributeError("Attribute is read-only")
Now overloading property is not necessarily the simplest solution. In fact there are actually quite a couple existing implementations out there, including Django's "cached_property" (cf http://ericplumb.com/blog/understanding-djangos-cached_property-decorator.html for more about it) and pydanny's "cached-property" package (https://pypi.python.org/pypi/cached-property/0.1.5)

OO design: an object that can be exported to a "row", while accessing header names, without repeating myself

Sorry, badly worded title. I hope a simple example will make it clear. Here's the easiest way to do what I want to do:
class Lemon(object):
headers = ['ripeness', 'colour', 'juiciness', 'seeds?']
def to_row(self):
return [self.ripeness, self.colour, self.juiciness, self.seeds > 0]
def save_lemons(lemonset):
f = open('lemons.csv', 'w')
out = csv.writer(f)
out.write(Lemon.headers)
for lemon in lemonset:
out.writerow(lemon.to_row())
This works alright for this small example, but I feel like I'm "repeating myself" in the Lemon class. And in the actual code I'm trying to write (where the number of variables I'm exporting is ~50 rather than 4, and where to_row calls a number of private methods that do a bunch of weird calculations), it becomes awkward.
As I write the code to generate a row, I need to constantly refer to the "headers" variable to make sure I'm building my list in the correct order. If I want to change the variables being outputted, I need to make sure to_row and headers are being changed in parallel (exactly the kind of thing that DRY is meant to prevent, right?).
Is there a better way I could design this code? I've been playing with function decorators, but nothing has stuck. Ideally I should still be able to get at the headers without having a particular lemon instance (i.e. it should be a class variable or class method), and I don't want to have a separate method for each variable.
In this case, getattr() is your friend: it allows you to get a variable based on a string name. For example:
def to_row(self):
return [getattr(self, head) for head in self.headers]
EDIT: to properly use the header seeds?, you would need to set the attribute seeds? for the objects. setattr(self, 'seeds?', self.seeds > 0) right above the return statement.
We could use some metaclass shenanegans to do this...
In python 2, attributes are passed to the metaclass in a dict, without
preserving order, we'll also want a base class to work with so we can
distinguish class attributes that should be mapped into the row. In python3, we could dispense with just about all of this base descriptor class.
import itertools
import functools
#functools.total_ordering
class DryDescriptor(object):
_order_gen = itertools.count()
def __init__(self, alias=None):
self.alias = alias
self.order = next(self._order_gen)
def __lt__(self, other):
return self.order < other.order
We will want a python descriptor for every attribute we wish to map into the
row. slots are a nice way to get data descriptors without much work. One
caveat, though, we'll have to manually remove the helper instance to make the
real slot descriptor visible.
class slot(DryDescriptor):
def annotate(self, attr, attrs):
del attrs[attr]
self.attr = attr
slots = attrs.setdefault('__slots__', []).append(attr)
def annotate_class(self, cls):
if self.alias is not None:
setattr(cls, self.alias, getattr(self.attr))
For computed fields, we can memoize results. Memoizing off of the annotated
instance is tricky without a memory leak, we need weakref. alternatively, we
could have arranged for another slot just to store the cached value. This also isn't quite thread safe, but pretty close.
import weakref
class memo(DryDescriptor):
_memo = None
def __call__(self, method):
self.getter = method
return self
def annotate(self, attr, attrs):
if self.alias is not None:
attrs[self.alias] = self
def annotate_class(self, cls): pass
def __get__(self, instance, owner):
if instance is None:
return self
if self._memo is None:
self._memo = weakref.WeakKeyDictionary()
try:
return self._memo[instance]
except KeyError:
return self._memo.setdefault(instance, self.getter(instance))
On the metaclass, all of the descriptors we created above are found, sorted by
creation order, and instructed to annotate the new, created class. This does
not correctly treat derived classes and could use some other conveniences like
an __init__ for all the slots.
class DryMeta(type):
def __new__(mcls, name, bases, attrs):
descriptors = sorted((value, key)
for key, value
in attrs.iteritems()
if isinstance(value, DryDescriptor))
for descriptor, attr in descriptors:
descriptor.annotate(attr, attrs)
cls = type.__new__(mcls, name, bases, attrs)
for descriptor, attr in descriptors:
descriptor.annotate_class(cls)
cls._header_descriptors = [getattr(cls, attr) for descriptor, attr in descriptors]
return cls
Finally, we want a base class to inherit from so that we can have a to_row
method. this just invokes all of the __get__s for all of the respective
descriptors, in order.
class DryBase(object):
__metaclass__ = DryMeta
def to_row(self):
cls = type(self)
return [desc.__get__(self, cls) for desc in cls._header_descriptors]
Assuming all of that is tucked away, out of sight, the definition of a class
that uses this feature is mostly free of repitition. The only short coming is
that to be practical, every field needs a python friendly name, thus we had the
alias key to associate 'seeds?' to has_seeds
class ADryRow(DryBase):
__slots__ = ['seeds']
ripeness = slot()
colour = slot()
juiciness = slot()
#memo(alias='seeds?')
def has_seeds(self):
print "Expensive!!!"
return self.seeds > 0
>>> my_row = ADryRow()
>>> my_row.ripeness = "tart"
>>> my_row.colour = "#8C2"
>>> my_row.juiciness = 0.3479
>>> my_row.seeds = 19
>>>
>>> print my_row.to_row()
Expensive!!!
['tart', '#8C2', 0.3479, True]
>>> print my_row.to_row()
['tart', '#8C2', 0.3479, True]

__str__ and pretty-printing (sub)dictionaries

I have an object that consists primarily of a very large nested dictionary:
class my_object(object):
def __init__(self):
self.the_dict = {} # Big, nested dictionary
I've modified __ str__ to pretty-print the top-level dictionary by simply "printing" the object:
def __str__(self):
pp = pprint.PrettyPrinter()
return pp.pformat(self.the_dict)
My goal here was to make the user's life a bit easier when he/she peruses the object with IPython:
print(the_object) # Pretty-prints entire dict
This works to show the user the entire dictionary, but I would like to expand this functionality to sub-portions of the dictionary as well, allowing the user to get pretty-printed output from commands such as:
print(the_object.the_dict['level1']['level2']['level3'])
(would pretty-print only the 'level3' sub-dict)
Is there a straight-forward way to use __ str__ (or similar) to do this?
You could provide a custom displayhook that prints builtin dictionaries and other objects you choose according to your taste at an interactive prompt:
>>> import sys
>>> oldhook = sys.displayhook
>>> sys.displayhook = your_module.DisplayHook(oldhook)
It doesn't change print obj behavior.
The idea is that your users can choose whether they'd like to use your custom formatting for dicts or not.
When a user says
print(the_object.the_dict['level1']['level2']['level3'])
Python evaluates the_object.the_dict['level1']['level2']['level3'] and (let's say) finds it is a dict, and passes that on to print.
Since the_object.the_dict is a dict, the rest is out of the_object's control. As you burrow down through level1, level2, and level3, only the type of object returned by the_object.the_dict['level1']['level2']['level3'] is going to affect how print behaves. the_object's __str__ method is not going to affect anything beyond the_object itself.
Moreover, when printing nested objects, pprint.pformat uses the repr of the object, not str of the object.
So to get the behave we want, we need the_object.the_dict['level1']['level2']['level3'] to evaluate to something like a dict but with a different __repr__...
You could make a dict-like object (e.g. Turtle) and use Turtles all the way down:
import collections
import pprint
class Turtle(collections.MutableMapping):
def __init__(self,*args,**kwargs):
self._data=dict(*args,**kwargs)
def __getitem__(self,key):
return self._data[key]
def __setitem__(self, key, value):
self._data[key]=value
def __delitem__(self, key):
del self._data[key]
def __iter__(self):
return iter(self._data)
def __len__(self):
return len(self._data)
def __contains__(self, x):
return x in self._data
def __repr__(self):
return pprint.pformat(self._data)
class MyObject(object):
def __init__(self):
self.the_dict=Turtle()
def __repr__(self):
return repr(self.the_dict)
the_object=MyObject()
the_object.the_dict['level1']=Turtle()
the_object.the_dict['level1']['level2']=Turtle()
the_object.the_dict['level1']['level2']['level3']=Turtle({i:i for i in range(20)})
print(the_object)
print(the_object.the_dict['level1']['level2']['level3'])
To use this, you must replace all dicts in your nested dict structure with Turtles.
But really (as you can tell from my fanciful naming), I don't really expect you to use Turtles. Dicts are such nice, optimized builtins, I would not want to add this intermediate object just to effect pretty printing.
If instead you can convince your users to type
from pprint import pprint
then they can just use
pprint(the_object.the_dict['level1']['level2']['level3'])
to get pretty printing.
you can convert the underlying dictionaries to "pretty printing dictionaries" ... perhaps something like this will do:
class my_object( object ):
_pp = pprint.PrettyPrinter()
class PP_dict( dict ):
def __setitem__( self, key, value ):
if isinstance( value, dict ): value = PP_dict( value )
super( my_object.PP_dict, self ).__setitem__( key, value )
def __str__( self ):
return my_object.pp( self )
#property
def the_dict( self ):
return self.__dict__[ 'the_dict' ]
#the_dict.setter
def the_dict( self, value ):
self.__dict__[ 'the_dict' ] = my_object.PP_dict( value )
The property is only because I don't know how you set/manipulate "the_dict".
This approach is limited -- for instance if you put dict-derivatives that are not dicts in the_dict, they will be replaced by PP_dict. Also, if you have other reference to these subdicts, they will no longer be pointing to the same objects.
Another approach would be to put a __getitem__ in my_object directly, that returns a proxy wrapper for the dictionary that pretty prints the current object in __str__, overrides __getitem__ to return proxies for subobjects, and otherwise forwards all acccess/manipulation to the wrapped class.

Namespaces inside class in Python3

I am new to Python and I wonder if there is any way to aggregate methods into 'subspaces'. I mean something similar to this syntax:
smth = Something()
smth.subspace.do_smth()
smth.another_subspace.do_smth_else()
I am writing an API wrapper and I'm going to have a lot of very similar methods (only different URI) so I though it would be good to place them in a few subspaces that refer to the API requests categories. In other words, I want to create namespaces inside a class. I don't know if this is even possible in Python and have know idea what to look for in Google.
I will appreciate any help.
One way to do this is by defining subspace and another_subspace as properties that return objects that provide do_smth and do_smth_else respectively:
class Something:
#property
def subspace(self):
class SubSpaceClass:
def do_smth(other_self):
print('do_smth')
return SubSpaceClass()
#property
def another_subspace(self):
class AnotherSubSpaceClass:
def do_smth_else(other_self):
print('do_smth_else')
return AnotherSubSpaceClass()
Which does what you want:
>>> smth = Something()
>>> smth.subspace.do_smth()
do_smth
>>> smth.another_subspace.do_smth_else()
do_smth_else
Depending on what you intend to use the methods for, you may want to make SubSpaceClass a singleton, but i doubt the performance gain is worth it.
I had this need a couple years ago and came up with this:
class Registry:
"""Namespace within a class."""
def __get__(self, obj, cls=None):
if obj is None:
return self
else:
return InstanceRegistry(self, obj)
def __call__(self, name=None):
def decorator(f):
use_name = name or f.__name__
if hasattr(self, use_name):
raise ValueError("%s is already registered" % use_name)
setattr(self, name or f.__name__, f)
return f
return decorator
class InstanceRegistry:
"""
Helper for accessing a namespace from an instance of the class.
Used internally by :class:`Registry`. Returns a partial that will pass
the instance as the first parameter.
"""
def __init__(self, registry, obj):
self.__registry = registry
self.__obj = obj
def __getattr__(self, attr):
return partial(getattr(self.__registry, attr), self.__obj)
# Usage:
class Something:
subspace = Registry()
another_subspace = Registry()
#MyClass.subspace()
def do_smth(self):
# `self` will be an instance of Something
pass
#MyClass.another_subspace('do_smth_else')
def this_can_be_called_anything_and_take_any_parameter_name(obj, other):
# Call it `obj` or whatever else if `self` outside a class is unsettling
pass
At runtime:
>>> smth = Something()
>>> smth.subspace.do_smth()
>>> smth.another_subspace.do_smth_else('other')
This is compatible with Py2 and Py3. Some performance optimizations are possible in Py3 because __set_name__ tells us what the namespace is called and allows caching the instance registry.

Categories

Resources