Lazy-loading variables using overloaded decorators - python

I have a state object that represents a system. Properties within the state object are populated from [huge] text files. As not every property is accessed every time a state instance, is created, it makes sense to lazily load them.:
class State:
def import_positions(self):
self._positions = {}
# Code which populates self._positions
#property
def positions(self):
try:
return self._positions
except AttributeError:
self.import_positions()
return self._positions
def import_forces(self):
self._forces = {}
# Code which populates self._forces
#property
def forces(self):
try:
return self._forces
except AttributeError:
self.import_forces()
return self._forces
There's a lot of repetitive boilerplate code here. Moreover, sometimes an import_abc can populate a few variables (i.e. import a few variables from a small data file if its already open).
It makes sense to overload #property such that it accepts a function to "provide" that variable, viz:
class State:
def import_positions(self):
self._positions = {}
# Code which populates self._positions
#lazyproperty(import_positions)
def positions(self):
pass
def import_forces(self):
self._forces = {}
# Code which populates self._forces and self._strain
#lazyproperty(import_forces)
def forces(self):
pass
#lazyproperty(import_forces)
def strain(self):
pass
However, I cannot seem to find a way to trace exactly what method are being called in the #property decorator. As such, I don't know how to approach overloading #property into my own #lazyproperty.
Any thoughts?

Maybe you want something like this. It's a sort of simple memoization function combined with #property.
def lazyproperty(func):
values = {}
def wrapper(self):
if not self in values:
values[self] = func(self)
return values[self]
wrapper.__name__ = func.__name__
return property(wrapper)
class State:
#lazyproperty
def positions(self):
print 'loading positions'
return {1, 2, 3}
s = State()
print s.positions
print s.positions
Which prints:
loading positions
set([1, 2, 3])
set([1, 2, 3])
Caveat: entries in the values dictionary won't be garbage collected, so it's not suitable for long-running programs. If the loaded value is immutable across all classes, it can be stored on the function object itself for better speed and memory use:
try:
return func.value
except AttributeError:
func.value = func(self)
return func.value

I think you can remove even more boilerplate by writing a custom descriptor class that decorates the loader method. The idea is to have the descriptor itself encode the lazy-loading logic, meaning that the only thing you define in an actual method is the loader itself (which is the only thing that, apparently, really does have to vary for different values). Here's an example:
class LazyDesc(object):
def __init__(self, func):
self.loader = func
self.secretAttr = '_' + func.__name__
def __get__(self, obj, cls):
try:
return getattr(obj, self.secretAttr)
except AttributeError:
print("Lazily loading", self.secretAttr)
self.loader(obj)
return getattr(obj, self.secretAttr)
class State(object):
#LazyDesc
def positions(self):
self._positions = {'some': 'positions'}
#LazyDesc
def forces(self):
self._forces = {'some': 'forces'}
Then:
>>> x = State()
>>> x.forces
Lazily loading _forces
{'some': 'forces'}
>>> x.forces
{'some': 'forces'}
>>> x.positions
Lazily loading _positions
{'some': 'positions'}
>>> x.positions
{'some': 'positions'}
Notice that the "lazy loading" message was printed only on the first access for each attribute. This version also auto-creates the "secret" attribute to hold the real data by prepending an underscore to the method name (i.e., data for positions is stored in _positions. In this example, there's no setter, so you can't do x.positions = blah (although you can still mutate the positions with x.positions['key'] = val), but the approach could be extended to allow setting as well.
The nice thing about this approach is that your lazy logic is transparently encoded in the descriptor __get__, meaning that it easily generalizes to other kinds of boilerplate that you might want to abstract away in a similar manner.

However, I cannot seem to find a way to trace exactly what method are
being called in the #property decorator.
property is actually a type (whether you use it with the decorator syntax of not is orthogonal), which implements the descriptor protocol (https://docs.python.org/2/howto/descriptor.html). An overly simplified (I skipped the deleter, doc and quite a few other things...) pure-python implementation would look like this:
class property(object):
def __init__(self, fget=None, fset=None):
self.fget = fget
self.fset = fset
def setter(self, func):
self.fset = func
return func
def __get__(self, obj, type=None):
return self.fget(obj)
def __set__(self, obj, value):
if self.fset:
self.fset(obj, value)
else:
raise AttributeError("Attribute is read-only")
Now overloading property is not necessarily the simplest solution. In fact there are actually quite a couple existing implementations out there, including Django's "cached_property" (cf http://ericplumb.com/blog/understanding-djangos-cached_property-decorator.html for more about it) and pydanny's "cached-property" package (https://pypi.python.org/pypi/cached-property/0.1.5)

Related

Replacing the object from one of its methods

I am using python and have an object, that object has a method. I am looking for a simple way, to replace the entire object from within that function.
E.g
class a():
def b(self):
self = other_object
How can you do that?
Thanks
You use a proxy/facade object to hold a reference to the actual object, the self if you wish and that proxy (better term than Facade, but not changing my code now) is what the rest of your codebase sees. However, any attribute/method access is forwarded on to the actual object, which is swappable.
Code below should give you a rough idea. Note that you need to be careful about recursion around __the_instance, which is why I am assigning to __dict__ directly. Bit messy, since it's been a while I've written code that wraps getattr and setattr entirely.
class Facade:
def __init__(self, instance):
self.set_obj(instance)
def set_obj(self, instance):
self.__dict__["__theinstance"] = instance
def __getattr__(self, attrname):
if attrname == "__theinstance":
return self.__dict__["__theinstance"]
return getattr(self.__dict__["__theinstance"], attrname)
def __setattr__(self, attrname, value):
if attrname == "__theinstance":
self.set_obj(value)
return setattr(self.__dict__["__theinstance"], attrname, value)
class Test:
def __init__(self, name, cntr):
self.name = name
self.cntr = cntr
def __repr__(self):
return "%s[%s]" % (self.__class__.__name__, self.__dict__)
obj1 = Test("first object", 1)
obj2 = Test("second", 2)
obj2.message = "greetings"
def pretend_client_code(facade):
print(id(facade), facade.name, facade.cntr, getattr(facade, "value", None))
facade = Facade(obj1)
pretend_client_code(facade)
facade.set_obj(obj2)
pretend_client_code(facade)
facade.value = 3
pretend_client_code(facade)
facade.set_obj(obj1)
pretend_client_code(facade)
output:
4467187104 first object 1 None
4467187104 second 2 None
4467187104 second 2 3
4467187104 first object 1 None
So basically, the "client code" always sees the same facade object, but what it is actually accessing depends on what your equivalent of def b is has done.
Facade has a specific meaning in Design Patterns terminology and it may not be really applicable here, but close enough. Maybe Proxy would have been better.
Note that if you want to change the class on the same object, that is a different thing, done through assigning self.__class__ . For example, say an RPG game with an EnemyClass who gets swapped to DeadEnemyClass once killed: self.__class__ = DeadEnemyClass
You can't directly do that. What you can do is save it as an instance variable.
class A():
def __init__(self, instance=None):
self.instance = val or self
# yes, you can make it a property as well.
def set_val(self, obj):
self.instance = obj
def get_val(self):
return self.instance
It is unlikely that replacing the 'self' variable will accomplish
whatever you're trying to do, that couldn't just be accomplished by
storing the result of func(self) in a different variable. 'self' is
effectively a local variable only defined for the duration of the
method call, used to pass in the instance of the class which is being
operated upon. Replacing self will not actually replace references to
the original instance of the class held by other objects, nor will it
create a lasting reference to the new instance which was assigned to
it.
Original source: Is it safe to replace a self object by another object of the same type in a method?

OO design: an object that can be exported to a "row", while accessing header names, without repeating myself

Sorry, badly worded title. I hope a simple example will make it clear. Here's the easiest way to do what I want to do:
class Lemon(object):
headers = ['ripeness', 'colour', 'juiciness', 'seeds?']
def to_row(self):
return [self.ripeness, self.colour, self.juiciness, self.seeds > 0]
def save_lemons(lemonset):
f = open('lemons.csv', 'w')
out = csv.writer(f)
out.write(Lemon.headers)
for lemon in lemonset:
out.writerow(lemon.to_row())
This works alright for this small example, but I feel like I'm "repeating myself" in the Lemon class. And in the actual code I'm trying to write (where the number of variables I'm exporting is ~50 rather than 4, and where to_row calls a number of private methods that do a bunch of weird calculations), it becomes awkward.
As I write the code to generate a row, I need to constantly refer to the "headers" variable to make sure I'm building my list in the correct order. If I want to change the variables being outputted, I need to make sure to_row and headers are being changed in parallel (exactly the kind of thing that DRY is meant to prevent, right?).
Is there a better way I could design this code? I've been playing with function decorators, but nothing has stuck. Ideally I should still be able to get at the headers without having a particular lemon instance (i.e. it should be a class variable or class method), and I don't want to have a separate method for each variable.
In this case, getattr() is your friend: it allows you to get a variable based on a string name. For example:
def to_row(self):
return [getattr(self, head) for head in self.headers]
EDIT: to properly use the header seeds?, you would need to set the attribute seeds? for the objects. setattr(self, 'seeds?', self.seeds > 0) right above the return statement.
We could use some metaclass shenanegans to do this...
In python 2, attributes are passed to the metaclass in a dict, without
preserving order, we'll also want a base class to work with so we can
distinguish class attributes that should be mapped into the row. In python3, we could dispense with just about all of this base descriptor class.
import itertools
import functools
#functools.total_ordering
class DryDescriptor(object):
_order_gen = itertools.count()
def __init__(self, alias=None):
self.alias = alias
self.order = next(self._order_gen)
def __lt__(self, other):
return self.order < other.order
We will want a python descriptor for every attribute we wish to map into the
row. slots are a nice way to get data descriptors without much work. One
caveat, though, we'll have to manually remove the helper instance to make the
real slot descriptor visible.
class slot(DryDescriptor):
def annotate(self, attr, attrs):
del attrs[attr]
self.attr = attr
slots = attrs.setdefault('__slots__', []).append(attr)
def annotate_class(self, cls):
if self.alias is not None:
setattr(cls, self.alias, getattr(self.attr))
For computed fields, we can memoize results. Memoizing off of the annotated
instance is tricky without a memory leak, we need weakref. alternatively, we
could have arranged for another slot just to store the cached value. This also isn't quite thread safe, but pretty close.
import weakref
class memo(DryDescriptor):
_memo = None
def __call__(self, method):
self.getter = method
return self
def annotate(self, attr, attrs):
if self.alias is not None:
attrs[self.alias] = self
def annotate_class(self, cls): pass
def __get__(self, instance, owner):
if instance is None:
return self
if self._memo is None:
self._memo = weakref.WeakKeyDictionary()
try:
return self._memo[instance]
except KeyError:
return self._memo.setdefault(instance, self.getter(instance))
On the metaclass, all of the descriptors we created above are found, sorted by
creation order, and instructed to annotate the new, created class. This does
not correctly treat derived classes and could use some other conveniences like
an __init__ for all the slots.
class DryMeta(type):
def __new__(mcls, name, bases, attrs):
descriptors = sorted((value, key)
for key, value
in attrs.iteritems()
if isinstance(value, DryDescriptor))
for descriptor, attr in descriptors:
descriptor.annotate(attr, attrs)
cls = type.__new__(mcls, name, bases, attrs)
for descriptor, attr in descriptors:
descriptor.annotate_class(cls)
cls._header_descriptors = [getattr(cls, attr) for descriptor, attr in descriptors]
return cls
Finally, we want a base class to inherit from so that we can have a to_row
method. this just invokes all of the __get__s for all of the respective
descriptors, in order.
class DryBase(object):
__metaclass__ = DryMeta
def to_row(self):
cls = type(self)
return [desc.__get__(self, cls) for desc in cls._header_descriptors]
Assuming all of that is tucked away, out of sight, the definition of a class
that uses this feature is mostly free of repitition. The only short coming is
that to be practical, every field needs a python friendly name, thus we had the
alias key to associate 'seeds?' to has_seeds
class ADryRow(DryBase):
__slots__ = ['seeds']
ripeness = slot()
colour = slot()
juiciness = slot()
#memo(alias='seeds?')
def has_seeds(self):
print "Expensive!!!"
return self.seeds > 0
>>> my_row = ADryRow()
>>> my_row.ripeness = "tart"
>>> my_row.colour = "#8C2"
>>> my_row.juiciness = 0.3479
>>> my_row.seeds = 19
>>>
>>> print my_row.to_row()
Expensive!!!
['tart', '#8C2', 0.3479, True]
>>> print my_row.to_row()
['tart', '#8C2', 0.3479, True]

Python "callable" attribute (pseudo-property)

In python, I can alter the state of an instance by directly assigning to attributes, or by making method calls which alter the state of the attributes:
foo.thing = 'baz'
or:
foo.thing('baz')
Is there a nice way to create a class which would accept both of the above forms which scales to large numbers of attributes that behave this way? (Shortly, I'll show an example of an implementation that I don't particularly like.) If you're thinking that this is a stupid API, let me know, but perhaps a more concrete example is in order. Say I have a Document class. Document could have an attribute title. However, title may want to have some state as well (font,fontsize,justification,...), but the average user might be happy enough just setting the title to a string and being done with it ...
One way to accomplish this would be to:
class Title(object):
def __init__(self,text,font='times',size=12):
self.text = text
self.font = font
self.size = size
def __call__(self,*text,**kwargs):
if(text):
self.text = text[0]
for k,v in kwargs.items():
setattr(self,k,v)
def __str__(self):
return '<title font={font}, size={size}>{text}</title>'.format(text=self.text,size=self.size,font=self.font)
class Document(object):
_special_attr = set(['title'])
def __setattr__(self,k,v):
if k in self._special_attr and hasattr(self,k):
getattr(self,k)(v)
else:
object.__setattr__(self,k,v)
def __init__(self,text="",title=""):
self.title = Title(title)
self.text = text
def __str__(self):
return str(self.title)+'<body>'+self.text+'</body>'
Now I can use this as follows:
doc = Document()
doc.title = "Hello World"
print (str(doc))
doc.title("Goodbye World",font="Helvetica")
print (str(doc))
This implementation seems a little messy though (with __special_attr). Maybe that's because this is a messed up API. I'm not sure. Is there a better way to do this? Or did I leave the beaten path a little too far on this one?
I realize I could use #property for this as well, but that wouldn't scale well at all if I had more than just one attribute which is to behave this way -- I'd need to write a getter and setter for each, yuck.
It is a bit harder than the previous answers assume.
Any value stored in the descriptor will be shared between all instances, so it is not the right place to store per-instance data.
Also, obj.attrib(...) is performed in two steps:
tmp = obj.attrib
tmp(...)
Python doesn't know in advance that the second step will follow, so you always have to return something that is callable and has a reference to its parent object.
In the following example that reference is implied in the set argument:
class CallableString(str):
def __new__(class_, set, value):
inst = str.__new__(class_, value)
inst._set = set
return inst
def __call__(self, value):
self._set(value)
class A(object):
def __init__(self):
self._attrib = "foo"
def get_attrib(self):
return CallableString(self.set_attrib, self._attrib)
def set_attrib(self, value):
try:
value = value._value
except AttributeError:
pass
self._attrib = value
attrib = property(get_attrib, set_attrib)
a = A()
print a.attrib
a.attrib = "bar"
print a.attrib
a.attrib("baz")
print a.attrib
In short: what you want cannot be done transparently. You'll write better Python code if you don't insist hacking around this limitation
You can avoid having to use #property on potentially hundreds of attributes by simply creating a descriptor class that follows the appropriate rules:
# Warning: Untested code ahead
class DocAttribute(object):
tag_str = "<{tag}{attrs}>{text}</{tag}>"
def __init__(self, tag_name, default_attrs=None):
self._tag_name = tag_name
self._attrs = default_attrs if default_attrs is not None else {}
def __call__(self, *text, **attrs):
self._text = "".join(text)
self._attrs.update(attrs)
return self
def __get__(self, instance, cls):
return self
def __set__(self, instance, value):
self._text = value
def __str__(self):
# Attrs left as an exercise for the reader
return self.tag_str.format(tag=self._tag_name, text=self._text)
Then you can use Document's __setattr__ method to add a descriptor based on this class if it is in a white list of approved names (or not in a black list of forbidden ones, depending on your domain):
class Document(object):
# prelude
def __setattr__(self, name, value):
if self.is_allowed(name): # Again, left as an exercise for the reader
object.__setattr__(self, name, DocAttribute(name)(value))

Create per-instance property descriptor?

Usually Python descriptor are defined as class attributes. But in my case, I want every object instance to have different set descriptors that depends on the input. For example:
class MyClass(object):
def __init__(self, **kwargs):
for attr, val in kwargs.items():
self.__dict__[attr] = MyDescriptor(val)
Each object are have different set of attributes that are decided at instantiation time. Since these are one-off objects, it is not convenient to first subclass them.
tv = MyClass(type="tv", size="30")
smartphone = MyClass(type="phone", os="android")
tv.size # do something smart with the descriptor
Assign Descriptor to the object does not seem to work. If I try to access the attribute, I got something like
<property at 0x4067cf0>
Do you know why is this not working? Is there any work around?
This is not working because you have to assign the descriptor to the class of the object.
class Descriptor:
def __get__(...):
# this is called when the value is got
def __set__(...
def __del__(...
if you write
obj.attr
=> type(obj).__getattribute__(obj, 'attr') is called
=> obj.__dict__['attr'] is returned if there else:
=> type(obj).__dict__['attr'] is looked up
if this contains a descriptor object then this is used.
so it does not work because the type dictionairy is looked up for descriptors and not the object dictionairy.
there are possible work arounds:
put the descriptor into the class and make it use e.g. obj.xxxattr to store the value.
If there is only one descriptor behaviour this works.
overwrite setattr and getattr and delattr to respond to discriptors.
put a discriptor into the class that responds to descriptors stored in the object dictionairy.
You are using descriptors in the wrong way.
Descriptors don't make sense on an instance level. After all the __get__/__set__
methods give you access to the instance of the class.
Without knowing what exactly you want to do, I'd suggest you put the per-instance
logic inside the __set__ method, by checking who is the "caller/instance" and act accordingly.
Otherwise tell us what you are trying to achieve, so that we can propose alternative solutions.
I dynamically create instances by execing a made-up class. This may suit your use case.
def make_myclass(**kwargs):
class MyDescriptor(object):
def __init__(self, val):
self.val = val
def __get__(self, obj, cls):
return self.val
def __set__(self, obj, val):
self.val = val
cls = 'class MyClass(object):\n{}'.format('\n'.join(' {0} = MyDescriptor({0})'.format(k) for k in kwargs))
#check if names in kwargs collide with local names
for key in kwargs:
if key in locals():
raise Exception('name "{}" collides with local name'.format(key))
kwargs.update(locals())
exec(cls, kwargs, locals())
return MyClass()
Test;
In [577]: tv = make_myclass(type="tv", size="30")
In [578]: tv.type
Out[578]: 'tv'
In [579]: tv.size
Out[579]: '30'
In [580]: tv.__dict__
Out[580]: {}
But the instances are of different class.
In [581]: phone = make_myclass(type='phone')
In [582]: phone.type
Out[582]: 'phone'
In [583]: tv.type
Out[583]: 'tv'
In [584]: isinstance(tv,type(phone))
Out[584]: False
In [585]: isinstance(phone,type(tv))
Out[585]: False
In [586]: type(tv)
Out[586]: MyClass
In [587]: type(phone)
Out[587]: MyClass
In [588]: type(phone) is type(tv)
Out[588]: False
This looks like a use-case for named tuples
The reason it is not working is because Python only checks for descriptors when looking up attributes on the class, not on the instance; the methods in question are:
__getattribute__
__setattr__
__delattr__
It is possible to override those methods on your class in order to implement the descriptor protocol on instances as well as classes:
# do not use in production, example code only, needs more checks
class ClassAllowingInstanceDescriptors(object):
def __delattr__(self, name):
res = self.__dict__.get(name)
for method in ('__get__', '__set__', '__delete__'):
if hasattr(res, method):
# we have a descriptor, use it
res = res.__delete__(name)
break
else:
res = object.__delattr__(self, name)
return res
def __getattribute__(self, *args):
res = object.__getattribute__(self, *args)
for method in ('__get__', '__set__', '__delete__'):
if hasattr(res, method):
# we have a descriptor, call it
res = res.__get__(self, self.__class__)
return res
def __setattr__(self, name, val):
# check if object already exists
res = self.__dict__.get(name)
for method in ('__get__', '__set__', '__delete__'):
if hasattr(res, method):
# we have a descriptor, use it
res = res.__set__(self, val)
break
else:
res = object.__setattr__(self, name, val)
return res
#property
def world(self):
return 'hello!'
When the above class is used as below:
huh = ClassAllowingInstanceDescriptors()
print(huh.world)
huh.uni = 'BIG'
print(huh.uni)
huh.huh = property(lambda *a: 'really?')
print(huh.huh)
print('*' * 50)
try:
del huh.world
except Exception, e:
print(e)
print(huh.world)
print('*' * 50)
try:
del huh.huh
except Exception, e:
print(e)
print(huh.huh)
The results are:
hello!
BIG
really?
can't delete attribute
hello!
can't delete attribute
really?

Namespaces inside class in Python3

I am new to Python and I wonder if there is any way to aggregate methods into 'subspaces'. I mean something similar to this syntax:
smth = Something()
smth.subspace.do_smth()
smth.another_subspace.do_smth_else()
I am writing an API wrapper and I'm going to have a lot of very similar methods (only different URI) so I though it would be good to place them in a few subspaces that refer to the API requests categories. In other words, I want to create namespaces inside a class. I don't know if this is even possible in Python and have know idea what to look for in Google.
I will appreciate any help.
One way to do this is by defining subspace and another_subspace as properties that return objects that provide do_smth and do_smth_else respectively:
class Something:
#property
def subspace(self):
class SubSpaceClass:
def do_smth(other_self):
print('do_smth')
return SubSpaceClass()
#property
def another_subspace(self):
class AnotherSubSpaceClass:
def do_smth_else(other_self):
print('do_smth_else')
return AnotherSubSpaceClass()
Which does what you want:
>>> smth = Something()
>>> smth.subspace.do_smth()
do_smth
>>> smth.another_subspace.do_smth_else()
do_smth_else
Depending on what you intend to use the methods for, you may want to make SubSpaceClass a singleton, but i doubt the performance gain is worth it.
I had this need a couple years ago and came up with this:
class Registry:
"""Namespace within a class."""
def __get__(self, obj, cls=None):
if obj is None:
return self
else:
return InstanceRegistry(self, obj)
def __call__(self, name=None):
def decorator(f):
use_name = name or f.__name__
if hasattr(self, use_name):
raise ValueError("%s is already registered" % use_name)
setattr(self, name or f.__name__, f)
return f
return decorator
class InstanceRegistry:
"""
Helper for accessing a namespace from an instance of the class.
Used internally by :class:`Registry`. Returns a partial that will pass
the instance as the first parameter.
"""
def __init__(self, registry, obj):
self.__registry = registry
self.__obj = obj
def __getattr__(self, attr):
return partial(getattr(self.__registry, attr), self.__obj)
# Usage:
class Something:
subspace = Registry()
another_subspace = Registry()
#MyClass.subspace()
def do_smth(self):
# `self` will be an instance of Something
pass
#MyClass.another_subspace('do_smth_else')
def this_can_be_called_anything_and_take_any_parameter_name(obj, other):
# Call it `obj` or whatever else if `self` outside a class is unsettling
pass
At runtime:
>>> smth = Something()
>>> smth.subspace.do_smth()
>>> smth.another_subspace.do_smth_else('other')
This is compatible with Py2 and Py3. Some performance optimizations are possible in Py3 because __set_name__ tells us what the namespace is called and allows caching the instance registry.

Categories

Resources