Making a LazilyEvaluatedConstantProperty class in Python - python

There's a little thing I want to do in Python, similar to the built-in property, that I'm not sure how to do.
I call this class LazilyEvaluatedConstantProperty. It is intended for properties that should be calculated only once and do not change, but they should be created lazily rather than on object creation, for performance.
Here's the usage:
class MyObject(object):
# ... Regular definitions here
def _get_personality(self):
# Time consuming process that creates a personality for this object.
print('Calculating personality...')
time.sleep(5)
return 'Nice person'
personality = LazilyEvaluatedConstantProperty(_get_personality)
You can see that the usage is similar to property, except there's only a getter, and no setter or deleter.
The intention is that on the first access to my_object.personality, the _get_personality method will be called, and then the result will be cached and _get_personality will never be called again for this object.
What is my problem with implementing this? I want to do something a bit tricky to improve performance: I want that after the first access and _get_personality call, personality will become a data attribute of the object, so lookup will be faster on subsequent calls. But I don't know how it's possible since I don't have a reference to the object.
Does anyone have an idea?

I implemented it:
class CachedProperty(object):
'''
A property that is calculated (a) lazily and (b) only once for an object.
Usage:
class MyObject(object):
# ... Regular definitions here
def _get_personality(self):
print('Calculating personality...')
time.sleep(5) # Time consuming process that creates personality
return 'Nice person'
personality = CachedProperty(_get_personality)
'''
def __init__(self, getter, name=None):
'''
Construct the cached property.
You may optionally pass in the name that this property has in the
class; This will save a bit of processing later.
'''
self.getter = getter
self.our_name = name
def __get__(self, obj, our_type=None):
if obj is None:
# We're being accessed from the class itself, not from an object
return self
value = self.getter(obj)
if not self.our_name:
if not our_type:
our_type = type(obj)
(self.our_name,) = (key for (key, value) in
vars(our_type).iteritems()
if value is self)
setattr(obj, self.our_name, value)
return value
For the future, the maintained implementation could probably be found here:
https://github.com/cool-RR/GarlicSim/blob/master/garlicsim/garlicsim/general_misc/caching/cached_property.py

Related

Don't understand the default implementation of Non-Overriding Descriptors in Python

I don't understand the design decision to render non-overriding descriptors ineffective when an instance attribute exists, e.g.
>>> class Descriptor:
... def __get__(self, obj, objtype=None):
... return 4
...
>>> class Class:
... attr = Descriptor()
... def __init__(self):
... self.attr = 'instance attr'
...
>>> Class().attr # why doesn't this return 4?
'instance attr'
To me, overriding descriptors make sense in that if we have a descriptor with __set__, then that __set__ pretty much always gets used for something like obj.attr = <new value>.
Why aren't non-overriding descriptors this simple in the language, i.e. why isn't __get__ pretty much always used when attributes are accessed, e.g. obj.attr?
This is how I get benefit from this:
class Lazy():
def __init__(self, function):
self.name = function.__name__
self.function = function
def __get__(self, obj, type=None):
print("get value by heavier __get__")
obj.__dict__[self.name] = self.function(obj)
return obj.__dict__[self.name]
class C:
#Lazy
def attr(self):
# doing heavy calculation
return "heavy calculation result"
>>> c = C()
>>> c.attr
get value by heavier __get__
heavy calculation result
>>> c.attr # get value by __dict__
heavy calculation result
In object's attribute lookup logic, __dict__ has higher priority than non-data descriptor(maybe non-overriding descriptor in your question). In your first time trying to access attribute, since attribute doesn't exist in __dict__, attribute could be calculated in __get__ and cached it in traditional location(__dict__). By lower __get__ priority, later lookup will get returned by __dict__ directly.
__get__ host complicated, heavy lookup logic inside. So, if you have two approaches to get identical value, the heavier one should have lower priority.
TLDR: Instance variable definition i.e. self.attr = 'instance attr' overrides class variable definition i.e. attr = Descriptor() whenever there's a name clash. Either change the names, or use the classname to access class variables like so
Class.attr # instead of Class().attr
Long Answer
There's class variables and instance variables. Watch this video for a quick understanding. Class variables can be accessed using the classname or using an instance (if it hasn't yet been overridden), but instance variables can only be accessed using instances.
In your example, the following line sets up a class variable attr, which if accessed will return 4
attr = Descriptor() # returns 4 because of __get__() definition
However with this next line inside the constructor of Class i.e. __init__(), you set up an instance variable with the same name attr, thus overriding the class variable.
self.attr = 'instance attr'
Whenever you access the properties of an instance, first the instance variables are checked and only if it's not found then the class variables are checked. Class() creates an instance, and so Class().attr will search for attr as an instance variable and since it's defined, the value of 'instance attr' will be returned. If you hadn't defined it in __init__(), then it wouldn't be found, and so as the next step attr will be searched for as a class variable, and since that would be defined with the value 4, it will be returned.

How to replace/bypass a class property?

I would like to have a class with an attribute attr that, when accessed for the first time, runs a function and returns a value, and then becomes this value (its type changes, etc.).
A similar behavior can be obtained with:
class MyClass(object):
#property
def attr(self):
try:
return self._cached_result
except AttributeError:
result = ...
self._cached_result = result
return result
obj = MyClass()
print obj.attr # First calculation
print obj.attr # Cached result is used
However, .attr does not become the initial result, when doing this. It would be more efficient if it did.
A difficulty is that after obj.attr is set to a property, it cannot be set easily to something else, because infinite loops appear naturally. Thus, in the code above, the obj.attr property has no setter so it cannot be directly modified. If a setter is defined, then replacing obj.attr in this setter creates an infinite loop (the setter is accessed from within the setter). I also thought of first deleting the setter so as to be able to do a regular self.attr = …, with del self.attr, but this calls the property deleter (if any), which recreates the infinite loop problem (modifications of self.attr anywhere generally tend to go through the property rules).
So, is there a way to bypass the property mechanism and replace the bound property obj.attr by anything, from within MyClass.attr.__getter__?
This looks a bit like premature optimization : you want to skip a method call by making a descriptor change itself.
It's perfectly possible, but it would have to be justified.
To modify the descriptor from your property, you'd have to be editing your class, which is probably not what you want.
I think a better way to implement this would be to :
do not define obj.attr
override __getattr__, if argument is "attr", obj.attr = new_value, otherwise raise AttributeError
As soon as obj.attr is set, __getattr__ will not be called any more, as it is only called when the attribute does not exist. (__getattribute__ is the one that would get called all the time.)
The main difference with your initial proposal is that the first attribute access is slower, because of the method call overhead of __getattr__, but then it will be as fact as a regular __dict__ lookup.
Example :
class MyClass(object):
def __getattr__(self, name):
if name == 'attr':
self.attr = ...
return self.attr
raise AttributeError(name)
obj = MyClass()
print obj.attr # First calculation
print obj.attr # Cached result is used
EDIT : Please see the other answer, especially if you use Python 3.6 or more.
For new-style classes, which utilize the descriptor protocol, you could do this by creating your own custom descriptor class whose __get__() method will be called at most one time. When that happens, the result is then cached by creating an instance attribute with the same name the class method has.
Here's what I mean.
from __future__ import print_function
class cached_property(object):
"""Descriptor class for making class methods lazily-evaluated and caches the result."""
def __init__(self, func):
self.func = func
def __get__(self, inst, cls):
if inst is None:
return self
else:
value = self.func(inst)
setattr(inst, self.func.__name__, value)
return value
class MyClass(object):
#cached_property
def attr(self):
print('doing long calculation...', end='')
result = 42
return result
obj = MyClass()
print(obj.attr) # -> doing long calculation...42
print(obj.attr) # -> 42

Python: deletion of self referencing object

I want to ask how to delete an object with a self-reference in Python.
Let's think a class, which is a simple example to know when it is created and when it is deleted:
#!/usr/bin/python
class TTest:
def __init__(self):
self.sub_func= None
print 'Created',self
def __del__(self):
self.sub_func= None
print 'Deleted',self
def Print(self):
print 'Print',self
This class has a variable self.sub_func to which we assume to assign a function. I want to assign a function using an instance of TTest to self.sub_func. See the following case:
def SubFunc1(t):
t.Print()
def DefineObj1():
t= TTest()
t.sub_func= lambda: SubFunc1(t)
return t
t= DefineObj1()
t.sub_func()
del t
The result is:
Created <__main__.TTest instance at 0x7ffbabceee60>
Print <__main__.TTest instance at 0x7ffbabceee60>
that is to say, though we executed "del t", t was not deleted.
I guess the reason is that t.sub_func is a self-referencing object, so reference counter of t does not become zero at "del t", thus t is not deleted by the garbage collector.
To solve this problem, I need to insert
t.sub_func= None
before "del t"; in this time, the output is:
Created <__main__.TTest instance at 0x7fab9ece2e60>
Print <__main__.TTest instance at 0x7fab9ece2e60>
Deleted <__main__.TTest instance at 0x7fab9ece2e60>
But this is strange. t.sub_func is part of t, so I do not want to care about clearing t.sub_func when deleting t.
Could you tell me if you know a good solution?
How to makes sure an object in a reference cycle gets deleted when it is no longer reachable? The simplest solution is not to define a __del__ method. Very few, if any, classes need a __del__ method. Python makes no guarantees about when or even if a __del__ method will get called.
There are several ways you can alleviate this problem.
Use a function rather than a lambda that contains and checks a weak reference. Requires explicit checking that the object is still alive each time the function is called.
Create a unique class for each object so that we can store the function on a class rather than as a monkey-patched function. This could get memory heavy.
Define a property that knows how to get the given function and turn it into a method. My personal favourite as it closely approximates how bound methods are created from a class'es unbound methods.
Using weak references
import weakref
class TTest:
def __init__(self):
self.func = None
print 'Created', self
def __del__(self):
print 'Deleted', self
def print_self(self):
print 'Print',self
def print_func(t):
t.print_self()
def create_ttest():
t = TTest()
weak_t = weakref.ref(t)
def func():
t1 = weak_t()
if t1 is None:
raise TypeError("TTest object no longer exists")
print_func(t1)
t.func = func
return t
if __name__ == "__main__":
t = create_ttest()
t.func()
del t
Creating a unique class
class TTest:
def __init__(self):
print 'Created', self
def __del__(self):
print 'Deleted', self
def print_self(self):
print 'Print',self
def print_func(t):
t.print_self()
def create_ttest():
class SubTTest(TTest):
def func(self):
print_func(self)
SubTTest.func1 = print_func
# The above also works. First argument is instantiated as the object the
# function was called on.
return SubTTest()
if __name__ == "__main__":
t = create_ttest()
t.func()
t.func1()
del t
Using properties
import types
class TTest:
def __init__(self, func):
self._func = func
print 'Created', self
def __del__(self):
print 'Deleted', self
def print_self(self):
print 'Print',self
#property
def func(self):
return types.MethodType(self._func, self)
def print_func(t):
t.print_self()
def create_ttest():
def func(self):
print_func(self)
t = TTest(func)
return t
if __name__ == "__main__":
t = create_ttest()
t.func()
del t
From the official CPython docs:
Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn’t collect such cycles automatically because, in general, it isn’t possible for Python to guess a safe order in which to run the __del__() methods. If you know a safe order, you can force the issue by examining the garbage list, and explicitly breaking cycles due to your objects within the list. Note that these objects are kept alive even so by virtue of being in the garbage list, so they should be removed from garbage too. For example, after breaking cycles, do del gc.garbage[:] to empty the list. It’s generally better to avoid the issue by not creating cycles containing objects with __del__() methods, and garbage can be examined in that case to verify that no such cycles are being created.
See also: http://engineering.hearsaysocial.com/2013/06/16/circular-references-in-python/

Set object attributes to every variable used in a method in python

How can I set (almost) all local variables in an object's method to be attributes of that object?
class Obj(object):
def do_something(self):
localstr = 'hello world'
localnum = 1
#TODO store vars in the object for easier inspection
x = Obj()
x.do_something()
print x.localstr, x.localnum
Inspired by Python update object from dictionary, I came up with the following:
class Obj(object):
def do_something(self):
localstr = 'hello world'
localnum = 1
# store vars in the object for easier inspection
l = locals().copy()
del l['self']
for key,value in l.iteritems():
setattr(self, key, value)
x = Obj()
x.do_something()
print x.localstr, x.localnum
There is already a python debugger that let you inspect local variables, so there is no point in polluting the objects with random instance attributes.
Also your approach does not work if more than one method use the same local variable names, since it would be possible that a method overwrites some of the instance attributes, leaving the state of the object in an ambiguous state.
Also your solution goes against the DRY principle, since you must add the code before every return.
An other disadvantage is that often you want to know the state of the local variables in more than one place during method execution, and this is not possible with your answer.
If you really want to save the local variables manually, then something like this is probably much better than your solution:
import inspect
from collections import defaultdict
class LogLocals(object):
NO_BREAK_POINT = object()
def __init__(self):
self.__locals = defaultdict(defaultdict(list))
def register_locals(self, local_vars, method_name=None,
break_point=NO_BREAK_POINT):
if method_name is None:
method_name = inspect.currentframe(1).f_code.co_name
self.__locals[method_name][break_point].append(local_vars)
def reset_locals(self, method_name=None, break_point=NO_BREAK_POINT,
all_=False):
if method_name is None:
method_name = inspect.currentframe(1).f_code.co_name
if all_:
del self.__locals[method_name]
else:
del self.__locals[method_name][point]
def get_locals(self, method_name, break_point=NO_BREAK_POINT):
return self.__locals[method_name][break_point]
You simply have inherit from it and call register_locals(locals()) when you want to save the state. It also allow to distinguish between "break points" and most importantly it does not pollute the instances.
Also it distinguish between different calls returning a list of states instead of the last state.
If you want to access the locals of some call via attributes you can simply do something like:
class SimpleNamespace(object): # python3.3 already provides this
def __init__(self, attrs):
self.__dict__.update(attrs)
the_locals = x.get_locals('method_1')[-1] # take only last call locals
x = SimpleNamespace(the_locals)
x.some_local_variable
Anyway, I believe there is no much use for this. You ought to use the python debugger.

Dynamically adding #property in python

I know that I can dynamically add an instance method to an object by doing something like:
import types
def my_method(self):
# logic of method
# ...
# instance is some instance of some class
instance.my_method = types.MethodType(my_method, instance)
Later on I can call instance.my_method() and self will be bound correctly and everything works.
Now, my question: how to do the exact same thing to obtain the behavior that decorating the new method with #property would give?
I would guess something like:
instance.my_method = types.MethodType(my_method, instance)
instance.my_method = property(instance.my_method)
But, doing that instance.my_method returns a property object.
The property descriptor objects needs to live in the class, not in the instance, to have the effect you desire. If you don't want to alter the existing class in order to avoid altering the behavior of other instances, you'll need to make a "per-instance class", e.g.:
def addprop(inst, name, method):
cls = type(inst)
if not hasattr(cls, '__perinstance'):
cls = type(cls.__name__, (cls,), {})
cls.__perinstance = True
inst.__class__ = cls
setattr(cls, name, property(method))
I'm marking these special "per-instance" classes with an attribute to avoid needlessly making multiple ones if you're doing several addprop calls on the same instance.
Note that, like for other uses of property, you need the class in play to be new-style (typically obtained by inheriting directly or indirectly from object), not the ancient legacy style (dropped in Python 3) that's assigned by default to a class without bases.
Since this question isn't asking about only adding to a spesific instance,
the following method can be used to add a property to the class, this will expose the properties to all instances of the class YMMV.
cls = type(my_instance)
cls.my_prop = property(lambda self: "hello world")
print(my_instance.my_prop)
# >>> hello world
Note: Adding another answer because I think #Alex Martelli, while correct, is achieving the desired result by creating a new class that holds the property, this answer is intended to be more direct/straightforward without abstracting whats going on into its own method.

Categories

Resources