How do I memoize expensive calculations on Django model objects?

How do I memoize expensive calculations on Django model objects? - python

I have several TextField columns on my UserProfile object which contain JSON objects. I've also defined a setter/getter property for each column which encapsulates the logic for serializing and deserializing the JSON into python datastructures.
The nature of this data ensures that it will be accessed many times by view and template logic within a single Request. To save on deserialization costs, I would like to memoize the python datastructures on read, invalidating on direct write to the property or save signal from the model object.
Where/How do I store the memo? I'm nervous about using instance variables, as I don't understand the magic behind how any particular UserProfile is instantiated by a query. Is __init__ safe to use, or do I need to check the existence of the memo attribute via hasattr() at each read?
Here's an example of my current implementation:
class UserProfile(Model):
text_json = models.TextField(default=text_defaults)
#property
def text(self):
if not hasattr(self, "text_memo"):
self.text_memo = None
self.text_memo = self.text_memo or simplejson.loads(self.text_json)
return self.text_memo
#text.setter
def text(self, value=None):
self.text_memo = None
self.text_json = simplejson.dumps(value)

You may be interested in a built-in django decorator django.utils.functional.memoize.
Django uses this to cache expensive operation like url resolving.

Generally, I use a pattern like this:
def get_expensive_operation(self):
if not hasattr(self, '_expensive_operation'):
self._expensive_operation = self.expensive_operation()
return self._expensive_operation
Then you use the get_expensive_operation method to access the data.
However, in your particular case, I think you are approaching this in slightly the wrong way. You need to do the deserialization when the model is first loaded from the database, and serialize on save only. Then you can simply access the attributes as a standard Python dictionary each time. You can do this by defining a custom JSONField type, subclassing models.TextField, which overrides to_python and get_db_prep_save.
In fact someone's already done it: see here.

For class methods, you should use django.utils.functional.cached_property.
Since the first argument on a class method is self, memoize will maintain a reference to the object and the results of the function even after you've thrown it away. This can cause memory leaks by preventing the garbage collector from cleaning up the stale object. cached_property turns Daniel's suggestion into a decorator.

Related

Is there anything wrong with referencing an instance variable without implementing a getter?

In a nutshell, I receive json events via an API and recently I've been learning a lot more about classes. One of the recommended ways to use classes is to implement getters, setters etc.. However, my classes aren't too sophisticated all they're doing is parsing data from a json object and passing better formatted data onto further ETL processes.
Below is a simple example of what I've encountered.
data = {'status': 'ready'}
class StatusHandler:
def __init__(self, data):
self.status = data.get('status', None)
class StatusHandler2:
def __init__(self, data):
self._status = data.get('status', None)
#property
def status(self):
return self._status
without_getter = StatusHandler(data)
print(without_getter.status)
with_getter = StatusHandler2(data)
print(with_getter.status)
Is there anything wrong with me using the class StatusHandler and referencing a status instance variable and using that to pass information forward to other bits of code? I'm just wondering if further down the line as my project gets more complicated that this would be an issue as it doesn't seem to be standard although I could be wrong...

The point of getters/setters is to avoid replacing plain attributes access with computed ones without breaking client code if and when you have to change your implementation. This only make sense for languages that have no support for computed attributes.
Python has a quite strong support for computed attributes thru the descriptor protocol, including the generic builtin property type, so you don't need explicit getters/setters - if you have to change your implementation, just replace affected public attributes by computed ones.
Just make sure to not abuse computed attributes - they should not make any heavy computation, external resource access or so. No one expects what looks like an attribute to have a high cost or raise IOErrors or so ;-)
EDIT
With regard to your example: computed attributes are a way to control attribute access, and making an attribute read-only (not providing a setter for your property) IS a perfectly valid use case - IF you have a reason to make it read-only of course.

How can avoid DatabaseSessionIsOver from outside calls

I have some definitions of models where I had overridden their __repr__ methods. So for example, let's take into account the following entities:
def A(db.Entity):
id = PrimaryKey(int, auto=True)
name = Required(unicode)
b = Optional("B")
def __repr__(self):
return self.name
def B(db.Entity):
id = PrimaryKey(int, auto=True)
name = Required(unicode)
a = Required("A")
def __repr__(self):
return '{n} from a={aname}'.format(n=self.name, aname = self.a)
It is raising the DatabaseSessionIsOver exception whilst I was using search(B, 'aaaa) method from Flask-PonyWhoosh even if it are using db_session wrapped inside:
#orm.db_session
def search(model, *arg, **kw):
return model._wh_.search(*arg, **kw)
The exception raises only when some entity override the __repr__ method in that way that I did in the example above.
However, I'am using to avoid the problem the following sentences:
with db_session:
print(search(A, 'karl'))
So, shortly, the question is, is there any way to avoid the using of with ..., maybe modifying the __repr__ method or maybe modifying the methods from the package?.
Thanks,
PD: I've been reading prefetch method but it seems to be not appropriate. I'm not sure.

The exception DatabaseSessionIsOver happens because in your repr method you're trying to access the relationship attribute, which wasn't loaded from the database (self.a which tries to return the name attribute of the A entity).
One way to avoid this exception is to load all the necessary objects before you leave the db_session. In this case, those objects will sit in the Identity Map and no request to the database will be required.
Another way is to wrap all your code with a bigger scope db_session, so when you access the attribute which was not loaded from the database, Pony can do this within the db_session.
Pony requires using the #db_session because it sets the boundaries for the database conversation and allows freeing the resources:
Clear the Identity Map cache
Return the database connection to the connection pool
If we don't clear the cache, then all objects which were loaded from the database will sit in the memory until you clear the cache manually or your program ends.
Let's say we introduce the mode when the db_session never ends and you need to clear the cache manually. Do you think it would solve your problem and you would use it?

How do I write a Class-Based Django Validator?

I'm using Django 1.8.
The documentation on writing validators has an example of a function-based validator. It also says the following on using a class:
You can also use a class with a __call__() method for more complex or configurable validators. RegexValidator, for example, uses this technique. If a class-based validator is used in the validators model field option, you should make sure it is serializable by the migration framework by adding deconstruct() and __eq__() methods.
What are the pros/cons of class-based against function-based validators?
What is __call__() used for, and how is it used?
What is deconstruct() used for, and how is it used?
What is __eq__() used for, and how is it used?
An example would be helpful. A full answer may also be worth submitting to be in the official documentation.
Thanks!

A big advantage of validator functions is that they are really simple. They simply take one value as an argument, check that it's valid, and and raise ValidationError if it's not. You don't need to worry about deconstruct and __eq__ methods to make migrations work.
The validate_even example in the docs is much simpler than a validator class would be.
def validate_even(value):
if value % 2 != 0:
raise ValidationError('%s is not an even number' % value)
If you need to check divisibility by other numbers as well, then it would be worth creating a validator class ValidateDivisibleBy. Then you could use ValidateDivisibleBy(2), ValidateDivisibleBy(3) and so on. But a lot of the time, a validator function is good enough.

Other than being able to inherit from BaseValidator, there might not necessarily be a significant pro/con to choosing function vs class-based validators. I prefer class-based because you can keep internal state if necessary without making it visible to the clients (e.g. compiled regexes, pre-computed values in a table, history, etc.)
The __call__ method makes an object callable and allows it to sort of emulate function-like behavior (i.e. the object can be called like a function would), and the object's __call__ override will be invoked. It requires you to implement the special __call__(self, ...) method in the validator.
class Callable(object):
def __call__(self,*args,**kwargs):
print('Calling', args, kwargs)
>>> c = Callable()
>>> c(2, 3, color='red')
Calling (2, 3) {'color': 'red'}
>>>
The deconstruct method seems to provide a point where the client (i.e. you) can override serializing behavior by writing custom implementations. For example, see here. This seems similar to the clean method, where you can implement custom input sanitation for your models and gets called automatically when full_clean is invoked (e.g. when a form uses is_valid).
The __eq__ allows you to implement a comparison between 2 objects that are themselves not comparable. For example, if you have a
class Vector2:
def __init__(self, x, y):
self.x = x
self.y = y
your __eq__ implementation could look like this to check for equality between two vector objects:
# ...
def __eq__(self, other):
return self.x == other.x and self.y == other.y
This way, you avoid a shallow comparison of the underlying references.

Starts from the beginning, there are almost none cons - except of maybe some complexity in implementing class based validators.
But there are some pros: yon can save in class instance something for future validation, so it won't be computed each time something is validated, for example compiled regex pattern. You can also create more complex validator by spreading code into other methods in your class.
Also, you can construct your validator with some parameters which can be used later in validation process.
__call__ method is actual validation function - it will be called like normal validation function with same parameters (and additional self parameter - instance of class, like in all methods). And it's not something from django framework, it's from python itself. Any class can be called, like a function, if it has __call__ method implemented. deconstruct method is explained in migration serializing. __eq__ is also from python itself, each class can have that and it will simply compare 2 objects to check if they are equal.

Class decorator to auto-update properties dictionary on disk?

I am working on a project where I have a number of custom classes to interface with a varied collection of data on a user's system. These classes only have properties as user-facing attributes. Some of these properties are decently resource intensive, so I want to only run the generation code once, and store the returned value on disk (cache it, that is) for faster retrieval on subsequent runs. As it stands, this is how I am accomplishing this:
def stored_property(func):
"""This ``decorator`` adds on-disk functionality to the `property`
decorator. This decorator is also a Method Decorator.
Each key property of a class is stored in a settings JSON file with
a dictionary of property names and values (e.g. :class:`MyClass`
stores its properties in `my_class.json`).
"""
#property
#functools.wraps(func)
def func_wrapper(self):
print('running decorator...')
try:
var = self.properties[func.__name__]
if var:
# property already written to disk
return var
else:
# property written to disk as `null`
return func(self)
except AttributeError:
# `self.properties` does not yet exist
return func(self)
except KeyError:
# `self.properties` exists, but property is not a key
return func(self)
return func_wrapper
class MyClass(object):
def __init__(self, wf):
self.wf = wf
self.properties = self._properties()
def _properties(self):
# get name of class in underscore format
class_name = convert(self.__class__.__name__)
# this is a library used (in Alfred workflows) for interacted with data stored on disk
properties = self.wf.stored_data(class_name)
# if no file on disk, or one of the properties has a null value
if properties is None or None in properties.values():
# get names of all properties of this class
propnames = [k for (k, v) in self.__class__.__dict__.items()
if isinstance(v, property)]
properties = dict()
for prop in propnames:
# generate dictionary of property names and values
properties[prop] = getattr(self, prop)
# use the external library to save that dictionary to disk in JSON format
self.wf.store_data(class_name, properties,
serializer='json')
# return either the data read from file, or data generated in situ
return properties
#this decorator ensures that this generating code is only run if necessary
#stored_property
def only_property(self):
# some code to get data
return 'this is my property'
This code works precisely as I need it, but it still forces me to manually add the _properties(self) method to each class wherein I need this functionality (currently, I have 3). What I want is a way to "insert" this functionality into any class I please. I think that a Class Decorator could get this job done, but try as I might, I can't quite figure out how to wrangle it.
For the sake of clarity (and in case a decorator is not the best way to get what I want), I will try to explain the overall functionality I am after. I want to write a class that contains some properties. The values of these properties are generated via various degrees of complex code (in one instance, I'm searching for a certain app's pref file, then searching for 3 different preferences (any of which may or may not exist) and determining the best single result from those preferences). I want the body of the properties' code only to contain the algorithm for finding the data. But, I don't want to run that algorithmic code each time I access that property. Once I generate the value once, I want to write it to disk and then simply read that on all subsequent calls. However, I don't want each value written to its own file; I want a dictionary of all the values of all the properties of a single class to be written to one file (so, in the example above, my_class.json would contain a JSON dictionary with one key, value pair). When accessing the property directly, it should first check to see if it already exists in the dictionary on disk. If it does, simply read and return that value. If it exists, but has a null value, then try to run the generation code (i.e. the code actually written in the property method) and see if you can find it now (if not, the method will return None and that will once again be written to file). If the dictionary exists and that property is not a key (my current code doesn't really make this possible, but better safe than sorry), run the generation code and add the key, value pair. If the dictionary doesn't exist (i.e. on the first instantiation of the class), run all generation code for all properties and create the JSON file. Ideally, the code would be able to update one property in the JSON file without rerunning all of the generation code (i.e. running _properties() again).
I know this is a bit peculiar, but I need the speed, human-readable content, and elegant code all together. I would really not to have to compromise on my goal. Hopefully, the description of what I want it clear enough. If not, let me know in a comment what doesn't make sense and I will try to clarify. But I do think that a Class Decorator could probably get me there (essentially by inserting the _properties() method into any class, running it on instantiation, and mapping its value to the properties attribute of the class).

Maybe I'm missing something, but it doesn't seem that your _properties method is specific to the properties that a given class has. I'd put that in a base class and have each of your classes with #stored_property methods subclass that. Then you don't need to duplicate the _properties method.
class PropertyBase(object):
def __init__(self, wf):
self.wf = wf
self.properties = self._properties()
def _properties(self):
# As before...
class MyClass(PropertyBase):
#stored_property
def expensive_to_calculate(self):
# Calculate it here
If for some reason you can't subclass PropertyBase directly (maybe you already need to have a different base class), you can probably use a mixin. Failing that, make _properties accept an instance/class and a workflow object and call it explicitly in __init__ for each class.

Python/Django OOP modify the following code to show get/set and constructor

Case. I want to modify and add the following behavior to the code below (it's a context processor):
After checking if a user is authenticated check the last time the balance was updated (cookie maybe) if it was updated in the last 5 mins do nothing, else get the new balance as normal.
def get_balance(request):
if request.user.is_authenticated():
balance = Account.objects.get(user=request.user).balance
else:
balance = 0
return {'account_balance': balance}
HOWEVER:
I want to learn a little more about OOP in Django/Python can some modify the example to achieve my goal include the use of:
Property: I come from Java, I want to set and get, it makes more sense to me. get balance if does not exist else create new one.
Constructor method: In Python I think I have to change this to a class and use init right?
UPDATE:
To use a construct I first think I need to create a class, I'm assuming this is ok using as a context processor in Django to do something like this:
class BalanceProcessor(request):
_balance = Account.objects.get(user=request.user).balance
#property
def get_balance(self):
return return {'account_balance': _balance}
#setter???

Python is not Java. In Python you don't create classes for no reason. Classes are for when you have data you want to encapsulate with code. In this case, there is no such thing: you simply get some data and return it. A class would be of no benefit here whatsoever.
In any case, even if you do create a class, once again Python is not Java, and you don't create getters and setters on properties unless you actually need to do some processing when you get and set. If you just want to access an instance attribute, then you simply access it.
Finally, your proposed code will not work for two reasons. Firstly, you are trying to inherit from request. That makes no sense: you should inherit from object unless you are subclassing something. Secondly, how are you expecting your class to be instantiated? Context processors are usually functions, and that means Django is expecting a callable. If you give the class as the context processor, then calling it will instantiate it: but then there's nothing that will call the get_balance method. And your code will fail because Django will pass the request into the instantation (as it is expecting to do with a function) and your __init__ doesn't expect that parameter.
It's fine to experiment with classes in Python, but a context processor is not the place for it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I memoize expensive calculations on Django model objects? - python

You may be interested in a built-in django decorator django.utils.functional.memoize. Django uses this to cache expensive operation like url resolving.

Related

Is there anything wrong with referencing an instance variable without implementing a getter?

How can avoid DatabaseSessionIsOver from outside calls

How do I write a Class-Based Django Validator?

Class decorator to auto-update properties dictionary on disk?

Python/Django OOP modify the following code to show get/set and constructor

Categories

Resources