Dangers of dynamically modifying entity Model in App Engine - python

I have a class from which all my entity definitions inherit:
class Model(db.Model):
"""Superclass for all others; contains generic properties and methods."""
created = db.DateTimeProperty(auto_now_add=True)
modified = db.DateTimeProperty(auto_now=True)
For various reasons I want to be able to occasionally modify an entity without changing its modified property. I found this example:
Model.__dict__["modified"].__dict__["auto_now"] = False
db.put(my_entity)
Model.__dict__["modified"].__dict__["auto_now"] = True
When I test this locally, it works great. My question is this: could this have wider ramifications for any other code that happens to be saving entities during the small period of time Model is altered? I could see that leading to incredibly confusing bugs. But maybe this little change only affects the current process/thread or whatever?

Any other request coming in to the same instance and being handled whilst the put is in progress will also get auto_now=False, whilst unlikely it is possible
Something else other thing to consider
You don't have try block around this code, if you get a timeout or error during the put() your code will leave the model in the modified state with auto_now=False .
Personally in think its a bad idea and will definatley be a source of errors.
There are a number of ways of achieving this without manipulating models,
consider setting the default behaviour to auto_now=False, and then have two methods you use for updating. The primary method sets the modified time to datetime.now() just before you do the put(), e.g save() and save_without_modified()
A better method would to override put() in your class, then set modified and then call super put() have put() accept a new argument like modified=False so you don't set the modified date before you call super.
Lastly you could use _pre_put hook to run code before the put() call, but you need to annotate the instance in some way so the _pre_put method can determine if modified needs to be set or not.
I think each of these strategies is a lot more safe than hacking the model

Related

Advantages of using static methods over instance methods in python

My IDE keeps suggesting I convert my instance methods to static methods. I guess because I haven't referenced any self within these methods.
An example is :
class NotificationViewSet(NSViewSet):
def pre_create_processing(self, request, obj):
log.debug(" creating messages ")
# Ensure data is consistent and belongs to the sending bot.
obj['user_id'] = request.auth.owner.id
obj['bot_id'] = request.auth.id
So my question would be: do I lose anything by just ignoring the IDE suggestions, or is there more to it?
This is a matter of workflow, intentions with your design, and also a somewhat subjective decision.
First of all, you are right, your IDE suggests converting the method to a static method because the method does not use the instance. It is most likely a good idea to follow this suggestion, but you might have a few reasons to ignore it.
Possible reasons to ignore it:
The code is soon to be changed to use the instance (on the other hand, the idea of soon is subjective, so be careful)
The code is legacy and not entirely understood/known
The interface is used in a polymorphic/duck typed way (e.g. you have a collection of objects with this method and you want to call them in a uniform way, but the implementation in this class happens to not need to use the instance - which is a bit of a code smell)
The interface is specified externally and cannot be changed (this is analog to the previous reason)
The AST of the code is read/manipulated either by itself or something that uses it and expects this method to be an instance method (this again is an external dependency on the interface)
I'm sure there can be more, but failing these types of reasons I would follow the suggestion. However, if the method does not belong to the class (e.g. factory method or something similar), I would refactor it to not be part of the class.
I think that you might be mixing up some terminology - the example is not a class method. Class methods receive the class as the first argument, they do not receive the instance. In this case you have a normal instance method that is not using its instance.
If the method does not belong in the class, you can move it out of the class and make it a standard function. Otherwise, if it should be bundled as part of the class, e.g. it's a factory function, then you should probably make it a static method as this (at a minimum) serves as useful documentation to users of your class that the method is coupled to the class, but not dependent on it's state.
Making the method static also has the advantage this it can be overridden in subclasses of the class. If the method was moved outside of the class as a regular function then subclassing is not possible.

Is there an easy way to "reload" an object's methods without re-instantiating it entirely?

This is typical pattern that I run into in Python but probably applies to most other multi-paradigm languages.
I write a bunch of functions. Some of these are like load_data() and some are like do_something_with_data(). That is the latter acts on the data that is read in with the first function. Let's say it takes 1 minute to read in the data.
After a while I refactor the code so that these are both methods within a class. While this seems neater, it is also harder to develop on. That is, if I fix a bug in do_something_with_data() the object that is already instantiated is not fixed. I have to re-instantiate it which might take a minute or so since it has to read the data.
object=my_object();object.load_data();object.do_something_with_data()
I am wondering if there is a good pattern for handling this issue. Can you update an object's methods without refreshing the data? Should I write a method that takes an old object and copies in all the data fields from an object that has been saved? Other ideas?
Methods are looked up on the class. On module reload, existing instances end up referencing a class that no longer exists; their __class__ points to an object that was the old module.classname, and is not the same object as the new module.classname.
You have two options:
Update the old class to have your new method:
existing_instance.__class__.methodname = module.classname.methodname.__func__
Replace the class references on the existing objects:
existing_instance.__class__ = module.classname

Memory leak in django when keeping a reference of all instances of forms

This is a followup to this thread.
I have implemented the method for keeping a reference to all my forms in a array like mentioned by the selected answer in this but unfortunately I am getting a memory leak on each request to the django host.
The code in question is as follows:
This is my custom form I am extending which has a function to keep reference of neighboring forms, and whenever I instantiate a new form, it just keeps getting added to the _instances stack.
class StepForm(ModelForm):
TAGS = []
_instances = []
def __new__(cls, *args, **kwargs):
instance = object.__new__(cls)
cls._instances.append(instance)
return instance
Even though this more of a python issue then Django I have decided that it's better to show you the full context that I am encountering this problem at.
As requested I am posting what I am trying to accomplish with this feat:
I have a js applet with steps, and for each step there is a form, but in order to load the contents of each step dynamically through JS I need to execute some calls on the next form in line. And on the previous aswell. Therefore the only solution I Could come up with is to just keep a reference to all forms on each request and just use the forms functions I need.
Well it's not only a Python issue - the execution context (here a Django app) is important too. As Ludwik Trammer rightly comments, you're in a long running process, and as such anything at the module or class level will live for the duration of the process. Also if using more than one process to serve the app you may (and will) get inconsistant results from one request to another, since two subsequent requests from a same user can (and will) end up being served by different processes.
To make a long story short: the way to safely keep per-user persistant state in a web application is to use sessions. Please explain what problem you're trying to solve, there's very probably a more appropriate (and possibly existing and tested) solution.
EDIT : ok what you're looking for is a "wizard". There are a couple available implementations for Django but most of them don't handle going back - which, from experience, can get tricky when each step depends on the previous one (and that's one of the driving points for using a wizard). What one usually do is have a `Wizard' class (plain old Python object) with a set of forms.
The wizard takes care of
step to step navigation
instanciating forms
maintaining state (which includes storing and retrieving form's data for each step, revalidating etc).
FWIW I've had rather mixed success using Django's existing session-based wizard. We rolled our own for another project (with somehow complex requirements) and while it works I wouldn't name it a success neither. Having ajax and file uploads thrown in the mix doesn't help neither. Anyway, you can try to start with an existing implementation, see how it fits your needs, and go for a custom solution if it doesn't - generic solutions sometimes make things harder than they have to be.
My 2 cents...
The leak is not just a side effect of your code - it's part of its core function. It is not possible to remove the leak without changing what the code does.
It does exactly what it is programmed to do - every time the form is displayed a new instance is created and added to the _instances list. It is never removed from the list. As a consequence after 100 requests you will have a list with 100 requests, after 1 000 requests there will be 1 000 instances in the list, and so on - until all memory is exhausted and the program crashes.
What did you want to accomplish by keeping all instances of your form? And what else did you expect to happen?

What functions do the order of Django Managers affect?

So, I've read most of the docs and I've been looking around on SO a bit, but I can't quite find the answer to my question. I'll start with the code.
# Manager
class ActiveManager(models.Manager):
def get_query_set(self):
return super(ActiveManager, self).get_query_set().filter(is_active=True)
# Model
class ModelA(models.Model):
# ...
is_active = models.BooleanField()
objects = ActiveManager()
all_objects = models.Manager()
So, while I was playing around I noticed that if I wrote it this way and used get_object_or_404(), then it would use the ActiveManager to first search for all active records and then return the one related to my query. However, if I switched the order of the managers:
class ModelA(models.Model):
# ...
all_objects = models.Manager()
objects = ActiveManager()
Then it uses the default manager, in this case all_objects, to do the query. I'm wondering what other functions does this change impact.
EDIT: I understand that the first manager found in the class becomes the default manager, but I'm wondering which specific functions use this default manager (like get_object_or_404)
Here's the relevant bit from the docs: "If you use custom Manager objects, take note that the first Manager Django encounters (in the order in which they're defined in the model) has a special status. Django interprets the first Manager defined in a class as the "default" Manager, and several parts of Django (including dumpdata) will use that Manager exclusively for that model. As a result, it's a good idea to be careful in your choice of default manager in order to avoid a situation where overriding get_query_set() results in an inability to retrieve objects you'd like to work with".
If you look at the way get_object_or_404 is implemented, they use the _default_manager attribute of the model, which is how Django refers to the first manager encountered. (As far as I know, all Django internals work this way -- they never use Model.objects etc. because you shouldn't assume the default manager happens to be called objects).
It effects many things. The default name for the manager, objects, is just that, a default, but it's not required. If you didn't include objects in your model definition and just defined a manager as all_objects, ModelA.objects wouldn't exist. Django merely assigns a default manager to that if no other managers are present on the model and you have not defined objects on your own.
Anyways, because of this Django takes the first manager defined in a model and calls that the "default", and later uses the "default" manager anytime is needs to reference the model's manager (because, again, it can't simply use objects because objects might not be defined).
The rule of thumb is that the standard manager that Django should use (in a sense, the manager that should most normally be used), should be the first one defined, whether it be assigned to objects or something else entirely. Every other additional manager should come after that.

Traversing object hierarchy pickle style

I'm in a need for doing some sort of processing on the objects that get pickled just before it happens. More precisely for instances of subclasses of a certain base class I would like something totally different to be pickled instead and then recreated on loading.
I'm aware of __getstate__ & __setstate__ however this is a very invasive approach. My understanding is that these are private methods (begin with double underscore: __), and as such are subject to name mangling. Therefore this effectively would force me to redefine those two methods for every single class that I want to be subject to this non standard behavior. In addition I don't really have a full control over the hierarchy of all classes.
I was wondering if there is some sort of brief way of hooking into pickling process and applying this sort of control that __getstate__ and __setstate__ give but without having to modify the pickled classes as such.
A side note for the curious ones. This is a use case taken from a project using Django and Celery. Django models are either unpickable or very unpractical and cumbersome to pickle. Therefore it's much more advisable to pickle pairs of values ID + model class instead. However sometimes it's not the model directly that is pickled but rather a dictionary of models, a list of models, a list of lists of models, you name it. This forces me to write a lot of copy-paste code that I really dislike. A need for pickling models comes itself from Django-celery setup, where functions along with their call arguments are scheduled for later execution. Unfortunately among those arguments there are usually a lot of models mixed up in some nontrivial hierarchy.
EDIT
I do have a possibility of specifying a custom serializer to be used by Celery, so it's really a question of being able to build a slightly modified serializer on top of pickle without much effort.
The only additional hooks that are related are reduce() and __reduce__ex()
http://docs.python.org/library/pickle.html
What is the difference between __reduce__ and __reduce_ex__?
Python: Ensuring my class gets pickled only with the latest protocol
Not sure if they really provide what you need in particular.

Categories

Resources