Overriding delete() vs using pre delete signal - python

I have a model where i would like when an object gets deleted, instead of being deleted a status is updated. This was achieved with following code:
def delete(self, using=None, keep_parents=False):
self.status = Booking.DELETED
self.save()
The manager was updated so that in the rest of the application i never get presented deleted bookings.
class BookingManager(models.Manager):
def get_queryset(self):
return super().get_queryset().exclude(status=Booking.DELETED)
class BookingDeletedManager(models.Manager):
def get_queryset(self):
return super().get_queryset().filter(status=Booking.DELETED)
class Booking(models.Model):
PAYED = 0
PENDING = 1
OPEN = 2
CANCELLED = 3
DELETED = 4
objects = BookingManager()
deleted_objects = BookingDeletedManager()
...
Now i have read up on django signals and was wondering wheter it would be better to use the pre delete signal here. The code could be changed so that in the pre delete receiver a duplicate is created with status 'deleted' and the original booking is just deleted.
In the documentation it states that these signals should be used to allow decoupled applications get notified when actions occur elsewhere in the framework. It seems the signal is a good a solution but this nuance in the documentation makes me think it's maybe not what i want and overriding might just be the way to go.
This is not really the case here since i just want this functionality all the time. So my question being is there a solid reason why i should not override the delete method and use the pre delete signal or vice versa?

In the documentation it states that these signals should be used to allow decoupled applications get notified when actions occur elsewhere in the framework.
That's indeed the point: allowing one application to get notified of events occuring in another application that knows nothing about the first one - hence avoiding the need to couple the second application to the first one.
In your case using models signals instead of just overridding the model's method would be an anti-pattern: it will only add overhead and make your code less readable for absolutely no good reason, when the very obvious solution is to just do what you've did.

Related

Django: updating many objects with per-object calculation

This question is a continuation of one I asked yesterday: I'm still not sure if a post_save handler or a 2nd Celery task is the best way to update many objects based on the results of the first Celery task, but I plan to test performance down the line. Here's a recap of what's happening:
Celery task, every 30s:
Update page_count field of Book object based on conditions
|
post_save(Book) |
V
Update some field on all Reader objects w/ foreign key to updated Book
(update will have different results per-Reader, thousands of Readers could be FKed to Book)
The first task could save ~10 objects, requiring the update to all related Reader objects for each.
Whichever proves to be better between post_save and another task, they must accomplish the same thing: update potentially tens to hundreds of thousands of objects in a table, with each object update being unique. It could be that my choice between post_save and Celery task is determined by which method will actually allow me to accomplish this goal.
Since I can't just use a few queryset update() commands, I need to somehow call a method or function that calculates the value of a field based on the result of the first Celery task as well as some of the values in the object. Here's an example:
class Reader(models.Model):
book = models.ForeignKey(Book)
pages_read = models.IntegerField(default=0)
book_finished = models.BooleanField(default=False)
def determine_book_finished(self):
if self.pages_read == book.page_count:
self.book_finished = True
else:
self.book_finished = False
This is a contrived example, but if the page_count was updated in the first task, I want all Readers foreign keyed to the Book to have their book_finished recalculated- and looping over a queryset seems like a really inefficient way to go about it.
My thought was to somehow call a model method such as determine_book_finished() on an entire queryset at once, but I can't find any documentation on how to do something like that- custom querysets don't appear to be intended for actually operating on objects in the queryset beyond the built-in update() capability.
This post using Celery is the most promising thing I've found, and since Django signals are sync, using another Celery task would also have the benefit of not holding anything else up. So even though I'd still need to loop over a queryset, it'd be async and any querysets that needed to be updated could be handled by separate tasks, hopefully in parallel.
On the other had, this question seems to have a solution too- register the method with the post_save signal, which presumably would run the method on all objects after receiving the signal. Would this be workable with thousands of objects needing update, as well as potentially other Books being updated by the same task and their thousands of associated Readers then needing update too?
Is there a best practice for doing what I'm trying to do here?
EDIT: I realized I could go about this another way- making the book_finished field a property determined at runtime rather than a static field.
#property
def book_finished:
if self.pages_read == self.book.page_count:
if self.book.page_count == self.book.planned_pages:
return True
else:
return False
This is close enough to my actual code- in that, the first if branch contains a couple elif branches, with each having their own if-else for a total maximum depth of 3 ifs.
Until I can spin up a lot of test data and simulate many simultaneous users, I may stick with this option as it definitely works (for now). I don't really like having the property calculated every retrieval, but from some quick research, it doesn't seem like an overly slow method.

Dangers of dynamically modifying entity Model in App Engine

I have a class from which all my entity definitions inherit:
class Model(db.Model):
"""Superclass for all others; contains generic properties and methods."""
created = db.DateTimeProperty(auto_now_add=True)
modified = db.DateTimeProperty(auto_now=True)
For various reasons I want to be able to occasionally modify an entity without changing its modified property. I found this example:
Model.__dict__["modified"].__dict__["auto_now"] = False
db.put(my_entity)
Model.__dict__["modified"].__dict__["auto_now"] = True
When I test this locally, it works great. My question is this: could this have wider ramifications for any other code that happens to be saving entities during the small period of time Model is altered? I could see that leading to incredibly confusing bugs. But maybe this little change only affects the current process/thread or whatever?
Any other request coming in to the same instance and being handled whilst the put is in progress will also get auto_now=False, whilst unlikely it is possible
Something else other thing to consider
You don't have try block around this code, if you get a timeout or error during the put() your code will leave the model in the modified state with auto_now=False .
Personally in think its a bad idea and will definatley be a source of errors.
There are a number of ways of achieving this without manipulating models,
consider setting the default behaviour to auto_now=False, and then have two methods you use for updating. The primary method sets the modified time to datetime.now() just before you do the put(), e.g save() and save_without_modified()
A better method would to override put() in your class, then set modified and then call super put() have put() accept a new argument like modified=False so you don't set the modified date before you call super.
Lastly you could use _pre_put hook to run code before the put() call, but you need to annotate the instance in some way so the _pre_put method can determine if modified needs to be set or not.
I think each of these strategies is a lot more safe than hacking the model

Memory leak in django when keeping a reference of all instances of forms

This is a followup to this thread.
I have implemented the method for keeping a reference to all my forms in a array like mentioned by the selected answer in this but unfortunately I am getting a memory leak on each request to the django host.
The code in question is as follows:
This is my custom form I am extending which has a function to keep reference of neighboring forms, and whenever I instantiate a new form, it just keeps getting added to the _instances stack.
class StepForm(ModelForm):
TAGS = []
_instances = []
def __new__(cls, *args, **kwargs):
instance = object.__new__(cls)
cls._instances.append(instance)
return instance
Even though this more of a python issue then Django I have decided that it's better to show you the full context that I am encountering this problem at.
As requested I am posting what I am trying to accomplish with this feat:
I have a js applet with steps, and for each step there is a form, but in order to load the contents of each step dynamically through JS I need to execute some calls on the next form in line. And on the previous aswell. Therefore the only solution I Could come up with is to just keep a reference to all forms on each request and just use the forms functions I need.
Well it's not only a Python issue - the execution context (here a Django app) is important too. As Ludwik Trammer rightly comments, you're in a long running process, and as such anything at the module or class level will live for the duration of the process. Also if using more than one process to serve the app you may (and will) get inconsistant results from one request to another, since two subsequent requests from a same user can (and will) end up being served by different processes.
To make a long story short: the way to safely keep per-user persistant state in a web application is to use sessions. Please explain what problem you're trying to solve, there's very probably a more appropriate (and possibly existing and tested) solution.
EDIT : ok what you're looking for is a "wizard". There are a couple available implementations for Django but most of them don't handle going back - which, from experience, can get tricky when each step depends on the previous one (and that's one of the driving points for using a wizard). What one usually do is have a `Wizard' class (plain old Python object) with a set of forms.
The wizard takes care of
step to step navigation
instanciating forms
maintaining state (which includes storing and retrieving form's data for each step, revalidating etc).
FWIW I've had rather mixed success using Django's existing session-based wizard. We rolled our own for another project (with somehow complex requirements) and while it works I wouldn't name it a success neither. Having ajax and file uploads thrown in the mix doesn't help neither. Anyway, you can try to start with an existing implementation, see how it fits your needs, and go for a custom solution if it doesn't - generic solutions sometimes make things harder than they have to be.
My 2 cents...
The leak is not just a side effect of your code - it's part of its core function. It is not possible to remove the leak without changing what the code does.
It does exactly what it is programmed to do - every time the form is displayed a new instance is created and added to the _instances list. It is never removed from the list. As a consequence after 100 requests you will have a list with 100 requests, after 1 000 requests there will be 1 000 instances in the list, and so on - until all memory is exhausted and the program crashes.
What did you want to accomplish by keeping all instances of your form? And what else did you expect to happen?

How to save a model without sending a signal?

How can I save a model, such that signals arent sent. (post_save and pre_save)
It's a bit of a hack, but you can do something like this:
use a unique identifier with a filter and then use the update method of the queryset (which does not trigger the signals)
user_id = 142187
User.objects.filter(id=user_id).update(name='tom')
ModelName.objects.bulk_create([your object/objects])
also you can read more here django docs
This ticket has been marked as "wontfix" because:
In short, it sounds like, given the defined purpose of signals, it is
the attached signal handler that needs to become more intelligent
(like in davedash's suggestion), rather than the code that emits the
signal. Disabling signals is just a quick fix that will work when you
know exactly what handlers are attached to a signal, and it hides the
underlying problem by putting the fix in the wrong place.
There is currently a ticket pending a Django design decision for this feature.
Included in the ticket is a diff for a patch with the proposed implementation.
If you have mutual relations on models and their signals
still you can decouple signal's logic to have more signals of same type, and handle your logic in more sophisticated way:
You can check in signals, the state of object:
kwargs['created']
You can check the state of any pasted extra value:
So in one signal, you will read at first:
if `kwargs['instance'].skip_signals`:
return
and in another place, before save() you will just set that skip_signals on the specific object, in specific situation.
(there is no need to define it as model field)
You can also do not emit signals:
by overriding method(s) on models,
or by adding own save_without_signals(),
or just as mentioned already, doing filter(pk=<>).update(...)
As far as I'm aware there still isn't a 'nice' way to do this, but if you're open to exploring hacky solutions, then I'll add one to the mix.
If you look at the django model source code, specifically save_base() here and here, you'll see that the pre_save() and post_save() signals are both wrapped in a conditional:
if not meta.auto_created:
// Emit signal
We can directly manipulate the meta options of a model or instance through the _meta API which means we're able to 'disable' the signals from firing by setting auto_created = True on the instance we want to save.
For example:
#receiver(post_save, sender=(MyModel))
def save_my_model(sender, instance=None, created=False, **kwargs):
if created:
# Modify the instance
instance.task_id = task.task_hash
# HACK: Prevent `post_save` signal from being called during save
instance._meta.auto_created = True
instance.save()
instance._meta.auto_created = False
elif instance.has_changed("schedule"):
# Modify the instance
instance.task_id = 'abc123'
# HACK: Prevent `post_save` signal from being called during save
instance._meta.auto_created = True
instance.save()
instance._meta.auto_created = False
The major caveat here is that this is undocumented behaviour and it could well change it future releases of django.

Per-session transactions in Django

I'm making a Django web-app which allows a user to build up a set of changes over a series of GETs/POSTs before committing them to the database (or reverting) with a final POST. I have to keep the updates isolated from any concurrent database users until they are confirmed (this is a configuration front-end), ruling out committing after each POST.
My preferred solution is to use a per-session transaction. This keeps all the problems of remembering what's changed (and how it affects subsequent queries), together with implementing commit/rollback, in the database where it belongs. Deadlock and long-held locks are not an issue, as due to external constraints there can only be one user configuring the system at any one time, and they are well-behaved.
However, I cannot find documentation on setting up Django's ORM to use this sort of transaction model. I have thrown together a minimal monkey-patch (ew!) to solve the problem, but dislike such a fragile solution. Has anyone else done this before? Have I missed some documentation somewhere?
(My version of Django is 1.0.2 Final, and I am using an Oracle database.)
Multiple, concurrent, session-scale transactions will generally lead to deadlocks or worse (worse == livelock, long delays while locks are held by another session.)
This design is not the best policy, which is why Django discourages it.
The better solution is the following.
Design a Memento class that records the user's change. This could be a saved copy of their form input. You may need to record additional information if the state changes are complex. Otherwise, a copy of the form input may be enough.
Accumulate the sequence of Memento objects in their session. Note that each step in the transaction will involve fetches from the data and validation to see if the chain of mementos will still "work". Sometimes they won't work because someone else changed something in this chain of mementos. What now?
When you present the 'ready to commit?' page, you've replayed the sequence of Mementos and are pretty sure they'll work. When the submit "Commit", you have to replay the Mementos one last time, hoping they're still going to work. If they do, great. If they don't, someone changed something, and you're back at step 2: what now?
This seems complex.
Yes, it does. However it does not hold any locks, allowing blistering speed and little opportunity for deadlock. The transaction is confined to the "Commit" view function which actually applies the sequence of Mementos to the database, saves the results, and does a final commit to end the transaction.
The alternative -- holding locks while the user steps out for a quick cup of coffee on step n-1 out of n -- is unworkable.
For more information on Memento, see this.
In case anyone else ever has the exact same problem as me (I hope not), here is my monkeypatch. It's fragile and ugly, and changes private methods, but thankfully it's small. Please don't use it unless you really have to. As mentioned by others, any application using it effectively prevents multiple users doing updates at the same time, on penalty of deadlock. (In my application, there may be many readers, but multiple concurrent updates are deliberately excluded.)
I have a "user" object which persists across a user session, and contains a persistent connection object. When I validate a particular HTTP interaction is part of a session, I also store the user object on django.db.connection, which is thread-local.
def monkeyPatchDjangoDBConnection():
import django.db
def validConnection():
if django.db.connection.connection is None:
django.db.connection.connection = django.db.connection.user.connection
return True
def close():
django.db.connection.connection = None
django.db.connection._valid_connection = validConnection
django.db.connection.close = close
monkeyPatchDBConnection()
def setUserOnThisThread(user):
import django.db
django.db.connection.user = user
This last is called automatically at the start of any method annotated with #login_required, so 99% of my code is insulated from the specifics of this hack.
I came up with something similar to the Memento pattern, but different enough that I think it bears posting. When a user starts an editing session, I duplicate the target object to a temporary object in the database. All subsequent editing operations affect the duplicate. Instead of saving the object state in a memento at each change, I store operation objects. When I apply an operation to an object, it returns the inverse operation, which I store.
Saving operations is much cheaper for me than mementos, since the operations can be described with a few small data items, while the object being edited is much bigger. Also I apply the operations as I go and save the undos, so that the temporary in the db always corresponds to the version in the user's browser. I never have to replay a collection of changes; the temporary is always only one operation away from the next version.
To implement "undo," I pop the last undo object off the stack (as it were--by retrieving the latest operation for the temporary object from the db) apply it to the temporary and return the transformed temporary. I could also push the resultant operation onto a redo stack if I cared to implement redo.
To implement "save changes," i.e. commit, I de-activate and time-stamp the original object and activate the temporary in it's place.
To implement "cancel," i.e. rollback, I do nothing! I could delete the temporary, of course, because there's no way for the user to retrieve it once the editing session is over, but I like to keep the canceled edit sessions so I can run stats on them before clearing them out with a cron job.

Categories

Resources