How can I atomically compare-exchange-save a value of Django Model instance Field? (Using PostgreSQL as the DB backend).
An example use case is making sure multiple posts with similar content (e.g. submits of the same form) take effect only once, without relying on insecure and only sometimes-working client-side javascript or server-side tracking of form UUIDs, which isn't secure against malicious multiple-posts.
For example:
def compare_exchange_save(model_object, field_name, comp, exch):
# How to implement?
....
from django.views.generic.edit import FormView
from django.db import transaction
from my_app.models import LicenseCode
class LicenseCodeFormView(FormView):
def post(self, request, ...):
# Get object matching code entered in form
license_code = LicenseCode.objects.get(...)
# Safely redeem the code exactly once
# No change is made in case of error
try:
with transaction.atomic()
if compare_exchange_save(license_code, 'was_redeemed', False, True):
# Deposit a license for the user with a 3rd party service. Raises an exception if it fails.
...
else:
# License code already redeemed, don't deposit another license.
pass
except:
# Handle exception
...
What you are looking for is the update function on a QuerySet object.
Depending on the value, you can do a comparison with Case, When objects - check out the docs on conditional updates NOTE that link is for 1.10 - Case/When came in in 1.8.
You might also find utility in using F which is used to reference a value in a field.
For example:
I need to update a value in my model Model:
(Model.objects
.filter(id=my_id)
.update(field_to_be_updated=Case(
When(my_field=True, then=Value(get_new_license_string()),
default=Value(''),
output_field=models.CharField())))
If you need to use an F object, just reference it on the right hand side of the equals in the update expression.
The update doesn't necessitate the use of transaction.atomic() context manager but if you need to do any other database operations you should continue to wrap that code with transaction.atomic()
Edit:
You may also like to use the queryset select_for_update method that implements row locks when the queryset is executed docs.
Related
I have a foreign key relationship in my Django (v3) models:
class Example(models.Model):
title = models.CharField(max_length=200) # this is irrelevant for the question here
not_before = models.DateTimeField(auto_now_add=True)
...
class ExampleItem(models.Model):
myParent = models.ForeignKey(Example, on_delete=models.CASCADE)
execution_date = models.DateTimeField(auto_now_add=True)
....
Can I have code running/triggered whenever an ExampleItem is "added to the list of items in an Example instance"? What I would like to do is run some checks and, depending on the concrete Example instance possibly alter the ExampleItem before saving it.
To illustrate
Let's say the Example's class not_before date dictates that the ExampleItem's execution_date must not be before not_before I would like to check if the "to be saved" ExampleItem's execution_date violates this condition. If so, I would want to either change the execution_date to make it "valid" or throw an exception (whichever is easier). The same is true for a duplicate execution_date (i.e. if the respective Example already has an ExampleItem with the same execution_date).
So, in a view, I have code like the following:
def doit(request, example_id):
# get the relevant `Example` object
example = get_object_or_404(Example, pk=example_id)
# create a new `ExampleItem`
itm = ExampleItem()
# set the item's parent
itm.myParent = example # <- this should trigger my validation code!
itm.save() # <- (or this???)
The thing is, this view is not the only way to create new ExampleItems; I also have an API for example that can do the same (let alone that a user could potentially "add ExampleItems manually via REPL). Preferably the validation code must not be duplicated in all the places where new ExampleItems can be created.
I was looking into Signals (Django docu), specifically pre_save and post_save (of ExampleItem) but I think pre_save is too early while post_save is too late... Also m2m_changed looks interesting, but I do not have a many-to-many relationship.
What would be the best/correct way to handle these requirements? They seem to be rather common, I imagine. Do I have to restructure my model?
The obvious solution here is to put this code in the ExampleItem.save() method - just beware that Model.save() is not invoked by some queryset bulk operations.
Using signals handlers on your own app's models is actually an antipattern - the goal of signal is to allow for your app to hook into other app's lifecycle without having to change those other apps code.
Also (unrelated but), you can populate your newly created models instances directly via their initializers ie:
itm = ExampleItem(myParent=example)
itm.save()
and you can even save them directly:
# creates a new instance, populate it AND save it
itm = ExampleItem.objects.create(myParent=example)
This will still invoke your model's save method so it's safe for your use case.
I am trying to rewrite get_by_natural_key method on django manager (models.Manager). After adding model (NexchangeModel) I can delete all() objects but single - cannot.
Can:
SmsToken.objects.all().delete()
Cannot:
SmsTokent.objects.last().delete()
Code:
from django.db import models
from core.common.models import SoftDeletableModel, TimeStampedModel, UniqueFieldMixin
class NexchangeManager(models.Manager):
def get_by_natural_key(self, param):
qs = self.get_queryset()
lookup = {qs.model.NATURAL_KEY: param}
return self.get(**lookup)
class NexchangeModel(models.Model):
class Meta:
abstract = True
objects = NexchangeManager()
class SmsToken(NexchangeModel, SoftDeletableModel, UniqueFieldMixin):
sms_token = models.CharField(
max_length=4, blank=True)
user = models.ForeignKey(User, related_name='sms_token')
send_count = models.IntegerField(default=0)
While you are calling:
SmsToken.objects.all().delete() you are calling the queryset's delete method.
But on SmsTokent.objects.last().delete() you are calling the instance's delete method.
After django 1.9 queryset delete method returns no of items deleted. REF
Changed in Django 1.9:
The return value describing the number of objects deleted was added.
But on instance delete method Django already knows only one row will be deleted.
Also note that querset's delete method and instance's delete method are different.
The delete()[on a querset] method does a bulk delete and does not call any delete() methods on your models[instance method]. It does, however, emit the pre_delete and post_delete signals for all deleted objects (including cascaded deletions).
So you cannot rely on the response of the method to check if the delete worked fine or not. But in terms of python's philosophy "Ask for forgiveness than for permission". That means you can rely on exceptions to see if a delete has worked properly the way it should. Django's ORM will raise proper exceptions and do proper rollbacks in case of any failure.
So you can do this:
try:
instance.delete()/querset.delete()
except Exception as e:
# some code to notify failure / raise to propagate it
# you can avoid this try..except if you want to propagate exceptions as well.
Note: I am catching generic Exception because the only code in my try block is delete. If you wish to have some other code then you must catch specific exceptions only.
I assume that SoftDeletableModel comes from the django-model-utils package? If so, the purpose of that model is to mark instances with an is_removed field rather than actually deleting them. So it's to be expected that calling delete() on a model instance—which is what you get from last()—wouldn't actually delete anything.
SoftDeletableModel provides an objects attribute with a manager that limits its results to non-removed objects, and overrides delete() to mark objects as removed instead of actually deleting them.
The problem is that you've defined your own manager as objects, so the SoftDeletableModel manager isn't being used. Your custom manager is actually bulk deleting objects from the database, contrary to the goal of doing a soft delete. The way to resolve this is to have your custom manager inherit from SoftDeletableManagerMixin:
class NexchangeManager(SoftDeletableManagerMixin, models.Manager):
# your custom code
We know, that update - is thread safe operation.
It means, that when you do:
SomeModel.objects.filter(id=1).update(some_field=100)
Instead of:
sm = SomeModel.objects.get(id=1)
sm.some_field=100
sm.save()
Your application is relativly thread safe and operation SomeModel.objects.filter(id=1).update(some_field=100) will not rewrite data in other model fields.
My question is.. If there any way to do
SomeModel.objects.filter(id=1).update(some_field=100)
but with creation of object if it does not exists?
from django.db import IntegrityError
def update_or_create(model, filter_kwargs, update_kwargs)
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
kwargs = filter_kwargs.copy()
kwargs.update(update_kwargs)
try:
model.objects.create(**kwargs)
except IntegrityError:
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
raise # re-raise IntegrityError
I think, code provided in the question is not very demonstrative: who want to set id for model?
Lets assume we need this, and we have simultaneous operations:
def thread1():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 1})
def thread2():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 2})
With update_or_create function, depends on which thread comes first, object will be created and updated with no exception. This will be thread-safe, but obviously has little use: depends on race condition value of SomeModek.objects.get(some__unique_field=1).some_field could be 1 or 2.
Django provides F objects, so we can upgrade our code:
from django.db.models import F
def thread1():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 1})
def thread2():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 2})
You want django's select_for_update() method (and a backend that supports row-level locking, such as PostgreSQL) in combination with manual transaction management.
try:
with transaction.commit_on_success():
SomeModel.objects.create(pk=1, some_field=100)
except IntegrityError: #unique id already exists, so update instead
with transaction.commit_on_success():
object = SomeModel.objects.select_for_update().get(pk=1)
object.some_field=100
object.save()
Note that if some other process deletes the object between the two queries, you'll get a SomeModel.DoesNotExist exception.
Django 1.7 and above also has atomic operation support and a built-in update_or_create() method.
You can use Django's built-in get_or_create, but that operates on the model itself, rather than a queryset.
You can use that like this:
me = SomeModel.objects.get_or_create(id=1)
me.some_field = 100
me.save()
If you have multiple threads, your app will need to determine which instance of the model is correct. Usually what I do is refresh the model from the database, make changes, and then save it, so you don't have a long time in a disconnected state.
It's impossible in django do such upsert operation, with update. But queryset update method return number of filtered fields so you can do:
from django.db import router, connections, transaction
class MySuperManager(models.Manager):
def _lock_table(self, lock='ACCESS EXCLUSIVE'):
cursor = connections[router.db_for_write(self.model)]
cursor.execute(
'LOCK TABLE %s IN %s MODE' % (self.model._meta.db_table, lock)
)
def create_or_update(self, id, **update_fields):
with transaction.commit_on_success():
self.lock_table()
if not self.get_query_set().filter(id=id).update(**update_fields):
self.model(id=id, **update_fields).save()
this example if for postgres, you can use it without sql code, but update or insert operation will not be atomic. If you create a lock on table you will be sure that two objects will be not created in two other threads.
I think if you have critical demands on atom operations. You'd better design it in database level instead of Django ORM level.
Django ORM system is focusing on convenience instead of performance and safety. You have to optimize the automatic generated SQL sometimes.
"Transaction" in most productive databases provide database lock and rollback well.
In mashup(hybrid) systems, or say your system added some 3rd part components, like logging, statistics. Application in different framework or even language may access database at the same time, adding thread safe in Django is not enough in this case.
SomeModel.objects.filter(id=1).update(set__some_field=100)
In my Django app very often I need to do something similar to get_or_create(). E.g.,
User submits a tag. Need to see if
that tag already is in the database.
If not, create a new record for it. If
it is, just update the existing
record.
But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.
This must be a very common situation. How do I handle it in a threadsafe way?
Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
If you are using MySQL, be sure to use the READ COMMITTED isolation
level rather than REPEATABLE READ (the default), otherwise you may see
cases where get_or_create will raise an IntegrityError but the object
won’t appear in a subsequent get() call.
From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
Here's an example of how you could do it:
Define a model with either unique=True:
class MyModel(models.Model):
slug = models.SlugField(max_length=255, unique=True)
name = models.CharField(max_length=255)
MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>})
... or by using unique_togheter:
class MyModel(models.Model):
prefix = models.CharField(max_length=3)
slug = models.SlugField(max_length=255)
name = models.CharField(max_length=255)
class Meta:
unique_together = ("prefix", "slug")
MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>})
Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.
Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.
This must be a very common situation. How do I handle it in a threadsafe way?
Yes.
The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.
If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.
Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.
try transaction.commit_on_success decorator for callable where you are trying get_or_create(**kwargs)
"Use the commit_on_success decorator to use a single transaction for all the work done in a function.If the function returns successfully, then Django will commit all work done within the function at that point. If the function raises an exception, though, Django will roll back the transaction."
apart from it, in concurrent calls to get_or_create, both the threads try to get the object with argument passed to it (except for "defaults" arg which is a dict used during create call in case get() fails to retrieve any object). in case of failure both the threads try to create the object resulting in multiple duplicate objects unless some unique/unique together is implemented at database level with field(s) used in get()'s call.
it is similar to this post
How do I deal with this race condition in django?
So many years have passed, but nobody has written about threading.Lock. If you don't have the opportunity to make migrations for unique together, for legacy reasons, you can use locks or threading.Semaphore objects. Here is the pseudocode:
from concurrent.futures import ThreadPoolExecutor
from threading import Lock
_lock = Lock()
def get_staff(data: dict):
_lock.acquire()
try:
staff, created = MyModel.objects.get_or_create(**data)
return staff
finally:
_lock.release()
with ThreadPoolExecutor(max_workers=50) as pool:
pool.map(get_staff, get_list_of_some_data())
I have a model which contains a ManyToMany to User to keep track of which users have 'favorited' a particular model instance.
In my API for this model, when requested by an authenticated user, I'd like to include an 'is_favorite' boolean. However, it seems that any api fields that aren't straight model attributes must be implemented as a class method, which when called in Piston does not get a reference to the request object, and therefore I have no way to know who the current user is.
From the Piston docs:
In addition to these, you may define any other methods you want. You can use these by including their names in the fields directive, and by doing so, the function will be called with a single argument: The instance of the model. It can then return anything, and the return value will be used as the value for that key.
So, if only the Piston CRUD methods get an instance of the request, how can my classmethod fields generate output which is relevant to the current authenticated user?
I am not aware of the piston API, but how about using the thread locals middleware to access the request
add this to middleware
try:
from threading import local
except ImportError:
from django.utils._threading_local import local
_thread_locals = local()
def get_request():
return getattr(_thread_locals, 'request', None)
class ThreadLocals(object):
def process_request(self, request):
_thread_locals.request = request
and update the settings with the ThreadLocals middleware
and wherever you want to access the request import get_request from middleware
if you want to just get the current user, modify the middleware to set only request.user in thread locals
From the piston wiki page it says that you may specify the contents of foreign keys and many to many fields by nesting attributes. In your case
class FriendHandler(BaseHandler):
allowed_methods = ('GET',)
model = User
fields = ('userfield_1', 'userfield_2', ('friends', ('is_friended')))
def read(self, request):
# Anything else you might want to do, but return an object of type User
# Or whatever your model happens to be called
EDIT: Another slightly hacky way to do it (if you don't want the friend to get passed at all if the is_friended is false) would be to manually create a dict object structured how you like, and then return it. piston processes the dict a works with the built in emitters (the JSON one for sure, haven't tried the others)