We know, that update - is thread safe operation.
It means, that when you do:
SomeModel.objects.filter(id=1).update(some_field=100)
Instead of:
sm = SomeModel.objects.get(id=1)
sm.some_field=100
sm.save()
Your application is relativly thread safe and operation SomeModel.objects.filter(id=1).update(some_field=100) will not rewrite data in other model fields.
My question is.. If there any way to do
SomeModel.objects.filter(id=1).update(some_field=100)
but with creation of object if it does not exists?
from django.db import IntegrityError
def update_or_create(model, filter_kwargs, update_kwargs)
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
kwargs = filter_kwargs.copy()
kwargs.update(update_kwargs)
try:
model.objects.create(**kwargs)
except IntegrityError:
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
raise # re-raise IntegrityError
I think, code provided in the question is not very demonstrative: who want to set id for model?
Lets assume we need this, and we have simultaneous operations:
def thread1():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 1})
def thread2():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 2})
With update_or_create function, depends on which thread comes first, object will be created and updated with no exception. This will be thread-safe, but obviously has little use: depends on race condition value of SomeModek.objects.get(some__unique_field=1).some_field could be 1 or 2.
Django provides F objects, so we can upgrade our code:
from django.db.models import F
def thread1():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 1})
def thread2():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 2})
You want django's select_for_update() method (and a backend that supports row-level locking, such as PostgreSQL) in combination with manual transaction management.
try:
with transaction.commit_on_success():
SomeModel.objects.create(pk=1, some_field=100)
except IntegrityError: #unique id already exists, so update instead
with transaction.commit_on_success():
object = SomeModel.objects.select_for_update().get(pk=1)
object.some_field=100
object.save()
Note that if some other process deletes the object between the two queries, you'll get a SomeModel.DoesNotExist exception.
Django 1.7 and above also has atomic operation support and a built-in update_or_create() method.
You can use Django's built-in get_or_create, but that operates on the model itself, rather than a queryset.
You can use that like this:
me = SomeModel.objects.get_or_create(id=1)
me.some_field = 100
me.save()
If you have multiple threads, your app will need to determine which instance of the model is correct. Usually what I do is refresh the model from the database, make changes, and then save it, so you don't have a long time in a disconnected state.
It's impossible in django do such upsert operation, with update. But queryset update method return number of filtered fields so you can do:
from django.db import router, connections, transaction
class MySuperManager(models.Manager):
def _lock_table(self, lock='ACCESS EXCLUSIVE'):
cursor = connections[router.db_for_write(self.model)]
cursor.execute(
'LOCK TABLE %s IN %s MODE' % (self.model._meta.db_table, lock)
)
def create_or_update(self, id, **update_fields):
with transaction.commit_on_success():
self.lock_table()
if not self.get_query_set().filter(id=id).update(**update_fields):
self.model(id=id, **update_fields).save()
this example if for postgres, you can use it without sql code, but update or insert operation will not be atomic. If you create a lock on table you will be sure that two objects will be not created in two other threads.
I think if you have critical demands on atom operations. You'd better design it in database level instead of Django ORM level.
Django ORM system is focusing on convenience instead of performance and safety. You have to optimize the automatic generated SQL sometimes.
"Transaction" in most productive databases provide database lock and rollback well.
In mashup(hybrid) systems, or say your system added some 3rd part components, like logging, statistics. Application in different framework or even language may access database at the same time, adding thread safe in Django is not enough in this case.
SomeModel.objects.filter(id=1).update(set__some_field=100)
Related
I have an application in which I query against a SQL database and end up with a SQL Alchemy object representing a given row. Then, based on user input and a series of if/then statements I may perform an update on the SQLA object.
i.e.,
if 'fooinput1' in payload:
sqla_instance.foo1 = validate_foo1(fooinput1)
if 'fooinput2' in payload:
sqla_instance.foo2 = validate_foo2(fooinput2)
...
I now need to add modified_at and modified_by data to this system. Is it possible to check something on the SQLA instance like sqla_instance.was_modified or sqla_instance.update_pending to determine if a modification was performed?
(I recognize that I could maintain my own was_modified boolean, but since there are many of these if/then clauses that would lead to a lot of boilerplate which I'd like to avoid if possible.)
FWIW: this is a python 2.7 pyramids app reading from a MySQL db in the context of a web request.
The Session object SQL Alchemy ORM provides has two attributes that can help with what you are trying to do:
1) Session.is_modified()
2) Session.dirty
Depending on the number of fields which modification you want to track you may achieve what you need using ORM Events:
#sa.event.listens_for(Thing.foo, 'set')
def foo_changed(target, value, oldvalue, initiator):
target.last_modified = datetime.now()
#sa.event.listens_for(Thing.baz, 'set')
def baz_changed(target, value, oldvalue, initiator):
target.last_modified = datetime.now()
The session object provides events, which may help you.
just review once,
-->Transient to pending status for session object
How can I atomically compare-exchange-save a value of Django Model instance Field? (Using PostgreSQL as the DB backend).
An example use case is making sure multiple posts with similar content (e.g. submits of the same form) take effect only once, without relying on insecure and only sometimes-working client-side javascript or server-side tracking of form UUIDs, which isn't secure against malicious multiple-posts.
For example:
def compare_exchange_save(model_object, field_name, comp, exch):
# How to implement?
....
from django.views.generic.edit import FormView
from django.db import transaction
from my_app.models import LicenseCode
class LicenseCodeFormView(FormView):
def post(self, request, ...):
# Get object matching code entered in form
license_code = LicenseCode.objects.get(...)
# Safely redeem the code exactly once
# No change is made in case of error
try:
with transaction.atomic()
if compare_exchange_save(license_code, 'was_redeemed', False, True):
# Deposit a license for the user with a 3rd party service. Raises an exception if it fails.
...
else:
# License code already redeemed, don't deposit another license.
pass
except:
# Handle exception
...
What you are looking for is the update function on a QuerySet object.
Depending on the value, you can do a comparison with Case, When objects - check out the docs on conditional updates NOTE that link is for 1.10 - Case/When came in in 1.8.
You might also find utility in using F which is used to reference a value in a field.
For example:
I need to update a value in my model Model:
(Model.objects
.filter(id=my_id)
.update(field_to_be_updated=Case(
When(my_field=True, then=Value(get_new_license_string()),
default=Value(''),
output_field=models.CharField())))
If you need to use an F object, just reference it on the right hand side of the equals in the update expression.
The update doesn't necessitate the use of transaction.atomic() context manager but if you need to do any other database operations you should continue to wrap that code with transaction.atomic()
Edit:
You may also like to use the queryset select_for_update method that implements row locks when the queryset is executed docs.
I have this accounts model in my django project, which stores the account balance(available money) of all users. Almost every deduction from the account of users is preceded by an amount check i.e. check if the user has x amount of money or more. If yes then go ahead and deduct the amount.
account = AccountDetails.objects.get(user=userid)
if int(account.amount) >= fare:
account.amount = account.amount-fare
account.save()
Now I want to put a lock in the first .get() statement, so that race conditions can be avoided. A user makes a request twice, and the application executes the above code twice simultaneously, causing one of the requests to override the other.
I found out that select_for_update() does exactly what I want. It locks the row until the end of the transaction.
account = AccountDetails.objects.select_for_update().get(user=userid)
But it's only available in Django 1.4 or higher and I'm still using Django 1.3 and moving to a new version can't be done right now. Any ideas how can I achieve this in my present Django version?
Looks like you'll have to use raw SQL. I had a look through the current code and I think it would be more hassle to try and backport this yourself than it would be to just write the SQL.
account = AccountDetails.objects.raw(
"SELECT * FROM yourapp_accountdetails FOR UPDATE WHERE user = %s", [user.id]
)
For convenience and to keep your code DRY you can add this as a method to your AccountDetails model or something.
class AccountDetails(models.Model):
#classmethod
def get_locked_for_update(cls, user):
return cls.objects.raw(
"SELECT * FROM yourapp_accountdetails FOR UPDATE WHERE user = %s", [user.id]
)
yourapp is the name of your application that you would have given when you ran startapp. I'm assuming you have a foreign key relationship on your AccountDetails to a user model of some kind.
The current implementation of select_for_update on Django 1.5 looks like this:
def select_for_update(self, **kwargs):
"""
Returns a new QuerySet instance that will select objects with a
FOR UPDATE lock.
"""
# Default to false for nowait
nowait = kwargs.pop('nowait', False)
obj = self._clone()
obj.query.select_for_update = True
obj.query.select_for_update_nowait = nowait
return obj
So that's pretty simple code. Just setting a few flags on the query object. However those flags won't mean anything when the query is executed. So at some point you'll need to write raw SQL. At this point you only need that query on your AccountDetails model. So just put it there for now. Maybe later you'll need it on another model. That's when you'll have to decide how to share the code between models.
I first tried to override the delete() method but that doesn't work for QuerySet's bulk delete method. It should be related to pre_delete signal but I can't figure it out. My code is as following:
def _pre_delete_problem(sender, instance, **kwargs):
instance.context.delete()
instance.stat.delete()
But this method seems to be called infinitely and the program runs into a dead loop.
Can someone please help me?
If the class has foreign keys (or related objects) they are deleted by default like a DELETE CASCADE in sql.
You can change the behavior using the on_delete argument when defining the ForeignKey in the class, but by default it is CASCADE.
You can check the docs here.
Now the pre_delete signal works, but it doesn't call the delete() method if you are using a bulk delete, since its not deleting in a object by object basis.
In your case, using the post_delete signal instead of pre_delete should fix the infinite loop issue. Due to a ForeignKey's on_delete default value of cascade, using pre_delete logic this way will trigger the instance.context object to call delete on instance, which will then call instance.context, and so forth.
Using this approach:
def _post_delete_problem(sender, instance, **kwargs):
instance.context.delete()
instance.stat.delete()
post_delete.connect(_post_delete_problem, sender=Foo)
Can do the cleanup you want.
If you'd like a quick one-off to delete an instance and all of its related objects and those related objects' objects and so on without having to change the DB schema, you can do this -
def recursive_delete(to_del):
"""Recursively delete an object, all of its protected related
instances, those instances' protected instances, and so on.
"""
from django.db.models import ProtectedError
while True:
try:
to_del_pk = to_del.pk
if to_del_pk is None:
return # unsaved object
to_del.delete()
print(f"Deleted {to_del.__class__.__name__} with pk {to_del_pk}: {to_del}")
except ProtectedError as e:
for protected_ob in e.protected_objects:
recursive_delete(protected_ob)
Be careful, though!
I'd only use this to help with debugging in one-off scripts (or on the shell) with test databases that I don't mind wiping. Relationships aren't always obvious and if something is protected, it's probably for a reason.
In my Django app very often I need to do something similar to get_or_create(). E.g.,
User submits a tag. Need to see if
that tag already is in the database.
If not, create a new record for it. If
it is, just update the existing
record.
But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.
This must be a very common situation. How do I handle it in a threadsafe way?
Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
If you are using MySQL, be sure to use the READ COMMITTED isolation
level rather than REPEATABLE READ (the default), otherwise you may see
cases where get_or_create will raise an IntegrityError but the object
won’t appear in a subsequent get() call.
From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
Here's an example of how you could do it:
Define a model with either unique=True:
class MyModel(models.Model):
slug = models.SlugField(max_length=255, unique=True)
name = models.CharField(max_length=255)
MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>})
... or by using unique_togheter:
class MyModel(models.Model):
prefix = models.CharField(max_length=3)
slug = models.SlugField(max_length=255)
name = models.CharField(max_length=255)
class Meta:
unique_together = ("prefix", "slug")
MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>})
Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.
Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.
This must be a very common situation. How do I handle it in a threadsafe way?
Yes.
The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.
If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.
Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.
try transaction.commit_on_success decorator for callable where you are trying get_or_create(**kwargs)
"Use the commit_on_success decorator to use a single transaction for all the work done in a function.If the function returns successfully, then Django will commit all work done within the function at that point. If the function raises an exception, though, Django will roll back the transaction."
apart from it, in concurrent calls to get_or_create, both the threads try to get the object with argument passed to it (except for "defaults" arg which is a dict used during create call in case get() fails to retrieve any object). in case of failure both the threads try to create the object resulting in multiple duplicate objects unless some unique/unique together is implemented at database level with field(s) used in get()'s call.
it is similar to this post
How do I deal with this race condition in django?
So many years have passed, but nobody has written about threading.Lock. If you don't have the opportunity to make migrations for unique together, for legacy reasons, you can use locks or threading.Semaphore objects. Here is the pseudocode:
from concurrent.futures import ThreadPoolExecutor
from threading import Lock
_lock = Lock()
def get_staff(data: dict):
_lock.acquire()
try:
staff, created = MyModel.objects.get_or_create(**data)
return staff
finally:
_lock.release()
with ThreadPoolExecutor(max_workers=50) as pool:
pool.map(get_staff, get_list_of_some_data())