Django, how to make a view atomic? - python

I have a simple django app to simulate a stock market, users come in and buy/sell. When they choose to trade,
the market price is read, and
based on the buy/sell order the market price is increased/decreased.
I'm not sure how this works in django, but is there a way to make the view atomic? i.e. I'm concerned that user A's actions may read the price but before it's updated because of his order, user B's action reads the price.
Couldn't find a simple, clean solution for this online. Thanks.

This is database transactions, with some notes. All notes for Postgresql; all databases have locking mechanisms but the details are different.
Many databases don't do this level of locking by default, even if you're in a transaction. You need to get an explicit lock on the data.
In Postgresql, you probably want SELECT ... FOR UPDATE, which will lock the returned rows. You need to use FOR UPDATE on every SELECT that wants to block if another user is about to update them.
Unfortunately, there's no way to do a FOR UPDATE in Django's ORM. You'd eitiher need to hack the ORM a bit or use raw SQL, as far as I know. If this is low-performance code and you can afford to serialize all access to the table, you can use a table-level LOCK IN EXCLUSIVE MODE, which will serialize the whole table.
http://www.postgresql.org/docs/current/static/explicit-locking.html
http://www.postgresql.org/docs/current/static/sql-lock.html
http://www.postgresql.org/docs/current/static/sql-select.html

Pretty old question, but since 1.6 you can use transaction.atomic() as decorator.
views.py
#transaction.atomic()
def stock_action(request):
#trade here

Wrap the DB queries that read and the ones that update in a transaction. The syntax depends on what ORM you are using.

Related

Should buisness logîc be enforced in django python or in sql?

I am writing a rest backend for a project. Heres a basic example of the kind of problems I'm trying to solve.
These students have tests with grades and each student has a column current_average_grade.
Everytime a test is stored, this average_grade_should be updated (using all existing tests).
So the question is, should this be calculated and stored with within the post view of django (get all grades from db and then do the calculation) or with an sql trigger and only use django to convert json to sql.
The advantage of using sql for this, is of course it should theoretically be much faster and you also get concurrency for free.
The disadvantage is that since I am now programming sql, I have yet another codebase to manage and it might even create problems with django.
So whats the ideal solution here? How do I enforce buisness logic in an elegant way?
I think dealing it in Django views will be a better idea. In this way you can control the business logic in a better way and also you don't need to manage the database extensively.
And for handling concurrency Django provides a beautiful way in the form of select_for_update().
Handling Concurrency in Django
To acquire a lock on a resource we use a database lock.
And in Django we use select_for_update() for database lock.
Example Code
from django.db import transaction
entries = Entry.objects.select_for_update().filter(author=request.user)
with transaction.atomic():
for entry in entries:
# action on each entry in thread-safe way

Django Rest Framework Viewsets / Database Locks

Hi there so i am using Django Rest Framework 3.1 and i was wondering if its possible to "protect" my viewsets / database against writes in a per user basis?
In other words if 1 user is saving something the other one cannot save and it either waits till the first user finishes or returns some kind of error.
I tried looking for this answer but couldn't find it.
Is this behavior already implemented? if not how can i achieve this in practice?
UPDATE after some more thinking:
This is just a theory still, it needs more thinking, but if we use a Queue (Redis or Rabbitmq) we can put all synchronization writes requests in the queue instead of processing them right away and in conjunction with some user specific lock variable (maybe in the user sessions db table) we can ask if there are any users in front of us belonging to the same proponent and if those users have finished writing their updates or not (using the lock)
cheers
Database transactions will provide some of the safety you're looking for, I think. If a number of database operations are wrapped in a transaction, they are applied to the database together, so a sequence of operations cannot fail mid-way through and leave the database in an invalid state.
Other users will see the results of the operations as if they were applied all at once, or not at all (in the case of an error).

How to make Django QuerySet bulk delete() more efficient

Setup:
Django 1.1.2, MySQL 5.1
Problem:
Blob.objects.filter(foo = foo) \
.filter(status = Blob.PLEASE_DELETE) \
.delete()
This snippet results in the ORM first generating a SELECT * from xxx_blob where ... query, then doing a DELETE from xxx_blob where id in (BLAH); where BLAH is a ridiculously long list of id's. Since I'm deleting a large amount of blobs, this makes both me and the DB very unhappy.
Is there a reason for this? I don't see why the ORM can't convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?
For those who are still looking for an efficient way to bulk delete in django, here's a possible solution:
The reason delete() may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.
If you know your models don't have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete as follows:
queryset._raw_delete(queryset.db)
More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.
Not without writing your own custom SQL or managers or something; they are apparently working on it though.
http://code.djangoproject.com/ticket/9519
Bulk delete is already part of django
Keep in mind that this will, whenever possible, be executed purely in SQL

Aggregating save()s in Django?

I'm using Django with an sqlite backend, and write performance is a problem. I may graduate to a "proper" db at some stage, but for the moment I'm stuck with sqlite. I think that my write performance problems are probably related to the fact that I'm creating a large number of rows, and presumably each time I save() one it's locking, unlocking and syncing the DB on disk.
How can I aggregate a large number of save() calls into a single database operation?
EDITED: commit_on_success is deprecated and was removed in Django 1.8. Use transaction.atomic instead. See Fraser Harris's answer.
Actually this is easier to do than you think. You can use transactions in Django. These batch database operations (specifically save, insert and delete) into one operation. I've found the easiest one to use is commit_on_success. Essentially you wrap your database save operations into a function and then use the commit_on_success decorator.
from django.db.transaction import commit_on_success
#commit_on_success
def lot_of_saves(queryset):
for item in queryset:
modify_item(item)
item.save()
This will have a huge speed increase. You'll also get the benefit of having roll-backs if any of the items fail. If you have millions of save operations then you may have to commit them in blocks using the commit_manually and transaction.commit() but I've rarely needed that.
New as of Django 1.6 is atomic, a simple API to control DB transactions. Copied verbatim from the docs:
atomic is usable both as a decorator:
from django.db import transaction
#transaction.atomic
def viewfunc(request):
# This code executes inside a transaction.
do_stuff()
and as a context manager:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
Legacy django.db.transaction functions autocommit(), commit_on_success(), and commit_manually() have been deprecated and will be remove in Django 1.8.
I think this is the method you are looking for: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create
Code copied from the docs:
Entry.objects.bulk_create([
Entry(headline='This is a test'),
Entry(headline='This is only a test'),
])
Which in practice, would look like:
my_entries = list()
for i in range(100):
my_entries.append(Entry(headline='Headline #'+str(i))
Entry.objects.bulk_create(my_entries)
According to the docs, this executes a single query, regardless of the size of the list (maximum 999 items on SQLite3), which can't be said for the atomic decorator.
There is an important distinction to make. It sounds like, from the OP's question, that he is attempted to bulk create rather than bulk save. The atomic decorator is the fastest solution for saving, but not for creating.
"How can I aggregate a large number of save() calls into a single database operation?"
You don't need to. Django already manages a cache for you. You can't improve it's DB caching by trying to fuss around with saves.
"write performance problems are probably related to the fact that I'm creating a large number of rows"
Correct.
SQLite is pretty slow. That's the way it is. Queries are faster than most other DB's. Writes are pretty slow.
Consider more serious architecture change. Are you loading rows during a web transaction (i.e., bulk uploading files and loading the DB from those files)?
If you're doing bulk loading inside a web transaction, stop. You need to do something smarter. Use celery or use some other "batch" facility to do your loads in the background.
We try to limit ourself to file validation in a web transaction and do the loads when the user's not waiting for their page of HTML.

Does Python Django support custom SQL and denormalized databases with no Foreign Key relationships?

I've just started learning Python Django and have a lot of experience building high traffic websites using PHP and MySQL. What worries me so far is Python's overly optimistic approach that you will never need to write custom SQL and that it automatically creates all these Foreign Key relationships in your database. The one thing I've learned in the last few years of building Chess.com is that its impossible to NOT write custom SQL when you're dealing with something like MySQL that frequently needs to be told what indexes it should use (or avoid), and that Foreign Keys are a death sentence. Percona's strongest recommendation was for us to remove all FKs for optimal performance.
Is there a way in Django to do this in the models file? create relationships without creating actual DB FKs? Or is there a way to start at the database level, design/create my database, and then have Django reverse engineer the models file?
If you don't want foreign keys, then avoid using
models.ForeignKey(),
models.ManyToManyField(), and
models.OneToOneField().
Django will automatically create an auto-increment int field named id that you can use to refer to individual records, or you can override that by marking a field as primary_key=True.
There is also documentation on running raw SQL queries on the database.
Raw SQL is as easy as this :
for obj in MyModel.objects.raw('SELECT * FROM myapp_mymodel'):
print obj
Denormalizing a database is up to you at model definition time.
You can use non-relational databases (MongoDB, ...) too with Django NonRel
django-admin inspectdb allows you to reverse engineer a models file from existing tables. That is only a very partial response to your question ;)
You can just create the model.py and avoid having SQL Alchemy automatically create the tables leaving it up to you to define the actual tables as you please. So although there are foreign key relationships in the model.py this does not mean that they must exist in the actual tables. This is a very good thing considering how ludicrously foreign key constraints are implemented in MySQL - MyISAM just ignores them and InnoDB creates a non-optional index on every single one regardless of whether it makes sense.
I concur with the 'no foreign keys' advice (with the disclaimer: I also work for Percona).
The reason why it is is recommended is for concurrency / reducing locking internally.
It can be a difficult "optimization" to sell, but if you consider that the database has transactions (and is more or less ACID compliant) then it should only be application-logic errors that cause foreign-key violations. Not to say they don't exist, but if you enable foreign keys in development hopefully you should find at least a few bugs.
In terms of whether or not you need to write custom SQL:
The explanation I usually give is that "optimization rarely decreases complexity". I think it is okay to stick with an ORM by default, but if in a profiler it looks like one particular piece of functionality is taking a lot more time than you suspect it would when written by hand, then you need to be prepared to fix it (assuming the code is called often enough).
The real secret here is that you need good instrumentation / profiling in order to be frugal with your complexity adding optimization(s).

Categories

Resources