Django Rest Framework Viewsets / Database Locks - python

Hi there so i am using Django Rest Framework 3.1 and i was wondering if its possible to "protect" my viewsets / database against writes in a per user basis?
In other words if 1 user is saving something the other one cannot save and it either waits till the first user finishes or returns some kind of error.
I tried looking for this answer but couldn't find it.
Is this behavior already implemented? if not how can i achieve this in practice?
UPDATE after some more thinking:
This is just a theory still, it needs more thinking, but if we use a Queue (Redis or Rabbitmq) we can put all synchronization writes requests in the queue instead of processing them right away and in conjunction with some user specific lock variable (maybe in the user sessions db table) we can ask if there are any users in front of us belonging to the same proponent and if those users have finished writing their updates or not (using the lock)
cheers

Database transactions will provide some of the safety you're looking for, I think. If a number of database operations are wrapped in a transaction, they are applied to the database together, so a sequence of operations cannot fail mid-way through and leave the database in an invalid state.
Other users will see the results of the operations as if they were applied all at once, or not at all (in the case of an error).

Related

Should buisness logîc be enforced in django python or in sql?

I am writing a rest backend for a project. Heres a basic example of the kind of problems I'm trying to solve.
These students have tests with grades and each student has a column current_average_grade.
Everytime a test is stored, this average_grade_should be updated (using all existing tests).
So the question is, should this be calculated and stored with within the post view of django (get all grades from db and then do the calculation) or with an sql trigger and only use django to convert json to sql.
The advantage of using sql for this, is of course it should theoretically be much faster and you also get concurrency for free.
The disadvantage is that since I am now programming sql, I have yet another codebase to manage and it might even create problems with django.
So whats the ideal solution here? How do I enforce buisness logic in an elegant way?
I think dealing it in Django views will be a better idea. In this way you can control the business logic in a better way and also you don't need to manage the database extensively.
And for handling concurrency Django provides a beautiful way in the form of select_for_update().
Handling Concurrency in Django
To acquire a lock on a resource we use a database lock.
And in Django we use select_for_update() for database lock.
Example Code
from django.db import transaction
entries = Entry.objects.select_for_update().filter(author=request.user)
with transaction.atomic():
for entry in entries:
# action on each entry in thread-safe way

Updating model on save vs count objects in the ListView - what's better performance-wise?

I'm creating a forum script. Currently I'm trying to optimize things and looking for answer from more experienced developers.
For example - let's assume we are working on ListView of Category, which should list all threads within the same forum category. For every thread in category we are listing fields such as:
Thread name
Thread author
Number of posts
Number of views
Last post details (author, date)
What is the best performance approach to calculate number of posts? Currently I'm thinking about 3 solutions.
Use annotate() on queryset
add IntegerField posts_number to Thread model. Increment the value on save() in Post model and decrement on delete()
Use memcache to cache read-only SQL queries and force cache refresh on every save() in Post model.
I'm aware this is not an issue with low traffic forums, but I would love to know what's the best approach
I generally handle the posts count on the thread model itself not as an extra model.field but as a method or a property and cache the evaluated value one time and invalidate cache for that thread only when there is a new post on that thread. This way
not all the cached counts are invalidated when there is new post on another thread
I can access the post count from all over the application without a database hit
you don't need to query each time for posts count only when is a change to posts count (deletion and insertion)
And for your solution
Annotation is faster than calculating count per thread in a for loop but then you have to count it every time even when there has been no new Post..
Integer field on the Thread model is prone to data inconsistency especially in the long run (for example from the admin if 2 users are accessing it or a new post is created while you are working in the admin. So you might end up writing thread safe code with locks or end up writing extra boiler plate for making it read-only e.g. taking care of user writing on it with an 'all' serializer and etc.)
For your memcache solution i think it's best when they have not been bind together (new post on thread A won't make you recalculate count for all the threads)
plus it's not a good practice to handle updating cache in model.save since it's called all the time (e.g. editing a post) it's better to invalidate and not update cache where you actually create or delete a post (e.g. in the admin and writing a custom form, or in the your view or serializer.perform_create or in signals but watch out for soft deletes and etc..)
Update:
Since your question is about performance you should take a look django ORM optimization doc most importantly select_related and prefetch_related
Also if you don't need the python object after getting them from the database and just need their value don't convert them to python objects

How to instruct SQLAlchemy ORM to execute multiple queries in parallel when loading relationships?

I am using SQLAlchemy's ORM. I have a model that has multiple many-to-many relationships:
User
User <--MxN--> Organization
User <--MxN--> School
User <--MxN--> Credentials
I am implementing these using association tables, so there are also User_to_Organization, User_to_School and User_to_Credentials tables that I don't directly use.
Now, when I attempt to load a single User (using its PK identifier) and its relationships (and related models) using joined eager loading, I get horrible performance (15+ seconds). I assume this is due to this issue:
When multiple levels of depth are used with joined or subquery loading, loading collections-within- collections will multiply the total number of rows fetched in a cartesian fashion. Both forms of eager loading always join from the original parent class.
If I introduce another level or two to the hierarchy:
Organization <--1xN--> Project
School <--1xN--> Course
Project <--MxN--> Credentials
Course <--MxN--> Credentials
The query takes 50+ seconds to complete, even though the total amount of records in each table is fairly small.
Using lazy loading, I am required to manually load each relationship, and there are multiple round trips to the server.
e.g.
Operations, executed serially as queries:
Get user
Get user's Organizations
Get user's Schools
Get user's credentials
For each Organization, get its Projects
For each School, get its Courses
For each Project, get its Credentials
For each Course, get its Credentials
Still, it all finishes in less than 200ms.
I was wondering if there is anyway to indeed use lazy loading, but perform the relationship loading queries in parallel. For example, using the concurrent module, asyncio or by using gevent.
e.g.
Step 1 (in parallel):
Get user
Get user's Organizations
Get user's Schools
Get user's credentials
Step 2 (in parallel):
For each Organization, get its Projects
For each School, get its Courses
Step 3 (in parallel):
For each Project, get its Credentials
For each Course, get its Credentials
Actually, at this point, making a subquery type load can also work, that is, return Organization and OrganizationID/Project/Credentials in two separate queries:
e.g.
Step 1 (in parallel):
Get user
Get user's Organizations
Get user's Schools
Get user's credentials
Step 2 (in parallel):
Get Organizations
Get Schools
Get the Organizations' Projects, join with Credentials
Get the Schools' Courses, join with Credentials
The first thing you're going to want to do is check to see what queries are actually being executed on the db. I wouldn't assume that SQLAlchemy is doing what you expect unless you're very familiar with it. You can use echo=True on your engine configuration or look at some db logs (not sure how to do that with mysql).
You've mentioned that you're using different loading strategies so I guess you've read through the docs on that (
http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html). For what you're doing, I'd probably recommend subquery load, but it totally depends on the number of rows / columns you're dealing with. In my experience it's a good general starting point though.
One thing to note, you might need to something like:
db.query(Thing).options(subqueryload('A').subqueryload('B')).filter(Thing.id==x).first()
With filter.first rather that get, as the latter case won't re-execute queries according to your loading strategy if the primary object is already in the identity map.
Finally, I don't know your data - but those numbers sound pretty abysmal for anything short of a huge data set. Check that you have the correct indexes specified on all your tables.
You may have already been through all of this, but based on the information you've provided, it sounds like you need to do more work to narrow down your issue. Is it the db schema, or is it the queries SQLA is executing?
Either way, I'd say, "no" to running multiple queries on different connections. Any attempt to do that could result in inconsistent data coming back to your app, and if you think you've got issues now..... :-)
MySQL has no parallelism in a single connection. For the ORM to do such would require multiple connections to MySQL. Generally, the overhead of trying to do such is "not worth it".
To get a user, his Organizations, Schools, etc, can all be done (in mysql) via a single query:
SELECT user, organization, ...
FROM Users
JOIN Organizations ON ...
etc.
This is significantly more efficient than
SELECT user FROM ...;
SELECT organization ... WHERE user = ...;
etc.
(This is not "parallelism".)
Or maybe your "steps" are not quite 'right'?...
SELECT user, organization, project
FROM Users
JOIN Organizations ...
JOIN Projects ...
That gets, in a single step, all users, together with all their organizations and projects.
But is a "user" associated with a "project"? If not, then this is the wrong approach.
If the ORM is not providing a mechanism to generate queries like those, than it is "getting in the way".

Checking username availability - Handling of AJAX requests (Google App Engine)

I want to add the 'check username available' functionality on my signup page using AJAX. I have few doubts about the way I should implement it.
With which event should I register my AJAX requests? We can send the
requests when user focus out of the 'username' input field (blur
event) or as he types (keyup event). Which provides better user
experience?
On the server side, a simple way of dealing with requests would be
to query my main 'Accounts' database. But this could lead to a lot
of request hitting my database(even more if we POST using the keyup
event). Should I maintain a separate model for registered usernames
only and use that to get better results?
Is it possible to use Memcache in this case? Initializing cache with
every username as key and updating it as we register users and use a
random key to check if cache is actually initialized or pass the
queries directly to db.
Answers -
Do the check on blur. If you do it on key up, you will be hammering your server with unnecessary queries, annoying the user who is not yet done typing, and likely lag the typing anyway.
If your Account entity is very large, you may want to create a separate AccountName entity, and create a matching such entity whenever you create a real Account (but this is probably an unnecessary optimization). When you create the Account (or AccountName), be sure to assign id=name when you create it. Then you can do an AccountName.get_by_id(name) to quickly see if the AccountName has already been assigned, and it will automatically pull it from memcache if it has been recently dealt with.
By default, GAE NDB will automatically populate memcache for you when you put or get entities. If you follow my advice in step 2, things will be very fast and you won't have to mess around with pre-populating memcache.
If you are concerned about 2 people simultaneously requesting the same user name, put your create method in a transaction:
#classmethod
#ndb.transactional()
def create_account(cls, name, other_params):
acct = Account.get_by_id(name)
if not acct:
acct = Account(id=name, other param assigns)
acct.put()
I would recommend the blur event of the username field, combined with some sort of inline error/warning display.
I would also suggest maintaining a memcache of registered usernames, to reduce DB hits and improve user experience - although probably not populate this with a warm-up, but instead only when requests are made. This is sometimes called a "Repository" pattern.
BUT, you can only populate the cache with USED usernames - you should not store the "available" usernames here (or if you do, use a much lower timeout).
You should always check directly against the DB/Datastore when actually performing the registration. And ideally in some sort of transactional method so that you don't have race conditions with multiple people registering.
BUT, all of this work is dependant on several things, including how busy your app is and what data storage tech you are using!

Would using transactions in a Celery task in Django application cause problems?

I have a set of celery tasks that I've written. Each of these tasks take a — just an example — author id as a parameter and for each of the books for the author, it fetches the latest price and stores it in the database.
I'd like to add transactions to my task by adding Django's
#transaction.commit_on_success decorator to my tasks. If any task crashes, I'd like the whole task to fail and nothing to be saved to the database.
I have a dozen or so celery workers that check the prices of books for a author and I'm wondering if this simple transactional logic would cause locking and race conditions in my Postgres database.
I've dug around and found this project called django-celery-transactions but I still haven't understood the real issue behind this and what this project tried to solve.
The reasoning is that in your Django view the DB transaction is not committed until the view has exited if you apply the decorator. Inside the view before it returns and triggers the commit you may invoke tasks that expect the DB transaction to already be committed i.e. for those entries to exist in the DB context.
In order to guard against this race condition (task starting before your view and consequently transaction finished) you can either manually manage it or use the module you mentioned which handles it automatically for you.
The example where it might fail for instance in your case is if you are adding a new author and you have a task that fetches prices for all/any of its books. Should the task execute before the commit for the new author transaction is done, your task will try to fetch Author with an id that does not yet exist.
It depends on several things including: the transaction isolation level of your database, how frequently you check for price updates, and how often you expect prices to change. If, for example, you were making a very large number of updates per second to stock standard PostgreSQL, you might get different results executing the same select statement multiple times in a transaction.
Databases are optimized to handle concurrency so I don't think this is going to be a problem for you; especially if you don't open the transaction until after fetching prices (i.e. use a context manager rather than decorating the task). If — for some reason — things get slow in the future, optimize then (fetch prices less frequently, tweak database configuration, etc.).
As for you other question: django-celery-transactions aims to prevent race conditions between Django and Celery. One example is if you were to pass the primary key of a newly created object to a task: the task may attempt to retrieve the object before the view's transaction has been committed. Boom!

Categories

Resources