Django matching query does not exist after object save in Celery task - python

I have the following code:
#task()
def handle_upload(title, temp_file, user_id):
.
.
.
photo.save()
#if i insert here "photo2 = Photo.objects.get(pk=photo.pk)" it works, including the view function
return photo.pk
#view function
def upload_status(request):
task_id = request.POST['task_id']
async_result = AsyncResult(task_id)
photo_id = async_result.get()
if async_result.successful():
photo = Photo.objects.get(pk=photo_id)
I use an ajax request to check for the uploaded file but after the celery task finishes i get a Photo matching query does not exist. The photo pk does exist and gets returned. If i query the database manually it works. Is this some sort of database lag? How can I fix it?
I'm using Django 1.4 and Celery 3.0

You can confirm if it is a lag issue by adding a delay to your django view to wait after the task has successfully finished for a a few seconds. If that resolves the problem you might want to wrap the handle_upload in a transaction to block until the db has completely confirmed it has finished before returning.
Beside Django, DB too has its own caches. When django invokes the queryset, it gets stale data either from its own caches (unlikely unless you were reusing querysets, which I didn't see in the portion of the code you posted) or the DB is caching results for the same Django connection.
For example if you were to invoke post processing after the celery task has finished in a completely new django request/view you would probably see the new changes in DB just fine. However, since your view was blocked while the task was executing (which defeats the purpose of celery btw) internally django only keeps the snapshot of the DB at the time the view was entered. Therefore your get fails and you confirmed this behavior directly when simply entering the django shell.
You can fix this like you already did by either:
invoking transactional management which will refresh the snapshot
changing on your DB endpoint caching and autocommit policies
have celery make a callback to django (web request) once it is done to finalize processing (which is likely what you want to do anyway because blocking django defeats the purpose)

Related

Flask-SQLAlchemy with multiple gunicorn workers causes inconsistent reads

I have developed a Flask application, and so far I have only ever deployed it using a single worker. The app connects to an SQLite DB using Flask-SQLAlchemy. In the beginning, I check if my DB already has data, and if not, I initialize some data like a default setup user like this:
root_user = User.query.filter_by(username='root').one_or_none()
if not root_user:
new_user = User(username="root", password_hash="SomeLongAndSecurePasswordHash")
new_user.roles = [serveradmin_role]
db.session.add(new_user)
When I run this code with multiple gunicorn workers and threads, the workers crash because they try to create multiple root users, which fails my UNIQUE constraint in the DB. Apparently, they all read the DB at the same time, when the root user does not exists yet, and then they all try to write the user to the DB, which only works for one of the workers.
What would be a good way of preventing this? I feel like my code should just deal better with the SQLAlchemy error being thrown, or is there anything I am missing here? The same thing might also happen in production, if two people try to create the same user at exactly the same time, how would I deal with it there?

Django cache_page - prepopulate/pre-cache

I have one DB query that takes a couple of seconds in production. I have also a DRF ViewSet action that returns this query.
I'm already caching this action using cache_page.
#method_decorator(cache_page(settings.DEFAULT_CACHE_TIMEOUT))
#action(detail=False)
def home(self, request) -> Response:
articles = Article.objects.home()
return Response(serializers.ArticleListSerializer(articles, many=True).data,
headers={'Access-Control-Allow-Origin': '*'})
The problem is that after 15 minutes, at least one user needs to wait 15 seconds for the response. I want to pre-cache this every 5 minutes in background so that no user will need to wait.
I use the default caching mechanism.
My idea is to create a management command that will be executed using crontab. Every 5 minutes it will call the Article.objects.home() or the ViewSet.action and change it's value in the cache.
As this is only one entry, I don't hesitate to use database caching.
How would you do that?
EDIT: as the default LocMemCache is single-threaded, I'll go with the database caching. I just don't know how to manually cache the view or QuerySet.
A cron or Celery beat task (if you already use celery) looks like the best option.
Calling Article.objects.home() would not do much unless you cache in home() method of the manager (which could be a valid option that could simplify automated cache refresh).
To automate the refresh of view cache you better send actual requests to the URL from the management command. You will also want to invalidate the cache before sending the request, in order to update it.
Also, keep in mind the cache timeout when planning the job frequency. You wouldn't want to refresh too early nor too late.

Pyramid exception logging with SQLAlchemy - commands not committing

I am using the Pyramid web framework with SQLAlchemy, connected to a MySQL backend. The app I've put together works, but I'm trying to add some polish by way of some enhanced logging and exception handling.
I based everything off of the basic SQLAlchemy tutorial on the Pyramid site, using the session like so:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
Using DBSession to query works great, and if I need to add and commit something to the database I'll do something like
DBSession.add(myobject)
DBSession.flush()
So I get my new ID.
Then I wanted to add logging to the database, so I followed this tutorial. That seemed to work great. I did initially run into some weirdness with things getting committed and I wasn't sure how SQLAlchemy was working so I had changed "transaction.commit()" to "DBSession.flush()" to force the logs to commit (this is addressed below!).
Next I wanted to add custom exception handling with the intent that I could put a friendly error page for anything that wasn't explicitly caught and still log things. So based on this documentation I created error handlers like so:
from pyramid.view import (
view_config,
forbidden_view_config,
notfound_view_config
)
from pyramid.httpexceptions import (
HTTPFound,
HTTPNotFound,
HTTPForbidden,
HTTPBadRequest,
HTTPInternalServerError
)
from models import DBSession
import transaction
import logging
log = logging.getLogger(__name__)
#region Custom HTTP Errors and Exceptions
#view_config(context=HTTPNotFound, renderer='HTTPNotFound.mako')
def notfound(request):
log.exception('404 not found: {0}'.format(str(request.url)))
request.response.status_int = 404
return {}
#view_config(context=HTTPInternalServerError, renderer='HTTPInternalServerError.mako')
def internalerror(request):
log.exception('HTTPInternalServerError: {0}'.format(str(request.url)))
request.response.status_int = 500
return {}
#view_config(context=Exception, renderer="HTTPExceptionCaught.mako")
def error_view(exc, request):
log.exception('HTTPException: {0}'.format(str(request.url)))
log.exception(exc.message)
return {}
#endregion
So now my problem is, exceptions are caught and my custom exception view comes up as expected. But the exceptions aren't logged to the database. It appears this is because the DBSession transaction is rolled back on any exception. So I changed the logging handler back to "transaction.commit". This had the effect of actually committing my exception logs to the database, BUT now any DBSession action after any log statement throws an "Instance not bound to a session" error...which makes sense because from what I understand after a transaction.commit() the session is cleared out. The console log always shows exactly what I want logged, including the SQL statements to write the log info to the database. But it's not committing on exception unless I use transaction.commit(), but if I do that then I kill any DBSession statements after the transaction.commit()!.
Sooooo....how might I set things up so that I can log to the database, but also catch and successfully log exceptions to the database, too? I feel like I want the logging handler to use some sort of separate database session/connection/instance/something so that it is self-contained but I'm unclear on how that might work.
Or should I architect what I want to do completely different?
EDIT:
I did end up going with a separate, log-specific session dedicated only to adding committing log info to the database. This seemed to work well until I started integrating a Pyramid console script into the mix, in which I ran into problems with sessions and database commits within the script not necessarily working like they do in the actual Pyramid web application.
In hindsight (and what I'm doing now) instead of logging to a database I use the standard logging and FileHandlers (TimedRotatingFileHandlers specifically) and log to the file system.
Using transaction.commit() has an unintended side-effect of the changes to other models being committed too, which is not too cool - the idea behind the "normal" Pyramid session setup with ZopeTransactionExtension is that a single session starts at the beginning of the request, then if everything succeeds the session is committed, if there's an exception then everything is rolled back. It would be better to keep this logic and avoid committing things manually in the middle of request.
(as a side note - DBSession.flush() does not commit the transaction, it emits the SQL statements but the transaction can be rolled back later)
For things like exception logs, I would look at setting up a separate Session which is not bound to Pyramid's request/response cycle (without ZopeTransactionExtension) and then using it to create log records. You'd need to commit the transaction manually after adding a log record:
record = Log("blah")
log_session.add(record)
log_session.commit()

whats the difference between using amqp backend and database backend for Celery results?

I don't understand where the actual results are getting stored in either case.
I am using django-celery and sqlite as my database in a test application. I am using RabbitMQ as my broker.
I tried setting CELERY_RESULT_BACKEND = "amqp" and also to "database" with CELERY_RESULT_DBURI="mysqlitedb"
But I don't understand how to interact with the results once they are stored, in either case.
I think I fail to understand the basic concepts surrounding what happens to a result once the worker returns at the end of a task.
When you are sending task to celery, you get AsyncResult as a result. It has an id attribute which you can store somewhere and then use result to check and retrieve actual result of task execution.
Result storage is AMQP or database table. First is faster and do not add load on DB but needs some additional setup.

Dynamic templatetag

I have my own templatetag:
#register.inclusion_tag('path_to_module.html', takes_context=True)
def getmodule(context, token):
try:
return slow_function(params)
except Exception, e:
return None
And it is very slow. Template waiting for this tags.
Can I call them asynchonously?
If it's cacheable (doesn't need to be unique per page view); then cache it. Either using Django's cache API in your templatetag, or template fragment caching directly in your template. As #jpic says, if it's something that takes a while to recalculate - pass it off to a task queue like Celery.
If you need this function to run every page view for whatever reason; then separate it out in to a new view and load it in to some container in your main template asynchronously using JavaScript.
You can execute python functions in a background process:
django-ztask
celery (django kombu for database transport)
uWSGI spooler (if using uWSGI for deployment)
You could create a background task that renders path_to_module and caches the output. When the cache should be invalidated: run slow_function in the background again.

Categories

Resources