I have a Django 1.11 + MySQL + Celery 4.1 project where a view creates a new user record, and then kicks off a Celery task to perform additional long-running actions in relation to it.
The typical problem in this case is ensuring that the user creation is committed to the database before the Celery task executes. Otherwise, there's a race condition, and the task may try and access a record that doesn't exit if it executes before the transaction commits.
The way I had learned to fix this was to always wrap the record creation in a manual transaction or atomic block, and then trigger the Celery task after that. e.g.
def create_user():
with transaction.atomic():
user = User.objects.create(username='blah')
mytask.apply_async(args=[user.id])
#task
def mytask(user_id):
user = User.objects.get(id=user_id)
do_stuff(user)
However, I still occasionally see the error DoesNotExist: User matching query does not exist in my Celery worker logs, implying my task is sometimes executing before the user record gets committed.
Is this not the correct strategy or am I not implementing it correctly?
I believe a post_save signal would be more appropriate for what you're trying to do: https://docs.djangoproject.com/en/1.11/ref/signals/#post-save. This signal sends a created argument as a boolean, making it easy to operate only on object creation.
Related
I have an API endpoint to register new user. The "welcome email" will enqueue and do this task async. I have 2 unit tests to check:
Api does save user's information to DB OK
The Celery task does send email with right content+template
I want to add 3rd unit test to ensure "The endpoint has to enqueue email-sending after saving user form to DB"
I try with celery.AsyncResult but it ask me to run the worker. For further, even if the worker is ready, we still can't verify the task was enqueued or not because the ambiguous PENDING state:
Task exists in queue but not execute yet: PENDING
Task doesn't exist in queue: PENDING
Does anyone face this problem? How do I solve it?
Common way to solve this problem in testing environments is to use the task_always_eager configuration setting, which basically instructs Celery to run the task like a regular function. Instead of the AsyncResult, Celery will make an object of the EagerResult type that behaves the same, but has completely different execution logic.
I am trying to find a bug which happens from time to time on our production server, but could not be reproduced otherwise: some value in the DB gets changed in a way which I don't want it to.
I could write a PostgreSQL trigger which fires if this bug happens, and raise an exception from said trigger. I would see the Python traceback which executes the unwanted SQL statement.
But in this case I don't want to stop the processing of the request.
Is there a way to log the Python/Django traceback from within a PostgreSQL trigger?
I know that this is not trival since the DB code runs under a different linux process with a different user id.
I am using Python, Django, PostgreSQL, Linux.
I guess this is not easy since the DB trigger runs in a different context than the python interpreter.
Please ask if you need further information.
Update
One solution might be to overwrite connection.notices of psycopg2.
Is there a way to log the Python/Django traceback from within a PostgreSQL trigger?
No, there is not
The (SQL) query is executed on the DBMS-server, and so is the code inside the trigger
The Python code is executed on the client which is a different process, possibly executed by a different user, and maybe even on a different machine.
The only connection between the server (which detects the condition) and the client (which needs to perform the stackdump) is the connected socket. You could try to extend the server's reply (if there is one) by some status code, which is used by the client to stackddump itself. This will only work if the trigger is part of the current transaction, not of some unrelated process.
The other way is: massive logging. Make the DBMS write every submitted SQL to its logfile. This can cause huge amounts of log entries, which you have to inspect.
Given this setup
(django/python) -[SQL connection]-> (PostgreSQL server)
your intuition that
I guess this is not easy since the DB trigger runs in a different context than the python interpreter.
is correct. At least, we won't be able to do this exactly the way you want it; not without much acrobatics.
However, there are options, each with drawbacks:
If you are using django with SQLAlchemy, you can register event listeners (either ORM events or Core Events) that detect this bad SQL statement you are hunting, and log a traceback.
Write a wrapper around your SQL driver, check for the bad SQL statement you are hunting, and log the traceback every time it's detected.
Give every SQL transaction, or every django request, an ID (could just be some UUID in werkzeug's request-bound storage manager). From here, we gain more options:
Configure the logger to log this request ID everywhere, and log all SQL statements in SQLAlchemy. This lets you correlate Django requests, and specific function invocations, with SQL statements. You can do this with echo= in SQLAlchemy.
Include this request ID in every SQL statement (extra column?), then log this ID in the PostgreSQL trigger with RAISE NOTICE. This lets you correlate client-side activity in django against server-side activity in PostgreSQL.
In the spirit of "Test in Production" espoused by Charity Majors, send every request to a sandbox copy of your Django app that reads/writes a sandboxed copy of your production database. In the sandbox database, raise the exception and log your traceback.
You can take this idea further and create smaller "async" setups. For example, you can, for each request, trigger a async duplicate (say, with celery) of the same request that hits a DB configured with your PostgreSQL trigger to fail and log the traceback.
Use RAISE EXCEPTION in the PostgreSQL trigger to rollback the current transaction. In Python, catch that specific exception, log it, then repeat the transaction, changing the data slightly (extra column?) to indicate that this is a retry and the trigger should not fail.
Is there a reason you can't SELECT all row values into Python, then do the detection in Python entirely?
So if you're able to detect the condition after the queries execute, then you can log the condition and/or throw an exception.
Then what you need is tooling like Sentry or New Relic.
You could use LISTEN+NOTIFY.
First let some daemon thread LISTEN and in the db trigger you can execute a NOTIFY.
The daemon thread receives the notify event and can dump the stacktrace of the main thread.
If you use psycopg2, you can use this
# Overwriting connetion.notices via Django
class MyAppConfig(AppConfig):
def ready(self):
connection_created.connect(connection_created_check_for_notice_in_connection)
class ConnectionNoticeList(object):
def append(self, message):
if not 'some_magic_of_db_trigger' in message:
return
logger.warn('%s %s' % (message, ''.join(traceback.format_stack())))
def connection_created_check_for_notice_in_connection(sender, connection, **kwargs):
connection.connection.notices=ConnectionNoticeList()
I'm initiating celery tasks via after_insert events.
Some of the celery tasks end up updating the db and therefore need the id of the newly inserted row. This is quite error-prone because it appears that if the celery task starts running immediately sometimes sqlalchemy will not have finished committing to the db and celery won't find the row.
What are my other options?
I guess I could gather these celery tasks up somehow and only send them on "after_commit" but it feels unnecessarily complicated.
It wasn't so complicated, subclass Session, providing a list for appending tasks via after_insert. Then run through the list in after_commit.
This is what I did, inspired by this answer.
#sqla.event.listens_for(User, "after_insert")
def event_after_insert(_mapper, _connection, target):
"""Callback executed on insert event
Launches a task in a separate thread.
We want to ensure the user is committed to database, so we
register a callback to run on the next commit event of the session.
https://stackoverflow.com/questions/25078815/
https://stackoverflow.com/questions/37186674/
"""
#sqla.event.listens_for(db.session, "after_commit", once=True)
def user_after_commit_after_insert(_session):
"""Callback executed on commit event after an insert"""
# Ensure the user has not been deleted or rolled-back since flush
if sqla.inspect(target).persistent:
task.delay(target.id)
I'm using Django database instead of RabbitMQ for concurrency reasons.
But I can't solve the problem of revoking a task before it execute.
I found some answers about this matter but they don't seem complete or I can't get enough help.
first answer
second answer
How can I extend celery task table using a model, add a boolean field (revoked) to set when I don't want the task to execute?
Thanks.
Since Celery tracks tasks by an ID, all you really need is to be able to tell which IDs have been canceled. Rather than modifying kombu internals, you can create your own table (or memcached etc) that just tracks canceled IDs, then check whether the ID for the current cancelable task is in it.
This is what the transports that support a remote revoke command do internally:
All worker nodes keeps a memory of revoked task ids, either in-memory
or persistent on disk (see Persistent revokes). (from Celery docs)
When you use the django transport, you are responsible for doing this yourself. In this case it's up to each task to check whether it has been canceled.
So the basic form of your task (logging added in place of an actual operation) becomes:
from celery import shared_task
from celery.exceptions import Ignore
from celery.utils.log import get_task_logger
from .models import task_canceled
logger = get_task_logger(__name__)
#shared_task
def my_task():
if task_canceled(my_task.request.id):
raise Ignore
logger.info("Doing my stuff")
You can extend & improve this in various ways, such as by creating a base CancelableTask class as in one of the other answers you linked to, but this is the basic form. What you're missing now is the model and the function to check it.
Note that the ID in this case will be a string ID like a5644f08-7d30-43ff-a61e-81c165ad9e19, not an integer. Your model can be as simple as this:
from django.db import models
class CanceledTask(models.Model):
task_id = models.CharField(max_length=200)
def cancel_task(request_id):
CanceledTask.objects.create(task_id=request_id)
def task_canceled(request_id):
return CanceledTask.objects.filter(task_id=request_id).exists()
You can now check the behavior by watching your celery service's debug logs while doing things like:
my_task.delay()
models.cancel_task(my_task.delay())
This will be a bit of a combo question, mostly because I'd like to get some more background info.
The main question:
I'm trying to do a transaction that involves an RPC call to another REST service, that will update some remote data. For example, say the RPC call tells the remote server that I purchased something. In nonfunctional python pseudocode it'll be something like:
def txn_purchase():
a = ModelA.objects.get(blah)
httpresult = HttpPurchaseRPC(url, a.foo)
a.receipt = httpresult.get_receipt() # This raises an error if the request fails
a.save()
db.run_in_transaction(txn_purchase)
I'm pretty sure that transactions only ensure datastore consistency (so in this case, entity a will be consistent), and it doesn't ensure consistency with the RPC. Is it possible to build something on top of this that ensures consistency with the RPC as well?
To me it looks like I'll have a potential problem case if the RPC succeeds, but the datastore transaction failed to save. How do I get around this?
The hazy concept in my mind is to implement a 2-stage purchase:
Do a prepurchase phase where I create entity A in a transaction and set a prepurchase flag.
Do a purchase phase where I run the purchase transaction and update A if successful. Clear the prepurchase flag.
Have a "fix-it" cron job that runs and scans for stale entities with a pre-purchase flag, and use another RPC to check whether those purchases have actually gone through.
Is this the "best practice" way to do it, or is there something better?
Background questions on transactions:
Do the transaction functions run on the frontend with the rest of the code, or is it somehow magically run on the datastore backend?
If the frontend that a transaction is running on dies in the middle of the transaction (ie timeout), will the transaction be retried anywhere? Or the transaction simply doesn't happen?
Thanks!
You sort of have the right idea here: the way you should do this is to farm out the RPC to a separate deferred task. Tasks that are enqueued within a transaction can have a flag set to ensure they only get enqueued if the transaction succeeds.
There's no magic backend that runs transactions. And they're not retried automatically: again, unless they are part of a task, as tasks are retried until they return successfully.