How to make sure a Celery task was enqueued in Pytest? - python

I have an API endpoint to register new user. The "welcome email" will enqueue and do this task async. I have 2 unit tests to check:
Api does save user's information to DB OK
The Celery task does send email with right content+template
I want to add 3rd unit test to ensure "The endpoint has to enqueue email-sending after saving user form to DB"
I try with celery.AsyncResult but it ask me to run the worker. For further, even if the worker is ready, we still can't verify the task was enqueued or not because the ambiguous PENDING state:
Task exists in queue but not execute yet: PENDING
Task doesn't exist in queue: PENDING
Does anyone face this problem? How do I solve it?

Common way to solve this problem in testing environments is to use the task_always_eager configuration setting, which basically instructs Celery to run the task like a regular function. Instead of the AsyncResult, Celery will make an object of the EagerResult type that behaves the same, but has completely different execution logic.

Related

How to retry a celery task without duplicating it - SQS

I have a Celery task that takes a message from an SQS queue and tries to run it. If it fails it is supposed to retry every 10 seconds at least 144 times. What I think is happening is that it fails and gets back into the queue, and at the same time it creates a new one, duplicating it to 2. These 2 fail again and follow the same pattern to create 2 new and becoming 4 messages in total. So if I let it run for some time the queue gets clogged.
What I am not getting is the proper way to retry it without duplicating. Following is the code that retries. Please see if someone can guide me here.
from celery import shared_task
from celery.exceptions import MaxRetriesExceededError
#shared_task
def send_br_update(bgc_id, xref_id, user_id, event):
from myapp.models.mappings import BGC
try:
bgc = BGC.objects.get(pk=bgc_id)
return bgc.send_br_update(user_id, event)
except BGC.DoesNotExist:
pass
except MaxRetriesExceededError:
pass
except Exception as exc:
# retry every 10 minutes for at least 24 hours
raise send_br_update.retry(exc=exc, countdown=600, max_retries=144)
Update:
More explanation of the issue...
A user creates an object in my database. Other users act upon that object and as they change the state of that object, my code emits signals. The signal handler then initiates a celery task, which means that it connects to the desired SQS queue and submits the message to the queue. The celery server, running the workers, see that new message and try to execute the task. This is where it fails and the retry logic comes in.
According to celery documentation to retry a task all we need to do is to raise self.retry() call with countdown and/or max_retries. If a celery task raises an exception it is considered as failed. I am not sure how SQS handles this. All I know is that one task fails and there are two in the queue, both of these fail and then there are 4 in the queue and so on...
This is NOT celery nor SQS issues.
The real issues is the workflow , i.e. way of you sending message to MQ service and handle it that cause duplication. You will face the same problem using any other MQ service.
Imagine your flow
script : read task message. MQ Message : lock for 30 seconds
script : task fail. MQ Message : locking timeout, message are now free to be grab again
script : create another task message
Script : Repeat Step 1. MQ Message : 2 message with the same task, so step 1 will launch 2 task.
So if the task keep failing, it will keep multiply, 2,4,8,16,32....
If celery script are mean to "Recreate failed task and send to message queue", you want to make sure these message can only be read ONCE. **You MUST discard the task message after it already been read 1 time, even if the task failed. **
There are at least 2 ways to do this, choose one.
Delete the message before recreate the task. OR
In SQS, you can enforce this by create DeadLetter Queue, configure the Redrive Policy, set Maximum Receives to 1. This will make sure the message
with the task that have been read never recycle.
You may prefer method 2, because method 1 require you to configure celery to "consume"(read and delete) ASAP it read the message, which is not very practical. (and you must make sure you delete it before create a new message for failed task)
This dead letter queue is a way to let you to check if celery CRASH, i.e. message that have been read once but not consumed (delete) means program stop somewhere.
This is probably a little bit late, I have written a backoff policy for Celery + SQS as a patch.
You can see how it is implemented in this repository
https://github.com/galCohen88/celery_sqs_retry_policy/blob/master/svc/celery.py

Celery gives no error with bad remote task names, why?

Using "send_task" celery actually never verifies a remote task exists i.e:
app.send_task('tasks.i.dont.exist', args=[], kwargs={})
Celery seems to still return a message i.e.:
<AsyncResult: b8c1425a-7411-491f-b75a-34313832b8ba>
Is there a way for it to fail if the remote task does not exist?
I've tried adding .get() and it just freezes.
According to the documentation:
If the task is not registered in the current process then you can also
execute a task by name.
You do this by using the send_task() method of the celery instance
If you want verification consider using delay instead.
You can read more about how to execute celery tasks here.

Celery/Django: Get result of periodic task execution

I have a Django 1.7 project using Celery (latest). I have a REST API that receives some parameters, and creates, programmatically, a PeriodicTask. For testing, I'm using a period of seconds:
periodic_task, _= PeriodicTask.objects.get_or_create(name=task_label, task=task_name, interval=interval_schedule)
I store a reference to this tasks somewhere. I start celery beat:
python manage.py celery beat
and a worker:
python manage.py celery worker --loglevel=info
and my task runs as I can see in the worker's output.
I've set the result backend:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
and with that, I can check the task results using the TaskMeta model. The objects there contains the task_id (the same that I would get if I call the task with .delay() or .apply_async() ), the status, the result, everything, beautiful.
However, I can't find a connection between the PeriodicTask object and TaskMeta.
PeriodicTask has a task property, but its just the task name/path. The id is just a consecutive number, not the task_id from TaskMeta, and I really need to be able to find the task that was executed as a PeriodicTask with TaskMeta so I can offer some monitoring over the status. TaskMeta doesn't have any other value that allows me to identify which task ran (since I will have several ones), so at least I could give a status of the last execution.
I've checked all over Celery docs and in here, but no solution so far.
Any help is highly appreciated.
Thanks
You can run service to monitor task have been performed by using command line
python manage.py celerycam --frequency=10.0
More detail at:
http://www.lexev.org/en/2014/django-celery-setup/

reuse results for celery tasks

Is there any common solution to store and reuse celery task results without executing tasks again? I have many http fetch tasks in my metasearch project and wish to reduce number of useless http requests (they can take long time and return same results) by store results of first one and fire it back without real fetching. Also it will be very useful to does not start new fetch task when the same one is already in progress. Instead of running new job app has to return AsyncResult by id (id is unique and generated by task call args) of already pending task.
Looks like I need to define new apply_async(Celery.send_task) behavior for tasks with same task_id:
if task with given task_id doesn't started yet then start it
if task with given task_id already started return AsyncResult(task_id) without actually run task
#task decorator should accept new ttl
kwarg to determine cache time (only for redis backend?)
Looks like the simplest answer is to store your results in a cache (like a database) and first ask for the result from your cache else fire the http request.
I don't think there's something specific to celery that can perform this.
Edit:
To comply with the fact that you the tasks are sent at the same time an additional thing would be to build a lock for celery task (see Celery Task Lock receipt).
In your case you want to give the lock a name containing the task name and the url name. And you can use whatever system you want for cache if visible by all your workers (Redis in your case?)

How to inspect and cancel Celery tasks by task name

I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication

Categories

Resources