Python+Celery: ignore task results on a per-invocation basis? - python

Is it possible to ignore task results on a per-invocation basis?
For example, so I can ignore the results of tasks when they are being run during a web request, but wait for the result (which might have, eg, debug info) when I'm running the task interactively?
I know that Tasks have the ignore_result flag, but I'm wondering specifically if it's possible to set ignore_result on a per-invocation basis (not a "global" basis).

Not normally, because ignore_result is a property of a Task that only the workers use (to decide whether to send a result back).
But you could do it if you used your own task parameter (avoid calling it ignore_result), and have the task set its ignore_result based on that:
task mytask(please_ignore_result):
mytask.ignore_result = please_ignore_result

You can use ignore_result=True/False while calling apply_async or delay
#app.task
def hello():
print('hello world')
# storing/rejecting results per invocation basis
res = hello.apply_async(ignore_result=True)
res1 = hello.apply_async(ignore_result=False)
You might run into this error if you are running an older version of celery. You can read the docs about how to use ignore_result in more detail here

Related

How to Inspect the Queue Processing a Celery Task

I'm currently leveraging celery for periodic tasks. I am new to celery. I have two workers running two different queues. One for slow background jobs and one for jobs user's queue up in the application.
I am monitoring my tasks on datadog because it's an easy way to confirm my workers a running appropriately.
What I want to do is after each task completes, record which queue the task was completed on.
#after_task_publish.connect()
def on_task_publish(sender=None, headers=None, body=None, **kwargs):
statsd.increment("celery.on_task_publish.start.increment")
task = celery.tasks.get(sender)
queue_name = task.queue
statsd.increment("celery.on_task_publish.increment", tags=[f"{queue_name}:{task}"])
The following function is something that I implemented after researching the celery docs and some StackOverflow posts, but it's not working as intended. I get the first statsd increment but the remaining code does not execute.
I am wondering if there is a simpler way to inspect inside/after each task completes, what queue processed the task.
Since your question says is there a way to inspect inside/after each task completes - I'm assuming you haven't tried this celery-result-backend stuff. So you could check out this feature which is provided by Celery itself : Celery-Result-Backend / Task-result-Backend .
It is very useful for storing results of your celery tasks.
Read through this => https://docs.celeryproject.org/en/stable/userguide/configuration.html#task-result-backend-settings
Once you get an idea of how to setup this result-backend, Search for result_extended key (in the same link) to be able to add queue-names in your task return values.
Number of options are available - Like you can setup these results to go to any of these :
Sql-DB / NoSql-DB / S3 / Azure / Elasticsearch / etc
I have made use of this Result-Backend feature with Elasticsearch and this how my task results are stored :
It is just a matter of adding few configurations in settings.py file as per your requirements. Worked really well for my application. And I have a weekly cron that clears only successful results of tasks - since we don't need the results anymore - and I can see only failed results (like the one in image).
These were main keys for my requirement : task_track_started and task_acks_late along with result_backend

reuse results for celery tasks

Is there any common solution to store and reuse celery task results without executing tasks again? I have many http fetch tasks in my metasearch project and wish to reduce number of useless http requests (they can take long time and return same results) by store results of first one and fire it back without real fetching. Also it will be very useful to does not start new fetch task when the same one is already in progress. Instead of running new job app has to return AsyncResult by id (id is unique and generated by task call args) of already pending task.
Looks like I need to define new apply_async(Celery.send_task) behavior for tasks with same task_id:
if task with given task_id doesn't started yet then start it
if task with given task_id already started return AsyncResult(task_id) without actually run task
#task decorator should accept new ttl
kwarg to determine cache time (only for redis backend?)
Looks like the simplest answer is to store your results in a cache (like a database) and first ask for the result from your cache else fire the http request.
I don't think there's something specific to celery that can perform this.
Edit:
To comply with the fact that you the tasks are sent at the same time an additional thing would be to build a lock for celery task (see Celery Task Lock receipt).
In your case you want to give the lock a name containing the task name and the url name. And you can use whatever system you want for cache if visible by all your workers (Redis in your case?)

Having error queues in celery

Is there any way in celery by which if a task execution fails I can automatically put it into another queue.
For example it the task is running in a queue x, on exception enqueue it to another queue named error_x
Edit:
Currently I am using celery==3.0.13 along with django 1.4, Rabbitmq as broker.
Some times the task fails. Is there a way in celery to add messages to an error queue and process it later.
The problem when celery task fails is that I don't have access to the message queue name. So I can't use self.retry retry to put it to a different error queue.
Well, you cannot use the retry mechanism if you want to route the task to another queue. From the docs:
retry() can be used to re-execute the task, for example in the event
of recoverable errors.
When you call retry it will send a new message, using the same
task-id, and it will take care to make sure the message is delivered
to the same queue as the originating task.
You'll have to relaunch yourself and route it manually to your wanted queue in the event of any exception raised. It seems a good job for error callbacks.
The main issue is that we need to get the task name in the error callback to be able to launch it. Also we may not want to add the callback each time we launch a task. Thus a decorator would be a good way to automatically add the right callback.
from functools import partial, wraps
import celery
#celery.shared_task
def error_callback(task_id, task_name, retry_queue, retry_routing_key):
# We must retrieve the task object itself.
# `tasks` is a dict of 'task_name': celery_task_object
task = celery.current_app.tasks[task_name]
# Re launch the task in specified queue.
task.apply_async(queue=retry_queue, routing_key=retry_routing_key)
def retrying_task(retry_queue, retry_routing_key):
"""Decorates function to automatically add error callbacks."""
def retrying_decorator(func):
#celery.shared_task
#wraps(func) # just to keep the original task name
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
# Monkey patch the apply_async method to add the callback.
wrapper.apply_async = partial(
wrapper.apply_async,
link_error=error_callback.s(wrapper.name, retry_queue, retry_routing_key)
)
return wrapper
return retrying_decorator
# Usage:
#retrying_task(retry_queue='another_queue', retry_routing_key='another_routing_key')
def failing_task():
print 'Hi, I will fail!'
raise Exception("I'm failing!")
failing_task.apply_async()
You can adjust the decorator to pass whatever parameters you need.
I had a similar problem and i solved it may be not in a most efficient way but however my solution is as follows:
I have created a django model to keep all my celery task-ids and that is capable of checking the task state.
Then i have created another celery task that is running in an infinite cycle and checks all tasks that are 'RUNNING' on their actual state and if the state is 'FAILED' it just reruns it. Im not actually changing the queue for the task which i rerun but i think you can implement some custom logic to decide where to put every task you rerun this way.

Celery's inspect unstable behaviour

I got celery project with RabbitMQ backend, that relies heavily on inspecting scheduled tasks. I found that the following code returns nothing for most of the time (of course, there are scheduled tasks) :
i = app.control.inspect()
scheduled = i.scheduled()
if (scheduled):
# do something
This code also runs from one of tasks, but I think it doesn't matter, I got same result from interactive python command line (with some exceptions, see below).
At the same time, celery -A <proj> inspect scheduled command never fails. Also, I noticed, that when called from interactive python command line for the first time, this command also never fails. Most of the successive i.scheduled() calls return nothing.
i.scheduled() guarantees result only when called for the first time?
If so, why and how then can I inspect scheduled tasks from task? Run dedicated worker and restart it after every task? Seems like overkill for such trivial task.
Please explain, how to use this feature the right way.
This is caused by some weird issue inside Celery app. To repeat methods from Inspect object you have to create new Celery app instance object.
Here is small snippet, which can help you:
from celery import Celery
def inspect(method):
app = Celery('app', broker='amqp://')
return getattr(app.control.inspect(), method)()
print inspect('scheduled')
print inspect('active')

Celery: standard method for querying pending tasks?

Is there any standard/backend-independent method for querying pending tasks based on certain fields?
For example, I have a task which needs to run once after the “last user interaction”, and I'd like to implement it something like:
def user_changed_content():
task = find_task(name="handle_content_change")
if task is None:
task = queue_task("handle_content_change")
task.set_eta(datetime.now() + timedelta(minutes=5))
task.save()
Or is it simpler to hook directly into the storage backend?
No, this is not possible.
Even if some transports may support accessing the "queue" out of order (e.g. Redis)
it is not a good idea.
The task may not be on the queue anymore, and instead reserved by a worker.
See this part in the documentation: http://docs.celeryproject.org/en/latest/userguide/tasks.html#state
Given that, a better approach would be for the task to check if it should reschedule itself
when it starts:
#task
def reschedules():
new_eta = redis.get(".".join([reschedules.request.task_id, "new_eta"])
if new_eta:
return reschedules.retry(eta=new_eta)

Categories

Resources