Celery: standard method for querying pending tasks? - python

Is there any standard/backend-independent method for querying pending tasks based on certain fields?
For example, I have a task which needs to run once after the “last user interaction”, and I'd like to implement it something like:
def user_changed_content():
task = find_task(name="handle_content_change")
if task is None:
task = queue_task("handle_content_change")
task.set_eta(datetime.now() + timedelta(minutes=5))
task.save()
Or is it simpler to hook directly into the storage backend?

No, this is not possible.
Even if some transports may support accessing the "queue" out of order (e.g. Redis)
it is not a good idea.
The task may not be on the queue anymore, and instead reserved by a worker.
See this part in the documentation: http://docs.celeryproject.org/en/latest/userguide/tasks.html#state
Given that, a better approach would be for the task to check if it should reschedule itself
when it starts:
#task
def reschedules():
new_eta = redis.get(".".join([reschedules.request.task_id, "new_eta"])
if new_eta:
return reschedules.retry(eta=new_eta)

Related

How to handle asynchronous tasks with queues defined on a user (or other attribute) basis with django and celery?

thank you for viewing my question. I am new to celery and I am trying to wrap my head around the world of threading and multiprocessing, but I cannot seem to find any info that fits my specific use case.
Consider this, I have a django-rest-framework API for my personal trading software that receives post requests to perform a buy or sell order to my broker's REST API.
Now, I want to make sure that my API can handle numerous concurrent requests and queue those requests and process those in the order that they were received. One task cannot happen before the other if finished because the next task will rely on database information that the former task has written.
I have implemented a celery instance to process a FIFO queue. Here lies my question, lets say I have multiple strategies, and I don't want the queues from multiple strategies to be mixed in the same queue. I want an asynchronous queue per strategy. Do I now require multiple worker instances, or can I separate queues into one worker instance?
What exactly would that syntax look like or what celery methods should I be looking in to?
The post request view
#csrf_exempt
def webhook(request, username, slug):
webhook_data = json.loads(request.body)
if request.method == 'POST':
webhook_queue.delay(username,slug,request.body)
return HttpResponse(response)
The celery task
#task(name='webhook_queue')
def webhook_queue(username,slug,webhook_data):
strategy = UserStrategyModel.objects.get(slug__iexact=slug,user__username__iexact=username)
user = User.objects.get(username__iexact=username)
webhook = Webhook(webhook=webhook_data,strategy=strategy,user=user).process_webhook()
logger.info('Webhook processing.')

How to create a shared counter in Celery?

Is there a way to have a shared counter (shared between workers) in Celery? I am also open to other ideas on how to solve my problem, but would like to stick to Celery. Here is my problem:
I have a task that is dependent on an index passed to it. These tasks could pass or fail, but I need to target a number of passed tasks. If a job fails it should kick off a new job with the next available index.
I can of course do this through a function that tracks the active jobs and initiates the new jobs, but if there was something built in that'd be great.
You can use task_failure celery signal.
from celery.signals import task_failure
#task_failure.connect
def fail_task_handler(sender=None, body=None, **kwargs):
print('a task has failed')
# start new task or do something else
More at http://celery.readthedocs.org/en/latest/userguide/signals.html#task-failure

reuse results for celery tasks

Is there any common solution to store and reuse celery task results without executing tasks again? I have many http fetch tasks in my metasearch project and wish to reduce number of useless http requests (they can take long time and return same results) by store results of first one and fire it back without real fetching. Also it will be very useful to does not start new fetch task when the same one is already in progress. Instead of running new job app has to return AsyncResult by id (id is unique and generated by task call args) of already pending task.
Looks like I need to define new apply_async(Celery.send_task) behavior for tasks with same task_id:
if task with given task_id doesn't started yet then start it
if task with given task_id already started return AsyncResult(task_id) without actually run task
#task decorator should accept new ttl
kwarg to determine cache time (only for redis backend?)
Looks like the simplest answer is to store your results in a cache (like a database) and first ask for the result from your cache else fire the http request.
I don't think there's something specific to celery that can perform this.
Edit:
To comply with the fact that you the tasks are sent at the same time an additional thing would be to build a lock for celery task (see Celery Task Lock receipt).
In your case you want to give the lock a name containing the task name and the url name. And you can use whatever system you want for cache if visible by all your workers (Redis in your case?)

How to inspect and cancel Celery tasks by task name

I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication

Python+Celery: ignore task results on a per-invocation basis?

Is it possible to ignore task results on a per-invocation basis?
For example, so I can ignore the results of tasks when they are being run during a web request, but wait for the result (which might have, eg, debug info) when I'm running the task interactively?
I know that Tasks have the ignore_result flag, but I'm wondering specifically if it's possible to set ignore_result on a per-invocation basis (not a "global" basis).
Not normally, because ignore_result is a property of a Task that only the workers use (to decide whether to send a result back).
But you could do it if you used your own task parameter (avoid calling it ignore_result), and have the task set its ignore_result based on that:
task mytask(please_ignore_result):
mytask.ignore_result = please_ignore_result
You can use ignore_result=True/False while calling apply_async or delay
#app.task
def hello():
print('hello world')
# storing/rejecting results per invocation basis
res = hello.apply_async(ignore_result=True)
res1 = hello.apply_async(ignore_result=False)
You might run into this error if you are running an older version of celery. You can read the docs about how to use ignore_result in more detail here

Categories

Resources