how to run this kind of celery task properly?
#app.task
def add(x)
x + 1
def some_func():
result = 'result'
for i in range(10):
task_id = uuid()
add.apply_async((i,)), task_id=task_id)
return result
I need all tasks to be performed sequentially after the previous one is completed.
I tried using time.sleep() but in this case returning result waits until all tasks are completed. But I need the result returned and all 10 tasks are running sequentially in the background.
there is a group() in celery, but it runs tasks in parallel
Finally, I solved it by using immutable signature and chain
tasks = [
add.si(x).set(task_id=uuid())
for x in range(10)
]
chain(*tasks).apply_async()
If some_func() is executed outside Celery (say a script is used as "producer" to just send those tasks to be executed), then nothing stops you from calling .get() on AsyncResult to wait for task to finish, and loop that as much as you like.
If, however, you want to execute that loop as some sort of Celery workflow, then you have to build a Chain and use it.
Related
I'm a lazy guy. Instead of having to manually do:
res = task.result()
after a scheduled task
loop = asyncio.get_event_loop()
task = loop.create_task(some_func())
has finished, I would like to have some piece of blocking code after task = ... that waits until the task is done and directly assigns the output result to the variable res.
Besides the convenience of not having to do task.result() every time manually, I'm also running a bunch of tasks sequentially as independent jupyter notebook cells, and I only want the next task to start after the previous one has completely finished. It is ok if the jupyter notebook kernel is blocked by the task while it's running. This doesn't work:
loop = asyncio.get_event_loop()
task = loop.create_task(some_func())
# blocking code to wait for task to finish
# ...
# res = task.result()
loop = asyncio.get_event_loop()
task = loop.create_task(some_other_func())
# blocking code to wait for task to finish
# ...
# res = task.result()
as it would schedule both coros immediately for execution and run them at the same time, I think.
Obviously, the variable res will be overwritten by subsequent tasks, but that's ok, because I only need to look at res when I'm running a single instance of this task. The main requirement is to get the tasks running as a sequential chain and to just be able to do print(res) after the task has finished.
I am using Celery to asynchronously perform a group of operations. There are a lot of these operations and each may take a long time, so rather than send the results back in the return value of the Celery worker function, I'd like to send them back one at a time as custom state updates. That way the caller can implement a progress bar with a change state callback, and the return value of the worker function can be of constant size rather than linear in the number of operations.
Here is a simple example in which I use the Celery worker function add_pairs_of_numbers to add a list of pairs of numbers, sending back a custom status update for every added pair.
#!/usr/bin/env python
"""
Run worker with:
celery -A tasks worker --loglevel=info
"""
from celery import Celery
app = Celery("tasks", broker="pyamqp://guest#localhost//", backend="rpc://")
#app.task(bind=True)
def add_pairs_of_numbers(self, pairs):
for x, y in pairs:
self.update_state(state="SUM", meta={"x":x, "y":y, "x+y":x+y})
return len(pairs)
def handle_message(message):
if message["status"] == "SUM":
x = message["result"]["x"]
y = message["result"]["y"]
print(f"Message: {x} + {y} = {x+y}")
def non_looping(*pairs):
task = add_pairs_of_numbers.delay(pairs)
result = task.get(on_message=handle_message)
print(result)
def looping(*pairs):
task = add_pairs_of_numbers.delay(pairs)
print(task)
while True:
pass
if __name__ == "__main__":
import sys
if sys.argv[1:] and sys.argv[1] == "looping":
looping((3,4), (2,7), (5,5))
else:
non_looping((3,4), (2,7), (5,5))
If you run just ./tasks it executes the non_looping function. This does the standard Celery thing: makes a delayed call to the worker function and then uses get to wait for the result. A handle_message callback function prints each message, and the number of pairs added is returned as the result. This is what I want.
$ ./task.py
Message: 3 + 4 = 7
Message: 2 + 7 = 9
Message: 5 + 5 = 10
3
Though the non-looping scenario is sufficient for this simple example, the real world task I'm trying to accomplish is processing a batch of files instead of adding pairs of numbers. Furthermore the client is a Flask REST API and therefore cannot contain any blocking get calls. In the script above I simulate this constraint with the looping function. This function starts the asynchronous Celery task, but does not wait for a response. (The infinite while loop that follows simulates the web server continuing to run and handle other requests.)
If you run the script with the argument "looping" it runs this code path. Here it immediately prints the Celery task ID then drops into the infinite loop.
$ ./tasks.py looping
a39c54d3-2946-4f4e-a465-4cc3adc6cbe5
The Celery worker logs show that the add operations are performed, but the caller doesn't define a callback function, so it never gets the results.
(I realize that this particular example is embarrassingly parallel, so I could use chunks to divide this up into multiple tasks. However, in my non-simplified real-world case I have tasks that cannot be parallelized.)
What I want is to be able to specify a callback in the looping scenario. Something like this.
def looping(*pairs):
task = add_pairs_of_numbers.delay(pairs, callback=handle_message) # There is no such callback.
print(task)
while True:
pass
In the Celery documentation and all the examples I can find online (for example this), there is no way to define a callback function as part of the delay call or its apply_async equivalent. You can only specify one as part of a get callback. That's making me think this is an intentional design decision.
In my REST API scenario I can work around this by having the Celery worker process send a "status update" back to the Flask server in the form of an HTTP post, but this seems weird because I'm starting to replicate messaging logic in HTTP that already exists in Celery.
Is there any way to write my looping scenario so that the caller receives callbacks without making a blocking call, or is that explicitly forbidden in Celery?
It's a pattern that is not supported by celery although you can (somewhat) trick it out by posting custom state updates to your task as described here.
Use update_state() to update a task’s state:.
def upload_files(self, filenames):
for i, file in enumerate(filenames):
if not self.request.called_directly:
self.update_state(state='PROGRESS',
meta={'current': i, 'total': len(filenames)})```
The reason that celery does not support such a pattern is that task producers (callers) are strongly decoupled from the task consumers (workers) with the only communications between the two being the broker to support communication from producers to consumers and the result backend supporting communications from consumers to producers. The closest you can get currently is with polling a task state or writing a custom result backend that will allow you to post events either via AMP RPC or redis subscriptions.
Maybe I am misunderstanding something about celery but I have been stuck on this for quite a while.
I have a bunch of simple subtasks that I want to run in parallel, and I want to iterate over them as they complete rather than waiting for all of them to complete. I tried this:
def task_generator():
for row in db:
yield mytask.s(row)
from celery.result import ResultSet
r = ResultSet(t.delay() for t in task_generator())
for result in r.iterate():
print result
However celery runs all of the tasks first and the iteration begins only after all of the tasks are completed, despite the docs for ResultSet.iterate reading "Iterate over the return values of the tasks as they finish one by one."
So how do I iterate over the task results as they are completed?
I've implemented this, and have used it in the past. I think .iterate() is deprecated though, so I am working on a new solution myself.
#task
def task_one(foo, bar):
return foo + bar
subtasks = []
for x in range(10):
subtasks.append(task_one.s(x, x+1))
results = celery.Group(subtasks)() # call this
for result in results.iterate(propagate=False):
answer = result.iteritems().next()
I'm writing an application which needs to run a series of tasks in parallel and then a single task with the results of all the tasks run:
#celery.task
def power(value, expo):
return value ** expo
#celery.task
def amass(values):
print str(values)
It's a very contrived and oversimplified example, but hopefully the point comes across well. Basically, I have many items which need to run through power, but I only want to run amass on the results from all of the tasks. All of this should happen asynchronously, and I don't need anything back from the amass method.
Does anyone know how to set this up in celery so that everything is executed asynchronously and a single callback with a list of the results is called after all is said and done?
I've setup this example to run with a chord as Alexander Afanasiev recommended:
from time import sleep
import random
tasks = []
for i in xrange(10):
tasks.append(power.s((i, 2)))
sleep(random.randint(10, 1000) / 1000.0) # sleep for 10-1000ms
callback = amass.s()
r = chord(tasks)(callback)
Unfortunately, in the above example, all tasks in tasks are started only when the chord method is called. Is there a way that each task can start separately and then I could add a callback to the group to run when everything has finished?
Here's a solution which worked for my purposes:
tasks.py:
from time import sleep
import random
#celery.task
def power(value, expo):
sleep(random.randint(10, 1000) / 1000.0) # sleep for 10-1000ms
return value ** expo
#celery.task
def amass(results, tasks):
completed_tasks = []
for task in tasks:
if task.ready():
completed_tasks.append(task)
results.append(task.get())
# remove completed tasks
tasks = list(set(tasks) - set(completed_tasks))
if len(tasks) > 0:
# resend the task to execute at least 1 second from now
amass.delay(results, tasks, countdown=1)
else:
# we done
print results
Use Case:
tasks = []
for i in xrange(10):
tasks.append(power.delay(i, 2))
amass.delay([], tasks)
What this should do is start all of the tasks as soon as possible asynchronously. Once they've all been posted to the queue, the amass task will also be posted to the queue. The amass task will keep reposting itself until all of the other tasks have been completed.
Celery has plenty of tools for most of workflows you can imagine.
It seems you need to get use of chord. Here's a quote from docs:
A chord is just like a group but with a callback. A chord consists of
a header group and a body, where the body is a task that should
execute after all of the tasks in the header are complete.
Taking a look at this snippet from your question, it looks like you are passing a list as the chord header, rather than a group:
from time import sleep
import random
tasks = []
for i in xrange(10):
tasks.append(power.s((i, 2)))
sleep(random.randint(10, 1000) / 1000.0) # sleep for 10-1000ms
callback = amass.s()
r = chord(tasks)(callback)
Converting the list to a group should result in the behaviour you're expecting:
...
callback = amass.s()
tasks = group(tasks)
r = chord(tasks)(callback)
The answer that #alexander-afanasiev gave you is essentially right: use a chord.
Your code is OK, but tasks.append(power.s((i, 2))) is not actually executing the subtask, just adding subtasks to a list. It's chord(...)(...) the one that send as many messages to the broker as subtasks you have defined in tasks list, plus one more message for the callback subtask. When you call chord it returns as soon as it can.
If you want to know when the chord has finished you can poll for completion like with a single task using r.ready() in your sample.
#tasks.py
from celery.decorators import task
#task()
def add(x, y):
add.delay(1, 9)
return x + y
>>> import tasks
>>> res = tasks.add.delay(5, 2)
>>> res.result()
7
If I run this code, I expect tasks to be continously added to the queue. But it's not! Only the first task (5,2) gets added to the queue and processed.
There should continuously be tasks being added, due to this line: "add.delay(1,9)"
Note: I need each task to execute another task.
As far as I can see, a periodic_task decorator is creating preiodic tasks, task creates just one task. And delay just executes it asynchronically.
You should just use periodic_task, instead of recursion.
add inside function body refers to original function, not its decorated version.
If you just need to run task repeatedly, use #periodic_task instead. You only need recursion if delay is different each time. In this case, subclass Task instead of using decorator and you'll be able to use recursion without a problem.
You should look at subtasks and callbacks, might give you the answer you are looking for
http://celeryproject.org/docs/userguide/tasksets.html