I'm using Celery 3.0.12.
I have two queues: Q1, Q2.
In general I put the main task in Q1, which call then subtasks that go in Q2.
I don't want to store any results for the subtasks. So my subtasks have the decorator #celery.task(ignore_results=True).
My main task should now wait until the subtask has finished. Because I write no results. I can't use: AsyncResult. Is there a way to wait in the main task to wait until the subtask finishes without storing the states to the backend. All my attempts with AsyncResults are not successfuel, (it relies on the backend). It seems also get() relies on the backend.
The whole story in code:
#celery.task(ignore_result=True)
def subtask():
#Do something
#celery.task
def maintask():
# Do something
# Call subtask on Q2:
res = subtask(options={'queue':'Q2'}).delay()
# Need to wait till subtask finishes
# NOT WORKING (DOES NEVER RETURN)
res.get()
I'm monitoring the whole application with Celery Flower and I can see that subtask is successfuelly finishing. How can Celery detect that state? I browsed their code but couldn't find out how they do the detection.
My main task should now wait until the subtask has finished.
You should never wait for a subtask as this may lead to resource starvation and deadlock (all
tasks waiting for another task, but no more workers to process them).
Instead you should use a callback to do additional actions after the subtask completes
(see the Canvas guide in the Celery user guide).
I'm monitoring the whole application with Celery Flower and I can see that subtask is
successfuelly finishing. How can Celery detect that state? I browsed their code but couldn't
find out how they do the detection.
Flower and other monitors does not use results (task state), instead they use what we call events.
Event messages are emitted when certain actions occur in the worker, and this becomes a transient stream of messages. Processes can subscribe to certain events (or all of them) to monitor the cluster.
The events are separate from task states because,
Events are not persistent (transient)
Missing an event is not regarded as a critical failure.
Complex fields are not serialized
Events are for diagnostic and informational purposes, and should not be used
to introspect task return values or exceptions
for example, as only the repr() of these is stored to make sure monitors
can be written in other languages, and big fields may be truncated to ensure
faster transmission.
Related
I have three Celery workers as follows, each running on a different ECS node:
Producer: Keeps generating & sending tasks to the consumer worker. Each task is expected to take several minutes to compute and has a database record.
Consumer: Receives computation tasks and immediately starts execution.
Watchdog: Periodically inspects database records, finds out computation tasks that are executing, and then does celery inspect active to verify whether there is actually a worker carrying out the computation.
We ensured that when the Consumer node is being terminated, the Celery worker on it will begin graceful shutdown, so that the ongoing computation can finish normally. Because Celery will unregister a gracefully stopping worker, the consumer will become invisible to the Watchdog, who will mistakenly think a computation task has mysteriously lost... even though the Consumer is still working on the task.
Is it possible to let a Celery worker broadcast an "I am dying" message upon receiving a warm shutdown signal? Or even better, can we somehow let the Watchdog worker still see shutting workers?
Yes, it is possible. Nodes in Celery cluster I am responsible for are doing something similar. Here is a snippet:
#worker_shutdown.connect
def handle_worker_shutdown(**kwargs):
_handle_worker_shutdown(app, _LOGGER, **kwargs)
#worker_ready.connect
def handle_worker_ready(**kwargs):
_handle_worker_ready(app, _LOGGER, **kwargs)
There are few other, very useful signals that you should have a look, but these two are essential. Maybe the worker_shutting_down is more suitable for your use-case...
I have setup an SGECluster scheduler with the correct settings and confirmed I can connect to both the dashboard and submit jobs to my sge queue. I would like to use the adapt method to scale the number of workers dependent upon the incoming task load. These tasks are generally not related so they can be run by individual workers in their own process.
I've noticed that the scheduler does not appear to register tasks (at least in the dashboard) until a worker is available. If that first worker takes some time to become available and I submit tasks to the scheduler, it will not know that it needs to scale and therefore the extra workers will end up at the back of the queue. Is it possible to prompt the scheduler to recognize that tasks have arrived before the first worker has connected to the scheduler, and to put in queue requests for workers appropriately?
I can get the workers to queue if I use scale(n) instead of adapt.
cluster = SGECluster(
queue=queue_name,
memory=maximum_memory,
processes=worker_processes,
env_extra=env_list,
scheduler_options=scheduler_options,
log_directory=log_dir,
job_name=name,
walltime=walltime,
resource_spec=f"{mem_spec}={maximum_memory}",
job_extra=job_extra_list,
)
# if the first worker takes ages to begin running, then only one worker will be requested
# and tasks submitted in the interim do not adjust the scheduler behaviour
# cluster.adapt(minimum=1, maximum=20)
# queues up the requested workers straight away but doesn't adapt to load
cluster.scale(20)
There are altogether eight tasks running in celery in different periods. All of them are event-driven tasks. After a certain event, they got fired. And the particular task works continuously until certain conditions were satisfied.
I have registered a task which checks for certain conditions for almost two minutes. This task works fine most of the time. But sometimes the expected behavior of the task is not attained.
The signature of the task is as below:
tasks.py
import time
from celery import shared_task
#shared_task()
def some_celery_task(a, b):
main_time_end = time.time() + 120
while time.time() < main_time_end:
...
# some db operations here with given function arguments 'a' and 'b'
# this part of the task get execute most of the time
if time.time() > main_time_end:
...
# some db operations here.
# this part is the part of the task that doesn't get executed sometimes
views.py
# the other part of the view not mentioned here
# only the task invoked part
some_celery_task.apply_async(args=(5, 9), countdown=0)
I am confused about the celery task timeout scenarios. Does that mean the task will stop from where it timeouts or will retry automatically?
It will be a great help if any clear idea about timeout and retries you guys got.
What could be the reason behind the explained scenarios above? Any help on this question will be highly appreciated. Thank you.
Check Celery documentation on Tasks - basics are documented very well.
If task fails or was terminated - task will have states.FAILURE status. It will not be re-tried unless specifically coded. If logging is correctly configured - you might see exception messages in logs in case of timeouts or other code exceptions.
When Celery Task TIME_LIMIT is exceeded - task is terminated right away:
The worker processing the task will be killed and replaced with a new one.
Also, TimeLimitExceeded exception will be raised with message like Task handler raised error: "TimeLimitExceeded(2700)"
If Celery SOFT_TIME_LIMIT is set and is smaller than TIME_LIMIT and is exceeded - than SoftTimeLimitExceeded exception will be raised allowing it to be catched in the task and perform clean-up actions.
When worker consumes message (task) from the broker queue - broker needs to know that the message was consumed successfully. To confirm successful consumpion of message worker acknowledges (ACK) to broker. Until message is not acknowledged it is not deleted from broker but also not available for consumption ("invisible"). In not acknowledged - message will be re-delivered back to broker queue available again for consumption.
Redelivering un-acknowledged messages logic depends on broker:
AMQP (RabbitMQ) broker - tracks connection status with worker, and if connection is lost - returns message back to queue.
Redis or SQS broker has its own timeout after which message will be re-delivered to broker queue if not ACKed.
By default celery worker acknowledges message right at the start of the task.
If ACKS_LATE is set - worker acknowledges to broker only after successfully executing task.
One can RETRY task, by catching exception in the task and sending same task back to the broker for re-execution - then this same task with same id will be queued at broker. Countdown option allows to specify delay before the task will be retried.
Celery Task Execution and other Options can be set globally in settings.py or per task as arguments.
Recommended way it to design tasks / logic with consideration of such events to be totally legit and see them normal (but not actually expected) to happen sometime and be ready:
tasks may fail (next same task may do work for both or checks that specific work was not done and re-fire task)
same task may run again (idempotency)
similar tasks can be run simultaneously (locking)
Suppose all my tasks on a celery queue are hitting a 3rd party API. However, the API has a rate limit, which I am keeping track of (there is a day limit and hourly limit which I need to respect). As soon as I hit the rate limit, I want to pause consumption of new tasks, and then resume when I know I am good.
I achieved this by using the following two tasks:
#celery.task()
def cancel_api_queue(minutes_to_resume):
resume_api_queue.apply_async(countdown=minutes_to_resume*60, queue='celery')
celery.control.cancel_consumer('third_party', reply=True)
#celery.task(default_retry_delay=300, max_retries=5)
def resume_api_queue():
celery.control.add_consumer('third_party', destination=['y#local'])
Then I can keep submitting my 3rd party API tasks, and as soon as my consumer is added back, all my tasks get consumed. Great.
However, since I have no consumer on this queue, this seems to mean I cannot see the jobs that are being submitted in Flower any more (until my consumer gets added).
Is there something I am doing wrong? Can I achieve this 'pause' another way to allow me to continue to see submitted jobs in flower?
p.s. maybe this is related to this issue, but not 100% sure: https://github.com/celery/celery/issues/1452
I am using amqp broker if that makes a difference.
thanks girls and boys.
I'd suspect that peeking into contents of the queue messages before a worker picks them up is not really part of Flower's intended design. Therefore, if you stop consuming tasks from a queue, the best Flower can do is show you how many of them have been enqueued as a single number on the "Broker" pane.
One hackish way to observe the internals of the incoming tasks could be to add an intermediate dummy "forwarding" task, which simply forwards the message from one queue (let us call it query_inbox) to another (say, query_processing).
E.g. something like:
#celery.task(queue='query_inbox')
def query(params):
process_query.delay(params)
#celery.task(queue='query_processing')
def process_query(params):
... do rate-limited stuff ...
Now you may stop consuming tasks from query_processing, but you will still be able to observe their parameters as they flow through the query_inbox worker.
I am running into a use case where I would like to have control over how and when celery workers dequeue a task for processing from rabbitmq. Dequeuing will be synchronized with an external event that happens out of celery context, but my concern is whether celery gives me any flexibility to control dequeueing of tasks? I tried to investigate and below are a few possibilities:
Make use of basic.get instead of basic.consume, where basic.get is triggered based upon external event. However, I see celery defaults to basic.consume (push) semantics. Can I override this behavior without modifying the core directly?
Custom remote control the workers as and when the external event is triggered. However, from the docs it isn't very clear to me how remote control commands can help me to control dequeueing of the tasks.
I am very much inclined to continue using celery and possibly keep away from writing a custom queue processing solution on top of AMQP.
With remote control commands you can pause or resume message consumption from a given queue.
celery.control.cancel_consumer('celery')
the command above instructs all workers to stop consuming (dequeuing) messages from the default celery queue
celery.control.add_consumer('celery')
remote commands accept destination argument which allows to send a request to specific workers
Two more exotic options to consider: (1) define a custom exchange type in the Rabbit layer. This allows you to create routing rules that control which tasks are sent to which queues. (2) define a custom Celery mediator. This allows you to controls which tasks move when from queues to worker pools.