Get list of pending Celery jobs when no workers - python

I think I must be missing something. On my django server I do the following:
inspector = app.control.inspect()
for element in list:
result = app.send_task("workerTasks.do_processing_task", args=[element, "spd"], queue="cloud")
scheduled = inspector.scheduled()
reserved = inspector.reserved()
active = inspector.active()
print("-- SCHEDULED")
print(scheduled)
print("-- RESERVED")
print(reserved)
print("-- ACTIVE")
print(active)
global_helper.execute_commands(["sudo rabbitmqctl list_queues"])
global_helper.execute_commands(["celery inspect active_queues"])
(excuse the printing, this is remote and I haven't set up remote debugging)
There are no workers connected to the server, and I get this output:
-- SCHEDULED
None
-- RESERVED
None
-- ACTIVE
None
[ sudo rabbitmqctl list_queues ]
Listing queues ...
[ celery inspect active_queues ]
Error: No nodes replied within time constraint.
So it looks like all of these tools require jobs to be on a worker somewhere.
So my question is, how can I look at tasks which are still waiting to be picked up by a worker?
(I have verified that app.send_task has been called 127 times)

Celery does not provide this facility (probably there is a good reason for it) so you need to use your brokers facilities to find out what you want to know. Here is how for Redis and RabbitMQ.

Related

Can Celery pass a Status Update to a non-Blocking Caller?

I am using Celery to asynchronously perform a group of operations. There are a lot of these operations and each may take a long time, so rather than send the results back in the return value of the Celery worker function, I'd like to send them back one at a time as custom state updates. That way the caller can implement a progress bar with a change state callback, and the return value of the worker function can be of constant size rather than linear in the number of operations.
Here is a simple example in which I use the Celery worker function add_pairs_of_numbers to add a list of pairs of numbers, sending back a custom status update for every added pair.
#!/usr/bin/env python
"""
Run worker with:
celery -A tasks worker --loglevel=info
"""
from celery import Celery
app = Celery("tasks", broker="pyamqp://guest#localhost//", backend="rpc://")
#app.task(bind=True)
def add_pairs_of_numbers(self, pairs):
for x, y in pairs:
self.update_state(state="SUM", meta={"x":x, "y":y, "x+y":x+y})
return len(pairs)
def handle_message(message):
if message["status"] == "SUM":
x = message["result"]["x"]
y = message["result"]["y"]
print(f"Message: {x} + {y} = {x+y}")
def non_looping(*pairs):
task = add_pairs_of_numbers.delay(pairs)
result = task.get(on_message=handle_message)
print(result)
def looping(*pairs):
task = add_pairs_of_numbers.delay(pairs)
print(task)
while True:
pass
if __name__ == "__main__":
import sys
if sys.argv[1:] and sys.argv[1] == "looping":
looping((3,4), (2,7), (5,5))
else:
non_looping((3,4), (2,7), (5,5))
If you run just ./tasks it executes the non_looping function. This does the standard Celery thing: makes a delayed call to the worker function and then uses get to wait for the result. A handle_message callback function prints each message, and the number of pairs added is returned as the result. This is what I want.
$ ./task.py
Message: 3 + 4 = 7
Message: 2 + 7 = 9
Message: 5 + 5 = 10
3
Though the non-looping scenario is sufficient for this simple example, the real world task I'm trying to accomplish is processing a batch of files instead of adding pairs of numbers. Furthermore the client is a Flask REST API and therefore cannot contain any blocking get calls. In the script above I simulate this constraint with the looping function. This function starts the asynchronous Celery task, but does not wait for a response. (The infinite while loop that follows simulates the web server continuing to run and handle other requests.)
If you run the script with the argument "looping" it runs this code path. Here it immediately prints the Celery task ID then drops into the infinite loop.
$ ./tasks.py looping
a39c54d3-2946-4f4e-a465-4cc3adc6cbe5
The Celery worker logs show that the add operations are performed, but the caller doesn't define a callback function, so it never gets the results.
(I realize that this particular example is embarrassingly parallel, so I could use chunks to divide this up into multiple tasks. However, in my non-simplified real-world case I have tasks that cannot be parallelized.)
What I want is to be able to specify a callback in the looping scenario. Something like this.
def looping(*pairs):
task = add_pairs_of_numbers.delay(pairs, callback=handle_message) # There is no such callback.
print(task)
while True:
pass
In the Celery documentation and all the examples I can find online (for example this), there is no way to define a callback function as part of the delay call or its apply_async equivalent. You can only specify one as part of a get callback. That's making me think this is an intentional design decision.
In my REST API scenario I can work around this by having the Celery worker process send a "status update" back to the Flask server in the form of an HTTP post, but this seems weird because I'm starting to replicate messaging logic in HTTP that already exists in Celery.
Is there any way to write my looping scenario so that the caller receives callbacks without making a blocking call, or is that explicitly forbidden in Celery?
It's a pattern that is not supported by celery although you can (somewhat) trick it out by posting custom state updates to your task as described here.
Use update_state() to update a task’s state:.
def upload_files(self, filenames):
for i, file in enumerate(filenames):
if not self.request.called_directly:
self.update_state(state='PROGRESS',
meta={'current': i, 'total': len(filenames)})```
The reason that celery does not support such a pattern is that task producers (callers) are strongly decoupled from the task consumers (workers) with the only communications between the two being the broker to support communication from producers to consumers and the result backend supporting communications from consumers to producers. The closest you can get currently is with polling a task state or writing a custom result backend that will allow you to post events either via AMP RPC or redis subscriptions.

Python Celery: How do I join task results out of order?

I have a simple project where I create a bunch of chunks of work that's not related to each other, create tasks, pass them to Redis, and have a number of workers spread out over a Docker Swarm chew through the queue of long-running tasks. When the workers finish they dump their completed work in an NFS share and send back a text value to the Celery client.
I'm using celery.result.ResultSet's .join() function on the resultset array of asyncresult objects. The join() includes a callback that (for now) simply prints the result.
My problem is join() blocks until it receives each asyncresult value in the order it was given. My swarm is made up of a number of hosts that are vastly different machines, and it's important to me to have results come back as they finish, not in order or once they are all complete.
Is there a way via Celery to properly trigger a callback function as tasks are finished? I've looked at a lot of examples online and seems like my only option is to try my luck with asyncio, but Python is not exactly my strong suite.
Func for creating tasks and ResultSet obj:
def populateQueue(encodeTasks):
r = ResultSet([])
taskHandles = {}
for task in encodeTasks:
try:
ret = encode.delay(task)
r.add(ret)
logging.debug("Task ID: " + str(ret.task_id))
taskHandles[ret.task_id] = ret
except:
logging.info("populateQueue fail: " + str(task.traceback))
logging.info("Tasks queued: " + str(len(taskHandles)))
return taskHandles, r
Part of main() which waits for results:
frameCountTotal = getFrameCount(targetFile)
encodeTasks = buildCmdString(targetFile, frameCountTotal, clientCount)
taskHandles, retSet = populateQueue(encodeTasks)
logging.info("Waiting on tasks...")
retSet.join(callback=testCallback)
Thanks in advance
Found an answer to my own question:
ResultSet has another method called join_native(), which I think uses more specific API calls to the broker as long as that broker is one of several known products (RabbitMQ, Redis, etc). Celery's documentation just says that it gives better performance if you meet the broker requirement. What the docs don't say is that it allows for out-of-order returns (at least on Redis, haven't tried RMQ).

Azure Batch Job Scheduling: Task doesn't run recurrently

My objective is to schedule an Azure Batch Task to run every 5 minutes from the moment it has been added, and I use the Python SDK to create/manage my Azure resources. I tried creating a Job-Schedule and it automatically created a new Job under the specified Pool.
job_spec = batch.models.JobSpecification(
pool_info=batch.models.PoolInformation(pool_id=pool_id)
)
schedule = batch.models.Schedule(
start_window=datetime.timedelta(hours=1),
recurrence_interval=datetime.timedelta(minutes=5)
)
setup = batch.models.JobScheduleAddParameter(
'python_test_schedule',
schedule,
job_spec
)
batch_client.job_schedule.add(setup)
What I did is then add a task to this new Job. But the task seems to run only once as soon as it is added (like a normal task). Is there something more that I need to do to make the task run recurrently? There doesn't seem to be much documentation and examples of JobSchedule either.
Thank you! Any help is appreciated.
You are correct in that a JobSchedule will create a new job at the specified time interval. Additionally, you cannot have a task "re-run" every 5 minutes once it has completed. You could do either:
Have one task that runs a loop, performing the same action every 5 minutes.
Use a Job Manager to add a new task (that does the same thing) every 5 minutes.
I would probably recommend the 2nd option, as it has a little more flexibility to monitor the progress of the tasks and job and take actions accordingly.
An example client which creates the job might look a bit like this:
job_manager = models.JobManagerTask(
id='job_manager',
command_line="/bin/bash -c 'python ./job_manager.py'",
environment_settings=[
mdoels.EnvironmentSettings('AZ_BATCH_KEY', AZ_BATCH_KEY)],
resource_files=[
models.ResourceFile(blob_sas="https://url/to/job_manager.py", file_name="job_manager.py")],
authentication_token_settings=models.AuthenticationTokenSettings(
access=[models.AccessScope.job]),
kill_job_on_completion=True, # This will mark the job as complete once the Job Manager has finished.
run_exclusive=False) # Whether the job manager needs a dedicated VM - this will depend on the nature of the other tasks running on the VM.
new_job = models.JobAddParameter(
id='my_job',
job_manager_task=job_manager,
pool_info=models.PoolInformation(pool_id='my_pool'))
batch_client.job.add(new_job)
Now we need a script to run as the Job Manager on the compute node. In this case I will use Python, so you will need to add a StartTask to you pool (or JobPrepTask to the job) to install the azure-batch Python package.
Additionally the Job Manager Task will need to be able to authenticate against the Batch API. There are two methods of doing this depending on the scope of activities that the Job Manager will perform. If you only need to add tasks, then you can use the authentication_token_settings attribute, which will add an AAD token environment variable to the Job Manager task with permissions to ONLY access the current job. If you need permission to do other things, like alter the pool, or start new jobs, you can pass an account key via environment variable. Both options are shown above.
The script you run on the Job Manager task could look something like this:
import os
import time
from azure.batch import BatchServiceClient
from azure.batch.batch_auth import SharedKeyCredentials
from azure.batch import models
# Batch account credentials
AZ_BATCH_ACCOUNT = os.environ['AZ_BATCH_ACCOUNT_NAME']
AZ_BATCH_KEY = os.environ['AZ_BATCH_KEY']
AZ_BATCH_ENDPOINT = os.environ['AZ_BATCH_ENDPOINT']
# If you're using the authentication_token_settings for authentication
# you can use the AAD token in the environment variable AZ_BATCH_AUTHENTICATION_TOKEN.
def main():
# Batch Client
creds = SharedKeyCredentials(AZ_BATCH_ACCOUNT, AZ_BATCH_KEY)
batch_client = BatchServiceClient(creds, base_url=AZ_BATCH_ENDPOINT)
# You can set up the conditions under which your Job Manager will continue to add tasks here.
# It could be a timeout, max number of tasks, or you could monitor tasks to act on task status
condition = True
task_id = 0
task_params = {
"command_line": "/bin/bash -c 'echo hello world'",
# Any other task parameters go here.
}
while condition:
new_task = models.TaskAddParameter(id=task_id, **task_params)
batch_client.task.add(AZ_JOB, new_task)
task_id += 1
# Perform any additional log here - for example:
# - Check the status of the tasks, e.g. stdout, exit code etc
# - Process any output files for the tasks
# - Delete any completed tasks
# - Error handling for tasks that have failed
time.sleep(300) # Wait for 5 minutes (300 seconds)
# Job Manager task has completed - it will now exit and the job will be marked as complete.
if __name__ == '__main__':
main()
job_spec = batchmodels.JobSpecification(
pool_info=pool_info,
job_manager_task=batchmodels.JobManagerTask(
id="JobManagerTask",
#specify the command that needs to run recurrently
command_line="/bin/bash -c \" python3 task.py\""
))
Add the task that you want run recurrently as a JobManagerTask inside JobSpecification as shown above. Now this JobManagerTask will run recurrently.

Celery transfer command line arguments to Task

I am struggling with transfering additional command line arguments to celery task. I can set the desired attribute in bootstep however the same attribute is emtpy when accessed directly from task (I guess it gets overriden)
class Arguments(bootsteps.Step):
def __init__(self, worker, environment, **options):
ArgumentTask.args = {'environment': environment}
# this works
print ArgumentTask.args
Here is the custom task
class ArgumentTask(Task):
abstract = True
_args = {}
#property
def args(self):
return self._args
#args.setter
def args(self, value):
self._args.update(value)
And actual task
#celery.task(base = ArgumentTask, bind = True, name = 'jobs.send')
def send(self):
# this prints empty dictionary
print self.args
Do I need to use some additional persistence layer, eg. persistent objects or am I missing something really obvious?
Similar question
It does not seem to be possible. The reason for that is that your task could be consumed anywhere by any consumer of the queue and each consumer having different command line parameters and therefore it's processing should not depend on workers configuration.
If your problem is to manage environment dev/prod this is the way we managed it in our project:
Each environment is jailed in it's venv having a configuration so that the project is self aware of it's environment(in our case it's just db links in configuration that changes). And each environment has its queues and celery workers launched with this command:
/path/venv/bin/celery worker -A async.myapp --workdir /path -E -n celery-name#server -Ofair
Hope it helped.
If you really want to dig hard on that, each task can access a .control which allows to launch control operations on celery (like some monitoring). But I didn't find anything helpful there.

Cancel an already executing task with Celery?

I have been reading the doc and searching but cannot seem to find a straight answer:
Can you cancel an already executing task? (as in the task has started, takes a while, and half way through it needs to be cancelled)
I found this from the doc at Celery FAQ
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
But I am unclear if this will cancel queued tasks or if it will kill a running process on a worker. Thanks for any light you can shed!
revoke cancels the task execution. If a task is revoked, the workers ignore the task and do not execute it. If you don't use persistent revokes your task can be executed after worker's restart.
https://docs.celeryq.dev/en/stable/userguide/workers.html#worker-persistent-revokes
revoke has an terminate option which is False by default. If you need to kill the executing task you need to set terminate to True.
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
https://docs.celeryq.dev/en/stable/userguide/workers.html#revoke-revoking-tasks
In Celery 3.1, the API of revoking tasks is changed.
According to the Celery FAQ, you should use result.revoke:
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
or if you only have the task id:
>>> from proj.celery import app
>>> app.control.revoke(task_id)
#0x00mh's answer is correct, however recent celery docs say that using the terminate option is "a last resort for administrators" because you may accidentally terminate another task which started executing in the meantime. Possibly a better solution is combining terminate=True with signal='SIGUSR1' (which causes the SoftTimeLimitExceeded exception to be raised in the task).
Per the 5.2.3 documentation, the following command can be run:
celery.control.revoke(task_id, terminate=True, signal='SIGKILL')
where
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
Link to the doc: https://docs.celeryq.dev/en/stable/reference/celery.app.control.html?highlight=revoke#celery.app.control.Control.revoke
In addition, unsatisfactory, there is another way(abort task) to stop the task, but there are many unreliability, more details, see:
http://docs.celeryproject.org/en/latest/reference/celery.contrib.abortable.html
You define celery app with broker and backend something like :
from celery import Celery
celeryapp = Celery('app', broker=redis_uri, backend=redis_uri)
When you run send task it return unique id for task:
task_id = celeryapp.send_task('run.send_email', queue = "demo")
To revoke task you need celery app and task id:
celeryapp.control.revoke(task_id, terminate=True)
from celery.app import default_app
revoked = default_app.control.revoke(task_id, terminated=True, signal='SIGKILL')
print(revoked)
See the following options for tasks: time_limit, soft_time_limit (or you can set it for workers). If you want to control not only time of execution, then see expires argument of apply_async method.

Categories

Resources