I am trying to find a way to delete all the currently queued tasks with a specific given name from a Celery queue. From the official documentations, I know I could inspect the workers and revoke the tasks by looking for their name and then getting their IDs like:
def drop_celery_task(options):
def _get_tasks_id(workers: list, tasks_ids: list, task_name: str):
"""
Get task ids with the given name included inside the given `workers` tasks.
{'worker1.example.com': [
{'name': 'tasks.sleeptask', 'id': '32666e9b-809c-41fa-8e93-5ae0c80afbbf',
'args': '(8,)', 'kwargs': '{}'}]
}
"""
for worker in workers:
if not workers[worker]:
continue
for _task in workers[worker]:
if _task["name"].split(".")[-1] == task_name:
tasks_ids.append(_task["id"])
task_name = options.drop_celery_task["name"]
i = Inspect(app=celery_app) # Inspect all nodes.
registered = i.registered()
if not registered:
raise Exception("No registered tasks found")
if not any(task_name == worker.split(".")[-1] for worker in chain(*list(registered.values()))):
raise Exception(f"Task not registered: {task_name}")
tasks_ids = []
_get_tasks_id(i.active(), tasks_ids, task_name)
_get_tasks_id(i.scheduled(), tasks_ids, task_name)
_get_tasks_id(i.reserved(), tasks_ids, task_name)
if tasks_ids:
for task_id in tasks_ids:
Control(app=celery_app).revoke(task_id)
else:
logging.info(f"No active/scheduled/registered task found with the name {task_name}")
But this code only revokes the tasks fetched or pre-fetched by the celery workers, not the ones still in the queue (using Redis as backend). Any advice on how to remove the ones in Redis using celery commands, or prevent the workers from accepting tasks with a given name?
I ended up identifying the IDs of the tasks with the name I wanted in Redis (using a redis client, not celery commands) and then revoking those IDs through the Control(app=celery_app).revoke(task_id) command. In Redis the queues are list objects under the key with the name of the queue.
Related
In the following code, an API gives a task to a task broker, who puts it in a queue, where it is picked up by a worker. The worker will then execute the task and notify the task broker (using a redis message channel) that he is done, after which the task broker will remove it from its queue. This works.
What I'd like is that the task broker is then able to return the result of the task to the API. But I'm unsure on how to do so since it is asynchronous code and I'm having difficulty figuring it out. Can you help?
Simplified the code is roughly as follows, but incomplete.
The API code:
#router.post('', response_model=BaseDocument)
async def post_document(document: BaseDocument):
"""Create the document with a specific type and an optional name given in the payload"""
task = DocumentTask({ <SNIP>
})
task_broker.give_task(task)
result = await task_broker.get_task_result(task)
return result
The task broker code, first part is giving the task, the second part is removing the task and the final part is what I assume should be a blocking call on the status of the removed task
def give_task(self, task_obj):
self.add_task_to_queue(task_obj)
<SNIP>
self.message_channel.publish(task_obj)
# ...
def remove_task_from_queue(self, task):
id_task_to_remove = task.id
for i in range(len(task_queue)):
if task_queue[i]["id"] == id_task_to_remove:
removed_task = task_queue.pop(i)
logger.debug(
f"[TaskBroker] Task with id '{id_task_to_remove}' succesfully removed !"
)
removed_task["status"] = "DONE"
return
# ...
async def get_task_result(self, task):
return task.result
My intuition would like to implement a way in get_task_result that blocks on task.result until it is modified, where I would modify it in remove_task_from_queue when it is removed from the queue (and thus done).
Any idea in how to do this, asynchronously?
I'm losing my mind trying to find a reliable and testable way to get the number of tasks contained in a given Celery queue.
I've already read these two related discussions:
Django Celery get task count
Note: I'm not using Django nor any other Python web framework.
Retrieve list of tasks in a queue in Celery
But I have not been able to solve my issue using the methods described in those threads.
I'm using Redis as backend, but I would like to have a backend independent and flexible solution, especially for tests.
This is my current situation: I've defined an EnhancedCelery class which inherits from Celery and adds a couple of methods, specifically get_queue_size() is the one I'm trying to properly implement/test.
The following is the code in my test case:
celery_test_app = EnhancedCelery(__name__)
# this is needed to avoid exception for ping command
# which is automatically triggered by the worker once started
celery_test_app.loader.import_module('celery.contrib.testing.tasks')
# in memory backend
celery_test_app.conf.broker_url = 'memory://'
celery_test_app.conf.result_backend = 'cache+memory://'
# We have to setup queues manually,
# since it seems that auto queue creation doesn't work in tests :(
celery_test_app.conf.task_create_missing_queues = False
celery_test_app.conf.task_default_queue = 'default'
celery_test_app.conf.task_queues = (
Queue('default', routing_key='task.#'),
Queue('queue_1', routing_key='q1'),
Queue('queue_2', routing_key='q2'),
Queue('queue_3', routing_key='q3'),
)
celery_test_app.conf.task_default_exchange = 'tasks'
celery_test_app.conf.task_default_exchange_type = 'topic'
celery_test_app.conf.task_default_routing_key = 'task.default'
celery_test_app.conf.task_routes = {
'sample_task': {
'queue': 'default',
'routing_key': 'task.default',
},
'sample_task_in_queue_1': {
'queue': 'queue_1',
'routing_key': 'q1',
},
'sample_task_in_queue_2': {
'queue': 'queue_2',
'routing_key': 'q2',
},
'sample_task_in_queue_3': {
'queue': 'queue_3',
'routing_key': 'q3',
},
}
#celery_test_app.task()
def sample_task():
return 'sample_task_result'
#celery_test_app.task(queue='queue_1')
def sample_task_in_queue_1():
return 'sample_task_in_queue_1_result'
#celery_test_app.task(queue='queue_2')
def sample_task_in_queue_2():
return 'sample_task_in_queue_2_result'
#celery_test_app.task(queue='queue_3')
def sample_task_in_queue_3():
return 'sample_task_in_queue_3_result'
class EnhancedCeleryTest(TestCase):
def test_get_queue_size_returns_expected_value(self):
def add_task(task):
task.apply_async()
with start_worker(celery_test_app):
for _ in range(7):
add_task(sample_task_in_queue_1)
for _ in range(4):
add_task(sample_task_in_queue_2)
for _ in range(2):
add_task(sample_task_in_queue_3)
self.assertEqual(celery_test_app.get_queue_size('queue_1'), 7)
self.assertEqual(celery_test_app.get_queue_size('queue_2'), 4)
self.assertEqual(celery_test_app.get_queue_size('queue_3'), 2)
Here are my attempts to implement get_queue_size():
This always returns zero (jobs == 0):
def get_queue_size(self, queue_name: str) -> Optional[int]:
with self.connection_or_acquire() as connection:
channel = connection.default_channel
try:
name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
return jobs
except (ChannelError, NotFound):
pass
This also always returns zero:
def get_queue_size(self, queue_name: str) -> Optional[int]:
inspection = self.control.inspect()
return inspection.active() # zero!
# or:
return inspection.scheduled() # zero!
# or:
return inspection.reserved() # zero!
This works by returning the expected number for each queue, but only in the test environment, because the channel.queues property does not exist when using the redis backend:
def get_queue_size(self, queue_name: str) -> Optional[int]:
with self.connection_or_acquire() as connection:
channel = connection.default_channel
if hasattr(channel, 'queues'):
queue = channel.queues.get(queue_name)
if queue is not None:
return queue.unfinished_tasks
None of the solutions you mentioned are entirely correct in my humble opinion. As you already mentioned this is backend-specific so you would have to wrap handlers for all backends supported by Celery to provide backend-agnostic queue inspection. In the Redis case you have to directly connect to Redis and LLEN the queue you want to inspect. In the case of RabbitMQ you find this information in completely different way. Same story with SQS...
This has all been discussed in the Retrieve list of tasks in a queue in Celery thread...
Finally, there is a reason why Celery does not provide this functionality out of box - the information is, I believe, useless. By the time you get what is in the queue it may already be empty!
If you want to monitor what is going on with your queues I suggest another approach. - Write your own real-time monitor. The example just captures task-failed events, but you should be able to modify it easily to capture all events you care about, and gather data about those tasks (queue, time, host it was executed on, etc). Clearly is an example how it is done in a more serious project.
You can see how it's implemented in the Flower (real-time monitor for Celery) here They have different Broker class implementation for redis and rabbitmq.
Another way - use celery's task events: calculate how many tasks were sent and how many were succeed/failed
What I understood from celery's documentations, when publishing tasks, you send them to exchange first and then exchange delegates it to queues. Now I want to send a task to specific custom made exchange which will delegate all tasks it receives to 3 different queues, which will have different consumers in the background, performing different tasks.
class Tasks(object):
def __init__(self, config_object={}):
self.celery = Celery()
self.celery.config_from_object(config_object)
self.task_publisher = task_publisher
def publish(self, task_name, job_id=None, params={}):
if not job_id:
job_id = uuid.uuid4()
self.celery.send_task(task_name, [job_id, params], queue='new_queue')
class config_object(object):
CELERY_IGNORE_RESULT = True
BROKER_PORT = 5672
BROKER_URL = 'amqp://guest:guest#localhost'
CELERY_RESULT_BACKEND = 'amqp'
tasks_service = Tasks(config_object)
tasks_service.publish('logger.log_event', params={'a': 'b'})
This is how I can send a task to specific queue, if I Dont define the queue it gets sent to a default one, but my question is how do I define the exchange to send to?
not sure if you have solved this problem.
i came across the same thing last week.
I am on Celery 4.1 and the solution I came up with was to just define the exchange name and the routing_key
so in your publish method, you would do something like:
def publish(self, task_name, job_id=None, params={}):
if not job_id:
job_id = uuid.uuid4()
self.celery.send_task(
task_name,
[job_id, params],
exchange='name.of.exchange',
routing_key='*'
)
I'm trying to create multiples Google Docs in background task.
I try to use the taskqueue from Google App Engine but I mustn't understand a point as I kept getting this message :
INFO 2016-05-17 15:38:46,393 module.py:787] default: "POST /update_docs HTTP/1.1" 302 -
WARNING 2016-05-17 15:38:46,393 taskqueue_stub.py:1981] Task task1 failed to execute. This task will retry in 0.800 seconds
Here is my code. I make a multiple call to the method UpdateDocs that need to be executed from the queue.
# Create a GDoc in the queue (called by her)
class UpdateDocs(BaseHandler):
#decorator.oauth_required
def post(self):
try:
http = decorator.http()
service = discovery.build("drive", "v2", http=http)
# Create the file
docs_name = self.request.get('docs_name')
body = {
'mimeType': DOCS_MIMETYPE,
'title': docs_name,
}
service.files().insert(body=body).execute()
except AccessTokenRefreshError:
self.redirect("/")
# Create multiple GDocs by calling the queue
class QueueMultiDocsCreator(BaseHandler):
def get(self):
try:
for i in range(5):
name = "File_n" + str(i)
taskqueue.add(
url='/update_docs',
params={
'docs_name': name,
})
self.redirect('/files')
except AccessTokenRefreshError:
self.redirect('/')
I can see the push queue in the App Engine Console, and every tasks is inside it but they can't run, I don't get why.
Kindly try to specify the worker module in your code.
As shown in Creating a new task, after calling the taskqueue.add() function, it targets the module named worker and invokes its handler by setting the url/update-counter.
class EnqueueTaskHandler(webapp2.RequestHandler):
def post(self):
amount = int(self.request.get('amount'))
task = taskqueue.add(
url='/update_counter',
target='worker',
params={'amount': amount})
self.response.write(
'Task {} enqueued, ETA {}.'.format(task.name, task.eta))
And from what I have read in this blog, a worker is one important piece of a task queue. It is a python process that reads jobs form the queue and executes them one at a time.
I hope that helps.
I have three Celery tasks that run on three different servers respectively.
tasks.send_push_notification
tasks.send_sms
tasks.send_email
I want to setup a workflow such that if sending push notification fails, I should try sending sms. And if sending sms fails, I should send email.
If those 3 tasks and their code base was on the same server, I would have followed the example on chained tasks and done something like
from celery import chain
from tasks import send_push_notification, send_sms, send_email
import json
# some paylaod
payload = json.dumps({})
res = chain(
send_push_notification.subtask(payload),
send_sms.subtask(payload),
send_email.subtask(payload)
)()
But the tasks are kept on 3 different servers!
I have tried
# 1
from celery import chain
from my_celery_app import app
res = chain(
app.send_task('tasks.send_push_notification', payload),
app.send_task('tasks.send_sms', payload),
app.send_task('tasks.send_email', payload)
)()
# Which fails because I am chaining tasks not subtasks
and
# 2
from celery import chain, subtask
res = chain(
subtask('tasks.send_push_notification', payload),
subtask('tasks.send_sms', payload),
subtask('tasks.send_email', payload)
)()
# fails because I am not adding the tasks on the broker
How can this be done?
Update:
I can do it using link NOT chain.
from celery import subtask
res = app.send_task(
'tasks.send_push_notification', (payload, ),
link=subtask(
'tasks.send_sms', (payload, ),
link=subtask(
'tasks.send_email', (payload, ),
)
)
)
There is a lot of nesting. And because I actually need to create a database driven workflow, it will be complicated to create it this way.
Why not handle it in your tasks,
def push_notification_task(payload):
if not send_push_notification(payload):
sms_notification_task.delay(payload)
def sms_notification_task(payload):
if not send_sms_notification(payload):
email_notification_task.delay(payload)
def email_notification_task(payload):
send_email_notification(payload)
Moreover, chain will execute all of your tasks in the given order, whereas you want next task to run only if first failed.