Celery worker details

Celery worker details - python

I have celery task with 100 input data in queue and need to execute using 5 workers.
How can I get which worker is executing which input?
Each worker executed how many inputs and its status?
If any task is failed how can get failed input data in separately and re-execute with available worker?
Is there any possible ways to customize celery based on worker specific.
We can combine celery worker limitation and flower
I am not using any framework.

How can I get which worker is executing which input?
There are 2 options to use multiple workers:
You run each worker separately with separate run commands
You run in one command using command line option -c i.e. concurrency
First method, flower will support it and will show you all the workers, all the tasks (you call inputs), which worker processed which task and other information too.
With second method, flower will show you all the tasks being processed by single worker. In this case you can only differentiate by viewing logs generated by celery worker as in logs it does store which worker THREAD executed which task. So, i think you will be better using first option given your requirements.
Each worker executed how many inputs and its status?
As I mentioned, using first approach, flower will give you this information.
If any task is failed how can get failed input data in separately and
re-execute with available worker?
Flower does provide the filters to filter the failed tasks and does provide what status tasks returned when exiting. Also you can set how many times celery should retry a failed task. But even after retries task fails, then you will have to relaunch the task yourself.

For the first and second question:
1) Using Flower API:
You can use celery flower to keep track of it. Flower api can provide you the information like which task is being executed by which worker through simple api calls (/api/task/info/<task_id>)
2) Querying celery directly:
from celery import Celery
celery = Celery('vwadaptor', broker='redis://workerdb:6379/0',backend='redis://workerdb:6379/0')
celery.control.inspect().active()
3) Using celery events:
Link : http://docs.celeryproject.org/en/latest/userguide/monitoring.html
(look Real-time Processing)
You can create an event ( task created, task received, etc) and the response will have the worker name(hostname , see the link)
For the third question:
Use the config entry 'CELERY_ACKS_LATE=True' to retry failed tasks.
celery.conf.update(
CELERY_ACKS_LATE=True,
)
You can also track failed tasks using celery events mentioned above and retry failed tasks manually.

Related

Celery - execute task on all nodes

I have a special use case where I need to run a task on all workers to check if a specific process is running on the celery worker. The problem is that I need to run this on all my workers as each worker represents a replica of this specific process.
In the end I want to display 8/20 workers are ready to process further tasks.
But currently I'm only able to process a task on either a random selected worker or just on one specific worker which does not solve my problem at all ...
Thanks in advance

I can't think of a good way to do this on Celery. However, a nice workaround perhaps could be to implement your own command, and then you can broadcast that command to every worker (just like you can broadcast shutdown or status commands for an example). When I think of it, this does indeed sound like some sort of monitoring/maintenance operation, right?

Python distributed tasks with multiple queues

So the project I am working on requires a distributed tasks system to process CPU intensive tasks. This is relatively straight forward, spin up celery and throw all the tasks in a queue and have celery do the rest.
The issue I have is that every user needs their own queue, and items within each users queue must be processed synchronously. So it there is a task in a users queue already processing, wait until it is finished before allowing a worker to pick up the next.
The closest I've come to something like this is having a fixed set of queues, and assigning them to users. Then having the users tasks picked off by celery workers fixed to a certain queue with a concurrency of 1.
The problem with this system is that I can't scale my workers to process a backlog of user tasks.
Is there a way I can configure celery to do what I want, or perhaps another task system exists that does what I want?
Edit:
Currently I use the following command to spawn my celery workers with a concurrency of one on a fixed set of queues
celery multi start 4 -A app.celery -Q:1 queue_1 -Q:2 queue_2 -Q:3 queue_3 -Q:4 queue_4 --logfile=celery.log --concurrency=1
I then store a queue name on the user object, and when the user starts a process I queue a task to the queue stored on the user object. This gives me my synchronous tasks.
The downside is when I have multiple users sharing queues causing tasks to build up and never getting processed.
I'd like to have say 5 workers, and a queue per user object. Then have the workers just hop over the queues, but never have more than 1 worker on a single queue at a time.

I use chain doc here condition for execution task in a specific order :
chain = task1_task.si(account_pk) | task2_task.si(account_pk) | task3_task.si(account_pk)
chain()
So, i execute for a specific user task1 when its finished i execute task2 and when finished execute task3.
It will spawm in any worker available :)
For stopping a chain midway:
self.request.callbacks = None
return
And don't forget to bind your task :
#app.task(bind=True)
def task2_task(self, account_pk):

Is there any way to make sure certain tasks are not executed in parallel?

I'm writing a Celery task that will run some tests on the pull requests created in BitBucket.
My problem is that if a pull request is updated before my task finishes it will trigger the task again and so I can end up having two tasks running tests on same pull request at the same time.
Is there any way I can prevent this? And make sure that if a task processing certain pull request is already in progress then I wait for that to finish and then start processing it again (from the new task that was queued)
As I monitor multiple repos each with multiple PRs I would like that if an event is coming but from different repo or different pull request to start it and run it.
I only need to queue it if I already have in progress same pull request from same repo.
Any idea if this is possible with celery?

Simplest way to achieve this is, setting worker concurrency to 1 so that only one task gets executed at a time.
Route the tasks to a seperate queue.
your_task.apply_async(foo, queue='bar')
Then start your worker with concurency of one
celery worker -Q bar -c 1
See also Celery - one task in one second

You are looking for a mutex. For Celery, there is celery_mutex and celery_once. In particular, celery_once claims to be doing what you ask, but I do not have experience with it.
You could also use the Python multiprocessing that has a global mutex implementation, or use a shared storage that you already have.
If the tasks run on the same machine, the operating system has locking mechanisms.

task with number of task can run along that in celery

As I see in celery, It can get number of tasks for a worker, that can run them at a same time.
I need run a task and set number of tasks can run simultaneously with this task.
Therefore, If I set this number to 2 and this task send to worker with 10 threads,
worker can run just one another task.

Worker will reserve tasks for each worker's tread. If you want to limit the number of tasks worker can execute the same time, you should configure your concurrency (e.g. to limit 1 task at the time, you need worker with 1 process -c 1).
You can also check prefetch configuration, but it only defines the number of tasks reserved for each process of the worker.
Here is Celery documentation where prefetch configuration explained:
http://celery.readthedocs.org/en/latest/userguide/optimizing.html

Celery multi with queues set up not receiving tasks from django

I am running my workers with the following command:
celery -A myapp multi start 4 -l debug -Q1:3 queue1,queue2 -Q:4 queue3
The workers start out very well so when i run
celery inspect active_queues
the queues appear assigned.
Then i start tasks from my django app with the following code:
result = chain(task1.s(**kwargs).set(queue='queue1'),task2.s(**kwargs).set(queue='queue2'))()
i parse the result variable with result.parent to get all tasks IDs and record them to database for further inspection. When i issue
task = AsyncResult(task.id)
task.status
i get
'PENDING'
for every task i start with my chain. The celery logs doesn't seem to be receiving any tasks. However when i issue a
celery purge
command with a following
yes
i get message that my tasks has been actually removed from 1 queue
the AsyncResult.status on the deleted tasks from here on continue to show up as 'PENDING' and the tasks never start.
I use rabbitmq-server as a broker with all default configuration. My celery config is default. It is really strange but in another environment the same code and commands produce other results: The workers also start but they do receive the very same tasks and execute them without any issues. Please consider what might be an issue here.
p.s. when i start a worker the other way:
celery -A myapp worker -Q queue1,queue2,queue3 -l debug
i still cant get my tasks executing.
The problem started to show up when i modified my chain to launch tasks and added the
.set(queue='queue1')
or queue2 or queue3
p.p.s:
all my tasks are written with a
#shared_task
decorator
Is there at least a way to see which tasks (which i can remove by celery purge) are waiting on a queue and what is the queue name they are waiting for?

Celery default settings should cover your case, so only thing could be that you have defined some of the following option in a way that mute your queues, and in this case consider commenting them out (more in the docs):
CELERY_QUEUES
CELERY_ROUTES
CELERY_DEFAULT_EXCHANGE
CELERY_DEFAULT_ROUTING_KEY
CELERY_DEFAULT_ROUTING_KEY
As for your question, I guess that's not the full answer, but you can list all active queues from RabbitMQ.
Using Celery, from the doc:
celery -A proj inspect active
Using RabbitMQ, from the doc:
rabbitmqadmin list queues vhost name node messages message_stats.publish_details.rate

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.