we have a system that runs a bunch of long tasks (sometimes 10 minutes long) and sometimes (I can't yet reproduce it, but I see it in logs) celery behaves like this (I present a sample "timeline" of things that happen):
all workers are free
a lot of jobs is sent to celery
celery spreads work equally between workers
celery autoscales to accommodate new jobs
all (or almost all) jobs end properly
celery assigns ALL NEW jobs to one worker
jobs get delayed waiting for one overworked worker while all other workers are idle
after overworked worker is killed by celery everything returns to normal
Because of that, some jobs get delayed by, sometimes, even half an hour.
This is how we run celery:
celery -A application worker -l INFO --autoscale=100,12
celery -A application beat -l INFO
we use supervisor to run everything. Celery broker is RabbitMQ.
What can be the cause of this behavior and how to avoid this?
Thanks!
Related
I'm using celery to run tasks that are small and big in nature.
Setup:
I'm using separate queues to handle small, medium, and large tasks independently.
There are different celery workers catering to each of the different queues.
Celery 5.2.7, Python 3.8.10
Using Redis as the broker.
Late ack set to True
Prefetch count set to 1
Visibility timeout set to max.
Celery worker started with: celery -A celeryapp worker --concurrency=1 -Ofair -l INFO -E -Q bigtask-queue -n big#%h
I'm facing an issue where the tasks are getting duplicated across multiple workers of the same type. I'm auto-scaling based on the load on the CPU.
For e.g, when I have 4 tasks with a maximum of 4 workers, each of those 4 tasks is being queued up for execution on each of the 4 workers. I.e, each task is getting executed 4 times, once on each machine sequentially.
What I want is for them to execute just once. If one worker has picked up 1 task from the queue, the same shouldn't be picked by another worker. A new task should be picked only once the new node is up.
I have played with existing answers where setting visibility timeout to the maximum value, setting prefetch task to 1 along with late ack set to True. Nothing has helped.
What am I missing?
Does celery not recognize that the same task has already been picked up by the other worker?
Will using a flag on Redis for each task status work? Will there not be a race condition if multiple workers are already running?
Are there any other solutions?
Do you have celery beat worker running?
something like this:
celery -A run.celery worker --loglevel=info --autoscale=5,2 -n app#beatworker --beat
We had the same problem, but now I don't remember how was it resolved. Try adding this separate worker with --beat option. there should be only one --beat running
I have divided celery into following parts
Celery
Celery worker
Celery daemon
Broker: Rabbimq or SQS
Queue
Result backend
Celery monitor (Flower)
My Understanding
When i hit celery task in django e,g tasks.add(1,2). Then celery adds that task to queue. I am confused if thats 4 or 5 in above list
WHen task goes to queue Then worker gets that task and delete from queue
The result of that task is saved in Result Backend
My Confusions
Whats diff between celery daemon and celery worker
Is Rabbitmq doing the work of queue. Does it means tasks gets saved in Rabitmq or SQS
What does flower do . does it monitor workers or tasks or queues or resulst
First, just to explain how it works briefly. You have a celery client running in your code. You call tasks.add(1,2) and a new Celery Task is created. That task is transferred by the Broker to the queue. Yes the queue is persisted in Rabbimq or SQS. The Celery Daemon is always running and is listening for new tasks. When there is a new task in the queue, it starts a new Celery Worker to perform the work.
To answer your questions:
Celery daemon is always running and it's starting celery workers.
Yes Rabitmq or SQS is doing the work of a queue.
With the celery monitor you can monitor how many tasks are running, how many are completed, what is the size of the queue, etc.
I think the answer from nstoitsev has good intention but create some confusion.
So let's try to clarify a bit.
A Celery worker is the celery process responsable of executing the
tasks, when configured to run in background than is often called
celery daemon. So you can consider the two the same thing.
To clarify the confusion of he answer of nstoitsev, each worker can have a concurrency parameter that can be bigger than 1. When this is the case each celery worker is capable of create N child worker till reaching the concurrency parameter to execute the task in parallel, this are often also called worker.
The broker holds queues and exchanges this means that a celery worker is able to connect to to the broker using a protocol called AMQP and publish or consume messages.
Flower is able to monitor a celery cluster using the broker itself. Basically is capable to receive events from all the workers. Flower works also if you have the Result Backend disabled that btw is default behavior with celery Celery result backend.
Hope this helps.
As I see in celery, It can get number of tasks for a worker, that can run them at a same time.
I need run a task and set number of tasks can run simultaneously with this task.
Therefore, If I set this number to 2 and this task send to worker with 10 threads,
worker can run just one another task.
Worker will reserve tasks for each worker's tread. If you want to limit the number of tasks worker can execute the same time, you should configure your concurrency (e.g. to limit 1 task at the time, you need worker with 1 process -c 1).
You can also check prefetch configuration, but it only defines the number of tasks reserved for each process of the worker.
Here is Celery documentation where prefetch configuration explained:
http://celery.readthedocs.org/en/latest/userguide/optimizing.html
In an environment with 8 cores, celery should be able to process 8 incoming tasks in parallel by default. But sometimes when new tasks are received celery place them behind a long running process.
I played around with default configuration, letting one worker consume from one queue.
celery -A proj worker --loglevel=INFO --concurrency=8
Is my understanding wrong, that one worker with a concurrency of 8 is able to process 8 tasks from one queue in parallel?
How is the preferred way to setup celery to prevent such behaviour described above?
To put it simply concurrency is the number of jobs running on a worker. Prefetch is the number of job sitting in a queue on a worker itself. You have 1 of 2 options here. The first is to set the prefetch multiplier down to 1. This will mean the worker will only keep, in your case, 8 additional jobs in it's queue. The second which I would recommend would be to create 2 different queues one for your short running tasks and another for your long running tasks.
Is there any "clean" way of killing the worker processes once the queue they've been listening to gets empty?
The idea is that I don't need to have the workers continuously listening to the queue, if they don't have any work to do. I tried the autoscale option, but from my experience, even if I set the lower threshold to 0, the worker doesn't die.
In my case I need to download 1000 files from somewhere, using celery. So I put 1000 tasks in the queue, but that's all the work that I want this batch of workers to do. It would be ideal if they could somehow die after they all finished downloading all the files, and not idle there until I remember to kill the processes manually.
I couldn't find any suggestion of how to do this from within celery.
Any idea?
Thanks!
With the very nice Rq simple job queue system, you'd run it in "burst mode." The worker will start up, start processing all jobs in the queues, and die. To process more jobs, we start the worker again every minute or so using Cron.
For Celery, we can do the equivalent by using two separate Cron jobs:
*/2 * * * * celery -A proj worker -l info
This starts Celery every even minute.
1-59/2 * * * * ps auxww | grep 'celery worker' | awk '{print $2}' |
xargs kill
On odd minutes, stop the Celery worker. Fortunately for us: "When shutdown is initiated the worker will finish all currently executing tasks before it actually terminates".
The combination of the above means Celery only runs about 50% of the time. With a bit of tuning it could be shrunk to 5% or so.
In practice, I'd just let Celery run all the time. A single process waiting on an empty queue doesn't consume nearly any resources. It's also much more responsive and reliable.