Some confusions regarding celery in python - python

I have divided celery into following parts
Celery
Celery worker
Celery daemon
Broker: Rabbimq or SQS
Queue
Result backend
Celery monitor (Flower)
My Understanding
When i hit celery task in django e,g tasks.add(1,2). Then celery adds that task to queue. I am confused if thats 4 or 5 in above list
WHen task goes to queue Then worker gets that task and delete from queue
The result of that task is saved in Result Backend
My Confusions
Whats diff between celery daemon and celery worker
Is Rabbitmq doing the work of queue. Does it means tasks gets saved in Rabitmq or SQS
What does flower do . does it monitor workers or tasks or queues or resulst

First, just to explain how it works briefly. You have a celery client running in your code. You call tasks.add(1,2) and a new Celery Task is created. That task is transferred by the Broker to the queue. Yes the queue is persisted in Rabbimq or SQS. The Celery Daemon is always running and is listening for new tasks. When there is a new task in the queue, it starts a new Celery Worker to perform the work.
To answer your questions:
Celery daemon is always running and it's starting celery workers.
Yes Rabitmq or SQS is doing the work of a queue.
With the celery monitor you can monitor how many tasks are running, how many are completed, what is the size of the queue, etc.

I think the answer from nstoitsev has good intention but create some confusion.
So let's try to clarify a bit.
A Celery worker is the celery process responsable of executing the
tasks, when configured to run in background than is often called
celery daemon. So you can consider the two the same thing.
To clarify the confusion of he answer of nstoitsev, each worker can have a concurrency parameter that can be bigger than 1. When this is the case each celery worker is capable of create N child worker till reaching the concurrency parameter to execute the task in parallel, this are often also called worker.
The broker holds queues and exchanges this means that a celery worker is able to connect to to the broker using a protocol called AMQP and publish or consume messages.
Flower is able to monitor a celery cluster using the broker itself. Basically is capable to receive events from all the workers. Flower works also if you have the Result Backend disabled that btw is default behavior with celery Celery result backend.
Hope this helps.

Related

Python distributed tasks with multiple queues

So the project I am working on requires a distributed tasks system to process CPU intensive tasks. This is relatively straight forward, spin up celery and throw all the tasks in a queue and have celery do the rest.
The issue I have is that every user needs their own queue, and items within each users queue must be processed synchronously. So it there is a task in a users queue already processing, wait until it is finished before allowing a worker to pick up the next.
The closest I've come to something like this is having a fixed set of queues, and assigning them to users. Then having the users tasks picked off by celery workers fixed to a certain queue with a concurrency of 1.
The problem with this system is that I can't scale my workers to process a backlog of user tasks.
Is there a way I can configure celery to do what I want, or perhaps another task system exists that does what I want?
Edit:
Currently I use the following command to spawn my celery workers with a concurrency of one on a fixed set of queues
celery multi start 4 -A app.celery -Q:1 queue_1 -Q:2 queue_2 -Q:3 queue_3 -Q:4 queue_4 --logfile=celery.log --concurrency=1
I then store a queue name on the user object, and when the user starts a process I queue a task to the queue stored on the user object. This gives me my synchronous tasks.
The downside is when I have multiple users sharing queues causing tasks to build up and never getting processed.
I'd like to have say 5 workers, and a queue per user object. Then have the workers just hop over the queues, but never have more than 1 worker on a single queue at a time.
I use chain doc here condition for execution task in a specific order :
chain = task1_task.si(account_pk) | task2_task.si(account_pk) | task3_task.si(account_pk)
chain()
So, i execute for a specific user task1 when its finished i execute task2 and when finished execute task3.
It will spawm in any worker available :)
For stopping a chain midway:
self.request.callbacks = None
return
And don't forget to bind your task :
#app.task(bind=True)
def task2_task(self, account_pk):

How to set up a distributed worker pool with Celery and RabbitMQ

I'm still really new to this kind of thing so it's entirely possible that I've got this wrong.
I am trying to set up a distributed task system. I have a Django webapp that is generating tasks using Celery. Right now, I have the webapp, the worker, and RabbitMQ running all on the same server. I would like to distribute this out to several servers.
As I currently understand it, I should be able to have my webapp generating tasks, handing them off to the message queue -- which is its own server -- and then workers distributed across any number of servers will consume tasks from that queue. I know how to tell my Django app which server is the broker, but how do I start worker threads on the worker servers and instruct them where to consume tasks from? I'm totally lost -- I don't even know where to look.
You can run your worker code (async_tasks.py) like this:
from celery import Celery
app = Celery('tasks', broker=broker_url)
#app.task(queue='queue_name')
def async_compute_something(input):
# do something
return "Result"
on other machines using this command :
celery -A async_tasks worker -Q queue_name
Note that you have set the url of the broker correctly and not localhost

Celery sometimes gives all jobs to one worker

we have a system that runs a bunch of long tasks (sometimes 10 minutes long) and sometimes (I can't yet reproduce it, but I see it in logs) celery behaves like this (I present a sample "timeline" of things that happen):
all workers are free
a lot of jobs is sent to celery
celery spreads work equally between workers
celery autoscales to accommodate new jobs
all (or almost all) jobs end properly
celery assigns ALL NEW jobs to one worker
jobs get delayed waiting for one overworked worker while all other workers are idle
after overworked worker is killed by celery everything returns to normal
Because of that, some jobs get delayed by, sometimes, even half an hour.
This is how we run celery:
celery -A application worker -l INFO --autoscale=100,12
celery -A application beat -l INFO
we use supervisor to run everything. Celery broker is RabbitMQ.
What can be the cause of this behavior and how to avoid this?
Thanks!

how to configure celery executing tasks concurrently from on queue

In an environment with 8 cores, celery should be able to process 8 incoming tasks in parallel by default. But sometimes when new tasks are received celery place them behind a long running process.
I played around with default configuration, letting one worker consume from one queue.
celery -A proj worker --loglevel=INFO --concurrency=8
Is my understanding wrong, that one worker with a concurrency of 8 is able to process 8 tasks from one queue in parallel?
How is the preferred way to setup celery to prevent such behaviour described above?
To put it simply concurrency is the number of jobs running on a worker. Prefetch is the number of job sitting in a queue on a worker itself. You have 1 of 2 options here. The first is to set the prefetch multiplier down to 1. This will mean the worker will only keep, in your case, 8 additional jobs in it's queue. The second which I would recommend would be to create 2 different queues one for your short running tasks and another for your long running tasks.

Celery multi with queues set up not receiving tasks from django

I am running my workers with the following command:
celery -A myapp multi start 4 -l debug -Q1:3 queue1,queue2 -Q:4 queue3
The workers start out very well so when i run
celery inspect active_queues
the queues appear assigned.
Then i start tasks from my django app with the following code:
result = chain(task1.s(**kwargs).set(queue='queue1'),task2.s(**kwargs).set(queue='queue2'))()
i parse the result variable with result.parent to get all tasks IDs and record them to database for further inspection. When i issue
task = AsyncResult(task.id)
task.status
i get
'PENDING'
for every task i start with my chain. The celery logs doesn't seem to be receiving any tasks. However when i issue a
celery purge
command with a following
yes
i get message that my tasks has been actually removed from 1 queue
the AsyncResult.status on the deleted tasks from here on continue to show up as 'PENDING' and the tasks never start.
I use rabbitmq-server as a broker with all default configuration. My celery config is default. It is really strange but in another environment the same code and commands produce other results: The workers also start but they do receive the very same tasks and execute them without any issues. Please consider what might be an issue here.
p.s. when i start a worker the other way:
celery -A myapp worker -Q queue1,queue2,queue3 -l debug
i still cant get my tasks executing.
The problem started to show up when i modified my chain to launch tasks and added the
.set(queue='queue1')
or queue2 or queue3
p.p.s:
all my tasks are written with a
#shared_task
decorator
Is there at least a way to see which tasks (which i can remove by celery purge) are waiting on a queue and what is the queue name they are waiting for?
Celery default settings should cover your case, so only thing could be that you have defined some of the following option in a way that mute your queues, and in this case consider commenting them out (more in the docs):
CELERY_QUEUES
CELERY_ROUTES
CELERY_DEFAULT_EXCHANGE
CELERY_DEFAULT_ROUTING_KEY
CELERY_DEFAULT_ROUTING_KEY
As for your question, I guess that's not the full answer, but you can list all active queues from RabbitMQ.
Using Celery, from the doc:
celery -A proj inspect active
Using RabbitMQ, from the doc:
rabbitmqadmin list queues vhost name node messages message_stats.publish_details.rate

Categories

Resources