How to set up a distributed worker pool with Celery and RabbitMQ

How to set up a distributed worker pool with Celery and RabbitMQ - python

I'm still really new to this kind of thing so it's entirely possible that I've got this wrong.
I am trying to set up a distributed task system. I have a Django webapp that is generating tasks using Celery. Right now, I have the webapp, the worker, and RabbitMQ running all on the same server. I would like to distribute this out to several servers.
As I currently understand it, I should be able to have my webapp generating tasks, handing them off to the message queue -- which is its own server -- and then workers distributed across any number of servers will consume tasks from that queue. I know how to tell my Django app which server is the broker, but how do I start worker threads on the worker servers and instruct them where to consume tasks from? I'm totally lost -- I don't even know where to look.

You can run your worker code (async_tasks.py) like this:
from celery import Celery
app = Celery('tasks', broker=broker_url)
#app.task(queue='queue_name')
def async_compute_something(input):
# do something
return "Result"
on other machines using this command :
celery -A async_tasks worker -Q queue_name
Note that you have set the url of the broker correctly and not localhost

Related

For a web application in Python running in a web server with WSGI, how to have one single WSGI Worker performing a task?

My web application with Python 3.9 and Flask is running in a web server with WSGI.
As more users connect to the web server, more workers are started by WSGI, but there are some tasks that must be performed by one single WSGI worker, rather than all Workers at the same time.
Among such tasks to be performed by one single worker are:
delete obsolete files in the disk
copy some data from a file to REDIS
delete specific lines in various TXT and LOG files
If all workers do such tasks, then a mess starts.
How to have one single worker doing it, rather than all workers?

You may want to look into implementing an asynchronous task queue; something like celery would work to do this, you can define the frequency at which tasks run.

Some confusions regarding celery in python

I have divided celery into following parts
Celery
Celery worker
Celery daemon
Broker: Rabbimq or SQS
Queue
Result backend
Celery monitor (Flower)
My Understanding
When i hit celery task in django e,g tasks.add(1,2). Then celery adds that task to queue. I am confused if thats 4 or 5 in above list
WHen task goes to queue Then worker gets that task and delete from queue
The result of that task is saved in Result Backend
My Confusions
Whats diff between celery daemon and celery worker
Is Rabbitmq doing the work of queue. Does it means tasks gets saved in Rabitmq or SQS
What does flower do . does it monitor workers or tasks or queues or resulst

First, just to explain how it works briefly. You have a celery client running in your code. You call tasks.add(1,2) and a new Celery Task is created. That task is transferred by the Broker to the queue. Yes the queue is persisted in Rabbimq or SQS. The Celery Daemon is always running and is listening for new tasks. When there is a new task in the queue, it starts a new Celery Worker to perform the work.
To answer your questions:
Celery daemon is always running and it's starting celery workers.
Yes Rabitmq or SQS is doing the work of a queue.
With the celery monitor you can monitor how many tasks are running, how many are completed, what is the size of the queue, etc.

I think the answer from nstoitsev has good intention but create some confusion.
So let's try to clarify a bit.
A Celery worker is the celery process responsable of executing the
tasks, when configured to run in background than is often called
celery daemon. So you can consider the two the same thing.
To clarify the confusion of he answer of nstoitsev, each worker can have a concurrency parameter that can be bigger than 1. When this is the case each celery worker is capable of create N child worker till reaching the concurrency parameter to execute the task in parallel, this are often also called worker.
The broker holds queues and exchanges this means that a celery worker is able to connect to to the broker using a protocol called AMQP and publish or consume messages.
Flower is able to monitor a celery cluster using the broker itself. Basically is capable to receive events from all the workers. Flower works also if you have the Result Backend disabled that btw is default behavior with celery Celery result backend.
Hope this helps.

Celery multi with queues set up not receiving tasks from django

I am running my workers with the following command:
celery -A myapp multi start 4 -l debug -Q1:3 queue1,queue2 -Q:4 queue3
The workers start out very well so when i run
celery inspect active_queues
the queues appear assigned.
Then i start tasks from my django app with the following code:
result = chain(task1.s(**kwargs).set(queue='queue1'),task2.s(**kwargs).set(queue='queue2'))()
i parse the result variable with result.parent to get all tasks IDs and record them to database for further inspection. When i issue
task = AsyncResult(task.id)
task.status
i get
'PENDING'
for every task i start with my chain. The celery logs doesn't seem to be receiving any tasks. However when i issue a
celery purge
command with a following
yes
i get message that my tasks has been actually removed from 1 queue
the AsyncResult.status on the deleted tasks from here on continue to show up as 'PENDING' and the tasks never start.
I use rabbitmq-server as a broker with all default configuration. My celery config is default. It is really strange but in another environment the same code and commands produce other results: The workers also start but they do receive the very same tasks and execute them without any issues. Please consider what might be an issue here.
p.s. when i start a worker the other way:
celery -A myapp worker -Q queue1,queue2,queue3 -l debug
i still cant get my tasks executing.
The problem started to show up when i modified my chain to launch tasks and added the
.set(queue='queue1')
or queue2 or queue3
p.p.s:
all my tasks are written with a
#shared_task
decorator
Is there at least a way to see which tasks (which i can remove by celery purge) are waiting on a queue and what is the queue name they are waiting for?

Celery default settings should cover your case, so only thing could be that you have defined some of the following option in a way that mute your queues, and in this case consider commenting them out (more in the docs):
CELERY_QUEUES
CELERY_ROUTES
CELERY_DEFAULT_EXCHANGE
CELERY_DEFAULT_ROUTING_KEY
CELERY_DEFAULT_ROUTING_KEY
As for your question, I guess that's not the full answer, but you can list all active queues from RabbitMQ.
Using Celery, from the doc:
celery -A proj inspect active
Using RabbitMQ, from the doc:
rabbitmqadmin list queues vhost name node messages message_stats.publish_details.rate

Celery configure separate connection for producer and consumer

We have an application setup on heroku, which uses celery to run background jobs.
The celery app uses RabbitMQ as the broker.
We used heroku’s RabbitMQ Bigwig add-on as AMQP message broker.
This add-on specifies two separate url one optimized for producer and other optimized for consumer.
Also, as per RabbitMQ documentation it is recommended to use separate connections for producer and consumer.
Celery documentation does not provide a ways to specify connections separately to producer and consumer.
Is there a way to specify two different broker urls in celery?

Unfortunately, there isn't a clean way to do that. You can provide a custom broker connection explicitly on task.apply_async, but that means giving up on the connection pool feature. It might work for you.
from kombu import BrokerConnection
conn = BrokerConnection(hostname="producerbroker")
mytask.apply_async(args, kwargs, connection=conn)
The most straightforward solution is probably to have different config files for producer and worker.

Add functions dynamically to existing celery worker processes?

I'm getting started with celery and I want to know if it is possible to add modules to celeryd processes that have already been started. In other words, instead of adding modules via celeryconfig.py as in
CELERY_IMPORTS = ("tasks", "additional_module" )
before starting the workers, I want to make additional_module available later somehow after the worker processes have started.
thanks in advance.

You can achieve your goal by starting a new celeryd with an expanded import list and eventually gracefully shutting down your old worker (after it's finished its current jobs).
Because of the asynchronous nature of getting jobs pushed to you and only marking them done after celery has finished its work, you won't actually miss any work doing it this way. You should be able to run the celery workers on the same machine - they'll simply show up as new connections to RabbitMQ (or whatever queue backend you use).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.