I am running my workers with the following command:
celery -A myapp multi start 4 -l debug -Q1:3 queue1,queue2 -Q:4 queue3
The workers start out very well so when i run
celery inspect active_queues
the queues appear assigned.
Then i start tasks from my django app with the following code:
result = chain(task1.s(**kwargs).set(queue='queue1'),task2.s(**kwargs).set(queue='queue2'))()
i parse the result variable with result.parent to get all tasks IDs and record them to database for further inspection. When i issue
task = AsyncResult(task.id)
task.status
i get
'PENDING'
for every task i start with my chain. The celery logs doesn't seem to be receiving any tasks. However when i issue a
celery purge
command with a following
yes
i get message that my tasks has been actually removed from 1 queue
the AsyncResult.status on the deleted tasks from here on continue to show up as 'PENDING' and the tasks never start.
I use rabbitmq-server as a broker with all default configuration. My celery config is default. It is really strange but in another environment the same code and commands produce other results: The workers also start but they do receive the very same tasks and execute them without any issues. Please consider what might be an issue here.
p.s. when i start a worker the other way:
celery -A myapp worker -Q queue1,queue2,queue3 -l debug
i still cant get my tasks executing.
The problem started to show up when i modified my chain to launch tasks and added the
.set(queue='queue1')
or queue2 or queue3
p.p.s:
all my tasks are written with a
#shared_task
decorator
Is there at least a way to see which tasks (which i can remove by celery purge) are waiting on a queue and what is the queue name they are waiting for?
Celery default settings should cover your case, so only thing could be that you have defined some of the following option in a way that mute your queues, and in this case consider commenting them out (more in the docs):
CELERY_QUEUES
CELERY_ROUTES
CELERY_DEFAULT_EXCHANGE
CELERY_DEFAULT_ROUTING_KEY
CELERY_DEFAULT_ROUTING_KEY
As for your question, I guess that's not the full answer, but you can list all active queues from RabbitMQ.
Using Celery, from the doc:
celery -A proj inspect active
Using RabbitMQ, from the doc:
rabbitmqadmin list queues vhost name node messages message_stats.publish_details.rate
Related
I'm still really new to this kind of thing so it's entirely possible that I've got this wrong.
I am trying to set up a distributed task system. I have a Django webapp that is generating tasks using Celery. Right now, I have the webapp, the worker, and RabbitMQ running all on the same server. I would like to distribute this out to several servers.
As I currently understand it, I should be able to have my webapp generating tasks, handing them off to the message queue -- which is its own server -- and then workers distributed across any number of servers will consume tasks from that queue. I know how to tell my Django app which server is the broker, but how do I start worker threads on the worker servers and instruct them where to consume tasks from? I'm totally lost -- I don't even know where to look.
You can run your worker code (async_tasks.py) like this:
from celery import Celery
app = Celery('tasks', broker=broker_url)
#app.task(queue='queue_name')
def async_compute_something(input):
# do something
return "Result"
on other machines using this command :
celery -A async_tasks worker -Q queue_name
Note that you have set the url of the broker correctly and not localhost
I have divided celery into following parts
Celery
Celery worker
Celery daemon
Broker: Rabbimq or SQS
Queue
Result backend
Celery monitor (Flower)
My Understanding
When i hit celery task in django e,g tasks.add(1,2). Then celery adds that task to queue. I am confused if thats 4 or 5 in above list
WHen task goes to queue Then worker gets that task and delete from queue
The result of that task is saved in Result Backend
My Confusions
Whats diff between celery daemon and celery worker
Is Rabbitmq doing the work of queue. Does it means tasks gets saved in Rabitmq or SQS
What does flower do . does it monitor workers or tasks or queues or resulst
First, just to explain how it works briefly. You have a celery client running in your code. You call tasks.add(1,2) and a new Celery Task is created. That task is transferred by the Broker to the queue. Yes the queue is persisted in Rabbimq or SQS. The Celery Daemon is always running and is listening for new tasks. When there is a new task in the queue, it starts a new Celery Worker to perform the work.
To answer your questions:
Celery daemon is always running and it's starting celery workers.
Yes Rabitmq or SQS is doing the work of a queue.
With the celery monitor you can monitor how many tasks are running, how many are completed, what is the size of the queue, etc.
I think the answer from nstoitsev has good intention but create some confusion.
So let's try to clarify a bit.
A Celery worker is the celery process responsable of executing the
tasks, when configured to run in background than is often called
celery daemon. So you can consider the two the same thing.
To clarify the confusion of he answer of nstoitsev, each worker can have a concurrency parameter that can be bigger than 1. When this is the case each celery worker is capable of create N child worker till reaching the concurrency parameter to execute the task in parallel, this are often also called worker.
The broker holds queues and exchanges this means that a celery worker is able to connect to to the broker using a protocol called AMQP and publish or consume messages.
Flower is able to monitor a celery cluster using the broker itself. Basically is capable to receive events from all the workers. Flower works also if you have the Result Backend disabled that btw is default behavior with celery Celery result backend.
Hope this helps.
I have celery task with 100 input data in queue and need to execute using 5 workers.
How can I get which worker is executing which input?
Each worker executed how many inputs and its status?
If any task is failed how can get failed input data in separately and re-execute with available worker?
Is there any possible ways to customize celery based on worker specific.
We can combine celery worker limitation and flower
I am not using any framework.
How can I get which worker is executing which input?
There are 2 options to use multiple workers:
You run each worker separately with separate run commands
You run in one command using command line option -c i.e. concurrency
First method, flower will support it and will show you all the workers, all the tasks (you call inputs), which worker processed which task and other information too.
With second method, flower will show you all the tasks being processed by single worker. In this case you can only differentiate by viewing logs generated by celery worker as in logs it does store which worker THREAD executed which task. So, i think you will be better using first option given your requirements.
Each worker executed how many inputs and its status?
As I mentioned, using first approach, flower will give you this information.
If any task is failed how can get failed input data in separately and
re-execute with available worker?
Flower does provide the filters to filter the failed tasks and does provide what status tasks returned when exiting. Also you can set how many times celery should retry a failed task. But even after retries task fails, then you will have to relaunch the task yourself.
For the first and second question:
1) Using Flower API:
You can use celery flower to keep track of it. Flower api can provide you the information like which task is being executed by which worker through simple api calls (/api/task/info/<task_id>)
2) Querying celery directly:
from celery import Celery
celery = Celery('vwadaptor', broker='redis://workerdb:6379/0',backend='redis://workerdb:6379/0')
celery.control.inspect().active()
3) Using celery events:
Link : http://docs.celeryproject.org/en/latest/userguide/monitoring.html
(look Real-time Processing)
You can create an event ( task created, task received, etc) and the response will have the worker name(hostname , see the link)
For the third question:
Use the config entry 'CELERY_ACKS_LATE=True' to retry failed tasks.
celery.conf.update(
CELERY_ACKS_LATE=True,
)
You can also track failed tasks using celery events mentioned above and retry failed tasks manually.
I have two files containing celery task definitions. Each of them contains code for a specific queue. One of them imports scikit-learn and therefore is a little memory-consuming for the limited memory the VPS has. When celery initializes it executes both files to look for tasks and each celery worker imports scikit-learn. Is there a way to prevent this?
I have tried using inspect to get the current active queue and continue if this worker consumes this queue, but I think it doesn't work when initializing:
i = inspect(['celery#hostname'])
print i.active_queues() # None
I think the best way to go is to start two workers, let them load 2 different apps and create 2 different queues.
example worker start cmd from top of my head:
celery -A scikit -Q learning worker
celery -A default -Q default worker
That of course requires you to add task routing (so that scikit tasks goes into the learning queue and the others go to the default queue).
I was able to solve it by emptying the CELERY_IMPORTS list and then including via cmd
celery -A proj worker -l info -Q first_queue -I proj.file1
which only looks for tasks in proj.file1.
I'm getting started with celery and I want to know if it is possible to add modules to celeryd processes that have already been started. In other words, instead of adding modules via celeryconfig.py as in
CELERY_IMPORTS = ("tasks", "additional_module" )
before starting the workers, I want to make additional_module available later somehow after the worker processes have started.
thanks in advance.
You can achieve your goal by starting a new celeryd with an expanded import list and eventually gracefully shutting down your old worker (after it's finished its current jobs).
Because of the asynchronous nature of getting jobs pushed to you and only marking them done after celery has finished its work, you won't actually miss any work doing it this way. You should be able to run the celery workers on the same machine - they'll simply show up as new connections to RabbitMQ (or whatever queue backend you use).