Specify threads to Celery in case of many Queues executed - python

It's been almost over two years that i use Celery in Production Web/Server with Django. It's been almost over two years that i search without success a solution to this problem : "How specify the numbers of threads available to celery ?"
I have 32 Threads on my production Server and 7 Celery Queues.
I use Celery on Centos OS managed by Supervisord like this
celery.ini
[program:Site_Web_celery-worker1]
command=/etc/supervisord.d/celery-worker1.sh
directory=/var/www/html/SiteWeb/Site_Web/
user=apache
numprocs=1
stdout_logfile=/var/log/celery/worker1.log
stderr_logfile=/var/log/celery/worker1.log
autostart=true
autorestart=true
priority=999
stopasgroup=true
The celery command line for the 1 first Queue.
celery -A Site_Web.celery_settings worker -l info --autoscale 22 -Q default -n worker1.%h
In resume:
How can i just specify at Celery to work only on the 30 first Threads and never use the 2 last Threads ?
Thanks in advance for any help and tips.

If I understood well, you want to set CPU affinity for every worker-process spawned by Celery. Celery does not support setting CPU affinity for its worker-processes and to do this manually you would have to spend large amount of time writing a monitoring tool that constantly "watches" Celery worker and its child processes and sets up CPU affinity using taskset or something similar.
I personally believe it is not worth the effort. Good reasons for setting CPU affinity are rare - trust your system's scheduler.

Related

how to updates and synchronize all workers task and code

i have celery running on few computers and using flower for monitoring.
the computers is used by different people.
celery beat is generating jobs for all the workers from one of the computer.
every time new coded task is ready, all the workers less the beat-computer will have task not registered exception.
what is the recommended direction to sync all the code to all other computers in the network, is there a prehook kind of mechanism in celery to check for new code?
Unfortunately, you need to update the code on all the workers (nodes) and after that you need to restart all of them. This is by (good) design.
A clever systemd service could in theory be able to
send the graceful shutdown signal
run pip install -U your-project
start the Celery service

Does eventlet with celery executes tasks in parallel?

This post is in continuation with my previous post - celery how to implement single queue with multiple workers executing in parallel?
I implemented celery to work with eventlet using this command :-
celery -A project worker -P eventlet -l info --concurrency=4
I can see that my tasks are getting moved to active list faster (In flower) but i am not sure if they are executing in parallel? I have a 4 core server for production but I am not utilizing all the cores at the same time.
My question is :-
how can I use all 4 cores to execute tasks in parallel?
Both eventlet/gevent worker types provide great solution for concurrency at the cost of stalling parallelism to 1. To have true parallel task execution and utilise cores, run several Celery instances on same machine.
I know this goes counter to what popular Linux distros have in mind, so just ignore system packages and roll your great configuration from scratch. Systemd service template is your friend.
Another option is to run Celery with prefork pool, you get parallelism at the cost of stalling concurrency to number of workers.

Running rqworker concurrently

I'm new to RQ and am trying to use it for a job which will run in the background. I have managed to set it up, and I'm also able to start more than one worker.
Now I'm trying to run these workers concurrently. I installed supervisor and followed a tutorial to add programs to it, and it worked.
Here is my supervisor configuration:
[program:rqworker]
command=/usr/local/bin/rq worker mysql
process_name=rqworker1-%(process_num)s
numprocs=3
directory=/home/hp/Python/sample
stopsignal=TERM
autostart=true
autorestart=true
stdout_logfile=/home/hp/Python/sample/logs
The worker function is present in the sample directory mentioned above.
The problem is that even after specifying numprocs as 3 in the config file, the workers do not run in parallel.
Here are some screenshots, which show that although multiple workers have been started, they do not work in parallel.
Also, I saw this stackoverflow answer, but it still doesn't divide the jobs amongst the workers!
Could anyone tell me what is wrong with this configuration/what I need to change?
I found the problem; it wasn't with supervisor or rqworker. The manager program was blocking concurrency, by waiting for task completion!

celery worker node and celery beat startup configuration

I have a celery worker that executes a bunch of tasks for data loading. I start up my work node using the following:
celery -A ingest_tasks worker -Q ingest --loglevel=INFO --concurrency=3 -f /var/log/celery/ingest_tasks.log
I have another application I want to setup as a celery beat to periodically grab files from various locations. Since my second application is not under the worker node I should be ok just starting the beat like this correct?
celery -A file_mover beat -s /my/path/celerybeat-schedule
I have never used celery beats before. From reading the docs they seem pretty straightforward. However I wanted to make sure this is correct.

"celeryd stop" is not working

I am using celery in an uncommon way - I am creating custom process when celery is started, this process should be running all the time when celery is running.
Celery workers use this process for their tasks (details not needed).
I run celery from command line and everything is ok:
celery -A celery_jobs.tasks.app worker -B --loglevel=warning
But when I use celeryd to daemonize celery, there is no way to stop it.
Command celeryd stop tries to stop celery but never ends.
When I check process trees in both cases, there is the difference - when running from command line, the parent is obviously celery process (main process which has celery workers as childs). Killing (stopping) the parent celery process will stop all celery workers and my custom process.
But when running with celeryd, my custom process has parent /sbin/init - and calling celeryd stop is not working - seems like main celery process is waiting for something, or is unable to stop my custom process as it is not child process of celery.
I don't know much about processes and it is not easy to find information, because I don't know what I should search for, so any tips are appreciated.
I have had the same problem. I needed a quick solution, so I wrote this bash script
#/bin/bash
/etc/init.d/celeryd stop
sleep 10
export PIDS=`ps -ef | grep celery |grep -v 'grep' | awk '{print $2}'`
for PID in $PIDS; do kill -9 $PID; done;
If the process doesn't stop after 10 seconds, it's a long-time-to-stop candidate, so i decided to stop abruptly
I assume your custom process is not a child of any of your pool worker processes and need not be so.
I use supervisord instead of celeryd to daemonize workers. It can be used to daemonize other processes as well. Such as your custom processes.
In your case your supervisord.conf can have multiple sections. One for each celery worker node and one (or more) for your custom process(es).
When you kill the supervisord process (with -TERM) it will take care of terminating all the workers and your custom process as well. If you use -TERM, then you will need to make sure your custom processes handle them.

Categories

Resources