I have a few questions regarding task routing, concurrency and performance.
Here is my use case :
I've got one dedicated server to run celery tasks, so I can use all the CPUs to run celery workers on this server.
I have a lot of different python tasks, which I route using : CELERY_ROUTES and because the tasks perform really different types of python code, I created 5 different workers.
These worker are created when I deploy my project using ansible, here is an example:
[program:default_queue-celery]
command={{ venv_dir }}/bin/celery worker --app=django_coreapp --loglevel=INFO --concurrency=1 --autoscale=15,10 --queues=default_queue
environment =
SERVER_TYPE="{{ SERVER_TYPE }}",
DB_SCHEMA="{{ DB_SCHEMA }}",
DB_USER="{{ DB_USER }}",
DB_PASS="{{ DB_PASS }}",
DB_HOST="{{ DB_HOST }}"
directory={{ git_dir }}
user={{ user }}
group={{ group }}
stdout_logfile={{ log_dir }}/default_queue.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
redirect_stderr=true
autostart=true
autorestart=true
startsecs=10
killasgroup=true
I have also a CELERY_QUEUES in settings.py to make the bridge between CELERY_ROUTES and my celery programs (Queues)
CELERY_DEFAULT_QUEUE = 'default_queue'
And if it happens that I don't route a task it will go to my 'default_queue'
To give space to all of my queues, I set --concurrency to 1 for default_queue, and more for my most important queue.
But I am wondering, does AutoScale have impact on the same value as concurrency ? Meaning, if I set concurrency to 1 and --autoscale to 15,10 (example above)
Will my worker 'work' on CPU and process a maximum of 15 tasks on this CPU ?
Or does this mean something completely different ?
It makes no sense setting both concurrency and autoscale since both are means to control the number of worker subprocesses for a given worker instance, as explained here.
--concurrency N means you will have exactly N worker subprocesses for your worker instance (meaning the worker instance can handle N conccurent tasks).
--autoscale max, min means you will have at least min and at most max concurrent worker subprocesses for a given worker instance.
On which CPU each process (the main worker process or any of it's child subprocesses) will run is not predictable, it's an OS thing, but do not assume subprocesses will all run on the same CPU (chances are they won't - that's part of the point of having concurrent subprocesses actually).
Related
I have a Django project with celery
Due to RAM limitations I can only run two worker processes.
I have a mix of 'slow' and 'fast' tasks.
Fast tasks shall be executed ASAP. There can be many fast tasks in a short time frame (0.1s - 3s), so ideally both CPUs should handle them.
Slow tasks might run for a few minutes but the result can be delayed.
Slow tasks occur less often, but it can happen that 2 or 3 are queued up at the same time.
My idea was to have one:
1 celery worker W1 with concurrency 1, that handles only fast tasks
1 celery worker W2 with concurrency 1 that can handle fast and slow tasks.
celery has by default a task prefetch multiplier ( https://docs.celeryproject.org/en/latest/userguide/configuration.html#worker-prefetch-multiplier ) of 4, which means that 4 fast tasks could be queued behind a slow task and could be delayed by several minutes. Thus I'd like to disable prefetch for worker W2. The doc states:
To disable prefetching, set worker_prefetch_multiplier to 1. Changing
that setting to 0 will allow the worker to keep consuming as many
messages as it wants.
However what I observe is, that with a prefetch_multiplier of 1 one task is prefetched and would still be delayed by a slow task.
Is this a documentation bug? Is this an implementation bug? Or do I misunderstand the documentation?
Is there any way to implement what I want?
The commands, that I execute to start the workers are:
celery -A miniclry worker --concurrency=1 -n w2 -Q=fast,slow --prefetch-multiplier 0
celery -A miniclry worker --concurrency=1 -n w1 -Q=fast
my celery settings are default except:
CELERY_BROKER_URL = "pyamqp://*****#localhost:5672/mini"
CELERY_TASK_ROUTES = {
'app1.tasks.task_fast': {"queue": "fast"},
'app1.tasks.task_slow': {"queue": "slow"},
}
my django project's celery.py file is:
from __future__ import absolute_import
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'miniclry.settings')
app = Celery("miniclry", backend="rpc", broker="pyamqp://")
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
The __init__.py of my django project is
from .celery import app as celery_app
__all__ = ('celery_app',)
The code of my workers
import time, logging
from celery import shared_task
from miniclry.celery import app as celery_app
logger = logging.getLogger(__name__)
#shared_task
def task_fast(delay=0.1):
logger.warning("fast in")
time.sleep(delay)
logger.warning("fast out")
#shared_task
def task_slow(delay=30):
logger.warning("slow in")
time.sleep(delay)
logger.warning("slow out")
If I execute following from a management shell I see, that one fast task is only executed after the slow task finished.
from app1.tasks import task_fast, task_slow
task_slow.delay()
for i in range(30):
task_fast.delay()
Can anybody help?
I could post the entire test project if this is considered helpful. Just advise about the recommended SO way of exchanging such kind of projects
Version info:
celery==4.3.0
Django==1.11.25
Python 2.7.12
I confirm the issue, there is a bug in this section of the documentation. worker_prefetch_multiplier = 1 will just as it says, set the worker's prefetch to 1, means worker will hold one more task in addition to one that is executing at the moment.
To actually disable the prefetch you also need to use task_acks_late = True along with the prefetch setting, see this docs section
I have a network of Celery servers and workers that are to be used for a high volume I/O task coming up. There are two queues, default and backlog, and each server has five workers. All servers are daemonized with a configuration much like the init script config documentation.
What I'd like to do for one server is have three workers for default and two for backlog. Is it possible to do this with a daemon configuration?
Have a look here in this part whare it shows you an example configuration it also says:
# Names of nodes to start
# most people will only start one node:
CELERYD_NODES="worker1"
# but you can also start multiple and configure settings
# for each in CELERYD_OPTS (see `celery multi --help` for examples):
So as you can see is possible to have celeryd starting multiple nodes and you can configure each of them via CELERYD_OPTS, therefore you can set different queue for each of them.
Here you found another more complete example of celery multi, I post here a little extract.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data
-Q default -L:4,5 DEBUG
We have recently forced to replace celery with RQ as it is simpler and celery was giving us too many problems. Now, we are not able to find a way to create multiple queues dynamically because we need to get multiple jobs done concurrently. So basically every request to one of our routes should start a job and it doesn't make sense to have multiple users wait for one user's job to be done before we can proceed with next jobs. We periodically send a request to the server in order to get the status of the job and some meta data. This way we can update the user with a progress bar (It could be a lengthy process so this has to be done for the sake of UX)
We are using Django and Python's rq library. We are not using django-rq (Please let me know if there are advantages in using this)
So far we start a task in one of our controllers like:
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = django_rq.enqueue(render_task, new_render.pk, domain=domain, data=csv_data, timeout=1200)
Then in our render_task method we add meta data to the job based on the state of the long task:
current_job = get_current_job()
current_job.meta['state'] = 'PROGRESS'
current_job.meta['process_percent'] = process_percent
current_job.meta['message'] = 'YOUTUBE'
current_job.save()
Now we have another endpoint that gets the current task and its meta data and passes it back to client (This happens through oeriodic AJAX request)
How do we go about running jobs concurrently without blocking other jobs? Should we make queues dynamically? Is there a way to make use of Workers in order to achieve this?
As far as I know RQ does not have any facility to manage multiple workers. You have to start a new worker process defining which queue it will consume. One way of doing this which works pretty well for me is using Supervisor. In supervisor you configure your worker for a given queue and number of processes to have concurrency. For example you can have queue "high-priority" with 5 workers and queue "low-priority" with 1 worker.
It is not only possible but ideal to run multiple workers. I use a bash file for the start command to enter the virtual env, and launch with a custom Worker class.
Here's a supervisor config that has worked very well for me for RQ workers, under a production workload as well. Note that startretries is high since this runs on AWS and needs retries during deployments.
[program:rq-workers]
process_name=%(program_name)s_%(process_num)02d
command=/usr/local/bin/start_rq_worker.sh
autostart=true
autorestart=true
user=root
numprocs=5
startretries=50
stopsignal=INT
killasgroup=true
stopasgroup=true
stdout_logfile=/opt/elasticbeanstalk/tasks/taillogs.d/super_logs.conf
redirect_stderr=true
Contents of start_rq_worker.sh
#!/bin/bash
date > /tmp/date
source /opt/python/run/venv/bin/activate
source /opt/python/current/env
/opt/python/run/venv/bin/python /opt/python/current/app/manage.py
rqworker --worker-class rq.SimpleWorker default
I would like to suggest a very simple solution using django-rq:
Sample settings.py
...
RQ_QUEUES = {
'default': {
'HOST': os.getenv('REDIS_HOST', 'localhost'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
},
'low': {
'HOST': os.getenv('REDIS_HOST', 'localhost'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
}
}
...
Run Configuration
Run python manage.py rqworker default low as many times (each time in its own shell, or as its own Docker container, for instance) as the number of desired workers. The order of queues in the command determines their priority. At this point, all workers are listening to both queues.
In the Code
When calling a job to run, pass in the desired queue:
For high/normal priority jobs, you can make the call without any parameters, and the job will enter the default queue. For low priority, you must specify, either at the job level:
#job('low')
def my_low_priority_job():
# some code
And then call my_low_priority_job.delay().
Alternatively, determine priority when calling:
queue = django_rq.get_queue('low')
queue.enqueue(my_variable_priority_job)
I currently using "Celeryd" to run my Celery workers as a daemon. My /etc/default/celeryd file contains the following:
CELERYD_NODES="w1 w2 w3"
Which obviously starts three worker processes.
How do I configure routing to work with this configuration? e.g.
celeryd -c 2 -l INFO -Q import
If I run celery from the command line I can specify the queue using the -Q flag. I need to tell my w1 worker process to only process tasks from the "import" queue.
You can make different workers consume from different/same queues by giving proper args in CELERYD_OPTS.
Refer this: http://celery.readthedocs.org/en/latest/reference/celery.bin.multi.html
The link is for celery multi documentation, but you can give the argument in same way to your case also.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG
can be used as:
$ CELERYD_OPTS="--time-limit=300 --concurrency=8 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG"
Do not create extra daemons unless required.
Hope this helps.
You can use the directive named CELERYD_OPTS to add optional command line arguments.
# Names of nodes to start
# most will only start one node:
CELERYD_NODES="w1 w2 w3"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=4 -Q import"
But as far as I know this option will tell all the workers will consume from only the import queue.
If you cannot find an acceptable answer, you may try to run workers separately.
It's worth noting that you can use node names with the CELERYD_OPTS arguments, for example
CELERYD_OPTS="--time-limit=300 --concurrency=4 --concurrency:w3=8 -Q:w1 import"
Can I have finer grain control over the number of celery workers running per task? I'm running pyramid applications and using pceleryd for async.
from ini file:
CELERY_IMPORTS = ('learning.workers.matrix_task',
'learning.workers.pipeline',
'learning.workers.classification_task',
'learning.workers.metric')
CELERYD_CONCURRENCY = 6
from learning.workers.matrix_task
from celery import Task
class BuildTrainingMatrixTask(Task):
....
class BuildTestMatrixTask(Task):
....
I want up to 6 BuildTestMatrixTask tasks running at a time. But I want only 1 BuiltTrainingMatrixTask running at a time. Is there a way to accomplish this?
You can send tasks to separate queues according to its type, i.e. BuildTrainingMatrixTask to first queue (let it be named as 'training_matrix') and BuildTestMatrixTask to second one (test_matrix). See Routing Tasks for details. Then you should start a worker for each queue with desirable concurrency:
$ celery worker --queues 'test_matrix' --concurrency=6
$ celery worker --queues 'training_matrix' --concurrency=1