I have a network of Celery servers and workers that are to be used for a high volume I/O task coming up. There are two queues, default and backlog, and each server has five workers. All servers are daemonized with a configuration much like the init script config documentation.
What I'd like to do for one server is have three workers for default and two for backlog. Is it possible to do this with a daemon configuration?
Have a look here in this part whare it shows you an example configuration it also says:
# Names of nodes to start
# most people will only start one node:
CELERYD_NODES="worker1"
# but you can also start multiple and configure settings
# for each in CELERYD_OPTS (see `celery multi --help` for examples):
So as you can see is possible to have celeryd starting multiple nodes and you can configure each of them via CELERYD_OPTS, therefore you can set different queue for each of them.
Here you found another more complete example of celery multi, I post here a little extract.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data
-Q default -L:4,5 DEBUG
Related
My Flask project takes in orders as POST requests from multiple online stores, saves those orders to a database, and forwards the purchase information to a service which delivers the product. Sometimes, the product is not set up in the final service and the request sits in my service's database in an "unresolved" state.
When the product is set up in the final service, I want to kick off a long-running (maybe a minute) process to send all "unresolved" orders to the final service. During this process, will Flask still be able to receive orders from the stores and continue processing as normal? If not, do I need to offload this to a task runner like rq?
I'm not worried about speed as much as I am about consistency. The items being purchased are tickets to a live event so as long as the order information is passed along before the event begins, it should make no difference to the customer.
There's a few different answers that are all valid in different situations. The quick answer is that a job queue like RQ is usually the right solution, especially in the long run as your project grows.
As long as the WSGI server has workers available, another request can be handled. Each worker handles one request at a time. The development server uses threads, so an unlimited number of workers are available (with the performance constraints of threads in Python). Production servers like Gunicorn can use multiple workers, and different types of workers such as threads, processes, or eventlets. If you want to run a task in response to an HTTP request and wait until the task is finished to send a response, you'll need enough workers to block on those tasks along with handling regular requests.
#app.route("/admin/send-purchases")
def send_purchases():
... # do stuff, wait for it to finish
return "success"
However, the task you're describing seems like a cleanup task that should be run regardless of HTTP requests from a user. In that case, you should write a Flask CLI command and call it using cron or another scheduling system.
#app.cli.command()
def send_purchases():
...
click.echo("done")
# crontab hourly job
0 * * * * env FLASK_APP=myapp /path/to/venv/bin/flask send-purchases
If you do want a user to initiate the task, but don't want to block a worker waiting for it to finish, then you want a task queue such as RQ or Celery. You could make a CLI command that submits the job too, to be able to trigger it on request and on a schedule.
#rq.job
def send_purchases():
...
#app.route("/admin/send-purchases", endpoint="send_purchases")
def send_purchases_view():
send_purchases.queue()
return "started"
#app.cli.command("send-purchases")
def send_purchases_command():
send_purchases.queue()
click.echo("started")
Flask's development server will spawn a new thread for each request. Similary, production servers can be started with multiple workers.
You can run your app with gunicorn or similar with multiple processes. For example with four process workers:
gunicorn -w 4 app:app
For example with eventlet workers:
gunicorn -k eventlet app:app
See the docs on deploying in production as well: https://flask.palletsprojects.com/en/1.1.x/deploying/
We have recently forced to replace celery with RQ as it is simpler and celery was giving us too many problems. Now, we are not able to find a way to create multiple queues dynamically because we need to get multiple jobs done concurrently. So basically every request to one of our routes should start a job and it doesn't make sense to have multiple users wait for one user's job to be done before we can proceed with next jobs. We periodically send a request to the server in order to get the status of the job and some meta data. This way we can update the user with a progress bar (It could be a lengthy process so this has to be done for the sake of UX)
We are using Django and Python's rq library. We are not using django-rq (Please let me know if there are advantages in using this)
So far we start a task in one of our controllers like:
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = django_rq.enqueue(render_task, new_render.pk, domain=domain, data=csv_data, timeout=1200)
Then in our render_task method we add meta data to the job based on the state of the long task:
current_job = get_current_job()
current_job.meta['state'] = 'PROGRESS'
current_job.meta['process_percent'] = process_percent
current_job.meta['message'] = 'YOUTUBE'
current_job.save()
Now we have another endpoint that gets the current task and its meta data and passes it back to client (This happens through oeriodic AJAX request)
How do we go about running jobs concurrently without blocking other jobs? Should we make queues dynamically? Is there a way to make use of Workers in order to achieve this?
As far as I know RQ does not have any facility to manage multiple workers. You have to start a new worker process defining which queue it will consume. One way of doing this which works pretty well for me is using Supervisor. In supervisor you configure your worker for a given queue and number of processes to have concurrency. For example you can have queue "high-priority" with 5 workers and queue "low-priority" with 1 worker.
It is not only possible but ideal to run multiple workers. I use a bash file for the start command to enter the virtual env, and launch with a custom Worker class.
Here's a supervisor config that has worked very well for me for RQ workers, under a production workload as well. Note that startretries is high since this runs on AWS and needs retries during deployments.
[program:rq-workers]
process_name=%(program_name)s_%(process_num)02d
command=/usr/local/bin/start_rq_worker.sh
autostart=true
autorestart=true
user=root
numprocs=5
startretries=50
stopsignal=INT
killasgroup=true
stopasgroup=true
stdout_logfile=/opt/elasticbeanstalk/tasks/taillogs.d/super_logs.conf
redirect_stderr=true
Contents of start_rq_worker.sh
#!/bin/bash
date > /tmp/date
source /opt/python/run/venv/bin/activate
source /opt/python/current/env
/opt/python/run/venv/bin/python /opt/python/current/app/manage.py
rqworker --worker-class rq.SimpleWorker default
I would like to suggest a very simple solution using django-rq:
Sample settings.py
...
RQ_QUEUES = {
'default': {
'HOST': os.getenv('REDIS_HOST', 'localhost'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
},
'low': {
'HOST': os.getenv('REDIS_HOST', 'localhost'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
}
}
...
Run Configuration
Run python manage.py rqworker default low as many times (each time in its own shell, or as its own Docker container, for instance) as the number of desired workers. The order of queues in the command determines their priority. At this point, all workers are listening to both queues.
In the Code
When calling a job to run, pass in the desired queue:
For high/normal priority jobs, you can make the call without any parameters, and the job will enter the default queue. For low priority, you must specify, either at the job level:
#job('low')
def my_low_priority_job():
# some code
And then call my_low_priority_job.delay().
Alternatively, determine priority when calling:
queue = django_rq.get_queue('low')
queue.enqueue(my_variable_priority_job)
I currently using "Celeryd" to run my Celery workers as a daemon. My /etc/default/celeryd file contains the following:
CELERYD_NODES="w1 w2 w3"
Which obviously starts three worker processes.
How do I configure routing to work with this configuration? e.g.
celeryd -c 2 -l INFO -Q import
If I run celery from the command line I can specify the queue using the -Q flag. I need to tell my w1 worker process to only process tasks from the "import" queue.
You can make different workers consume from different/same queues by giving proper args in CELERYD_OPTS.
Refer this: http://celery.readthedocs.org/en/latest/reference/celery.bin.multi.html
The link is for celery multi documentation, but you can give the argument in same way to your case also.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG
can be used as:
$ CELERYD_OPTS="--time-limit=300 --concurrency=8 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG"
Do not create extra daemons unless required.
Hope this helps.
You can use the directive named CELERYD_OPTS to add optional command line arguments.
# Names of nodes to start
# most will only start one node:
CELERYD_NODES="w1 w2 w3"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=4 -Q import"
But as far as I know this option will tell all the workers will consume from only the import queue.
If you cannot find an acceptable answer, you may try to run workers separately.
It's worth noting that you can use node names with the CELERYD_OPTS arguments, for example
CELERYD_OPTS="--time-limit=300 --concurrency=4 --concurrency:w3=8 -Q:w1 import"
Can I have finer grain control over the number of celery workers running per task? I'm running pyramid applications and using pceleryd for async.
from ini file:
CELERY_IMPORTS = ('learning.workers.matrix_task',
'learning.workers.pipeline',
'learning.workers.classification_task',
'learning.workers.metric')
CELERYD_CONCURRENCY = 6
from learning.workers.matrix_task
from celery import Task
class BuildTrainingMatrixTask(Task):
....
class BuildTestMatrixTask(Task):
....
I want up to 6 BuildTestMatrixTask tasks running at a time. But I want only 1 BuiltTrainingMatrixTask running at a time. Is there a way to accomplish this?
You can send tasks to separate queues according to its type, i.e. BuildTrainingMatrixTask to first queue (let it be named as 'training_matrix') and BuildTestMatrixTask to second one (test_matrix). See Routing Tasks for details. Then you should start a worker for each queue with desirable concurrency:
$ celery worker --queues 'test_matrix' --concurrency=6
$ celery worker --queues 'training_matrix' --concurrency=1
In fact I have few django applications with celery tasks. I need each task to be executed within particular channel so that I can control the load. For example I may have 3 servers listen to channel_for_app_1 and two to channel_for_app_2. My question is how can I run celery daemon and specify the channel? Any other ways to do that?
Please review this page: http://docs.celeryproject.org/en/latest/userguide/routing.html.
Your celery just should start with -Q, --queue settings, which define which queue will be used for fetching tasks
In your django settings, you can specify CELERY_QUEUES:
CELERY_QUEUES = {
"worker": {
"exchange": "worker",
"binding_key": "worker"
},
like so. Each key is a queue name, and you can change the exchange and binding key if you want to get fancy (multiple exchanges, etc), but I've never needed to.
When you define a task, you can
#task(queue="worker", etc)
The last step is to specify the queue names when you run celery - either via your celery daemon configuration, or on the command line when you run it. The result of all of this is that celery tasks will go to the queues specified by the task definition, and only on the boxes running the specified queue.
So I'm not sure if you mean something explicitly different when you say "channels", but I've always used multiple per-task queues to do exactly what you're describing.