Celery worker Queues - python

I currently using "Celeryd" to run my Celery workers as a daemon. My /etc/default/celeryd file contains the following:
CELERYD_NODES="w1 w2 w3"
Which obviously starts three worker processes.
How do I configure routing to work with this configuration? e.g.
celeryd -c 2 -l INFO -Q import
If I run celery from the command line I can specify the queue using the -Q flag. I need to tell my w1 worker process to only process tasks from the "import" queue.

You can make different workers consume from different/same queues by giving proper args in CELERYD_OPTS.
Refer this: http://celery.readthedocs.org/en/latest/reference/celery.bin.multi.html
The link is for celery multi documentation, but you can give the argument in same way to your case also.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG
can be used as:
$ CELERYD_OPTS="--time-limit=300 --concurrency=8 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG"
Do not create extra daemons unless required.
Hope this helps.

You can use the directive named CELERYD_OPTS to add optional command line arguments.
# Names of nodes to start
# most will only start one node:
CELERYD_NODES="w1 w2 w3"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=4 -Q import"
But as far as I know this option will tell all the workers will consume from only the import queue.
If you cannot find an acceptable answer, you may try to run workers separately.

It's worth noting that you can use node names with the CELERYD_OPTS arguments, for example
CELERYD_OPTS="--time-limit=300 --concurrency=4 --concurrency:w3=8 -Q:w1 import"

Related

Multiple queue for Celery daemon

I have a network of Celery servers and workers that are to be used for a high volume I/O task coming up. There are two queues, default and backlog, and each server has five workers. All servers are daemonized with a configuration much like the init script config documentation.
What I'd like to do for one server is have three workers for default and two for backlog. Is it possible to do this with a daemon configuration?
Have a look here in this part whare it shows you an example configuration it also says:
# Names of nodes to start
# most people will only start one node:
CELERYD_NODES="worker1"
# but you can also start multiple and configure settings
# for each in CELERYD_OPTS (see `celery multi --help` for examples):
So as you can see is possible to have celeryd starting multiple nodes and you can configure each of them via CELERYD_OPTS, therefore you can set different queue for each of them.
Here you found another more complete example of celery multi, I post here a little extract.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data
-Q default -L:4,5 DEBUG

How to purge tasks in celery queues using Redis as the broker

Part1
I've read and tried various SO threads to purge the celery tasks using Redis, but none of them worked. Please let me know how to purge tasks in celery using Redis as the broker.
Part 2
Also, I've multiple queues. I can run it within the project directory, but when demonizing, the workers dont take task. I still need to start the celery workers manually. How can I demozize it?
Here is my celerd conf.
# Name of nodes to start, here we have a single node
CELERYD_NODES="w1 w2 w3 w4"
CELERY_BIN="/usr/local/bin/celery"
# Where to chdir at start.
CELERYD_CHDIR="/var/www/fractal/parser-quicklook/"
# Python interpreter from environment, if using virtualenv
#ENV_PYTHON="/somewhere/.virtualenvs/MyProject/bin/python"
# How to call "manage.py celeryd_multi"
#CELERYD_MULTI="/usr/local/bin/celeryd-multi"
# How to call "manage.py celeryctl"
#CELERYCTL="/usr/local/bin/celeryctl"
#CELERYBEAT="/usr/local/bin/celerybeat"
# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=300 --concurrency=8 -Q BBC,BGR,FASTCOMPANY,Firstpost,Guardian,IBNLIVE,LIVEMINT,Mashable,NDTV,Pandodaily,Reuters,TNW,TheHindu,ZEENEWS "
# Name of the celery config module, don't change this.
CELERY_CONFIG_MODULE="celeryconfig"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celery/%n.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Workers should run as an unprivileged user.
#CELERYD_USER="nobody"
#CELERYD_GROUP="nobody"
# Set any other env vars here too!
PROJET_ENV="PRODUCTION"
# Name of the projects settings module.
# in this case is just settings and not the full path because it will change the dir to
# the project folder first.
CELERY_CREATE_DIRS=1
Celeryconfig is already provided in part1.
Here is my proj directory structure.
project
|-- main.py
|-- project
| |-- celeryconfig.py
| |-- __init__.py
|-- tasks.py
How can I demonize with the Queues? I have provided the queues in CELERYD_OPTS as well.
Is there a way in which we can dynamically demonize the number of queues in the celery? For eg:- we have CELERY_CREATE_MISSING_QUEUES = True for creating the missing queues. Is there something similar to daemonize the celery queues?
celery purge should be enough to clean up the queue in redis. However, your worker will have its own reserved tasks and it will send them back to the queue when you stop the worker. So, first, stop all the workers. Then run celery purge.
If you have several queues, celery purge will purge the default one. You can specify which queue(s) you would like to purge as such:
celery purge -A proj -Q queue1,queue2
In response to part 1, a programmatic solution to purge your queue, further documentation can be found at the following link celery.app.control.purge docs.
from celery import Celery
app = Celery()
app.control.purge()
#OR
app.control.discard_all()
This revokes all the tasks it can without terminating any processes. (To do so add terminate=True to the revoke call at your own risk.)
It takes a second or two to run, so is not suitable for high throughput code.
from myapp.celery import app as celery_app
celery_app.control.purge()
i = celery_app.control.inspect()
# scheduled(): tasks with an ETA or countdown
# active(): tasks currently running - probably not revokable without terminate=True
# reserved(): enqueued tasks - usually revoked by purge() above
for queues in (i.active(), i.reserved(), i.scheduled()):
for task_list in queues.values():
for task in task_list:
task_id = task.get("request", {}).get("id", None) or task.get("id", None)
celery_app.control.revoke(task_id)
Just .purge() then revoking .scheduled() would probably have the same effect to be honest, I haven't experimented extensively. But purge alone will not revoke tasks sat in the queues with an ETA or countdown set.
Credit to #kahlo's answer, which was the basis for this.
Starting with Celery v5, you should now use:
celery -A proj purge -Q queue1,queue2

How can concurrency per task be controlled for pcelery?

Can I have finer grain control over the number of celery workers running per task? I'm running pyramid applications and using pceleryd for async.
from ini file:
CELERY_IMPORTS = ('learning.workers.matrix_task',
'learning.workers.pipeline',
'learning.workers.classification_task',
'learning.workers.metric')
CELERYD_CONCURRENCY = 6
from learning.workers.matrix_task
from celery import Task
class BuildTrainingMatrixTask(Task):
....
class BuildTestMatrixTask(Task):
....
I want up to 6 BuildTestMatrixTask tasks running at a time. But I want only 1 BuiltTrainingMatrixTask running at a time. Is there a way to accomplish this?
You can send tasks to separate queues according to its type, i.e. BuildTrainingMatrixTask to first queue (let it be named as 'training_matrix') and BuildTestMatrixTask to second one (test_matrix). See Routing Tasks for details. Then you should start a worker for each queue with desirable concurrency:
$ celery worker --queues 'test_matrix' --concurrency=6
$ celery worker --queues 'training_matrix' --concurrency=1

How to purge all tasks of a specific queue with celery in python?

How to purge all scheduled and running tasks of a specific que with celery in python? The questions seems pretty straigtforward, but to add I am not looking for the command line code
I have the following line, which defines the que and would like to purge that que to manage tasks:
CELERY_ROUTES = {"socialreport.tasks.twitter_save": {"queue": "twitter_save"}}
At 1 point in time I wanna purge all tasks in the que twitter_save with python code, maybe with a broadcast function? I couldn't find the documentation about this. Is this possible?
just to update #Sam Stoelinga answer for celery 3.1, now it can be done like this on a terminal:
celery amqp queue.purge <QUEUE_NAME>
For Django be sure to start it from the manage.py file:
./manage.py celery amqp queue.purge <QUEUE_NAME>
If not, be sure celery is able to point correctly to the broker by setting the --broker= flag.
The original answer does not work for Celery 3.1. Hassek's update is the correct command if you want to do it from the command line. But if you want to do it programmatically, do this:
Assuming you ran your Celery app as:
celery_app = Celery(...)
Then:
import celery.bin.amqp
amqp = celery.bin.amqp.amqp(app = celery_app)
amqp.run('queue.purge', 'name_of_your_queue')
This is handy for cases where you've enqueued a bunch of tasks, and one task encounters a fatal condition that you know will prevent the rest of the tasks from executing.
E.g. you enqueued a bunch of web crawler tasks, and in the middle of your tasks your server's IP address gets blocked. There's no point in executing the rest of the tasks. So in that case, your task it self can purge its own queue.
Lol it's quite easy, hope somebody can help me still though.
from celery.bin.camqadm import camqadm
camqadm('queue.purge', queue_name_as_string)
The only problem with this I still need to stop the celeryd before purging the que, after purging I need to run the celeryd again to handle tasks for the queue. Will update this question if i succeed.
I succeeded, but please correct me if this is not a good method to stop the celeryd, purge que and start it again. I know I am using term, because I actually want it to be terminated the task.
kill_command = "ps auxww | grep 'celeryd -n twitter_save' | awk '{print $2}' | xargs kill -9"
subprocess.call(kill_command, shell=True)
camqadm('queue.purge', 'twitter_save')
rerun_command = "/home/samos/Software/virt_env/twittersyncv1/bin/python %s/manage.py celeryd -n twitter_save -l info -Q twitter_save" % settings.PROJECT_ROOT
os.popen(rerun_command+' &')
send_task("socialreport.tasks.twitter_save")

Is it possible to empty a job queue on a Gearman server

Is it possible to empty a job queue on a Gearman server? I am using the python driver for Gearman, and the documentation does not have any information about emptying queues. I would imagine that this functionality should exist, possibly, with a direct connection to the Gearman server.
I came across this method:
/usr/bin/gearman -t 1000 -n -w -f function_name > /dev/null
which basically dumps all the jobs into /dev/null.
The telnetable administrative protocol (search for "Administrative Protocol") doesn't have a command to empty a queue either, there is only a shutdown command.
If you wish to avoid downtime, you could write a generic "job consumer" worker and use that to empty the queues. I've set one up as a script which takes a list of job names, and just sits there accepting jobs and consuming them.
Something like:
# generic_consumer.py job1 job2 job3
You can use the administrative protocol's status command to get a list of the function names and counts on the queue. The administrative protocol docs tell you the format of the response.
# (echo status ; sleep 0.1) | netcat 127.0.0.1 4730
As far as i have been able to tell from the docs and using gearman with PHP, the only way to clear the job queue is to restart to the gearmand job server. If you are using persistent job queues, you will also need to empty whatever you are using as the persistent storage, if this is DB storage, you will need to empty the appropriate tables of all the rows.
stop gearmand --> empty table rows --> start gearmand
Hope this is clear enough.

Categories

Resources