How to purge tasks in celery queues using Redis as the broker

How to purge tasks in celery queues using Redis as the broker - python

Part1
I've read and tried various SO threads to purge the celery tasks using Redis, but none of them worked. Please let me know how to purge tasks in celery using Redis as the broker.
Part 2
Also, I've multiple queues. I can run it within the project directory, but when demonizing, the workers dont take task. I still need to start the celery workers manually. How can I demozize it?
Here is my celerd conf.
# Name of nodes to start, here we have a single node
CELERYD_NODES="w1 w2 w3 w4"
CELERY_BIN="/usr/local/bin/celery"
# Where to chdir at start.
CELERYD_CHDIR="/var/www/fractal/parser-quicklook/"
# Python interpreter from environment, if using virtualenv
#ENV_PYTHON="/somewhere/.virtualenvs/MyProject/bin/python"
# How to call "manage.py celeryd_multi"
#CELERYD_MULTI="/usr/local/bin/celeryd-multi"
# How to call "manage.py celeryctl"
#CELERYCTL="/usr/local/bin/celeryctl"
#CELERYBEAT="/usr/local/bin/celerybeat"
# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=300 --concurrency=8 -Q BBC,BGR,FASTCOMPANY,Firstpost,Guardian,IBNLIVE,LIVEMINT,Mashable,NDTV,Pandodaily,Reuters,TNW,TheHindu,ZEENEWS "
# Name of the celery config module, don't change this.
CELERY_CONFIG_MODULE="celeryconfig"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celery/%n.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Workers should run as an unprivileged user.
#CELERYD_USER="nobody"
#CELERYD_GROUP="nobody"
# Set any other env vars here too!
PROJET_ENV="PRODUCTION"
# Name of the projects settings module.
# in this case is just settings and not the full path because it will change the dir to
# the project folder first.
CELERY_CREATE_DIRS=1
Celeryconfig is already provided in part1.
Here is my proj directory structure.
project
|-- main.py
|-- project
| |-- celeryconfig.py
| |-- __init__.py
|-- tasks.py
How can I demonize with the Queues? I have provided the queues in CELERYD_OPTS as well.
Is there a way in which we can dynamically demonize the number of queues in the celery? For eg:- we have CELERY_CREATE_MISSING_QUEUES = True for creating the missing queues. Is there something similar to daemonize the celery queues?

celery purge should be enough to clean up the queue in redis. However, your worker will have its own reserved tasks and it will send them back to the queue when you stop the worker. So, first, stop all the workers. Then run celery purge.

If you have several queues, celery purge will purge the default one. You can specify which queue(s) you would like to purge as such:
celery purge -A proj -Q queue1,queue2

In response to part 1, a programmatic solution to purge your queue, further documentation can be found at the following link celery.app.control.purge docs.
from celery import Celery
app = Celery()
app.control.purge()
#OR
app.control.discard_all()

This revokes all the tasks it can without terminating any processes. (To do so add terminate=True to the revoke call at your own risk.)
It takes a second or two to run, so is not suitable for high throughput code.
from myapp.celery import app as celery_app
celery_app.control.purge()
i = celery_app.control.inspect()
# scheduled(): tasks with an ETA or countdown
# active(): tasks currently running - probably not revokable without terminate=True
# reserved(): enqueued tasks - usually revoked by purge() above
for queues in (i.active(), i.reserved(), i.scheduled()):
for task_list in queues.values():
for task in task_list:
task_id = task.get("request", {}).get("id", None) or task.get("id", None)
celery_app.control.revoke(task_id)
Just .purge() then revoking .scheduled() would probably have the same effect to be honest, I haven't experimented extensively. But purge alone will not revoke tasks sat in the queues with an ETA or countdown set.
Credit to #kahlo's answer, which was the basis for this.

Starting with Celery v5, you should now use:
celery -A proj purge -Q queue1,queue2

Related

django + celery: disable prefetch for one worker, Is there a bug?

I have a Django project with celery
Due to RAM limitations I can only run two worker processes.
I have a mix of 'slow' and 'fast' tasks.
Fast tasks shall be executed ASAP. There can be many fast tasks in a short time frame (0.1s - 3s), so ideally both CPUs should handle them.
Slow tasks might run for a few minutes but the result can be delayed.
Slow tasks occur less often, but it can happen that 2 or 3 are queued up at the same time.
My idea was to have one:
1 celery worker W1 with concurrency 1, that handles only fast tasks
1 celery worker W2 with concurrency 1 that can handle fast and slow tasks.
celery has by default a task prefetch multiplier ( https://docs.celeryproject.org/en/latest/userguide/configuration.html#worker-prefetch-multiplier ) of 4, which means that 4 fast tasks could be queued behind a slow task and could be delayed by several minutes. Thus I'd like to disable prefetch for worker W2. The doc states:
To disable prefetching, set worker_prefetch_multiplier to 1. Changing
that setting to 0 will allow the worker to keep consuming as many
messages as it wants.
However what I observe is, that with a prefetch_multiplier of 1 one task is prefetched and would still be delayed by a slow task.
Is this a documentation bug? Is this an implementation bug? Or do I misunderstand the documentation?
Is there any way to implement what I want?
The commands, that I execute to start the workers are:
celery -A miniclry worker --concurrency=1 -n w2 -Q=fast,slow --prefetch-multiplier 0
celery -A miniclry worker --concurrency=1 -n w1 -Q=fast
my celery settings are default except:
CELERY_BROKER_URL = "pyamqp://*****#localhost:5672/mini"
CELERY_TASK_ROUTES = {
'app1.tasks.task_fast': {"queue": "fast"},
'app1.tasks.task_slow': {"queue": "slow"},
}
my django project's celery.py file is:
from __future__ import absolute_import
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'miniclry.settings')
app = Celery("miniclry", backend="rpc", broker="pyamqp://")
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
The __init__.py of my django project is
from .celery import app as celery_app
__all__ = ('celery_app',)
The code of my workers
import time, logging
from celery import shared_task
from miniclry.celery import app as celery_app
logger = logging.getLogger(__name__)
#shared_task
def task_fast(delay=0.1):
logger.warning("fast in")
time.sleep(delay)
logger.warning("fast out")
#shared_task
def task_slow(delay=30):
logger.warning("slow in")
time.sleep(delay)
logger.warning("slow out")
If I execute following from a management shell I see, that one fast task is only executed after the slow task finished.
from app1.tasks import task_fast, task_slow
task_slow.delay()
for i in range(30):
task_fast.delay()
Can anybody help?
I could post the entire test project if this is considered helpful. Just advise about the recommended SO way of exchanging such kind of projects
Version info:
celery==4.3.0
Django==1.11.25
Python 2.7.12

I confirm the issue, there is a bug in this section of the documentation. worker_prefetch_multiplier = 1 will just as it says, set the worker's prefetch to 1, means worker will hold one more task in addition to one that is executing at the moment.
To actually disable the prefetch you also need to use task_acks_late = True along with the prefetch setting, see this docs section

Celery worker hangs without any error

I have a production setup for running celery workers for making a POST / GET request to remote service and storing result, It is handling load around 20k tasks per 15 min.
The problem is that the workers go numb for no reason, no errors, no warnings.
I have tried adding multiprocessing also, the same result.
In log I see the increase in the time of executing task, like succeeded in s
For more details look at https://github.com/celery/celery/issues/2621

If your celery worker get stuck sometimes, you can use strace & lsof to find out at which system call it get stuck.
For example:
$ strace -p 10268 -s 10000
Process 10268 attached - interrupt to quit
recvfrom(5,
10268 is the pid of celery worker, recvfrom(5 means the worker stops at receiving data from file descriptor.
Then you can use lsof to check out what is 5 in this worker process.
lsof -p 10268
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
......
celery 10268 root 5u IPv4 828871825 0t0 TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED)
......
It indicates that the worker get stuck at a tcp connection(you can see 5u in FD column).
Some python packages like requests is blocking to wait data from peer, this may cause celery worker hangs, if you are using requests, please make sure to set timeout argument.
Have you seen this page:
https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/

I also faced the issue, when I was using delay shared_task with
celery, kombu, amqp, billiard. After calling the API when I used
delay() for #shared_task, all functions well but when it goes to delay
it hangs up.
So, the issue was In main Application init.py, the below settings
were missing
This will make sure the app is always imported when # Django starts so that shared_task will use this app.
In init.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celeryApp
#__all__ = ('celeryApp',)
__all__ = ['celeryApp']
Note1: In place of celery_app put the Aplication name, means the Application mentioned in celery.py import the App and put here
Note2:** If facing only hangs issue in shared task above solution may solve your issue and ignore below matters.
Also wanna mention A=another issue, If anyone facing Error 111
connection issue then please check the versions of amqp==2.2.2,
billiard==3.5.0.3, celery==4.1.0, kombu==4.1.0 whether they are
supporting or not. Mentioned versions are just an example. And Also
check whether redis is install in your system(If any any using redis).
Also make sure you are using Kombu 4.1.0. In the latest version of
Kombu renames async to asynchronous.

Follow this tutorial
Celery Django Link
Add the following to the settings
NB Install redis for both transport and result
# TRANSPORT
CELERY_BROKER_TRANSPORT = 'redis'
CELERY_BROKER_HOST = 'localhost'
CELERY_BROKER_PORT = '6379'
CELERY_BROKER_VHOST = '0'
# RESULT
CELERY_RESULT_BACKEND = 'redis'
CELERY_REDIS_HOST = 'localhost'
CELERY_REDIS_PORT = '6379'
CELERY_REDIS_DB = '1'

Celery worker Queues

I currently using "Celeryd" to run my Celery workers as a daemon. My /etc/default/celeryd file contains the following:
CELERYD_NODES="w1 w2 w3"
Which obviously starts three worker processes.
How do I configure routing to work with this configuration? e.g.
celeryd -c 2 -l INFO -Q import
If I run celery from the command line I can specify the queue using the -Q flag. I need to tell my w1 worker process to only process tasks from the "import" queue.

You can make different workers consume from different/same queues by giving proper args in CELERYD_OPTS.
Refer this: http://celery.readthedocs.org/en/latest/reference/celery.bin.multi.html
The link is for celery multi documentation, but you can give the argument in same way to your case also.
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG
can be used as:
$ CELERYD_OPTS="--time-limit=300 --concurrency=8 -l INFO -Q:1-3 images,video -Q:4,5 data -Q default -L:4,5 DEBUG"
Do not create extra daemons unless required.
Hope this helps.

You can use the directive named CELERYD_OPTS to add optional command line arguments.
# Names of nodes to start
# most will only start one node:
CELERYD_NODES="w1 w2 w3"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=4 -Q import"
But as far as I know this option will tell all the workers will consume from only the import queue.
If you cannot find an acceptable answer, you may try to run workers separately.

It's worth noting that you can use node names with the CELERYD_OPTS arguments, for example
CELERYD_OPTS="--time-limit=300 --concurrency=4 --concurrency:w3=8 -Q:w1 import"

How can concurrency per task be controlled for pcelery?

Can I have finer grain control over the number of celery workers running per task? I'm running pyramid applications and using pceleryd for async.
from ini file:
CELERY_IMPORTS = ('learning.workers.matrix_task',
'learning.workers.pipeline',
'learning.workers.classification_task',
'learning.workers.metric')
CELERYD_CONCURRENCY = 6
from learning.workers.matrix_task
from celery import Task
class BuildTrainingMatrixTask(Task):
....
class BuildTestMatrixTask(Task):
....
I want up to 6 BuildTestMatrixTask tasks running at a time. But I want only 1 BuiltTrainingMatrixTask running at a time. Is there a way to accomplish this?

You can send tasks to separate queues according to its type, i.e. BuildTrainingMatrixTask to first queue (let it be named as 'training_matrix') and BuildTestMatrixTask to second one (test_matrix). See Routing Tasks for details. Then you should start a worker for each queue with desirable concurrency:
$ celery worker --queues 'test_matrix' --concurrency=6
$ celery worker --queues 'training_matrix' --concurrency=1

How to purge all tasks of a specific queue with celery in python?

How to purge all scheduled and running tasks of a specific que with celery in python? The questions seems pretty straigtforward, but to add I am not looking for the command line code
I have the following line, which defines the que and would like to purge that que to manage tasks:
CELERY_ROUTES = {"socialreport.tasks.twitter_save": {"queue": "twitter_save"}}
At 1 point in time I wanna purge all tasks in the que twitter_save with python code, maybe with a broadcast function? I couldn't find the documentation about this. Is this possible?

just to update #Sam Stoelinga answer for celery 3.1, now it can be done like this on a terminal:
celery amqp queue.purge <QUEUE_NAME>
For Django be sure to start it from the manage.py file:
./manage.py celery amqp queue.purge <QUEUE_NAME>
If not, be sure celery is able to point correctly to the broker by setting the --broker= flag.

The original answer does not work for Celery 3.1. Hassek's update is the correct command if you want to do it from the command line. But if you want to do it programmatically, do this:
Assuming you ran your Celery app as:
celery_app = Celery(...)
Then:
import celery.bin.amqp
amqp = celery.bin.amqp.amqp(app = celery_app)
amqp.run('queue.purge', 'name_of_your_queue')
This is handy for cases where you've enqueued a bunch of tasks, and one task encounters a fatal condition that you know will prevent the rest of the tasks from executing.
E.g. you enqueued a bunch of web crawler tasks, and in the middle of your tasks your server's IP address gets blocked. There's no point in executing the rest of the tasks. So in that case, your task it self can purge its own queue.

Lol it's quite easy, hope somebody can help me still though.
from celery.bin.camqadm import camqadm
camqadm('queue.purge', queue_name_as_string)
The only problem with this I still need to stop the celeryd before purging the que, after purging I need to run the celeryd again to handle tasks for the queue. Will update this question if i succeed.
I succeeded, but please correct me if this is not a good method to stop the celeryd, purge que and start it again. I know I am using term, because I actually want it to be terminated the task.
kill_command = "ps auxww | grep 'celeryd -n twitter_save' | awk '{print $2}' | xargs kill -9"
subprocess.call(kill_command, shell=True)
camqadm('queue.purge', 'twitter_save')
rerun_command = "/home/samos/Software/virt_env/twittersyncv1/bin/python %s/manage.py celeryd -n twitter_save -l info -Q twitter_save" % settings.PROJECT_ROOT
os.popen(rerun_command+' &')
send_task("socialreport.tasks.twitter_save")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.