I want to make two things
I made a worker bootstep that loads configuration from remote.
I want to get a param from config and add a queue with set it at consumer queue name.
app.steps['worker'].add(LoadConfig)
works perfect
but i can't make a SetQueue boot step
simply my SetQueue now looks like:
class SetQueue(bootsteps.StartStopStep):
requires = (Consumer, )
def start(self, parent, **kwargs):
parent.add_task_queue('q_name', exchange='q_name', routing_key='q_name')
app.steps['consumer'].add(SetQueue)
it doesn't work.
I think my problem is that i doesn't understand - at what moment(requires=(???, )) does queues can be added.
In Celery there are different "blueprints", where each blueprint is composed of multiple bootsteps.
The first blueprint to start is the Worker, which e.g. starts the execution pool,
the last blueprint to start is the Consumer, which connects to the broker and starts the
task consumer, among other things.
So your requires is not pointing to a bootstep, it's pointing to a blueprint.
I see that your bootstep adds another queue to consume from, so it would make more sense
for it to depend on the Tasks bootstep, which is required to be running if you want to call
add_task_queue.
class SetQueue(bootsteps.StartStopStep):
requires = ('celery.worker.consumer:Tasks', )
You can see the startup components overview here:
http://docs.celeryproject.org/en/latest/userguide/extending.html#blueprints
Also, with custom bootsteps registered you can generate a new version of this graph
using:
$ celery -A proj graph bootsteps | dot -Tpng -o steps.png
Related
There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.
abmp.py:
from celery import Celery
app = Celery('abmp', backend='amqp://guest#localhost',broker='amqp://guest#localhost' )
#app.task(bind=True)
def add(self, a, b):
return a + b
execute_test.py
from abmp import add
add.apply_async(
args=(5,7),
queue='push_tasks',
exchange='push_tasks',
routing_key='push_tasks'
)
execute celery
celery -A abmp worker -E -Q push_tasks -l info
execute execute_test.py
python2.7 execute_test.py。
Finally to the rabbitmq background view and found that the implementation of execute_test.py each time to generate a new queue, rather than the task thrown into push_tasks queue.
You are using AMQP as result backend. Celery stores each task's result as new queue, named with the task's ID. Use a better suited backend (Redis, for example) to avoid spamming new queues.
When you are using AMQP as the result backend for Celery, default behavior is to store every task result (for 1 day as per the faqs in http://docs.celeryproject.org/en/latest/faq.html).
As per the documentation on current stable version (4.1), this is deprecated and should not be used.
Your options are,
Use result_expires setting, if you plan to go ahead with amqp as backend.
Use a different backend (like redis)
If you dont need the results at all, user ignore_result setting
I have a Django+Celery application, where Celery is used to push (and pull) Django model instances to a third party SOAP service.
My Django models have dependencies between them and also a simple hash like this:
class MyModel(Models):
def get_dependencies(self):
# ...
return [...]
def __hash__(self):
return hash(self.__class__.__name__+str(self.pk))
This hash came handy in my own implementation which I had to drop due to stability issues. Celery is a much sounder ground.
When I push an instance over to the SOAP service, I must make sure that its dependencies have been pushed. This is done by checking all related instances for a pushed_ok timestamp fields.
The difficult part is when an instance a which depends on list of instances deps (all are instances of MyModel subclasses) is being pushed. I cannot push a unless all instances in deps have been processed by Celery. In other words I need to serialize tasks so that the dependecies order is respected.
Celery is run like this:
celery -A server worker -P eventlet -c 100
How can I prevent one of the the eventlets (/process/thread) from running a before its dependencies, if any, have finished being run by other eventlets?
Thank you for your help.
I went for a pragmatic solution of moving all dependency checking of a resource (which includes pushing out-of-sync dependencies to the soap server) inside the celery task. Instead of trying to serialize tasks depending on the resources' dependencies.
The up side is that it keeps it simple and I could implement it rapidly.
The down side is that I'm locking a worker for a moment and potentially many synchronous SOAP operations instead of dispatching these operations accross workers.
Problem
Celery workers are hanging on task execution when using a package which accesses a ZEO server. However, if I were to access the server directly within tasks.py, there's no problem at all.
Background
I have a program that reads and writes to a ZODB file. Because I want multiple users to be able to access and modify this database concurrently, I have it managed by a ZEO server, which should make it safe across multiple processes and threads. I define the database within a module of my program:
from ZEO import ClientStorage
from ZODB.DB import DB
addr = 'localhost', 8090
storage = ClientStorage.ClientStorage(addr, wait=False)
db = DB(storage)
SSCCE
I'm obviously attempting more complex operations, but let's assume I only want the keys of a root object, or its children. I can produce the problem in this context.
I create dummy_package with the above code in a module, databases.py, and a bare-bones module meant to perform database access:
# main.py
def get_keys(dict_like):
return dict_like.keys()
If I don't try any database access with dummy_package, I can import the database and access root without issue:
# tasks.py
from dummy_package import databases
#task()
def simple_task():
connection = databases.db.open()
keys = connection.root().keys()
connection.close(); databases.db.close()
return keys # Works perfectly
However, trying to pass a connection or a child of root makes the task hang indefinitely.
#task()
def simple_task():
connection = databases.db.open()
root = connection.root()
ret = main.get_keys(root) # Hangs indefinitely
...
If it makes any difference, these Celery tasks are accessed by Django.
Question
So, first of all, what's going on here? Is there some sort of race condition caused by accessing the ZEO server in this way?
I could make all database access Celery's responsibility, but that will make for ugly code. Furthermore, it would ruin my program's ability to function as a standalone program. Is it not possible to interact with ZEO within a routine called by a Celery worker?
Do not save an open connection or its root object as a global.
You need a connection per-thread; just because ZEO makes it possible for multiple threads to access, it sounds like you are using something that is not thread-local (e.g. module-level global in databases.py).
Save the db as a global, but call db.open() during each task. See http://zodb.readthedocs.org/en/latest/api.html#connection-pool
I don't completely understand what's going on, but I'm thinking the deadlock has something to do with the fact that Celery uses multiprocessing by default for concurrency. Switching over to using Eventlet for tasks that need to access the ZEO server solved my problem.
My process
Start up a worker that uses Eventlet, and one that uses standard multiproccesing.
celery is the name of the default queue (for historical reasons), so have the Eventlet worker handle this queue:
$ celery worker --concurrency=500 --pool=eventlet --loglevel=debug \
-Q celery --hostname eventlet_worker
$ celery worker --loglevel=debug \
-Q multiprocessing_queue --hostname multiprocessing_worker
Route tasks which need standard multiprocessing to the appropriate queue. All others will be routed to the celery queue (Eventlet-managed) by default. (If using Django, this goes in settings.py):
CELERY_ROUTES = {'project.tasks.ex_task': {'queue': 'multiprocessing_queue'}}
I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication