I have a special use case where I need to run a task on all workers to check if a specific process is running on the celery worker. The problem is that I need to run this on all my workers as each worker represents a replica of this specific process.
In the end I want to display 8/20 workers are ready to process further tasks.
But currently I'm only able to process a task on either a random selected worker or just on one specific worker which does not solve my problem at all ...
Thanks in advance
I can't think of a good way to do this on Celery. However, a nice workaround perhaps could be to implement your own command, and then you can broadcast that command to every worker (just like you can broadcast shutdown or status commands for an example). When I think of it, this does indeed sound like some sort of monitoring/maintenance operation, right?
Related
I have a task that I run periodically (each minute) via Celery Beat. On occasions, the task will take longer than a minute to finish it's execution, which results in the scheduler adding that task to the queue while the task is already running.
Is there a way I can avoid the scheduler adding tasks to the queue if those tasks are already running?
Edit: I have seen Celery Beat: Limit to single task instance at a time
Note that my question is different. I'm asking how to avoid my task being enqueued, while that question is asking how to avoid the task being ran multiple times.
I haven't had this particular problem but a similar one where I had to avoid tasks being applied when a task of the same kind was already running or queued but without Celery Beat. I went down a similar route, with a locking mechanism, as the answer you've linked here. Unfortunately it won't be that easy here as you want to avoid to queue already.
As far as I know Celery doesn't support anything like this out of the box. I guess your best bet is to write a custom scheduler which inherits from Scheduler and then overwrite the apply_entry method or the apply_async method. In there you'd need a locking mechanism to check if the task is already running, i.e. in the task set and release a lock and in apply_async check for that lock. You could use RedLock if you have a Redis running already.
As Celery documentation states, already executing task will not be aborted by calling .revoke(), unless terminate=True is set. But that is not recommended, because it will kill the worker itself, which might have already started another task. Does that mean that there is no reliable, stable way to do that?
EDIT: celery.contrib.abortable doesn't suit me, because, as documentation states, it works only with database backends.
A running task is a running subprocess of the worker (when using prefork), this means that the only way to abort a task is to kill the subprocess that is running it.
You may try to experiment your own implementation of revoke event handling trying to figure out the subprocess ID and kill only that one, but honestly don't know if is worth and if it can really work.
I think that short answer is you can't.
Anyway killing the worker is needed sometimes, especially in initial phases of projects where you still need to dimension correctly the resources, just make sure you log somewhere the running tasks so you can reschedule them or just use CELERY_ACKS_LATE
You can send HUP signal instead of TERM which gracefully restarts child process without killing worker.
In [80]: import signal
In [81]: x = add.delay(1, 2)
In [82]: x.revoke(terminate=True, signal=signal.SIGHUP)
I'm writing a Celery task that will run some tests on the pull requests created in BitBucket.
My problem is that if a pull request is updated before my task finishes it will trigger the task again and so I can end up having two tasks running tests on same pull request at the same time.
Is there any way I can prevent this? And make sure that if a task processing certain pull request is already in progress then I wait for that to finish and then start processing it again (from the new task that was queued)
As I monitor multiple repos each with multiple PRs I would like that if an event is coming but from different repo or different pull request to start it and run it.
I only need to queue it if I already have in progress same pull request from same repo.
Any idea if this is possible with celery?
Simplest way to achieve this is, setting worker concurrency to 1 so that only one task gets executed at a time.
Route the tasks to a seperate queue.
your_task.apply_async(foo, queue='bar')
Then start your worker with concurency of one
celery worker -Q bar -c 1
See also Celery - one task in one second
You are looking for a mutex. For Celery, there is celery_mutex and celery_once. In particular, celery_once claims to be doing what you ask, but I do not have experience with it.
You could also use the Python multiprocessing that has a global mutex implementation, or use a shared storage that you already have.
If the tasks run on the same machine, the operating system has locking mechanisms.
Is there any Celery functionality or preferred way of executing periodic background tasks locally when using a single worker? Sort of like a background thread, but scheduled and handled by Celery?
celery.beat doesn't seem suitable as it appears to be simply tied to a consumer (so could run on any server) - that's the type of scheduling I was after, but just a task that is always run locally on each server running this worker (the task does some cleanup and stats relating to the main task the worker handles).
I may be going about this the wrong way, but I'm confined to implementing this within a celery worker daemon.
You could use a custom remote control command and use the broadcast function on a cron to run cleanup or whatever else might be required.
One possible method I thought of, though not ideal, is to patch the celery.worker.heartbeat Heart() class.
Since we already use heartbeats, the class allows for a simple modification to its start() method (add another self.timer.call_repeatedly() entry), or an additional self.eventer.on_enabled.add() __init__ entry which references a new method that also uses self.timer.call_repeatedly() to perform a periodic task.
I'm getting started with celery and I want to know if it is possible to add modules to celeryd processes that have already been started. In other words, instead of adding modules via celeryconfig.py as in
CELERY_IMPORTS = ("tasks", "additional_module" )
before starting the workers, I want to make additional_module available later somehow after the worker processes have started.
thanks in advance.
You can achieve your goal by starting a new celeryd with an expanded import list and eventually gracefully shutting down your old worker (after it's finished its current jobs).
Because of the asynchronous nature of getting jobs pushed to you and only marking them done after celery has finished its work, you won't actually miss any work doing it this way. You should be able to run the celery workers on the same machine - they'll simply show up as new connections to RabbitMQ (or whatever queue backend you use).