Add functions dynamically to existing celery worker processes?

Add functions dynamically to existing celery worker processes? - python

I'm getting started with celery and I want to know if it is possible to add modules to celeryd processes that have already been started. In other words, instead of adding modules via celeryconfig.py as in
CELERY_IMPORTS = ("tasks", "additional_module" )
before starting the workers, I want to make additional_module available later somehow after the worker processes have started.
thanks in advance.

You can achieve your goal by starting a new celeryd with an expanded import list and eventually gracefully shutting down your old worker (after it's finished its current jobs).
Because of the asynchronous nature of getting jobs pushed to you and only marking them done after celery has finished its work, you won't actually miss any work doing it this way. You should be able to run the celery workers on the same machine - they'll simply show up as new connections to RabbitMQ (or whatever queue backend you use).

Related

Celery - execute task on all nodes

I have a special use case where I need to run a task on all workers to check if a specific process is running on the celery worker. The problem is that I need to run this on all my workers as each worker represents a replica of this specific process.
In the end I want to display 8/20 workers are ready to process further tasks.
But currently I'm only able to process a task on either a random selected worker or just on one specific worker which does not solve my problem at all ...
Thanks in advance

I can't think of a good way to do this on Celery. However, a nice workaround perhaps could be to implement your own command, and then you can broadcast that command to every worker (just like you can broadcast shutdown or status commands for an example). When I think of it, this does indeed sound like some sort of monitoring/maintenance operation, right?

Start a Django Channels Worker within the same process

I have to set up a worker which handles some data after a certain event happens. I know I can start the worker with python manage.py runworker my_worker, but what I would need is to start the worker in the same process as the main Django app on a separate thread.
Why do I need it in a separate thread and not in a separate process? Because the worker would perform a pretty light-weight job which would not overload the server's resources, and, moreover, the effort of making the set up for a new process in the production is not worth the gain in performance. In other words, I would prefer to keep it in the Django's process if possible.
Why not perform the job synchronously? Because it is a separate logic that needs the possibility to be extended, and it is out of the main HTTP request-reply scope. It is a post-processing task which doesn't interfere with the main logic. I need to decouple this task from an infrastructural point-of-view, not only logical (e.g. with plain signals).
Is there any possibility provided by Django Channels to run a worker in such a way?
Would there be any downsides to start the worker manually on a separate thread?
Right now I have the setup for a message broker consumer thread (without using Channels), so I have the entry point for starting a new worker thread. But as I've seen from the Channel's runworker command, it loads the whole application, so it doesn't seem like a naïve worker.run() call is the proper way to do it (I might be wrong with this one).

I found an answer to my question.
The answer is no, you can't just start a worker within the same process. This is because the consumer needs to run inside an event loop thread and it is not good at all to have more than one event loop thread in the same process (Django WSGI application already runs the main thread with an event loop).
The best you can do is to start the worker in a separate process. As I mentioned in my question, I started a message broker consumer on a separate thread, which was not a good approach either, so I changed my configuration to start the consumers as separate processes.

Is there any way to make sure certain tasks are not executed in parallel?

I'm writing a Celery task that will run some tests on the pull requests created in BitBucket.
My problem is that if a pull request is updated before my task finishes it will trigger the task again and so I can end up having two tasks running tests on same pull request at the same time.
Is there any way I can prevent this? And make sure that if a task processing certain pull request is already in progress then I wait for that to finish and then start processing it again (from the new task that was queued)
As I monitor multiple repos each with multiple PRs I would like that if an event is coming but from different repo or different pull request to start it and run it.
I only need to queue it if I already have in progress same pull request from same repo.
Any idea if this is possible with celery?

Simplest way to achieve this is, setting worker concurrency to 1 so that only one task gets executed at a time.
Route the tasks to a seperate queue.
your_task.apply_async(foo, queue='bar')
Then start your worker with concurency of one
celery worker -Q bar -c 1
See also Celery - one task in one second

You are looking for a mutex. For Celery, there is celery_mutex and celery_once. In particular, celery_once claims to be doing what you ask, but I do not have experience with it.
You could also use the Python multiprocessing that has a global mutex implementation, or use a shared storage that you already have.
If the tasks run on the same machine, the operating system has locking mechanisms.

How to make some parts of code queue-specific in celery?

I have two files containing celery task definitions. Each of them contains code for a specific queue. One of them imports scikit-learn and therefore is a little memory-consuming for the limited memory the VPS has. When celery initializes it executes both files to look for tasks and each celery worker imports scikit-learn. Is there a way to prevent this?
I have tried using inspect to get the current active queue and continue if this worker consumes this queue, but I think it doesn't work when initializing:
i = inspect(['celery#hostname'])
print i.active_queues() # None

I think the best way to go is to start two workers, let them load 2 different apps and create 2 different queues.
example worker start cmd from top of my head:
celery -A scikit -Q learning worker
celery -A default -Q default worker
That of course requires you to add task routing (so that scikit tasks goes into the learning queue and the others go to the default queue).

I was able to solve it by emptying the CELERY_IMPORTS list and then including via cmd
celery -A proj worker -l info -Q first_queue -I proj.file1
which only looks for tasks in proj.file1.

Notify celery task to stop during worker shutdown

I'm using celery 3.X and RabbitMQ backend. From time to time it needs to restart celery (to push a new source code update to the server). But there is a task with big loop and try/catch inside of the loop; it can takes a few hours to accomplish the task. Nothing critical will happen if I will stop it and will restart it later.
QUESTION: The problem is every time after I stopped the workers (via sudo service celeryd stop) I have to KILL the task manually (via kill -9); the task ignores SIGTERM from worker. I've read throw Celery docs & Stackoverflow but I can't find working solution. Any ideas how to fix the problem?

Sending the QUIT signal will stop workers immediately: sudo service celeryd stop -QUIT
If the CELERY_ACKS_LATE setting is set to True, tasks that were running when the worker stopped will run again when the worker starts back up.

Celery is not intended to run long tasks cause it blocks the worker for your task only. I recommend re-arranging your logic, making the task invoke itself instead of making the loop. Once shutdown is in progress, your current task will complete and will resume right at the same point where it stopped before celery shutdown.
Also, having task split into chunks, you will be able to divert the task to another worker/host which is probably what you would like to do in the future.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.