celery and long running tasks

celery and long running tasks - python

I just watch a youtube video where the presenter mentioned that one should design his/her celery to be short. Tasks running several minutes are bad.
Is this correct? What I do see is that I have some long running task, which takes say 10 minutes to finish. When these kind of task is scheduled frequently, the queue is swamped and no other tasks get scheduled. Is this the reason?
If so, what should be used for long running tasks?

Long running tasks aren't great but It's by no means appropriate to say they are bad. The best way to handle long running tasks is to create a queue for just those tasks and have them run on a separate worker then the short tasks.

The problem with long running tasks is that you have to wait for them when you're pushing a new software version on your server. If you don't wait, your task may run possibly incompatible code, especially if you pickled some complex object as a parameter (which is strongly discouraged).

As #user2097159 said its a good practice to keep the long running tasks in a dedicate queue. You should do that by routing using "settings.CELERY_ROUTES" more info here
If you could estimate how long a task can be running, I recommend to use soft_time_limit per task, you will be able to handle it.
There is a gist from a talk I gave here

Augment the basic Task definition to optionally treat the task instantiation as a generator, and check for TERM or soft timeout on every iteration through the generator. Generically inject a "state" dict kwarg into tasks that support it. If it's the first time the task is run, allocate a new one in results cache, otherwise look up the existing one from results cache.
In your task, figure out a good place to yield which results in short execution times. Update the state parameter as necessary.
When control returns to the master task class, check for TERM or soft timeout, and if there is one, save off the state object and respond to the signal.

Related

How to check if celery task is already running before running it again with beat?

I have a periodic task scheduled to run every 10 minutes. Sometimes this task completes in 2-3 minutes, sometimes it takes 20 minutes.
Is there any way using celery beats to not open the task if the previous task hasn't completed yet? I don't see an option for it in the interval settings.

No, Celery Beat knows nothing about the running tasks.
One way to achieve what you are trying to do is to link the task to itself. async_apply() for an example has optional parameter link and link_error which can be used to provide a signature (it can be a single task too) to run if the task finishes successfully (link) or unsuccessfully (link_error).
What I use is the following - I schedule task to run frequently (say every 5 minutes), and I use a distributed lock to make sure I always have only one instance of the task running.
Finally a reminder - you can always implement your own scheduler, and use it in your beat configuration. I was thinking about doing this in the past for exactly the same thing you want, but decided that the solution I already have is good enough for me.

You can try this
It provides you with a singleton base class for your tasks.

I use Celery with Django models and I implemented a boolean has_task_running at the model level. Then with Celery signals I change the state of the flag to True when signal before_task_publish is trigged and False when a task terminates. Not simple but flexible.

What is the best way to dispatch many tasks to concurrent worker threads in Python?

There is a large number of field devices (100,000, each having individual IP) from which I have to collect data.
I want to do it in a python based scheduler combined with an readily available executable written in C/C++, which handles the communication and readout of the devices. The idea is to communicate with up to ~100 devices in parallel. So the first 100 devices could be read out using subprocess call to the executable. I don't want to wait for all 100 tasks being completed, because some might take longer while other being faster. Instead I want to put the next process on its journey immediately after one task has been finished, and so on. So, conducted by a simple "dispatcher", there is a continuous starting of tasks over time.
Question: Which Python API is the best I can use for this purpose?
I considered to use concurrent.futures API, starting a ThreadPoolExecutor and submit task by task, each starting the executable in a separate thread. ProcessPoolExecutor wouldn't be an advantage, because the executable is started as a process anyway...
But I think, that this is not intended to be used in such way, because each submitted job will be remembered an therefore "kind of stored" in the executor forever; when a job is finished it ends up in status "finished" and is still visible, so I would mess up my executor with finished tasks. So I guess, the Executor API is more usable, when there is a given fixed number of tasks to be worked up like in
https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example
and not for permanently submitting tasks.
The other idea would be, to start 100 worker threads in parallel, each working in an endless-loop and reading its next task to be executed from a Queue object. In this case I can dispatch on my own to which Worker a new task is sent next. I know that this would work, because I implemented it already. But I have the feeling, that it must be a more elegant solution in Python to perform dispatching of tasks.

pika connection times out during execution of long task (3+ minutes)

I have a process in which I need to assign long running tasks amongst a pool of workers, in python. So far I have been using RabbitMQ to queue the tasks (input is a nodejs frontend); a python worker subscribes to the queue, obtains a task and executes it. Each task takes several minutes minimum.
After an update this process started breaking, and I eventually discovered this was due to RabbitMQ version 3.6.10 having changed the way it handles timeouts. I now believe I need to rethink my method of assigning tasks, but I want to make sure I do it the right way.
Until now I only had one worker (the task is to control a sequence of actions in a VM - I couldn't afford a new Windows license for a while, so until recently I had no practical way of testing parallel task execution); I suspect if I'd had two before I would have noticed this sooner. The worker attaches to a VM using libvirt to control it. The way my code is written currently implies that I would run one instance of the script per VM that I wish to control.
I suspect that part of my problem is the use of BlockingConnection - I think I need a way for the worker to disconnect from the queue when it has received and validated a task (this part takes less than 1 sec), then reconnect once it has completed the actions, but I haven't figured out how to do this yet. Is this correct? If so, how should I do this, and if not, what should I do instead?
One other idea I've had is that instead of running a script per VM I could have a global control script that on receiving a task would spin off a thread which would handle the task. This would solve the problem of the connection timing out during task execution, but the timeout would just have moved to a different stage: I would potentially receive tasks while there were no idle VMs, and I would have to come up with a way to make the script await an available VM without breaking the RabbitMQ connection.
My current code can be seen here:
https://github.com/scherma/antfarm/blob/master/src/runmanager/runmanager.py#L342
Any thoughts folks?

How to do weighted fair task queues for CPU intensive tasks (in Python)?

Problem
We run several calculations on geographical data from user input (called a "system"). Sometimes one system needs 10 locations to do calculations for, sometimes 1000+. One location takes approximately 1 second to calculate, hopefully we can speed this up in the future. We currently do this by using a multiprocessing Pool (from billiard) from within a Celery worker. This works in that it utilises all cores 100%, but there are two problems:
There are lingering connections (pipes, probably to the child procs) that cause the worker to hang when reaching the max open file limit (investigated, but haven't found a solution after more than a day of work)
We can't spread the calculations over multiple machines.
To solve these problems, I would could run each calculation as a separate Celery task. However, we also want to schedule these calculations "fairly" for our users, so that:
Users working on small systems (say <50 locations) don't have to wait until a large system (>1000 locations) is finished. The larger the system, the less the increased waiting time matters to the user (they are doing something else anyway, and can get a notification). So this would be something akin to Weighted fair queueing
.
I have not been able to find a distributed task runner that implements this possibility of prioritisation. Did I miss one? I looked at Celery, RQ, Huey, MRQ, Pulsar Queue and some more, as well as into data processing pipelines like Luigi and Pinball, but none seem to easily enable this.
Most of these suggest creating priority by adding more workers for higher priority queues. However, that wouldn't work as the workers would start fighting for CPU time. (RQ does it differently by emptying the complete first passed in queue, before moving on to the next).
Proposed architecture
What I imagine would work is running a multiprocessing program, with a process per CPU, that fetches, in a WFQ fashion, from multiple Redis lists, each being a certain queue.
Would this be the right approach? Of course there is quite some work to be done on making the queue configuration be dynamic (for example also storing it in Redis, and reloading it upon each couple of processed tasks), and getting event monitoring to be able to get insight.
Additional thoughts:
Each task needs around 3MB of data, coming from Postgres, which is the same for each location in the system (or at least per a couple of 100 locations). With the current approach, this resides in the shared memory, and each process can access it quickly. I'll probably have to setup a local Redis instance on each machine to cache this data to, so not every process is going to fetch it over and over again.
I keep hitting up on ZeroMQ, and it has a lot of enticing possibilities, but besides maybe the monitoring, it doesn't seem to be a good fit. Or am I wrong?
What would make more sense: running each worker as a separate program, and managing it with something like supervisor, or starting a single program, that forks a child for each CPU (no CPU count config necessary), and maybe also monitors its children for stuck processes?
We already run both RabbitMQ and Redis, so I could also use RMQ for the queues. It seems to me the only thing gained by using RMQ is the possibility of not losing tasks on worker crash by using acknowledgements, at the cost of using a more difficult library/complicated protocol.
Any other advice?

Celery design help: how to prevent concurrently executing tasks

I'm fairly new to Celery/AMQP and am trying to come up with a task/queue/worker design to meet the following requirements.
I have multiple types of "per-user" tasks: e.g., TaskA, TaskB, TaskC. Each of these "per-user" tasks read/write data for one particular user in the system. So at any given time, I might need to create tasks User1_TaskA, User1_TaskB, User1_TaskC, User2_TaskA, User2_TaskB, etc. I need to ensure that, for each user, no two tasks of any task type execute concurrently. I want a system in which no worker can execute User1_TaskA at the same time as any other worker is executing User1_TaskB or User1_TaskC, but while User1_TaskA is executing, other workers shouldn't be blocked from concurrently executing User2_TaskA, User3_TaskA, etc.
I realize this could be implemented using some sort of external locking mechanism (e.g., in the DB), but I'm hoping there's a more elegant task/queue/worker design that would work.
I suppose one possible solution is to implement queues as user buckets such that, when the workers are launched there's config that specifies how many buckets to create, and each "bucket worker" is bound to exactly one bucket. Then an "intermediate worker" would pull off tasks from the main task queue and assign them into the bucketed queues via, say, a hash/mod scheme. So UserA's tasks would always end up in the same queue, and multiple tasks for UserA would back up behind each other. I don't love this approach, as it would require the number of buckets to be defined ahead of time, and would seem to prevent (easily) adding workers dynamically. Seems to me there's got to be a better way -- suggestions would be greatly appreciated.

What's so bad in using an external locking mechanism? It's simple, straightforward, and efficient enough. You can find an example of distributed task locking in Celery here. Extend it by creating a lock per user, and you're done!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.