Scheduling celery tasks with large ETA

Scheduling celery tasks with large ETA - python

I am currently experimenting with future tasks in celery using the ETA feature and a redis broker. One of the known issues with using a redis broker has to do with the visibility timeout:
If a task isn’t acknowledged within the Visibility Timeout the task will be redelivered to another worker and executed.
This causes problems with ETA/countdown/retry tasks where the time to execute exceeds the visibility timeout; in fact if that happens it will be executed again, and again in a loop.
Some tasks that I can envision will have an ETA on the timescale of weeks/months. Setting the visibility timeout large enough to encompass these tasks is probably unwise.
Are there any paths forward for processing these tasks with a redis broker? I am aware of this question. Is changing brokers the only option?

I am doing this with redis in the following way:
We have customers that can schedule a release of some of their content. We store the release in our database with the time it should be executed at.
Then we use celery beat to perform a periodic task (hourly or what suits you) that checks our releases table for releases that are scheduled within the next period (again hour or what suits you). if any are found we then schedule a task for them with celery. This allows us to have a short ETA.

Related

Preventing duplicity while scheduling tasks with celery beat

I have a task that I run periodically (each minute) via Celery Beat. On occasions, the task will take longer than a minute to finish it's execution, which results in the scheduler adding that task to the queue while the task is already running.
Is there a way I can avoid the scheduler adding tasks to the queue if those tasks are already running?
Edit: I have seen Celery Beat: Limit to single task instance at a time
Note that my question is different. I'm asking how to avoid my task being enqueued, while that question is asking how to avoid the task being ran multiple times.

I haven't had this particular problem but a similar one where I had to avoid tasks being applied when a task of the same kind was already running or queued but without Celery Beat. I went down a similar route, with a locking mechanism, as the answer you've linked here. Unfortunately it won't be that easy here as you want to avoid to queue already.
As far as I know Celery doesn't support anything like this out of the box. I guess your best bet is to write a custom scheduler which inherits from Scheduler and then overwrite the apply_entry method or the apply_async method. In there you'd need a locking mechanism to check if the task is already running, i.e. in the task set and release a lock and in apply_async check for that lock. You could use RedLock if you have a Redis running already.

Celery: why is there a multi-second time gap from when a task is accepted and when it starts to execute?

I have a celery task:
#app.task(bind=True, soft_time_limit=FreeSWITCHConstants.EXECUTE_ATTEMPTS_LOCAL_SOFT_TIME_LIMIT)
def execute_attempt_local(self, attempt_id, provider_id, **kwargs):
print "PERF - entering execute_attempt_local"
...
that is processed by a (remote) worker with the following config:
celery -A mycompany.web.taskapp worker n -Q execute_attempts-2 --autoscale=4,60
This task gets spawned thousands at a time and has historically completed in 1-3s (it's a mostly I/O bound task).
Recently as our app's overall usage has increased, this task's completion time has increased to 5-8s on average and I'm trying to understand what's taking up the extra time. I noticed that for many tasks taking 5-8 seconds, ~4s is taken in the time in between the task being accepted by the thread and executing the first line of the task:
[2019-09-24 13:15:16,627: DEBUG/MainProcess] Task accepted: mycompany.ivr.freeswitch.tasks.execute_attempt_local[d7585570-e0c9-4bbf-b3b1-63c8c5cd88cc] pid:7086
...
[2019-09-24 13:15:22,180: WARNING/ForkPoolWorker-60] PERF - entering execute_attempt_local
What is happening in that 4s? I'm assuming I have a Celery config issue and somewhere there is a lack of resources for these tasks to process quicker. Any ideas what could be slowing them down?

There are several possible reasons why is this happening. It may take some time for the autoscaler to kick-in. So depending on your load, you may not have enough worker-processes to run your tasks when they are sent, so they are waiting in the queue for some time (it can even be minutes or hours) until there are available worker-processes.
You can easily monitor this by looking how many tasks are waiting in the queue. If the queue is always empty that means your tasks are executed immediately. If not, that means you may want to add new workers to your cluster.

How to discard scheduled celery tasks if their execution in the queue gets delayed by a maximum time period ?

Scenario:
Tasks get queued in celery through the celerybeat schedule at specific time.
In certain rare scenarios, the celery tasks queues might be long, or tasks may take more than normal to finish.
Lets assume few task gets scheduled in the queue at 3am, and mostly finish by 4am by the workers.Let's assume 5 am to be a time limit.
In certain scenarios, the execution of fore-mentioned tasks reaches 6am.
How to discard all the tasks currently queued, if the time is past 5am now ? i.e. a task already queued sometime back shall not run if the current time has crossed a time_of_day condition.
An obvious method would be to use datetime module within the celery tasks code and return if they are past the time_of_day conditions.
Is there any inbuilt method/parameter in celery to control this behaviour ? What would be a better way to achieve this ?

pika connection times out during execution of long task (3+ minutes)

I have a process in which I need to assign long running tasks amongst a pool of workers, in python. So far I have been using RabbitMQ to queue the tasks (input is a nodejs frontend); a python worker subscribes to the queue, obtains a task and executes it. Each task takes several minutes minimum.
After an update this process started breaking, and I eventually discovered this was due to RabbitMQ version 3.6.10 having changed the way it handles timeouts. I now believe I need to rethink my method of assigning tasks, but I want to make sure I do it the right way.
Until now I only had one worker (the task is to control a sequence of actions in a VM - I couldn't afford a new Windows license for a while, so until recently I had no practical way of testing parallel task execution); I suspect if I'd had two before I would have noticed this sooner. The worker attaches to a VM using libvirt to control it. The way my code is written currently implies that I would run one instance of the script per VM that I wish to control.
I suspect that part of my problem is the use of BlockingConnection - I think I need a way for the worker to disconnect from the queue when it has received and validated a task (this part takes less than 1 sec), then reconnect once it has completed the actions, but I haven't figured out how to do this yet. Is this correct? If so, how should I do this, and if not, what should I do instead?
One other idea I've had is that instead of running a script per VM I could have a global control script that on receiving a task would spin off a thread which would handle the task. This would solve the problem of the connection timing out during task execution, but the timeout would just have moved to a different stage: I would potentially receive tasks while there were no idle VMs, and I would have to come up with a way to make the script await an available VM without breaking the RabbitMQ connection.
My current code can be seen here:
https://github.com/scherma/antfarm/blob/master/src/runmanager/runmanager.py#L342
Any thoughts folks?

Celery running tasks from synchronous queues

Assume that I have periodic task that gets not processed messages from customers, and then runs tasks for processing those messages per user.
I can make router for this processing task so that process("customer1_msgs") will go to queue "customer1_queue" but then celery worker(/s) will get those tasks concurrently if they have free slot for processing. I dont want to process msgs for the same customer at the same time, is there are way to make workers do only one task per queue simultaneusly?
example flow:
periodic_task runs process("customer1_msgs"), process("customer2_msgs").
celery is processing customer2 and customer1(it can take long time)
then after some time periodic task will run again and run process("customer2_msgs"), process("customer3_msgs")
assuming that first process("customer2_msgs")is still processing messages, celery worker will start processing those tasks, and poof i have conflict with still running process("customer2_msgs") from first fetch of periodic task
(periodic task runs fast, processing msgs takes long time).
TL;DR version:
how to implement synchronous queues in celery? (tasks inside queue runs one after another, but other queues can also process tasks asynchronously(one task per queue))
Using python 2.7, django 1.4, celery(soon latest stable version), RabbitMQ message broker(, have also redis if needed) running on linux debian 7 wheezy

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.