So basically, I want to manage the workers based on the queue size/time of the day for one particular queue. I want to run less number of workers for a queue in peak hours and increase them when the load comes down.
You can check out the documentation for celery.worker.autoscale. This allows you to set the minimum and maximum concurrency, and Celery will handle the pool of workers. Not 100% sure how it works to be honest but seems like a good starting point.
http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.autoscale.html
Related
I implemented a script in which every day I process several urls and make many I/O operations, and I am subclassing threading.Thread and starting a number of threads (say 32).
The workload varies day by day but as soon as the processing starts I am sure that no more tasks will be added to the input queue.
Also, my script is not supporting any front-end (at least for now).
I feel though that this solution will not be so easily scalable in the case of multiple processes / machines and would like to give Celery (or any distributed task queue) a shot, but I always read that it’s better suited for long-running tasks running in the background to avoid blocking a UI.
On the other hand, I have also read that having many small tasks is not a problem with Celery.
What’s your thought on this? Would be easy to scale Celery workers possibly across processes / machines?
I was wondering whether it is possible to set different prefetch multiplier for queues.
I have 2 queues, one has really short running tasks, other slightly longer. Queue for shorter tasks needs to be prioritized over other one.
To ensure that prioritization work reliable, this has to be set in celery config:
task_acks_late = True
worker_prefetch_multiplier = 1
However, that really hurts performance for fast task queue. Would it be possible to configure so that if worker is fetching from fast task queue, worker_prefetch_multiplier is 4 and if worker is fetching from slow task queuem worker_prefetch_multiplier is 1 ?
I am not sure if it is possible to define different prefetch limits per queue since the Celery documentation seems to set these limits per worker.
However, we are solving this issue by starting a different worker for each queue. You can define different prefetch limits per worker - if one worker only uses one queue you can thus also define different prefetch limits as well as worker concurrencies per queue. This also has the added benefit that your long-running tasks would not block worker processing time for the short-running tasks.
If you by any chance are thinking about using celery-batches to speed up processing for the short-running tasks even further, the queue separation into different workers becomes even more important since you want to then have quite high prefetch limits defined for that worker (note: you will eventually be running out of memory if your prefetch limit is 0 and you have a very full queue).
In our case, we are running our workers in a contianerized environment. This enables us to even define the resource allocation (memory / cpu) independent for each worker / queue.
Suppose all my tasks on a celery queue are hitting a 3rd party API. However, the API has a rate limit, which I am keeping track of (there is a day limit and hourly limit which I need to respect). As soon as I hit the rate limit, I want to pause consumption of new tasks, and then resume when I know I am good.
I achieved this by using the following two tasks:
#celery.task()
def cancel_api_queue(minutes_to_resume):
resume_api_queue.apply_async(countdown=minutes_to_resume*60, queue='celery')
celery.control.cancel_consumer('third_party', reply=True)
#celery.task(default_retry_delay=300, max_retries=5)
def resume_api_queue():
celery.control.add_consumer('third_party', destination=['y#local'])
Then I can keep submitting my 3rd party API tasks, and as soon as my consumer is added back, all my tasks get consumed. Great.
However, since I have no consumer on this queue, this seems to mean I cannot see the jobs that are being submitted in Flower any more (until my consumer gets added).
Is there something I am doing wrong? Can I achieve this 'pause' another way to allow me to continue to see submitted jobs in flower?
p.s. maybe this is related to this issue, but not 100% sure: https://github.com/celery/celery/issues/1452
I am using amqp broker if that makes a difference.
thanks girls and boys.
I'd suspect that peeking into contents of the queue messages before a worker picks them up is not really part of Flower's intended design. Therefore, if you stop consuming tasks from a queue, the best Flower can do is show you how many of them have been enqueued as a single number on the "Broker" pane.
One hackish way to observe the internals of the incoming tasks could be to add an intermediate dummy "forwarding" task, which simply forwards the message from one queue (let us call it query_inbox) to another (say, query_processing).
E.g. something like:
#celery.task(queue='query_inbox')
def query(params):
process_query.delay(params)
#celery.task(queue='query_processing')
def process_query(params):
... do rate-limited stuff ...
Now you may stop consuming tasks from query_processing, but you will still be able to observe their parameters as they flow through the query_inbox worker.
I have the basic understanding of celery and that how it works. In my current project, I have run into a need to prioritise tasks. I mean if there are two kinds of tasks, say A and B in the celery queue, celery should prioritise task B irrespective of which task is on the head of the queue. Is there a way to do that?
Queue prioritisation is also fine with me. Meaning that I can make 2 different queues, say high_priority_queue and low_priority_queue, and celery should always execute the tasks in high_priority_queue first and then go towards low_priority_queue.
I also know the fact that we can assign different workers to the two queues, but that would mean that tasks in both the queues are being executed concurrently. I need the tasks in the high_priority_queue to be executed first. Any ideas?
Thanks
Usually the mutliple worker approach with multiple queues is recommended but as you pointed out, the low priority queue/worker will be working concurrently next to the high priority worker. This is an interesting setup if you have a lot of small tasks that you want to have executed rather soon, thus you put them in the high priority queue while the longer tasks get pushed to the low priority queue. You could also have a setup where you simply give more resources (or a better machine) to the high priority worker.
Since you would like a different solution I am going to suggest the priority parameter for apply_async. You do need a bit of setup for that as pointed out in a different question I answered recently and it only works for certain brokers. (For RabbitMQ it works since version 3.5.0.) After having set the x-max-priority on your queue and the additional settings as pointed out in the referenced answer you can simply put a priority on a task like this:
your_task.apply_async(queue="your_queue_that_can_handle_priority", priority=10)
Problem
We run several calculations on geographical data from user input (called a "system"). Sometimes one system needs 10 locations to do calculations for, sometimes 1000+. One location takes approximately 1 second to calculate, hopefully we can speed this up in the future. We currently do this by using a multiprocessing Pool (from billiard) from within a Celery worker. This works in that it utilises all cores 100%, but there are two problems:
There are lingering connections (pipes, probably to the child procs) that cause the worker to hang when reaching the max open file limit (investigated, but haven't found a solution after more than a day of work)
We can't spread the calculations over multiple machines.
To solve these problems, I would could run each calculation as a separate Celery task. However, we also want to schedule these calculations "fairly" for our users, so that:
Users working on small systems (say <50 locations) don't have to wait until a large system (>1000 locations) is finished. The larger the system, the less the increased waiting time matters to the user (they are doing something else anyway, and can get a notification). So this would be something akin to Weighted fair queueing
.
I have not been able to find a distributed task runner that implements this possibility of prioritisation. Did I miss one? I looked at Celery, RQ, Huey, MRQ, Pulsar Queue and some more, as well as into data processing pipelines like Luigi and Pinball, but none seem to easily enable this.
Most of these suggest creating priority by adding more workers for higher priority queues. However, that wouldn't work as the workers would start fighting for CPU time. (RQ does it differently by emptying the complete first passed in queue, before moving on to the next).
Proposed architecture
What I imagine would work is running a multiprocessing program, with a process per CPU, that fetches, in a WFQ fashion, from multiple Redis lists, each being a certain queue.
Would this be the right approach? Of course there is quite some work to be done on making the queue configuration be dynamic (for example also storing it in Redis, and reloading it upon each couple of processed tasks), and getting event monitoring to be able to get insight.
Additional thoughts:
Each task needs around 3MB of data, coming from Postgres, which is the same for each location in the system (or at least per a couple of 100 locations). With the current approach, this resides in the shared memory, and each process can access it quickly. I'll probably have to setup a local Redis instance on each machine to cache this data to, so not every process is going to fetch it over and over again.
I keep hitting up on ZeroMQ, and it has a lot of enticing possibilities, but besides maybe the monitoring, it doesn't seem to be a good fit. Or am I wrong?
What would make more sense: running each worker as a separate program, and managing it with something like supervisor, or starting a single program, that forks a child for each CPU (no CPU count config necessary), and maybe also monitors its children for stuck processes?
We already run both RabbitMQ and Redis, so I could also use RMQ for the queues. It seems to me the only thing gained by using RMQ is the possibility of not losing tasks on worker crash by using acknowledgements, at the cost of using a more difficult library/complicated protocol.
Any other advice?