I have a Celery cluster made up of machines with 8-core processors. Each machine has one worker that is set to a concurrency factor of 8 (-c8).
I often see nodes with a lot of reserved tasks, but only one or two are running simultaneously. My tasks are often long-running with a lot of compute and I/O.
Any ideas as to why this is happening, and what I can do to increase the number of tasks simultaneously running? Does celery throttle the number of active tasks based on system load? I looked through the documentation but came up short.
Thanks to banana, I think I found the answer.
Some of my tasks were spawning subprocesses, which Celery counts in its concurrency.
Related
I implemented a script in which every day I process several urls and make many I/O operations, and I am subclassing threading.Thread and starting a number of threads (say 32).
The workload varies day by day but as soon as the processing starts I am sure that no more tasks will be added to the input queue.
Also, my script is not supporting any front-end (at least for now).
I feel though that this solution will not be so easily scalable in the case of multiple processes / machines and would like to give Celery (or any distributed task queue) a shot, but I always read that it’s better suited for long-running tasks running in the background to avoid blocking a UI.
On the other hand, I have also read that having many small tasks is not a problem with Celery.
What’s your thought on this? Would be easy to scale Celery workers possibly across processes / machines?
I have a few Celery workers that perform tasks that are not always that fast. The tasks are usually a bunch of HTTP requests and DB queries (using pyscopg2 behind SQLAlchemy). I'm running in Kubernetes and the CPU usage is always fairly low (0.01 or so). Celery automatically set the concurrency to 2 (number of cores of a single node), but I was wondering whether it would make sense to manually increase this number.
I always read that the concurrency (processes?) should be the same as the number of cores, but if the worker does not use a whole core, couldn't it be more? Like concurrency=10 ? Or that would make no difference and I'm just missing the point of processes and concurrency?
I couldn't find information on that. Thanks.
Everything is true. Celery automatically sets the number of cores as concurrency, as it assumes that you will the entire core (CPU intensive task).
Sounds like you can increase the concurrency, as your tasks are doing more I/O bound tasks (and the CPU is idle).
To be on the safe side, I would do it gradually and increase to 5 first, monitor, ensure that CPU are fine and then to 10..
I am researching on Celery as background worker for my flask application. The application is hosted on a shared linux server (I am not very sure what this means) on Linode platform. The description says that the server has 1 CPU and 2GB RAM. I read that a Celery worker starts worker processes under it and their number is equal to number of cores on the machine - which is 1 in my case.
I would have situations where I have users asking for multiple background jobs to be run. They would all be placed in a redis/rabbitmq queue (not decided yet). So if I start Celery with concurrency greater than 1 (say --concurrency 4), then would it be of any use? Or will the other workers be useless in this case as I have a single CPU?
The tasks would mostly be about reading information to and from google sheets and application database. These interactions can get heavy at times taking about 5-15 minutes. Based on this, will the answer to the above question change as there might be times when cpu is not being utilized?
Any help on this will be great as I don't want one job to keep on waiting for the previous one to finish before it can start or will the only solution be to pay money for a better machine?
Thanks
This is a common scenario, so do not worry. If your tasks are not CPU heavy, you can always overutilise like you plan to do. If all they do is I/O, then you can pick even a higher number than 4 and it will all work just fine.
I'm looking at Celery to perform a defined set of tasks spread over multiple machines. Each machine can process any one of several tasks, but some of the tasks will require more machine resources than others. Is there a way to manage these resources using Celery?
Celery doesn't provide a means of measuring current/past resource utilization of workers and adjusting the amount of work they perform based on those measurements. However, you do have a few knobs to turn with Celery that can result in more predictable and more evenly distributed resource utilization (YMMV).
If you have tasks that have no performance requirement, you might consider limiting the number of tasks that can be performed over a given period of time with rate limiting.
Another option is to use celery queues to your advantage. Depending on your needs, you might create a queue for light tasks and one for heavy tasks and then have workers with more horsepower listen to the heavy queue and those with less listen to the light queue (or more workers listening on heavy, less on light).
I have a python (2.6.5 64-bit, Windows 2008 Server R2) app that launches worker processes. The parent process puts jobs in a job queue, from which workers pick them up. Similarly it has a results queue. Each worker performs its job by querying a server. CPU usage by the workers is low.
When the number of workers grows, CPU usage on the servers actually shrinks. The servers themselves are not the bottleneck, as I can load them up further from other applications.
Anyone else seen similar behavior? Is there an issue with python multiprocessing queues when a large number of processes are reading or writing to the same queues?
Two different ideas for performance constraints:
The bottleneck is the workers fighting each other and the parent for access to the job queue.
The bottleneck is connection rate-limits (syn-flood protection) on the servers.
Gathering more information:
Profile the amount of work done: tasks completed per second, use this as your core performance metric.
Use packet capture to view the network activity for network-level delays.
Have your workers document how long they wait for access to the job queue.
Possible improvements:
Have your workers use persistent connections if available/applicable (e.g. HTTP).
Split the tasks into multiple job queues fed to pools of workers.
Not exactly sure what is going on unless you provide all the details.
However, remember that the real concurrency is bounded by the actual number of hardware threads. If the number of processes launched is much larger than the actual number of hardware threads, at some point the context-switching overhead will be more than the benefit of having more concurrent processes.
Creating of new thead is very expensive operation.
One of the simplest ways for controling a lot of paralell network connections is to use stackless threads with support of asyncronical sockets. Python had great support and a bunch of libraries for that.
My favorite one is gevent, which has a great and comletely transparent monkey-patching utility.