Gunicorn + Flask-Restful : High CPU usage while starting

Gunicorn + Flask-Restful : High CPU usage while starting - python

I have a Flask-Restful App with a pretty standard server stack
WSGI Server : Gunicorn
Async worker class : Gevent / sync
When I start/restart my flask app via supervisorctl, the CPU load goes very high till the app is loaded. And it takes around 10-20 secs for the app to load.
I've tried running the app on the following instance configs
8 core, 15 GB RAM
2 core, 7.5 GB RAM
But on both of the instances, the behaviour is quite similar. The CPU usage rises drastically to ~35-40 %.
I'm not able to find the root cause of this load. Can there be any issue with Gunicorn combined with Flask-Restful?
Any help would be appreciated.

Related

Cloud Run with Gunicorn Best-Practise

I am currently working on a service that is supposed to provide an HTTP endpoint in Cloud Run and I don't have much experience. I am currently using flask + gunicorn and can also call the service. My main problem now is optimising for multiple simultaneous requests. Currently, the service in Cloud Run has 4GB of memory and 1 CPU allocated to it. When it is called once, the instance that is started directly consumes 3.7GB of memory and about 40-50% of the CPU (I use a neural network to embed my data). Currently, my settings are very basic:
memory: 4096M
CPU: 1
min-instances: 0
max-instances: 1
concurrency: 80
Workers: 1 (Gunicorn)
Threads: 1 (Gunicorn)
Timeout: 0 (Gunicorn, as recommended by Google)
If I up the number of workers to two, I would need to up the Memory to 8GB. If I do that my service should be able to work on two requests simultaneously with one instance, if this 1 CPU allocated, has more than one core. But what happens, if there is a thrid request? I would like to think, that Cloud Run will start a second instance. Does the new instance gets also 1 CPU and 8GB of memory and if not, what is the best practise for me?

One of the best practice is to let Cloud Run scale automatically instead of trying to optimize each instance. Using 1 worker is a good idea to limit the memory footprint and reduce the cold start.
I recommend to play with the threads, typically to put it to 8 or 16 to leverage the concurrency parameter.
If you put those value too low, Cloud Run internal load balancer will route the request to the instance, thinking it will be able to serve it, but if Gunicorn can't access new request, you will have issues.
Tune your service with the correct parameter of CPU and memory, but also the thread and the concurrency to find the correct ones. Hey is a useful tool to stress your service and observe what's happens when you scale.

The best practice so far is For environments with multiple CPU cores, increase the number of workers to be equal to the cores available. Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling. Adjust the number of workers and threads on a per-application basis. For example, try to use a number of workers equal to the cores available and make sure there is a performance improvement, then adjust the number of threads.i.e.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

gunicorn behind nginx high memory usage

I have an issue with gunicorn behind nginx controller.
I have a microservice written in python with aiohttp and I am using gunicorn. That microservice deployed in a kubernetes cluster. I decided to test my app by doing some stresstest, for this purpose I used locust. The problem is: when I am running my app in a docker container locally, it shows pretty good results, but when I am doing stress test in a kubernetes cluster I see high memory usage by pod where my app is running. I thought that it is a memory leak and checked docker stats while stresstesting my app locally and it was using 80-90 MiB of ram. But when I am doing stresstest within a cluster I see growing memory usage on the grafana dashboard. Memory usage reaches up to 1.2 Gb and when I stop the locust it is not stabilizing and just jumps from 600 Mb to 1.2 and I see the spikes on the graph.
The pod is given 1 cpu and unlimited memory for now.
This is my gunicorn config:
workers = 1
bind = f"{SERVICE_HOST}:{SERVICE_PORT}"
worker_class = "aiohttp.GunicornUVLoopWebWorker"
#worker_connections = 4096
#max_requests = 4096
#max_requests_jitter = 100
I have tried different configuration of gunicorn with 3 workers (2*nCPU + 1) and max_request with jitter to restart workers. But haven't got good results.
One thing I discovered - when I am doing high load (500 users simultaneously) locust shows client timeouts with 'Remote disconnected'. I have read in gunicorn docs that it is a good practice to put gunicorn behind nginx because nginx can buffer the responses. And when I am testing locally or within a cluster I do not have errors like that.
The main question I have not figured out yet is why the memory usage differs locally and within a cluster?
With 1 worker when testing locally docker stats shows 80-90 MiB, but grafana graph shows what I have already described...

First of all thanks to #moonkotte for trying to help!
Today I found out what the cause of this problem is.
So, the problem is related to gunicorn workers and prometheus_mutiproc_dir env variable where the path is set to save counters data. I don't actually know for now why this is happening, but I just deleted this env variable and everything worked fine, but prometheus :). I think this relates to this issue and this limitations. Will dig deeper to solve this.

Gunicorn worker terminated with signal 9

I am running a Flask application and hosting it on Kubernetes from a Docker container. Gunicorn is managing workers that reply to API requests.
The following warning message is a regular occurrence, and it seems like requests are being canceled for some reason. On Kubernetes, the pod is showing no odd behavior or restarts and stays within 80% of its memory and CPU limits.
[2021-03-31 16:30:31 +0200] [1] [WARNING] Worker with pid 26 was terminated due to signal 9
How can we find out why these workers are killed?

I encountered the same warning message.
[WARNING] Worker with pid 71 was terminated due to signal 9
I came across this faq, which says that "A common cause of SIGKILL is when OOM killer terminates a process due to low memory condition."
I used dmesg realized that indeed it was killed because it was running out of memory.
Out of memory: Killed process 776660 (gunicorn)

In our case application was taking around 5-7 minutes to load ML models and dictionaries into memory.
So adding timeout period of 600 seconds solved the problem for us.
gunicorn main:app \
--workers 1 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8443 \
--timeout 600

I encountered the same warning message when I limit the docker's memory, use like -m 3000m.
see docker-memory
and
gunicorn-Why are Workers Silently Killed?
The simple way to avoid this is set a high memory for docker or not set.

I was using AWS Beanstalk to deploy my flask application and I had a similar error.
In the log I saw:
web: MemoryError
[CRITICAL] WORKER TIMEOUT
[WARNING] Worker with pid XXXXX was terminated due to signal 9
I was using the t2.micro instance and when I changed it to t2.medium my app worked fine. In addition to this I changed to the timeout in my nginx config file.

In my case the problem was in long application startup caused by ml model warm-up (over 3s)

It may be that your liveness check in kubernetes is killing your workers.
If your liveness check is configured as an http request to an endpoint in your service, your main request may block the health check request, and the worker gets killed by your platform because the platform thinks that the worker is unresponsive.
That was my case. I have a gunicorn app with a single uvicorn worker, which only handles one request at a time. It worked fine locally but would have the worker sporadically killed when deployed to kubernetes. It would only happen during a call that takes about 25 seconds, and not every time.
It turned out that my liveness check was configuredto hit the /health route every 10 seconds, time out in 1 second, and retry 3 times. So this call would time out some times but not always.
If this is your case, a possible solution is to reconfigure your liveness check (or whatever health check mechanism your platform uses) so it can wait until your typical request finishes. Or allow for more threads - something that makes sure that the health check is not blocked for long enough to trigger worker kill.
You can see that adding more workers may help with (or hide) the problem.
Also, see this reply to a similar question: https://stackoverflow.com/a/73993486/2363627

Check memory usage
In my case, I can not use dmesg command. so I check memory usage as docker command:
sudo docker stats <container-id>
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
289e1ad7bd1d funny_sutherland 0.01% 169MiB / 1.908GiB 8.65% 151kB / 96kB 8.23MB / 21.5kB 5
In my case, terminating workers are not caused by memory.

I encountered the same problem too. and it was because docker memory usage was limited to 2GB. If you are using docker desktop you just need to go to resources and increase the memory docker dedicated portion (if not you need to find the docker command line to do that).
If that doesn't solve the problem, then it might be the timeout that kill the worker, you will need to add timeout arg to the gunicorn command:
CMD ["gunicorn","--workers", "3", "--timeout", "1000", "--bind", "0.0.0.0:8000", "wsgi:app"]

Django handle requests with theading

I have a Django project which make predictions on a VM with 2 CPU cores and 8 RAM. When my Django app starts it loads a large file (2.5GB, time to load:10 sec.) with information that the app needs. My app can handle a large amount of concurrent requests till I get error but it only use the 50% of my CPU (1 core), in order to use the 100% of my machines power I need to activate the second core through threading.
How can I set the my app so can handle users requests through different threads ?
Is there a recommended way to do that or an example ?

Celery/CloudAMQP error in a Heroku Flask App

I'm running a Flask app on Heroku (on the free tier) and running into some trouble when scheduling tasks using apply_async. If I schedule more than two tasks, I get a long stacktrace with the exception:
AccessRefused(403, u"ACCESS_REFUSED - access to exchange 'celeryresults' in vhost 'rthtwchf' refused for user 'rthtwchf'", (40, 10), 'Exchange.declare')
The odd thing is the first two tasks (before restarting all of my processes) always seem to complete with no issue.
A little bit of search engine sleuthing leads me to https://stackoverflow.com/questions/21071906/celery-cannot-connect-remote-worker-with-new-username which makes it looks like a permissions issue, but I'd assume that the Heroku CloudAMPQ service would have taken care of that already.
Any advice is appreciated!

I think your connections are exceeding 3 (free plan limit). Set the BROKER_POOL_LIMIT to 1 and it will work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.