I am using Django with Nginx and want to serve multiple requests in parallel.
We have Docker configuration and one pod has 10 cores. I am trying to create multiple workers in uWSGI like (uwsgi --socket /tmp/main.sock --module main.wsgi --enable-threads --master --processes=10 --threads=1 --chmod-socket=666)
Request first lands to view and from there it calls service file which does heavy work.
Actually, I am using openCV library in service file which has loop over all pixels to remove colored ones(pretty time consuming..)
I also tried using multiple cores and 1 worker as
(uwsgi --socket /tmp/main.sock --module main.wsgi --enable-threads --master --processes=1 --threads=10 --chmod-socket=666).
But still performance did not improve. I think it is due to GIL which is getting acquired while doing heavy I/O operations, not sure how I can find a work around it. Or use all cores in some other efficient way? TIA!
Related
I'm have deployed my Flask in AWS using gunicorn server.
This is my gunicorn configuration in Dockerfile,
CMD gunicorn api:app -w 1 --threads 2 -b 0.0.0.0:8000
It's clear that I'm having one master worker and that worker has 2 threads, the problem I was facing was that server getting stuck sometimes, meaning it was not processing any requests, when I redeployed the app, it started to process the requests once again.
I can increase the number of threads or increase the number of master workers to resolve this issue. But one question I have is how to get information about the threads running in Gunicorn, meaning which thread is processing which request.
Thanks in advance!
I want to know how to control the number of processes and threads in OpenVINO.
I executed following command referring to document.
docker run --name [my_server_name] --network [my_network] -d -u $(id -u):$(id -g) -p 9000:9000 -p 8000:8000 \
[my_repository_name] --model_path /models/model1 --model_name models --port 9000 --rest_port 8000 \
--plugin_config '{"CPU_THROUGHPUT_STREAMS": "2","CPU_BIND_THREAD": "NUMA","CPU_THREADS_NUM": "3"}' --shape "(1,3,704,576)"
Although I specified the '--plugin_config' no parameters are adopted and 1 process and 80 threads show up in the result of 'ps -efL' command.
Does anyone know the cause of this result?
Do not run Docker image in detached mode to make sure the server is running successfully.
If the server runs successfully, the parameters are adopted and shown in the serving info.
The plugin config parameters can define the number of threads used by the inference engine. Those are the threads dedicated to the inference execution. OpenVINO™ Model Server process will use more threads because some of them are related to gRPC and REST API servers.
They will use the number of threads dependent on the number of available CPU cores and for the number of REST server threads. They can be tuned with the parameter --rest_workers.
gRPC threads are not configurable now. They are set to the number of cpu cores, but it is possible to pass --grpc_channel_arguments to tune grpc server behavior.
I am running Django application (built on Django Rest Framework) on Digital Ocean server with following characteristics:
4gb RAM
2 CPUs
60 GB drive
I am using Gunicorn to run Django app and Celery to manage queue. Database is MySQL.
As I can see CPU usage is really low, but memory usage seems to be large.
After I deploy I noticed that python3 process uses even more memory (something around 75%). Whenever I deploy I am running after_deploy script, which contains following:
service nginx restart
service gunicorn restart
chmod +x /mnt/myapplication/current/myapplication/setup/restart.sh
source /mnt/env/bin/activate
cd /mnt/myapplication/current/
pip3 install -r requirements.txt
python3 manage.py migrate --noinput >> /mnt/migrations/migrations.log
rm -f celerybeat.pid
rm -f celeryd.pid
celery -A myapplication beat -l info -f /var/log/celery/celery.log --detach
celery -A myapplication worker -l info -f /var/log/celery/celery.log --detach
Are these numbers expected? And if not, how can I investigate what is going wrong?
Python processes tend to retain allocated memory, so if one of your python processes allocates a lot of memory for a given operation (a Django view, a celery task...) it will indeed keep it as long as it's running.
As long as memory usage stays mostly stable (I mean: grows to a certain amount after process startup then stays at this amount) and your server doesn't swap, there's usually nothing to worry about, as the processes will keep on reusing the already allocated memory.
Now if you find out the memory use keeps on growing ever and ever you possibly have some memory leak somewhere indeed.
Beware that running celery - or django FWIW - with settings.DEBUG will cause memory leaks - but you should never run your production processes with the `settings.DEBUG flag set anyway as this is also a security issue.
If that's not your case, then you can start searching here and elsewhere on the net for "debugging python memory leak". You may find a good starting point here:
It’s not so easy for a Python application to leak memory. Usually
there are three scenarios:
some low level C library is leaking
your Python code have global lists or dicts that grow over time, and you forgot to remove the objects after use
there are some reference cycles in your app
and here:
For celery in particular, you can roll the celery worker processes
regularly. This is exactly what the CELERYD_MAX_TASKS_PER_CHILD setting does.
This post is in continuation with my previous post - celery how to implement single queue with multiple workers executing in parallel?
I implemented celery to work with eventlet using this command :-
celery -A project worker -P eventlet -l info --concurrency=4
I can see that my tasks are getting moved to active list faster (In flower) but i am not sure if they are executing in parallel? I have a 4 core server for production but I am not utilizing all the cores at the same time.
My question is :-
how can I use all 4 cores to execute tasks in parallel?
Both eventlet/gevent worker types provide great solution for concurrency at the cost of stalling parallelism to 1. To have true parallel task execution and utilise cores, run several Celery instances on same machine.
I know this goes counter to what popular Linux distros have in mind, so just ignore system packages and roll your great configuration from scratch. Systemd service template is your friend.
Another option is to run Celery with prefork pool, you get parallelism at the cost of stalling concurrency to number of workers.
I'm creating a REST API for an application using Falcon. When launching two or more requests to the API on different endpoints, there's no multi-threaded execution (One request has to be finished to execute the next one)
The problem is coming from a POST endpoint that executes a complex machine learning process (takes dozen of seconds to finish) and the whole API is blocked when the process is being executed, because it waits for the process to be completed to return some results.
I'm using wsgiref simple_server to serve the requests:
if __name__ == '__main__':
httpd = simple_server.make_server('127.0.0.1', 8000, app)
httpd.serve_forever()
Is there any way to make the execution parallel to serve multiple requests in the same time.
Probably the server is not running in multiprocess or multithreaded mode.
But even if it was, it is not a good idea to occupy the web server for long-running tasks. The long running tasks should be run by some other worker processes.
Take a look at Celery
zaher ideally you should use Celery as giorgosp mention but if it is mandatory to return result for API request then you can use Gunicorn
gunicorn --workers 3 -b localhost:8000 main:app --reload
Here, in above code I have mention 3 workers so at a time you can serve/process 3 requests.
Ideally no of workers can be
cpu_count * 2 + 1
You can use any port number you like, but make sure that it is above 1024 and it's not used by any other program.
The main:app option tells Gunicorn to invoke the application object app available in the file main.py.
Gunicorn provides an optional --reload switch that tells Gunicorn to detect any code changes on the fly. This way you can change your code without having to restart Gunicorn.
And if this approach is not suitable for your need than I think you should use Tornado instead of Falcon.
Let me know if any further clarification needed.
This can be easily achieved by coupling Falcon with Gunicorn. With Gunicorn, achieving multi-threading/multi-processing will be relatively easier without needing to implement Celery (Although, nothing is stopping one from implementing it. Celery is awesome!)
gunicorn -b localhost:8000 main:app --threads 3 --workers 3 --reload
The above command will sping up 3 workers with each worker having 3 threads. You as a developer can tweak the number of workers and threads required. I would strongly advise to understand difference between multithreading and multiprocessing before tweaking these settings.