If all the fastapi endpoints are defined as async def, then there will only be 1 thread that is running right? (assuming a single uvicorn worker).
Just wanted to confirm in such a setup, we will never hit the python's Global Interpreter Lock. If the same was to be done in a flask framework with multiple threads for the single gunicorn worker, then we would be facing the GIL which hinders the true parallelism between threads.
So basically, in the above fastapi, the parallelism is limited to 1 since there is only one thread. And to make use of all the cores, we would need to increase the number of workers either using gunicorn or uvicorn.
Is my understanding correct?
Your understanding is correct. When using 1 worker with uvicorn, only one process is run. That means, there is only one thread that can take a lock on the interpreter that is running your application. Due to the asynchronous nature of your FastAPI app, it will be able to handle multiple simultaneous requests, but not in parallel.
If you want multiple instances of your application run in parallel, you can increase your workers. This will spin up multiple processes (all single threaded as above) and Uvicorn will distribute the requests among them.
Note that you cannot have shared global variables across workers. These are separate instances of your FastAPI app and do not communicate with each other. See this answer for more info on that and how to use databases or caches to work around that.
Related
I am working on a django application which uses celery for the distributed async processes. Now I have been tasked with integrating a process which was originally written with concurrent.futures in the code. So my question is, can this job with the concurrent futures processing work inside the celery task queue. Would it cause any problems ? If so what would be the best way to go forward. The concurrent process which was written earlier is resource intensive as it is able to avoid the GIL. Also, its very fast due to it. Not only that the process uses concurrent.futures.ProcessPoolExecutor and inside it another few (<5) concurrent.futures.ThreadPoolExecutor jobs.
So now the real question is should we extract all the core functions of the process and re-write them by breaking them as celery app tasks or just keep the original code and run it as one big piece of code within the celery queue.
As per the design of the system, a user of the system can submit several such celery tasks which would contain the concurrent futures code.
Any help will be appreciated.
Your library should work without modification. There's no harm in having threaded code running within Celery, unless you are mixing in gevent with non-gevent compatible code for example.
Reasons to break the code up would be for resource management (reduce memory/CPU overhead). With threading, the thing you want to monitor is CPU load. Once your concurrency causes enough load (e.g. threads doing CPU intensive work), the OS will start swapping between threads, and your processing gets slower, not faster.
What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.
As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.
How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.
What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.
As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.
How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.
In the documentation I see the following:
There is only one limiting factor regarding scaling in Flask which are
the context local proxies. They depend on context which in Flask is
defined as being either a thread, process or greenlet. If your server
uses some kind of concurrency that is not based on threads or
greenlets, Flask will no longer be able to support these global
proxies. However the majority of servers are using either threads,
greenlets or separate processes to achieve concurrency which are all
methods well supported by the underlying Werkzeug library.
My question: What other concurrent mechanisms are there other than these 3 methods?
One pretty interesting concurrency mechanism is the asynchronous model. You have a single process with a single thread running the whole show, with all the I/O or otherwise lengthy tasks being asynchronous and callback based. This method scales really well for I/O bound services, servers in this category easily handle the C10K problem.
See Tornado or node.js for examples.
I am running django on twisted in a wsgi container. Obviously I am avoiding all the async stuff with deferreds inside my django code because according to the documentation, twisted async abilities are not allowed inside WSGI apps.
However, I would like to use twisted.words inside my WSGI app to send requests to a jabber server. Does this count as async stuff or can I use it inside my app? What could happen if I sent twisted.words jabber requests to an xmpp server inside a WSGI anyway?
Moreover, I have a more general question. Is there any reason twisted's WSGI container is multithreaded (is it multithreaded?) since it is well known python's GIL only reduces the overall performance of a script with threads.
Thanks for any replies.
To call a function in the main event loop (I/O thread) in Twisted from another thread (non-I/O thread i.e., a WSGI application thread) you could use reactor.callFromThread(). If you'd like to wait for results then use threads.blockingCallFromThread(). Thus you could call functions that use twisted.words See Using Threads in Twisted.
To find out whether a wsgi container is multi-threaded inspect wsgi.multithread it should return true for twisted container.
WSGI containers are multi-threaded to support more than one request at a time (it is not strictly necessary but it makes life easier using existing software). Otherwise (if you don't use other means to solve it) your whole server blocks while your request handler waits for an answer from a database. Some people find it simpler to write request handlers less worrying about blocking other requests if there are not many concurrent requests.
Functions in Python that perform CPU-intensive jobs when performance matters can use libraries that release GIL during calculations or offload them to other processes. Network, disk I/O that are frequent in webapps are usually much slower than CPU.