If I deploy Django using Gunicorn with the Eventlet worker type, and only use one process, what happens under the hood to service the 1000 (by default) worker connections? What parts of Django are copied into each thread? Are any parts copied?
If you set workers = 1 in your gunicorn configuration, two processes will be created: 1 master process and 1 worker process.
When you use worker_class = eventlet, the simultaneous connections are handled by green threads. Green threads are not like real threads. In simple terms, green threads are functions (coroutines) that yield whenever the function encounters I/O operation.
So nothing is copied. You just need to worry about making every I/O operation 'green'.
Related
If all the fastapi endpoints are defined as async def, then there will only be 1 thread that is running right? (assuming a single uvicorn worker).
Just wanted to confirm in such a setup, we will never hit the python's Global Interpreter Lock. If the same was to be done in a flask framework with multiple threads for the single gunicorn worker, then we would be facing the GIL which hinders the true parallelism between threads.
So basically, in the above fastapi, the parallelism is limited to 1 since there is only one thread. And to make use of all the cores, we would need to increase the number of workers either using gunicorn or uvicorn.
Is my understanding correct?
Your understanding is correct. When using 1 worker with uvicorn, only one process is run. That means, there is only one thread that can take a lock on the interpreter that is running your application. Due to the asynchronous nature of your FastAPI app, it will be able to handle multiple simultaneous requests, but not in parallel.
If you want multiple instances of your application run in parallel, you can increase your workers. This will spin up multiple processes (all single threaded as above) and Uvicorn will distribute the requests among them.
Note that you cannot have shared global variables across workers. These are separate instances of your FastAPI app and do not communicate with each other. See this answer for more info on that and how to use databases or caches to work around that.
I have to set up a worker which handles some data after a certain event happens. I know I can start the worker with python manage.py runworker my_worker, but what I would need is to start the worker in the same process as the main Django app on a separate thread.
Why do I need it in a separate thread and not in a separate process? Because the worker would perform a pretty light-weight job which would not overload the server's resources, and, moreover, the effort of making the set up for a new process in the production is not worth the gain in performance. In other words, I would prefer to keep it in the Django's process if possible.
Why not perform the job synchronously? Because it is a separate logic that needs the possibility to be extended, and it is out of the main HTTP request-reply scope. It is a post-processing task which doesn't interfere with the main logic. I need to decouple this task from an infrastructural point-of-view, not only logical (e.g. with plain signals).
Is there any possibility provided by Django Channels to run a worker in such a way?
Would there be any downsides to start the worker manually on a separate thread?
Right now I have the setup for a message broker consumer thread (without using Channels), so I have the entry point for starting a new worker thread. But as I've seen from the Channel's runworker command, it loads the whole application, so it doesn't seem like a naïve worker.run() call is the proper way to do it (I might be wrong with this one).
I found an answer to my question.
The answer is no, you can't just start a worker within the same process. This is because the consumer needs to run inside an event loop thread and it is not good at all to have more than one event loop thread in the same process (Django WSGI application already runs the main thread with an event loop).
The best you can do is to start the worker in a separate process. As I mentioned in my question, I started a message broker consumer on a separate thread, which was not a good approach either, so I changed my configuration to start the consumers as separate processes.
In an environment with 8 cores, celery should be able to process 8 incoming tasks in parallel by default. But sometimes when new tasks are received celery place them behind a long running process.
I played around with default configuration, letting one worker consume from one queue.
celery -A proj worker --loglevel=INFO --concurrency=8
Is my understanding wrong, that one worker with a concurrency of 8 is able to process 8 tasks from one queue in parallel?
How is the preferred way to setup celery to prevent such behaviour described above?
To put it simply concurrency is the number of jobs running on a worker. Prefetch is the number of job sitting in a queue on a worker itself. You have 1 of 2 options here. The first is to set the prefetch multiplier down to 1. This will mean the worker will only keep, in your case, 8 additional jobs in it's queue. The second which I would recommend would be to create 2 different queues one for your short running tasks and another for your long running tasks.
I generally don't need to explicitly use threads in my Django application level programming (i.e. views). But I've noticed a library that looks interesting which handles server side analytics by via threading.
During a Django view, you would use their Python client to batch HTTP POSTs to their web service in a separate (non-daemon) thread. Normally, I would go with RabbitMQ for something like this, instead of threads but they wanted to lower the startup costs for the library.
My question is, are there any downsides to this approach? Threads have some additional memory footprint, but I'm not too worried about that. It obviously depends on the number of requests/threads started.
Is the fact that the threads are not daemons and potentially long running an issue? I assume that the Gunicorn process is the main thread of execution and it runs in an infinite loop, so it generally doesn't matter if it has to wait on the non-daemon threads to exit. Is that correct?
Kind of an open question but the main point is understanding the impact of non-daemon threads in Django/Gunicorn apps.
Gunicorn uses a pre-fork worker model. The Master process spawns and manages Worker processes. For non-Tornado uses, there are two kinds of Workers: Sync (default) and Async.
In normal operations, these Workers run in a loop until the Master either tells them to graceful shutdown or kills them. Workers will periodically issue a heartbeat to the Master to indicate that they are still alive and working. If a heartbeat timeout occurs, then the Master will kill the Worker and restart it.
Therefore, daemon and non-daemon threads that do not interfere with the Worker's main loop should have no impact. If the thread does interfere with the Worker's main loop, such as a scenario where the thread is performing work and will provide results to the HTTP Response, then consider using an Async Worker. Async Workers allow for the TCP connection to remain alive for a long time while still allowing the Worker to issue heartbeats to the Master.
We are running Celery behind Supervisor and start it with
celeryd --events --loglevel=INFO --concurrency=2
This, however, creates a process graph that is up to three layers deep and contains up to 7 celeryd processes (Supervisor spawns one celeryd, which spawns several others, which again spawn processes). Our machine has two CPU cores.
Are all of these processes working on tasks? Are maybe some of them just worker pools? How is the --concurrency setting connected to the number of processes actually spawned?
You shouldn't have 7 processes if --concurrency is 2.
The actual processes started is:
The main consumer process
Delegates work to the worker pool
The worker pool (this is the number that --concurrency decides)
So that is 3 processes with a concurrency of two.
In addition a very lightweight process used to clean up semaphores is started
if force_execv is enabled (which it is by default i you're using some other transport
than redis or rabbitmq).
NOTE that in some cases process listings also include threads.
the worker may start several threads if using transports other than rabbitmq/redis,
including one Mediator thread that is always started unless CELERY_DISABLE_RATE_LIMITS is enabled.