Does python3 asyncio use a work stealing scheduler like Rust Tokio? - python

Does Python 3 asyncio use a work-stealing scheduler like Rust Tokio? What's the behavior of the default scheduler? Is it documented somewhere?

"Work-stealing" is a property of multi-threaded executors. Python asyncio's executor (event loop) is single-threaded, so it's by definition not work-stealing. The behavior of the asyncio event loop wrt threads is documented (among other places) in the Concurrency and Multithreading section of the documentation.
As for the algorithm used for scheduling, it's intentionally unspecified, but the stdlib implementation uses:
a deque to store callbacks that are ready to run (those scheduled with call_soon() or create_task()) as well as those associated with file descriptors that are ready to read/write, and
a binary heap to store callbacks scheduled for a particular time ordered by the absolute time when they're supposed to fire. This covers callbacks scheduled by loop.call_after() and loop.call_at(), but also continuation of coroutines suspended by asyncio.sleep(), which internally uses loop.call_at().
At each loop iteration the loop waits for something to happen on file descriptors associated with coroutines and sets the timeout to interrupt the sleep at the nearest time-based callback, in case nothing interesting happens before that. It proceeds to call the ready callbacks and the timeouts scheduled to run at the current or earlier time. This is repeated until the event loop is instructed to stop.

Related

Asyncio: legitimate multiple threads with event loops use cases?

Granted, given the GIL, classic asyncio design should focus on "single main thread with single event loop" in it. Nonetheless, are there legitimate "multiple threads with multiple event loops" use cases, which bring some architectural or performance advantages over the singular case? Please share.
I never compare the performance of using multiple event loop in multiple threads.
As far as I know asynchronous is an event-driven architechture where single event loop is rely on single thread and the running function is this event-loop will wait for the trigger to get their time to run. This would becomes faster than threading (in theory) since we're no longer taking care of resource management (memory, cpu etc).
Threading basically will try to manage the resource in terms to make them run concurrently because its actually switching the resource usage.
But both is proposed to execute the program in parallel even if its not concurrent at a time. and for security, asynchronous is more thread safe since it's in a single thread.
Found: one such use case is QEventLoop <-> asyncio.EventLoop "mutual stop-and-go hand-over" pattern in PySide6, when QT event loop stops itself, then allows asyncio event loop to run for a while, then QT event loop is back in control, all the while sharing the same thread.

asyncio run from sync function in multiple workers

I am really struggling to understand the interaction between asyncio event loop and multiple workers/threads/processes.
I am using dash: which uses flask internally and gunicorn.
Say I have two functions
def async_download_multiple_files(files):
# This function uses async just so that it can concurrently send
# Multiple requests to different webservers and returns data.
def sync_callback_dash(files):
# This is a sync function that is called from a dash callback to get data
asyncio.run(async_download_multiple_files(files))
As I understand, asyncio.run runs the async function in an event loop but blocks it:
From Python Docs
While a Task is running in the event loop, no other Tasks can run in the same thread.
But what happens when I run a WSGI server like Gunicorn with multiple workers.
Say there are 2 requests coming in simultaneously, presumably there will be multiple calls to sync_callback_dash which will happen in parallel because of multiple Gunicorn workers.
Can both request 1 and request 2 try to execute the asyncio.run in parallel in different threads\processes ? Will one block the other ?
If they can run in parallel, what is the use of having asyncio workers that Gunicorn offers?
I answered this question with the assumption that there is some lack of knowledge on some of the fundamental understandings of threads/processes/async loop. If there was not, forgive me for the amount of detail.
First thing to note is that processes and threads are two separate concepts. This answer might give you some context. To expand:
Processes are run directly by the CPU, and if the CPU has multiple cores, processes can be run in parallel. Inside processes is where threads are run. There is always at least 1 thread per process, but there can be more. If there are more, the process switches between which thread it is executing after every (specific) millisecond (dictated by things out of the scope of this question)- and therefore threads are not run in absolute parallel, but rather constantly switched in and out of the CPU (at least as it pertains to Python, specifically, due to something called the GIL). The async loop runs inside a thread, and switches context relating specifically to I/O-bound instructions (more of this below).
Regarding this question, it's worth noting that Gunicorn workers are processes, and not threads (though you can increase the amount of threads per worker).
The intention of asynchronous code (with the use of async def, await, and asyncio) is to speed-up performance as it specifically relates to I/O bound tasks. Stuff like getting a file from disk, sending/receiving a network request, or anything that requires a physical piece of your computer - whether it is SSD, or the network card - other than the CPU to do some work. It can also be used for large CPU-bound instructions, but this is usually where threads come in. Note that I/O bound instructions are much slower than CPU bound instructions as the electricity inside your computer literally has to travel further distances, as well as perform extra steps in the hardware level (to keep things simple).
These tasks waste the CPU time (or, more specifically, the current process's time) on simply waiting for a reply. Asynchronous code is run with the help of a loop that auto-manages the context switching of I/O bound instructions and normal CPU bound instructions (dependent on the use of await keywords) by leveraging the idea that a function can "yield" control back to the loop, and allow the loop to continue processing other pieces of code while it waits. When async code sends an I/O bound instruction (e.g. grab the latest packet from the network card), instead of sitting still and waiting for a reply it will switch the current process' context to the next task in its list to speed up general execution time (adding that previous I/O bound call to this list to check back in later). There is more to this, but this is the general gist as it relates to your question.
This is what it means when the docs says:
While a Task is running in the event loop, no other Tasks can run in the same thread.
The async loop is not running things in parallel, but rather constantly switching context between different instructions for a more optimized CPU + I/O relationship/execution.
Processes, in the other hand, run in parallel in your CPU assuming you have multiple cores. Gunicorn workers - as mentioned earlier - are processes. When you run multiple async workers with Gunicorn you are effectively running multiple asyncio.loop in multiple (independent, and parallel-running) processes. This should answer your question on:
Can both request 1 and request 2 try to execute the asyncio.run in parallel in different threads\processes ? Will one block the other ?
If there is ever the case that one worker gets stuck on some extremely long I/O bound (or even non-async computation) instruction(s), other workers are there to take care of the next request(s).
With asyncio it is possible to run a separate event loop in each thread. Both will run in parallel (to the extent the Python Interpreter is capable). There are some restrictions. Communication between those loops must use threadsafe methods. Signals and subprocesses can be handled in the main thread only.
Calling asyncio.run in a callback will block until the asyncio part completely finishes. It is not clear from your question if this is what you want.
Alternatively, you could start a long running event loop in one thread and use asyncio.run_coroutine_threadsafe from other threads. Read the docs with an example here.

How are threads (and asyncio tasks) scheduled in Python?

I'm trying to understand concurrency in Python and am confused about how threads are scheduled and how tasks (in asyncio library) are scheduled to run/wait.
Suppose a thread tries to acquire a Lock and is blocked. Does the Python interpreter immediately put that thread into the 'blocked' queue? How is this blocked thread put back into the running state? Is there busy waiting involved?
How is this different when a task (the equivalent of a thread) in the asyncio library is blocked on an async mutex?
What is the advantage of asyncio, if there is no busy waiting involved in either of the above two cases?
Suppose a thread tries to acquire a Lock and is blocked. Does the Python interpreter immediately put that thread into the 'blocked' queue?
Python creates real operating system threads, so no queuing or scheduling needs to be done by the interpreter.
The one possible exception is the global lock use by the interpreter to serialize execution of Python code and access to Python objects. This lock is released not only before acquiring a threading lock, but also before any (potentially) blocking operation, such as reading from an IO handle or sleeping.
What is the advantage of asyncio, if there is no busy waiting involved in either of the above two cases?
The advantage is that asyncio doesn't require a new OS thread for each coroutine it executes in parallel. OS threads are expensive, and asyncio tasks are quite lightweight. Also, asyncio makes the potential switch points visible (the await keyword), so there's less potential for race conditions.
You can think of asyncio as a successor to Twisted, but with a modern API and using suspendable coroutines instead of explicit callback chaining.

Understanding the difference between Async/Await and Task

In the Python documentation it describes how to start and use coroutines.
This section describes how to use a Task.
In the Task section, it states:
Tasks are used to schedule coroutines concurrently
I'm failing to understand, what is happening when I start a coroutines without using Task? Is the code running asynchronously but not concurrently? Does it mean when the code sees an await it goes and does something else?
When I use a Task is it like start two threads and calling join()? I start two or more tasks and wait for the result, correct?
For simple cases, creating Tasks manually is somewhat similar to threads – you can create them, event loop will eventually run them, and you should eventually get result/exception.
But in most cases, your code is built around await coro() – nothing low-level. This means that your code may do some I/O operation inside coro, so process is free to put your implicitly created task into queue, and resume execution later.

parallel computations with task manager

I need to run some parallel computations in python. The only compatible approach I can think of is the multiprocess/fork model, which is less than ideal for several reasons:
from what I understand, forks in windows are expensive
fine-grained process management (signals, ie SIGSTOP/SIGCONT) is clunky (i.e. outside the language)
These are the task requirements:
tasks may spawn new tasks
tasks must be registered with the task manager
tasks do not require shared state
tasks must return a value (python object)
The task manager is responsible for scheduling and limiting the number of concurrent tasks. These are the task manager requirements:
when a new task is started, the task manager may suspend other tasks based on a predetermined limit
when a task returns, the task manager may continue other suspended tasks
when the return value of a task is requested, the task manager may reorganize the task priority (prevent deadlocks)
So you see, the task manager doesn't need to be a parallel/concurrent process. Each task may make synchronous calls to the task manager on starting or stopping. Tasks waiting on other tasks may also make synchronous calls.
I can't seem to think of any other approaches:
asyncio can start parallel process within a limited pool, but that approach is more suited for data parallelism rather than task pre-emption. Externally pre-empting a task (suspending) isn't compatible with cooperatively programmed events. Correct me if I'm wrong, but while I could use asyncio, it wouldn't make my life easier (an abstraction without benefit) as I would still be required to use processes, and signals on "task-start/stop" events?
stackless python might be suitable, but it isn't really python?
Any ideas?
P.S. My end-goal is to automatically parallelize (decorated) function calls. The task manager limits the number of tasks executing in parallel (i.e. recursive functions) to avoid thrashing (fork bombs). I need to use python, even though a though lazy (task waiting), pure (no shared state) and stackless (lightweight threads) language might be more suitable...
Wow, this question is old and I'm surprised a Stackless Python user hasn't chimed in...
Then again, Stackless Python was/is way ahead of its time and there's very few of us out there putting it into use.
Stackless Python is indeed Python. It is a little more than just Python, but it is Python none the less.
Stackless Python Wiki
I think it would suit your needs very well. It is still up-to-date and maintained with a commit as recent as this month. It's rather solid and has worked wonderfully for my needs.

Categories

Resources