I found that in Python 3.4, there are few different libraries for multiprocessing/threading: multiprocessing vs threading vs asyncio.
But I don't know which one to use or is the "recommended one". Do they do the same thing, or are different? If so, which one is used for what? I want to write a program that uses multicores in my computer. But I don't know which library I should learn.
TL;DR
Making the Right Choice:
We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:
if io_bound:
if io_very_slow:
print("Use Asyncio")
else:
print("Use Threads")
else:
print("Multi Processing")
CPU Bound => Multi Processing
I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
I/O Bound, Slow I/O, Many connections => Asyncio
Reference
[NOTE]:
If you have a long call method (e.g. a method containing a sleep time or lazy I/O), the best choice is asyncio, Twisted or Tornado approach (coroutine methods), that works with a single thread as concurrency.
asyncio works on Python3.4 and later.
Tornado and Twisted are ready since Python2.7
uvloop is ultra fast asyncio event loop (uvloop makes asyncio 2-4x faster).
[UPDATE (2019)]:
Japranto (GitHub) is a very fast pipelining HTTP server based on uvloop.
They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That's why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That's why multiprocessing may not be preferred over threading in general.
asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it's just a solution (a good one indeed!) for a particular task, not for parallel processing in general.
In multiprocessing you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you're effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can "cut" the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You'll get a ~8x speedup by doing that.
In (multi)threading you don't need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn't really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads - you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time "freezes" the execution of one thread and jumps to executing the other one (it's called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound - use threading.
asyncio is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).
This is the basic idea:
Is it IO-BOUND ? -----------> USE asyncio
IS IT CPU-HEAVY ? ---------> USE multiprocessing
ELSE ? ----------------------> USE threading
So basically stick to threading unless you have IO/CPU problems.
Many of the answers suggest how to choose only 1 option, but why not be able to use all 3? In this answer I explain how you can use asyncio to manage combining all 3 forms of concurrency instead as well as easily swap between them later if need be.
The short answer
Many developers that are first-timers to concurrency in Python will end up using processing.Process and threading.Thread. However, these are the low-level APIs which have been merged together by the high-level API provided by the concurrent.futures module. Furthermore, spawning processes and threads has overhead, such as requiring more memory, a problem which plagued one of the examples I showed below. To an extent, concurrent.futures manages this for you so that you cannot as easily do something like spawn a thousand processes and crash your computer by only spawning a few processes and then just re-using those processes each time one finishes.
These high-level APIs are provided through concurrent.futures.Executor, which are then implemented by concurrent.futures.ProcessPoolExecutor and concurrent.futures.ThreadPoolExecutor. In most cases, you should use these over the multiprocessing.Process and threading.Thread, because it's easier to change from one to the other in the future when you use concurrent.futures and you don't have to learn the detailed differences of each.
Since these share a unified interfaces, you'll also find that code using multiprocessing or threading will often use concurrent.futures. asyncio is no exception to this, and provides a way to use it via the following code:
import asyncio
from concurrent.futures import Executor
from functools import partial
from typing import Any, Callable, Optional, TypeVar
T = TypeVar("T")
async def run_in_executor(
executor: Optional[Executor],
func: Callable[..., T],
/,
*args: Any,
**kwargs: Any,
) -> T:
"""
Run `func(*args, **kwargs)` asynchronously, using an executor.
If the executor is None, use the default ThreadPoolExecutor.
"""
return await asyncio.get_running_loop().run_in_executor(
executor,
partial(func, *args, **kwargs),
)
# Example usage for running `print` in a thread.
async def main():
await run_in_executor(None, print, "O" * 100_000)
asyncio.run(main())
In fact it turns out that using threading with asyncio was so common that in Python 3.9 they added asyncio.to_thread(func, *args, **kwargs) to shorten it for the default ThreadPoolExecutor.
The long answer
Are there any disadvantages to this approach?
Yes. With asyncio, the biggest disadvantage is that asynchronous functions aren't the same as synchronous functions. This can trip up new users of asyncio a lot and cause a lot of rework to be done if you didn't start programming with asyncio in mind from the beginning.
Another disadvantage is that users of your code will also become forced to use asyncio. All of this necessary rework will often leave first-time asyncio users with a really sour taste in their mouth.
Are there any non-performance advantages to this?
Yes. Similar to how using concurrent.futures is advantageous over threading.Thread and multiprocessing.Process for its unified interface, this approach can be considered a further abstraction from an Executor to an asynchronous function. You can start off using asyncio, and if later you find a part of it you need threading or multiprocessing, you can use asyncio.to_thread or run_in_executor. Likewise, you may later discover that an asynchronous version of what you're trying to run with threading already exists, so you can easily step back from using threading and switch to asyncio instead.
Are there any performance advantages to this?
Yes... and no. Ultimately it depends on the task. In some cases, it may not help (though it likely does not hurt), while in other cases it may help a lot. The rest of this answer provides some explanations as to why using asyncio to run an Executor may be advantageous.
- Combining multiple executors and other asynchronous code
asyncio essentially provides significantly more control over concurrency at the cost of you need to take control of the concurrency more. If you want to simultaneously run some code using a ThreadPoolExecutor along side some other code using a ProcessPoolExecutor, it is not so easy managing this using synchronous code, but it is very easy with asyncio.
import asyncio
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
async def with_processing():
with ProcessPoolExecutor() as executor:
tasks = [...]
for task in asyncio.as_completed(tasks):
result = await task
...
async def with_threading():
with ThreadPoolExecutor() as executor:
tasks = [...]
for task in asyncio.as_completed(tasks):
result = await task
...
async def main():
await asyncio.gather(with_processing(), with_threading())
asyncio.run(main())
How does this work? Essentially asyncio asks the executors to run their functions. Then, while an executor is running, asyncio will go run other code. For example, the ProcessPoolExecutor starts a bunch of processes, and then while waiting for those processes to finish, the ThreadPoolExecutor starts a bunch of threads. asyncio will then check in on these executors and collect their results when they are done. Furthermore, if you have other code using asyncio, you can run them while waiting for the processes and threads to finish.
- Narrowing in on what sections of code needs executors
It is not common that you will have many executors in your code, but what is a common problem that I have seen when people use threads/processes is that they will shove the entirety of their code into a thread/process, expecting it to work. For example, I once saw the following code (approximately):
from concurrent.futures import ThreadPoolExecutor
import requests
def get_data(url):
return requests.get(url).json()["data"]
urls = [...]
with ThreadPoolExecutor() as executor:
for data in executor.map(get_data, urls):
print(data)
The funny thing about this piece of code is that it was slower with concurrency than without. Why? Because the resulting json was large, and having many threads consume a huge amount of memory was disastrous. Luckily the solution was simple:
from concurrent.futures import ThreadPoolExecutor
import requests
urls = [...]
with ThreadPoolExecutor() as executor:
for response in executor.map(requests.get, urls):
print(response.json()["data"])
Now only one json is unloaded into memory at a time, and everything is fine.
The lesson here?
You shouldn't try to just slap all of your code into threads/processes, you should instead focus in on what part of the code actually needs concurrency.
But what if get_data was not a function as simple as this case? What if we had to apply the executor somewhere deep in the middle of the function? This is where asyncio comes in:
import asyncio
import requests
async def get_data(url):
# A lot of code.
...
# The specific part that needs threading.
response = await asyncio.to_thread(requests.get, url, some_other_params)
# A lot of code.
...
return data
urls = [...]
async def main():
tasks = [get_data(url) for url in urls]
for task in asyncio.as_completed(tasks):
data = await task
print(data)
asyncio.run(main())
Attempting the same with concurrent.futures is by no means pretty. You could use things such as callbacks, queues, etc., but it would be significantly harder to manage than basic asyncio code.
Already a lot of good answers. Can't elaborate more on the when to use each one. This is more an interesting combination of two. Multiprocessing + asyncio: https://pypi.org/project/aiomultiprocess/.
The use case for which it was designed was highio, but still utilizing as many of the cores available. Facebook used this library to write some kind of python based File server. Asyncio allowing for IO bound traffic, but multiprocessing allowing multiple event loops and threads on multiple cores.
Ex code from the repo:
import asyncio
from aiohttp import request
from aiomultiprocess import Pool
async def get(url):
async with request("GET", url) as response:
return await response.text("utf-8")
async def main():
urls = ["https://jreese.sh", ...]
async with Pool() as pool:
async for result in pool.map(get, urls):
... # process result
if __name__ == '__main__':
# Python 3.7
asyncio.run(main())
# Python 3.6
# loop = asyncio.get_event_loop()
# loop.run_until_complete(main())
Just and addition here, would not working in say jupyter notebook very well, as the notebook already has a asyncio loop running. Just a little note for you to not pull your hair out.
I’m not a professional Python user, but as a student in computer architecture I think I can share some of my considerations when choosing between multi processing and multi threading. Besides, some of the other answers (even among those with higher votes) are misusing technical terminology, so I thinks it’s also necessary to make some clarification on those as well, and I’ll do it first.
The fundamental difference between multiprocessing and multithreading is whether they share the same memory space. Threads share access to the same virtual memory space, so it is efficient and easy for threads to exchange their computation results (zero copy, and totally user-space execution).
Processes on the other hand have separate virtual memory spaces. They cannot directly read or write the other process’ memory space, just like a person cannot read or alter the mind of another person without talking to him. (Allowing so would be a violation of memory protection and defeat the purpose of using virtual memory. ) To exchange data between processes, they have to rely on the operating system’s facility (e.g. message passing), and for more than one reasons this is more costly to do than the “shared memory” scheme used by threads. One reason is that invoking the OS’ message passing mechanism requires making a system call which will switch the code execution from user mode to kernel mode, which is time consuming; another reason is likely that OS message passing scheme will have to copy the data bytes from the senders’ memory space to the receivers’ memory space, so non-zero copy cost.
It is incorrect to say a multithread program can only use one CPU. The reason why many people say so is due to an artifact of the CPython implementation: global interpreter lock (GIL). Because of the GIL, threads in a CPython process are serialized. As a result, it appears that the multithreaded python program only uses one CPU.
But multi thread computer programs in general are not restricted to one core, and for Python, implementations that do not use the GIL can indeed run many threads in parallel, that is, run on more than one CPU at the same time. (See https://wiki.python.org/moin/GlobalInterpreterLock).
Given that CPython is the predominant implementation of Python, it’s understandable why multithreaded python programs are commonly equated to being bound to a single core.
With Python with GIL, the only way to unleash the power of multicores is to use multiprocessing (there are exceptions to this as mentioned below). But your problem better be easily partition-able into parallel sub-problems that have minimal intercommunication, otherwise a lot of inter-process communication will have to take place and as explained above, the overhead of using the OS’ message passing mechanism will be costly, sometimes so costly the benefits of parallel processing are totally offset. If the nature of your problem requires intense communication between concurrent routines, multithreading is the natural way to go. Unfortunately with CPython, true, effectively parallel multithreading is not possible due to the GIL. In this case you should realize Python is not the optimal tool for your project and consider using another language.
There’s one alternative solution, that is to implement the concurrent processing routines in an external library written in C (or other languages), and import that module to Python. The CPython GIL will not bother to block the threads spawned by that external library.
So, with the burdens of GIL, is multithreading in CPython any good? It still offers benefits though, as other answers have mentioned, if you’re doing IO or network communication. In these cases the relevant computation is not done by your CPU but done by other devices (in the case of IO, the disk controller and DMA (direct memory access) controller will transfer the data with minimal CPU participation; in the case of networking, the NIC (network interface card) and DMA will take care of much of the task without CPU’s participation), so once a thread delegates such task to the NIC or disk controller, the OS can put that thread to a sleeping state and switch to other threads of the same program to do useful work.
In my understanding, the asyncio module is essentially a specific case of multithreading for IO operations.
So:
CPU-intensive programs, that can easily be partitioned to run on multiple processes with limited communication: Use multithreading if GIL does not exist (eg Jython), or use multiprocess if GIL is present (eg CPython).
CPU-intensive programs, that requires intensive communication between concurrent routines: Use multithreading if GIL does not exist, or use another programming language.
Lot’s of IO: asyncio
Multiprocessing can be run parallelly.
Multithreading and asyncio cannot be run parallelly.
With Intel(R) Core(TM) i7-8700K CPU # 3.70GHz and 32.0 GB RAM, I timed how many prime numbers are between 2 and 100000 with 2 processes, 2 threads and 2 asyncio tasks as shown below. *This is CPU bound calculation:
Multiprocessing
Multithreading
asyncio
23.87 seconds
45.24 seconds
44.77 seconds
Because multiprocessing can be run parallelly so multiprocessing is double more faster than multithreading and asyncio as shown above.
I used 3 sets of code below:
Multiprocessing:
# "process_test.py"
from multiprocessing import Process
import time
start_time = time.time()
def test():
num = 100000
primes = 0
for i in range(2, num + 1):
for j in range(2, i):
if i % j == 0:
break
else:
primes += 1
print(primes)
if __name__ == "__main__": # This is needed to run processes on Windows
process_list = []
for _ in range(0, 2): # 2 processes
process = Process(target=test)
process_list.append(process)
for process in process_list:
process.start()
for process in process_list:
process.join()
print(round((time.time() - start_time), 2), "seconds") # 23.87 seconds
Result:
...
9592
9592
23.87 seconds
Multithreading:
# "thread_test.py"
from threading import Thread
import time
start_time = time.time()
def test():
num = 100000
primes = 0
for i in range(2, num + 1):
for j in range(2, i):
if i % j == 0:
break
else:
primes += 1
print(primes)
thread_list = []
for _ in range(0, 2): # 2 threads
thread = Thread(target=test)
thread_list.append(thread)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
print(round((time.time() - start_time), 2), "seconds") # 45.24 seconds
Result:
...
9592
9592
45.24 seconds
Asyncio:
# "asyncio_test.py"
import asyncio
import time
start_time = time.time()
async def test():
num = 100000
primes = 0
for i in range(2, num + 1):
for j in range(2, i):
if i % j == 0:
break
else:
primes += 1
print(primes)
async def call_tests():
tasks = []
for _ in range(0, 2): # 2 asyncio tasks
tasks.append(test())
await asyncio.gather(*tasks)
asyncio.run(call_tests())
print(round((time.time() - start_time), 2), "seconds") # 44.77 seconds
Result:
...
9592
9592
44.77 seconds
Multiprocessing
Each process has its own Python interpreter and can run on a separate core of a processor. Python multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers true parallelism, effectively side-stepping the Global Interpreter Lock by using sub processes instead of threads.
Use multiprocessing when you have CPU intensive tasks.
Multithreading
Python multithreading allows you to spawn multiple threads within the process. These threads can share the same memory and resources of the process. In CPython due to Global interpreter lock at any given time only a single thread can run, hence you cannot utilize multiple cores. Multithreading in Python does not offer true parallelism due to GIL limitation.
Asyncio
Asyncio works on co-operative multitasking concepts. Asyncio tasks run on the same thread so there is no parallelism, but it provides better control to the developer instead of the OS which is the case in multithreading.
There is a nice discussion on this link regarding the advantages of asyncio over threads.
There is a nice blog by Lei Mao on Python concurrency here
Multiprocessing VS Threading VS AsyncIO in Python Summary
I'm building a web scraping API, and most of my scraping is done with AsyncIO coroutines, like this:
async def parse_event(self):
do scraping
# call the func
asyncio.run(b.parse_event())
This works perfectly fine, but as I'm scraping multiple websites at the same time, I was using concurrent.futures.ThreadPoolExecutor at first to scrape with multiple threads.
But since I've implemented the coroutine logic, I cannot now use the asyncio.run method in my thread directly.
Before (without coroutine):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
w1_future = executor.submit(self.w1.parse_event)
w2_future = executor.submit(self.w2.parse_event)
w3_future = executor.submit(self.w3.parse_event)
After, I would have expected something like below
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
w1_future = executor.submit(asyncio.run(self.w1.parse_event))
w2_future = executor.submit(asyncio.run(self.w2.parse_event))
w3_future = executor.submit(asyncio.run(self.w3.parse_event))
Unfortunately it is not working.
Both asyncio and threading are a means to use a single core for concurrent operations. However, this works via different mechanisms: asyncio uses the cooperative concurrency of async/await whereas threading uses the preemptive concurrency of the GIL.
Mixing the two does not speed up execution since both still use the same single core; instead, overhead of both mechanisms will slow down the program and the interaction of the mechanisms complicates writing correct code.
To achieve concurrency between multiple tasks, submit them all to a single asyncio event loop. The equivalent to executor.submit is asyncio.create_task; multiple tasks can be submitted at once using asyncio.gather. Note that both are called inside the loop as opposed to outside the executor.
async def parse_all():
return await asyncio.gather(
# all the parsing tasks that should run concurrently
self.w1.parse_event,
self.w2.parse_event,
self.w3.parse_event,
)
asyncio.run(parse_all())
If you absolutely do want to use separate, threaded event loops for each parse, you must use executor.submit(func, *args) instead of executor.submit(func(args)).
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
w1_future = executor.submit(asyncio.run, self.w1.parse_event())
w2_future = executor.submit(asyncio.run, self.w2.parse_event())
w3_future = executor.submit(asyncio.run, self.w3.parse_event())
Note that mixing asyncio and threading adds complexity and constraints. You might want to use debug mode to detect some thread and context safety issues. However, thread and context safety constraints/guarantees are often not documented; manually test or inspect the operations for safety if needed.
I have a program workflow as follows: 1. IO-bound (web page fetch) -> 2. cpu-bound (processing information) -> 3. IO-bound (writing results to database).
I'm presently using aiohttp to fetch the web pages. I am currently using asyncio.as_completed to gather Step 1 tasks and pass them to Step 2 as completed. My concern is that this may be interfering with the completion of Step 1 tasks by consuming cpu resources and blocking program flow in Step 2.
I've tried to use ProcessPoolExecutor to farm out the Step 2 tasks to other processes, but Step 2 tasks uses non-pickleable data structures and functions. I've tried ThreadPoolExecutor, and while that worked (e.g. it didn't crash), it is my understanding that doing so for CPU-bound tasks is counter-productive.
Because the workflow has an intermediate cpu-bound task, would it be more efficient to use asyncio.gather (instead of asyncio.as_completed) to complete all of the Step 1 processes before moving on to Step 2?
Sample asyncio.as_completed code:
async with ClientSession() as session:
tasks = {self.fetch(session, url) for url in self.urls}
for task in asyncio.as_completed(tasks):
raw_data = await asyncio.shield(task)
data = self.extract_data(*raw_data)
await self.store_data(data)
Sample asyncio.gather code:
async with ClientSession() as session:
tasks = {self.fetch(session, url) for url in self.urls}
results = await asyncio.gather(*tasks)
for result in results:
data = self.extract_data(*result)
await self.store_data(data)
Preliminary tests with limited samples show as_completed to be slightly more efficient than gather: ~2.98s (as_completed) vs ~3.15s (gather). But is there an asyncio conceptual issue that would favor one solution over another?
"I've tried ThreadPoolExecutor, [...] it is my understanding that doing so for CPU-bound tasks is counter-productive." - it is countrproductiv in a sense you won't have two such asks running Python code in parallel, using multiple CPU cores - but otherwise, it will work to free up your asyncio Loop to continue working, if only munching code for one task at a time.
If you can't pickle things to a subprocess, running the CPU bound tasks in a ThreadPoolExecutor is good enough.
Otherwise, just sprinkle you cpu code with some await asyncio.sleep(0) (inside the loops) and run them normally as coroutines: that is enough for a cpu bound task not to lock the asyncio loop.
I am trying to understand asyncio module and spend about one hour with run_coroutine_threadsafe function, I even came to the working example, it works as expected, but works with several limitations.
First of all I do not understand how should I properly call asyncio loop in main (any other) thread, in the example I call it with run_until_complete and give it a coroutine to make it busy with something until another thread will not give it a coroutine. What are other options I have?
What are situations when I have to mix asyncio and threading (in Python) in real life? Since as far as I understand asyncio is supposed to take place of threading in Python (due to GIL for not IO ops), if I am wrong, do not be angry and share your suggestions.
Python version is 3.7/3.8
import asyncio
import threading
import time
async def coro_func():
return await asyncio.sleep(3, 42)
def another_thread(_loop):
coro = coro_func() # is local thread coroutine which we would like to run in another thread
# _loop is a loop which was created in another thread
future = asyncio.run_coroutine_threadsafe(coro, _loop)
print(f"{threading.current_thread().name}: {future.result()}")
time.sleep(15)
print(f"{threading.current_thread().name} is Finished")
if __name__ == '__main__':
loop = asyncio.get_event_loop()
main_th_cor = asyncio.sleep(10)
# main_th_cor is used to make loop busy with something until another_thread will not send coroutine to it
print("START MAIN")
x = threading.Thread(target=another_thread, args=(loop, ), name="Some_Thread")
x.start()
time.sleep(1)
loop.run_until_complete(main_th_cor)
print("FINISH MAIN")
First of all I do not understand how should I properly call asyncio loop in main (any other) thread, in the example I call it with run_until_complete and give it a coroutine to make it busy with something until another thread will not give it a coroutine. What are other options I have?
This is a good use case for loop.run_forever(). The loop will run and serve the coroutines you submit using run_coroutine_threadsafe. (You can even submit such coroutines from multiple threads in parallel; you never need to instantiate more than one event loop.)
You can stop the loop from a different thread by calling loop.call_soon_threadsafe(loop.stop).
What are situations when I have to mix asyncio and threading (in Python) in real life?
Ideally there should be none. But in the real world, they do crop up; for example:
When you are introducing asyncio into an existing large program that uses threads and blocking calls and cannot be converted to asyncio all at once. run_coroutine_threadsafe allows regular blocking code to make use of asyncio.
When you are dealing with older "async" APIs which use threads under the hood and call the user-supplied APIs from other threads. There are many examples, such as Python's own multiprocessing.
When you need to call blocking functions that have no async equivalent from asyncio - e.g. CPU-bound functions, legacy database drivers, things like that. This is not a use case for run_coroutine_threadsafe, here you'd use run_in_executor, but it is another example of mixing threads and asyncio.
In the Python docs, it states:
Application developers should typically use the high-level asyncio
functions, such as asyncio.run(), and should rarely need to reference
the loop object or call its methods.
This section is intended mostly for authors of lower-level code, libraries, and frameworks, who need finer control over the event loop behavior.
When using both async and a threadpoolexecutor, as show in the example code (from the docs):
import asyncio
import concurrent.futures
def blocking_io():
# File operations (such as logging) can block the
# event loop: run them in a thread pool.
with open('/dev/urandom', 'rb') as f:
return f.read(100)
async def main():
loop = asyncio.get_running_loop()
# 2. Run in a custom thread pool:
with concurrent.futures.ThreadPoolExecutor() as pool:
result = await loop.run_in_executor(
pool, blocking_io)
print('custom thread pool', result)
asyncio.run(main())
Do I need to call loop.close(), or would the asyncio.run() close the loop for me?
Is using both asyncio and threadpoolexecutor together, one of those situations where you need finer control over the event loop? Can using both, asyncio and threadpoolexecutor together be done without referencing the loop?
For question #1, The coroutines and tasks documentation linked in the Event Loop documentation you reference indicates that asyncio.run closes the loop:
asyncio.run(coro, *, debug=False)
Execute the coroutine coro and return the result.
This function runs the passed coroutine, taking care of managing the
asyncio event loop and finalizing asynchronous generators.
This function cannot be called when another asyncio event loop is
running in the same thread.
If debug is True, the event loop will be run in debug mode.
This function always creates a new event loop and closes it at the
end. It should be used as a main entry point for asyncio programs, and
should ideally only be called once.
For #2, that use of get_running_loop with ThreadExecutor is a way to run blocking code without blocking the OS thread. In Developing with asyncio, they indicate:
The loop.run_in_executor() method can be used with a concurrent.futures.ThreadPoolExecutor to execute blocking code in a different OS thread without blocking the OS thread that the event loop runs in.
It is an exception to the general warning they give in the Event documentation around calling the lower-level methods. So the answer to question #2, the answer is yes, it is one of those situations where you need a small amount of finer control in order to execute this recipe that handles blocking code in async scenarios.