jupyter notebooks-safe asyncio run wrapper method for a library

jupyter notebooks-safe asyncio run wrapper method for a library - python

I'm building a library that leverages asyncio internally.
While the user shouldn't be aware of it, the internal implementation currently wraps the async code with the asyncio.run() porcelain wrapper.
However, some users will be executing this library code from a jupyter notebook, and I'm struggling to replace the asyncio.run() with a wrapper that's safe for either environment.
Here's what I've tried:
ASYNC_IO_NO_RUNNING_LOOP_MSG = 'no running event loop'
def jupyter_safe_run_coroutine(async_coroutine, _test_mode: bool = False)
try:
loop = asyncio.get_running_loop()
task = loop.create_task(async_coroutine)
result = loop.run_until_complete(task) # <- fails as loop is already running
# OR
asyncio.wait_for(task, timeout=None, loop=loop) # <- fails as this is an async method
result = task.result()
except RuntimeError as e:
if _test_mode:
raise e
if ASYNC_IO_NO_RUNNING_LOOP_MSG in str(e):
return asyncio.run(async_coroutine)
except Exception as e:
raise e
Requirements
We use python 3.8, so we can't use asyncio.Runner context manager
We can't use threading, so the solution suggested here would not work
Problem:
How can I wait/await for the async_coroutine, or the task/future provided by loop.create_task(async_coroutine) to be completed?
None of the methods above actually do the waiting, and for the reasons stated in the comments.
Update
I've found this nest_asyncio library that's built to solve this problem exactly:
ASYNC_IO_NO_RUNNING_LOOP_MSG = 'no running event loop'
HAS_BEEN_RUN = False
def jupyter_safe_run_coroutine(async_coroutine, _test_mode: bool = False):
global HAS_BEEN_RUN
if not HAS_BEEN_RUN:
_apply_nested_asyncio_patch()
HAS_BEEN_RUN = True
return asyncio.run(async_coroutine)
def _apply_nested_asyncio_patch():
try:
loop = asyncio.get_running_loop()
logger.info(f'as get_running_loop() returned {loop}, this environment has it`s own event loop.\n'
f'Patching with nest_asyncio')
import nest_asyncio
nest_asyncio.apply()
except RuntimeError as e:
if ASYNC_IO_NO_RUNNING_LOOP_MSG in str(e):
logger.info(f'as get_running_loop() raised {e}, this environment does not have it`s own event loop.\n'
f'No patching necessary')
else:
raise e
Still, there are some issues I'm facing with it:
As per this SO answer, there might be starvation issues
Any logs written in the async_coroutine are not printed in the jupyter notebook
The jupyter notebook kernel occasionally crashes upon completion of the task
Edit
For context, the library internally calls external APIs for data enrichment of a user-provided dataframe:
# user code using the library
import my_lib
df = pd.DataFrame(data='some data')
enriched_df = my_lib.enrich(df)

It's usually a good idea to expose the asynchronous function. This way you will give your users more flexibility.
If some of your users can't (or don't want to) use asynchronous calls to your functions, they will be able to call the async function using asyncio.run(your_function()). Or in the rare situation where they have an event loop running but can't make async calls they could use the create_task + add_one_callback method described here. (I really have no idea why such a use case may happen, but for the sake of the argument I included it.)
Hidding the asynchronous interface from your users is not the best idea because it limits their capabilities. They will probably fork your package to patch it and make the exposed function async or call the hidden async function directly. None of which is good news for you (harder to document / track bugs). I would really suggest to stick to the simplest solution and provide the async functions as the main entry points.
Suppose the following package code followed by 3 different usage of it:
async def package_code():
return "package"
Client codes
Typical clients will probably just use it this way:
async def client_code_a():
print(await package_code())
# asyncio.run(client_code_a())
For some people, the following might make sense. For example if your package is the only asynchronous thing they will ever use. Or maybe they are not yet confortable using async code (these you can probably convince to try client_code_a instead):
def client_code_b():
print(asyncio.run(package_code()))
# client_code_b()
The very few (I'm tempted to say none):
async def client_code_c():
# asyncio.run() cannot be called from a running event loop:
# print(asyncio.run(package_code()))
loop = asyncio.get_running_loop()
task = loop.create_task(package_code())
task.add_done_callback(lambda t: print(t.result()))
# asyncio.run(client_code_c())

I'm still not sure to understand what your goal is, but I'll describe with code what I tried to explain in my comment so you can tell me where your issue lies in the following.
If you package requests the user to call some functions (your_package_function in the example) that take coroutines as arguments, then you shouldn't worry about the event loop.
That means the package shouldn't call asyncio.run nor loop.run_until_complete. The client should (in almost all cases) be responsible for starting the even loop.
Your package code should assume there is an event loop running. Since I don't know your package's goal I just made a function that feeds a "test" argument to any coroutine the client is passing:
import asyncio
async def your_package_function(coroutine):
print("- Package internals start")
task = asyncio.create_task(coroutine("test"))
await asyncio.sleep(.5) # Simulates slow tasks within your package
print("- Package internals completed other task")
x = await task
print("- Package internals end")
return x
The client (package user) should then call the following:
async def main():
x = await your_package_function(return_with_delay)
print(f"Computed value = {x}")
async def return_with_delay(value):
print("+ User function start")
await asyncio.sleep(.2)
print("+ User function end")
return value
await main()
# or asyncio.run(main()) if needed
This would print:
- Package internals start
- Package internals completed other task
+ User function start
+ User function end
- Package internals end
Computed value = test

Related

asyncio - How do I add a new async task to my "main" loop from inside sync code executed with run_in_executor?

Apologies for any incorrect terminology or poor description - I'm new to asyncio. Please feel free to edit/correct/improve my question.
I have a scenario where my asyncio application needs to use run_in_executor to run a sync function from a third-party library. This in turn calls a sync function in my code at certain times. I want this function to be able to then add tasks to my main loop.
I tried creating a new shorter-lived mini-loop within the sync function/thread, but it turns out another (tortoise-orm with sqlite) requires the task to be in the main loop (it uses an asyncio lock).
I'm wondering if there is a best-practice way of achieving this, my attempts so far are messy and have mixed results. I'm not sure if I'm thinking correctly that this is a valid use of contextvar / asyncio.run_coroutine_threadsafe.
Any tips would be appreciated.
A simplified example:
def add_item_to_database(item: dict) -> None:
# Is there a way to obtain the "main" loop here? contextvar?
# Do I use this with asyncio.run_coroutine_threadsafe?
# TODO
raise NotImplementedError(
"How do I now schedule/submit an async function back "
"on my main loop?"
)
import asyncio
async def my_coro() -> None:
loop = asyncio.get_running_loop()
# third party sync function will be long-running,
# and occasionally call my sync "callback"
# add_item_to_database function from another module.
await loop.run_in_executor(
None,
third_party_sync_function_that_will_call_add_item_to_database
)
async def main() -> None:
# other 'proper' asyncio coroutine tasks also exist - omitted here.
tasks = [asyncio.create_task(my_coro)]
await asyncio.gather(*tasks)
asyncio.run(main())

How to make Python async functions callable in Jupyter without await

I've written a Python library that currently makes several independent HTTP requests in serial. I'd like to parallelize these requests without altering how the library is called by users, or requiring users to be aware that calls are being made asynchronously under the hood. The library is meant for novice/intermediate Python users mostly using Jupyter, and I'd like it to work without introducing them to unfamiliar async/await semantics.
The following example, which works in Jupyter, illustrates what I'd like to achieve but requires use of await to invoke the code on the final line:
import asyncio
async def first_request():
await asyncio.sleep(2) # Simulate request time
return "First request response"
async def second_request():
await asyncio.sleep(2)
return "Second request response"
async def make_requests_in_parallel():
"""Make requests in parallel and return the responses."""
return await asyncio.gather(first_request(), second_request())
results = await make_requests_in_parallel() # Undesirable use of `await`
I've found previous answers describing how to call async code from synchronous code using asyncio.run(). In the Jupyter example above, I can replace the final line with the following to create a working, importable Python module:
def main():
"""Make results available to async-naive users"""
return asyncio.run(make_requests_in_parallel())
results = main() # No `await` needed to get results -- good!
This seems to be what I want. However, in Jupyter, the code will produce an error:
RuntimeError: asyncio.run() cannot be called from a running event loop
A comment on the same answer above explains that because Jupyter runs its own async event loop, there is no need (or, apparently, option) to start another one, so async code can "simply" be called using await. In my situation, though, avoiding await is why I wanted to use asyncio.run() in the first place.
This seems to suggest that existing synchronous libraries cannot, by any means, internally parallelize any operation using asyncio without altering their public API to require use of await. Is this true?
If so, are there more practical alternatives to asyncio that would let me parallelize a group of requests in an internal function without educating my users about async/await?

I found a great solution for this: nest_asyncio.
Once installed, the working solution in Jupyter is as follows:
import asyncio
import nest_asyncio
nest_asyncio.apply()
async def first_request():
await asyncio.sleep(2) # Simulate request time
return "First request response"
async def second_request():
await asyncio.sleep(2)
return "Second request response"
async def make_requests_in_parallel():
"""Make requests in parallel and return the responses."""
return await asyncio.gather(first_request(), second_request())
def main():
"""Make results available to async-naive users"""
return asyncio.run(make_requests_in_parallel())
results = main() # No `await` needed to get results

Learning asyncio: "coroutine was never awaited" warning error

I am trying to learn to use asyncio in Python to optimize scripts.
My example returns a coroutine was never awaited warning, can you help to understand and find how to solve it?
import time
import datetime
import random
import asyncio
import aiohttp
import requests
def requete_bloquante(num):
print(f'Get {num}')
uid = requests.get("https://httpbin.org/uuid").json()['uuid']
print(f"Res {num}: {uid}")
def faire_toutes_les_requetes():
for x in range(10):
requete_bloquante(x)
print("Bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
async def requete_sans_bloquer(num, session):
print(f'Get {num}')
async with session.get("https://httpbin.org/uuid") as response:
uid = (await response.json()['uuid'])
print(f"Res {num}: {uid}")
async def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
loop.run_until_complete(asyncio.gather(*futures))
loop.close()
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes_sans_bloquer()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
The first classic part of the code runs correctly, but the second half only produces:
synchronicite.py:43: RuntimeWarning: coroutine 'faire_toutes_les_requetes_sans_bloquer' was never awaited

You made faire_toutes_les_requetes_sans_bloquer an awaitable function, a coroutine, by using async def.
When you call an awaitable function, you create a new coroutine object. The code inside the function won't run until you then await on the function or run it as a task:
>>> async def foo():
... print("Running the foo coroutine")
...
>>> foo()
<coroutine object foo at 0x10b186348>
>>> import asyncio
>>> asyncio.run(foo())
Running the foo coroutine
You want to keep that function synchronous, because you don't start the loop until inside that function:
def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
# ...
loop.close()
print("Fin de la boucle !")
However, you are also trying to use a aiophttp.ClientSession() object, and that's an asynchronous context manager, you are expected to use it with async with, not just with, and so has to be run in aside an awaitable task. If you use with instead of async with a TypeError("Use async with instead") exception will be raised.
That all means you need to move the loop.run_until_complete() call out of your faire_toutes_les_requetes_sans_bloquer() function, so you can keep that as the main task to be run; you can call and await on asycio.gather() directly then:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
asyncio.run(faire_toutes_les_requetes_sans_bloquer())
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
I used the new asyncio.run() function (Python 3.7 and up) to run the single main task. This creates a dedicated loop for that top-level coroutine and runs it until complete.
Next, you need to move the closing ) parenthesis on the await resp.json() expression:
uid = (await response.json())['uuid']
You want to access the 'uuid' key on the result of the await, not the coroutine that response.json() produces.
With those changes your code works, but the asyncio version finishes in sub-second time; you may want to print microseconds:
exec_time = (datetime.datetime.now() - start).total_seconds()
print(f"Pour faire 10 requêtes, ça prend {exec_time:.3f}s\n")
On my machine, the synchronous requests code in about 4-5 seconds, and the asycio code completes in under .5 seconds.

Do not use loop.run_until_complete call inside async function. The purpose for that method is to run an async function inside sync context. Anyway here's how you should change the code:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
loop = asyncio.get_event_loop()
loop.run_until_complete(faire_toutes_les_requetes_sans_bloquer())
Note that alone faire_toutes_les_requetes_sans_bloquer() call creates a future that has to be either awaited via explicit await (for that you have to be inside async context) or passed to some event loop. When left alone Python complains about that. In your original code you do none of that.

Not sure if this was the issue for you, but for me the response from the coroutine was another coroutine, so my code started warning me (note not actually crashing) I had creating coroutines that weren't being called. After I actually called them (although I didn't realy use the response the error went away).
Note main code I added was:
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
inspired after I saw:
response: str = await content_from_url[0]
Full code:
"""
-- Notes from [1]
Threading and asyncio both run on a single processor and therefore only run one at a time [1]. It's cooperative concurrency.
Note: threads.py has a very good block with good defintions for io-bound, cpu-bound if you need to recall it.
Note: coroutine is an important definition to understand before proceeding. Definition provided at the end of this tutorial.
General idea for asyncio is that there is a general event loop that controls how and when each tasks gets run.
The event loop is aware of each task and knows what states they are in.
For simplicitly of exponsition assume there are only two states:
a) Ready state
b) Waiting state
a) indicates that a task has work to do and can be run - while b) indicates that a task is waiting for a response from an
external thing (e.g. io, printer, disk, network, coq, etc). This simplified event loop has two lists of tasks
(ready_to_run_lst, waiting_lst) and runs things from the ready to run list. Once a task runs it is in complete control
until it cooperatively hands back control to the event loop.
The way it works is that the task that was ran does what it needs to do (usually an io operation, or an interleaved op
or something like that) but crucially it gives control back to the event loop when the running task (with control) thinks is best.
(Note that this means the task might not have fully completed getting what is "fully needs".
This is probably useful when the user whats to implement the interleaving himself.)
Once the task cooperatively gives back control to the event loop it is placed by the event loop in either the
ready to run list or waiting list (depending how fast the io ran, etc). Then the event loop goes through the waiting
loop to see if anything waiting has "returned".
Once all the tasks have been sorted into the right list the event loop is able to choose what to run next (e.g. by
choosing the one that has been waiting to be ran the longest). This repeats until the event loop code you wrote is done.
The crucial point (and distinction with threads) that we want to emphasizes is that in asyncio, an operation is never
interrupted in the middle and every switching/interleaving is done deliberately by the programmer.
In a way you don't have to worry about making your code thread safe.
For more details see [2], [3].
Asyncio syntax:
i) await = this is where the code you wrote calls an expensive function (e.g. an io) and thus hands back control to the
event loop. Then the event loop will likely put it in the waiting loop and runs some other task. Likely eventually
the event loop comes back to this function and runs the remaining code given that we have the value from the io now.
await = the key word that does (mainly) two things 1) gives control back to the event loop to see if there is something
else to run if we called it on a real expensive io operation (e.g. calling network, printer, etc) 2) gives control to
the new coroutine (code that might give up control copperatively) that it is awaiting. If this is your own code with async
then it means it will go into this new async function (coroutine) you defined.
No real async benefits are being experienced until you call (await) a real io e.g. asyncio.sleep is the typical debug example.
todo: clarify, I think await doesn't actually give control back to the event loop but instead runs the "coroutine" this
await is pointing too. This means that if it's a real IO then it will actually give it back to the event loop
to do something else. In this case it is actually doing something "in parallel" in the async way.
Otherwise, it is your own python coroutine and thus gives it the control but "no true async parallelism" happens.
iii) async = approximately a flag that tells python the defined function might use await. This is not strictly true but
it gives you a simple model while your getting started. todo - clarify async.
async = defines a coroutine. This doesn't define a real io, it only defines a function that can give up and give the
execution power to other coroutines or the (asyncio) event loop.
todo - context manager with async
ii) awaiting = when you call something (e.g. a function) that usually requires waiting for the io response/return/value.
todo: though it seems it's also the python keyword to give control to a coroutine you wrote in python or give
control to the event loop assuming your awaiting an actual io call.
iv) async with = this creates a context manager from an object you would normally await - i.e. an object you would
wait to get the return value from an io. So usually we swap out (switch) from this object.
todo - e.g.
Note: - any function that calls await needs to be marked with async or you’ll get a syntax error otherwise.
- a task never gives up control without intentionally doing so e.g. never in the middle of an op.
Cons: - note how this also requires more thinking carefully (but feels less dangerous than threading due to no pre-emptive
switching) due to the concurrency. Another disadvantage is again the idisocyncracies of using this in python + learning
new syntax and details for it to actually work.
- understanding the semanics of new syntax + learning where to really put the syntax to avoid semantic errors.
- we needed a special asycio compatible lib for requests, since the normal requests is not designed to inform
the event loop that it's block (or done blocking)
- if one of the tasks doesn't cooperate properly then the whole code can be a mess and slow it down.
- not all libraries support the async IO paradigm in python (e.g. asyncio, trio, etc).
Pro: + despite learning where to put await and async might be annoying it forces your to think carefully about your code
which on itself can be an advantage (e.g. better, faster, less bugs due to thinking carefully)
+ often faster...? (skeptical)
1. https://realpython.com/python-concurrency/
2. https://realpython.com/async-io-python/
3. https://stackoverflow.com/a/51116910/6843734
todo - read [2] later (or [3] but thats not a tutorial and its more details so perhaps not a priority).
asynchronous = 1) dictionary def: not happening at the same time
e.g. happening indepedently 2) computing def: happening independently of the main program flow
couroutine = are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed.
So basically it's a routine/"function" that can give up control in "a controlled way" (i.e. not randomly like with threads).
Usually they are associated with a single process -- so it's concurrent but not parallel.
Interesting note: Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.
Likely we have an event loop in this document as an example. I guess yield and operators too are good examples!
Interesting contrast with subroutines: Subroutines are special cases of coroutines.[3] When subroutines are invoked, execution begins at the start,
and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations.
By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine;
from the coroutine's point of view, it is not exiting but calling another coroutine.
Coroutines are very similar to threads. However, coroutines are cooperatively multitasked, whereas threads are typically preemptively multitasked.
event loop = event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program.
Appendix:
For I/O-bound problems, there’s a general rule of thumb in the Python community:
“Use asyncio when you can, threading when you must.”
asyncio can provide the best speed up for this type of program, but sometimes you will require critical libraries that
have not been ported to take advantage of asyncio.
Remember that any task that doesn’t give up control to the event loop will block all of the other tasks
-- Notes from [2]
see asyncio_example2.py file.
The sync fil should have taken longer e.g. in one run the async file took:
Downloaded 160 sites in 0.4063692092895508 seconds
While the sync option took:
Downloaded 160 in 3.351937770843506 seconds
"""
import asyncio
from asyncio import Task
from asyncio.events import AbstractEventLoop
import aiohttp
from aiohttp import ClientResponse
from aiohttp.client import ClientSession
from typing import Coroutine
import time
async def download_site(session: ClientSession, url: str) -> str:
async with session.get(url) as response:
print(f"Read {response.content_length} from {url}")
return response.text()
async def download_all_sites(sites: list[str]) -> list[str]:
# async with = this creates a context manager from an object you would normally await - i.e. an object you would wait to get the return value from an io. So usually we swap out (switch) from this object.
async with aiohttp.ClientSession() as session: # we will usually away session.FUNCS
# create all the download code a coroutines/task to be later managed/run by the event loop
tasks: list[Task] = []
for url in sites:
# creates a task from a coroutine todo: basically it seems it creates a callable coroutine? (i.e. function that is able to give up control cooperatively or runs an external io and also thus gives back control cooperatively to the event loop). read more? https://stackoverflow.com/questions/36342899/asyncio-ensure-future-vs-baseeventloop-create-task-vs-simple-coroutine
task: Task = asyncio.ensure_future(download_site(session, url))
tasks.append(task)
# runs tasks/coroutines in the event loop and aggrates the results. todo: does this halt until all coroutines have returned? I think so due to the paridgm of how async code works.
content_from_url: list[ClientResponse.text] = await asyncio.gather(*tasks, return_exceptions=True)
assert isinstance(content_from_url[0], Coroutine) # note allresponses are coroutines
print(f'result after aggregating/doing all coroutine tasks/jobs = {content_from_url=}')
# this is needed since the response is in a coroutine object for some reason
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
print(f'result after getting response from coroutines that hold the text = {content_from_url_as_str=}')
return content_from_url_as_str
if __name__ == "__main__":
# - args
num_sites: int = 80
sites: list[str] = ["https://www.jython.org", "http://olympus.realpython.org/dice"] * num_sites
start_time: float = time.time()
# - run the same 160 tasks but without async paradigm, should be slower!
# note: you can't actually do this here because you have the async definitions to your functions.
# to test the synchronous version see the synchronous.py file. Then compare the two run times.
# await download_all_sites(sites)
# download_all_sites(sites)
# - Execute the coroutine coro and return the result.
asyncio.run(download_all_sites(sites))
# - run event loop manager and run all tasks with cooperative concurrency
# asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
# makes explicit the creation of the event loop that manages the coroutines & external ios
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# asyncio.run(download_all_sites(sites))
# making creating the coroutine that hasn't been ran yet with it's args explicit
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# download_all_sites_coroutine: Coroutine = download_all_sites(sites)
# asyncio.run(download_all_sites_coroutine)
# - print stats about the content download and duration
duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")
print('Success.\a')

Python run_in_executor and forget?

How can I set a blocking function to be run in a executor, in a way that the result doesn't matter, so the main thread shouldn't wait or be slowed by it.
To be honest I'm not sure if this is even the right solution for it, all I want is to have some type of processing queue separated from the main process so that it doesn't block the server application from returning requests, as this type of web server runs one worker for many requests.
Preferably I would like to keep away from solutions like Celery, but if that's the most optimal I would be willing to learn it.
The context here is a async web server that generates pdf files with large images.
app = Sanic()
#App "global" worker
executor = ProcessPoolExecutor(max_workers=5)
app.route('/')
async def getPdf(request):
asyncio.create_task(renderPdfsInExecutor(request.json))
#This should be returned "instantly" regardless of pdf generation time
return response.text('Pdf being generated, it will be sent to your email when done')
async def renderPdfsInExecutor(json):
asyncio.get_running_loop.run_in_executor(executor, syncRenderPdfs, json)
def syncRenderPdfs(json)
#Some PDF Library that downloads images synchronously
pdfs = somePdfLibrary.generatePdfsFromJson(json)
sendToDefaultMail(pdfs)
The above code gives the error (Yes, it is running as admin) :
PermissionError [WinError 5] Access denied
Future exception was never retrieved
Bonus question: Do I gain anything by running a asyncio loop inside the executor? So that if it is handling several PDF requests at once it will distribute the processing between them. If yes, how do I do it?

Ok, so first of all there is a misunderstanding. This
async def getPdf(request):
asyncio.create_task(renderPdfsInExecutor(request.json))
...
async def renderPdfsInExecutor(json):
asyncio.get_running_loop.run_in_executor(executor, syncRenderPdfs, json)
is redundant. It is enough to do
async def getPdf(request):
asyncio.get_running_loop.run_in_executor(executor, syncRenderPdfs, request.json)
...
or (since you don't want to await) even better
async def getPdf(request):
executor.submit(syncRenderPdfs, request.json)
...
Now the problem you get is because syncRenderPdfs throws PermissionError. It is not handled so Python warns you "Hey, some background code threw an error. But the code is not owned by anyone so what the heck?". That's why you get Future exception was never retrieved. You have a problem with the pdf library itself, not with asyncio. Once you fix that inner problem it is also a good idea to be safe:
def syncRenderPdfs(json)
try:
#Some PDF Library that downloads images synchronously
pdfs = somePdfLibrary.generatePdfsFromJson(json)
sendToDefaultMail(pdfs)
except Exception:
logger.exception('Something went wrong') # or whatever
Your "permission denied" issue is a whole different thing and you should debug it and/or post a separate question for that.
As for the final question: yes, executor will queue and evenly distribute tasks between workers.
EDIT: As we've talked in comments the actual problem might be with the Windows environment you work on. Or more precisely with the ProcessPoolExecutor, i.e. spawning processes may change permissions. I advice using ThreadPoolExecutor, assuming it works fine on the platform.

You can look at asyncio.gather(*tasks) to run multiple in parallel.
Remember that parallel tasks only work well if they are io bound and not blocking.
An example from python docs (https://docs.python.org/3/library/asyncio-task.html):
import asyncio
async def factorial(name, number):
f = 1
for i in range(2, number + 1):
print(f"Task {name}: Compute factorial({number}), currently i={i}...")
await asyncio.sleep(1)
f *= i
print(f"Task {name}: factorial({number}) = {f}")
return f
async def main():
# Schedule three calls *concurrently*:
L = await asyncio.gather(
factorial("A", 2),
factorial("B", 3),
factorial("C", 4),
)
print(L)
asyncio.run(main())

Python asyncio ensure_future decorator

Let's assume I'm new to asyncio. I'm using async/await to parallelize my current project, and I've found myself passing all of my coroutines to asyncio.ensure_future. Lots of stuff like this:
coroutine = my_async_fn(*args, **kwargs)
task = asyncio.ensure_future(coroutine)
What I'd really like is for a call to an async function to return an executing task instead of an idle coroutine. I created a decorator to accomplish what I'm trying to do.
def make_task(fn):
def wrapper(*args, **kwargs):
return asyncio.ensure_future(fn(*args, **kwargs))
return wrapper
#make_task
async def my_async_func(*args, **kwargs):
# usually making a request of some sort
pass
Does asyncio have a built-in way of doing this I haven't been able to find? Am I using asyncio wrong if I'm lead to this problem to begin with?

asyncio had #task decorator in very early pre-released versions but we removed it.
The reason is that decorator has no knowledge what loop to use.
asyncio don't instantiate a loop on import, moreover test suite usually creates a new loop per test for sake of test isolation.

Does asyncio have a built-in way of doing this I haven't been able to
find?
No, asyncio doesn't have decorator to cast coroutine-functions into tasks.
Am I using asyncio wrong if I'm lead to this problem to begin with?
It's hard to say without seeing what you're doing, but I think it may happen to be true. While creating tasks is usual operation in asyncio programs I doubt you created this much coroutines that should be tasks always.
Awaiting for coroutine - is a way to "call some function asynchronously", but blocking current execution flow until it finished:
await some()
# you'll reach this line *only* when some() done
Task on the other hand - is a way to "run function in background", it won't block current execution flow:
task = asyncio.ensure_future(some())
# you'll reach this line immediately
When we write asyncio programs we usually need first way since we usually need result of some operation before starting next one:
text = await request(url)
links = parse_links(text) # we need to reach this line only when we got 'text'
Creating task on the other hand usually means that following further code doesn't depend of task's result. But again it doesn't happening always.
Since ensure_future returns immediately some people try to use it as a way to run some coroutines concurently:
# wrong way to run concurrently:
asyncio.ensure_future(request(url1))
asyncio.ensure_future(request(url2))
asyncio.ensure_future(request(url3))
Correct way to achieve this is to use asyncio.gather:
# correct way to run concurrently:
await asyncio.gather(
request(url1),
request(url2),
request(url3),
)
May be this is what you want?
Upd:
I think using tasks in your case is a good idea. But I don't think you should use decorator: coroutine functionality (to make request) still is a separate part from it's concrete usage detail (it will be used as task). If requests synchronization controlling is separate from their's main functionalities it's also make sense to move synchronization into separate function. I would do something like this:
import asyncio
async def request(i):
print(f'{i} started')
await asyncio.sleep(i)
print(f'{i} finished')
return i
async def when_ready(conditions, coro_to_start):
await asyncio.gather(*conditions, return_exceptions=True)
return await coro_to_start
async def main():
t = asyncio.ensure_future
t1 = t(request(1))
t2 = t(request(2))
t3 = t(request(3))
t4 = t(when_ready([t1, t2], request(4)))
t5 = t(when_ready([t2, t3], request(5)))
await asyncio.gather(t1, t2, t3, t4, t5)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

jupyter notebooks-safe asyncio run wrapper method for a library - python

Related

asyncio - How do I add a new async task to my "main" loop from inside sync code executed with run_in_executor?

How to make Python async functions callable in Jupyter without await

Learning asyncio: "coroutine was never awaited" warning error

Python run_in_executor and forget?

Python asyncio ensure_future decorator

Categories

Resources