I've written a library of objects, many which make HTTP / IO calls. I've been looking at moving over to asyncio due to the mounting overheads, but I don't want to rewrite the underlying code.
I've been hoping to wrap asyncio around my code in order to perform functions asynchronously without replacing all of my deep / low level code with await / yield.
I began by attempting the following:
async def my_function1(some_object, some_params):
#Lots of existing code which uses existing objects
#No await statements
return output_data
async def my_function2():
#Does more stuff
while True:
loop = asyncio.get_event_loop()
tasks = my_function(some_object, some_params), my_function2()
output_data = loop.run_until_complete(asyncio.gather(*tasks))
print(output_data)
I quickly realised that while this code runs, nothing actually happens asynchronously, the functions complete synchronously. I'm very new to asynchronous programming, but I think this is because neither of my functions are using the keyword await or yield and thus these functions are not cooroutines, and do not yield, thus do not provide an opportunity to move to a different cooroutine. Please correct me if I am wrong.
My question is, is it possible to wrap complex functions (where deep within they make HTTP / IO calls ) in an asyncio await keyword, e.g.
async def my_function():
print("Welcome to my function")
data = await bigSlowFunction()
UPDATE - Following Karlson's Answer
Following and thanks to Karlsons accepted answer, I used the following code which works nicely:
from concurrent.futures import ThreadPoolExecutor
import time
#Some vars
a_var_1 = 0
a_var_2 = 10
pool = ThreadPoolExecutor(3)
future = pool.submit(my_big_function, object, a_var_1, a_var_2)
while not future.done() :
print("Waiting for future...")
time.sleep(0.01)
print("Future done")
print(future.result())
This works really nicely, and the future.done() / sleep loop gives you an idea of how many CPU cycles you get to use by going async.
The short answer is, you can't have the benefits of asyncio without explicitly marking the points in your code where control may be passed back to the event loop. This is done by turning your IO heavy functions into coroutines, just like you assumed.
Without changing existing code you might achieve your goal with greenlets (have a look at eventlet or gevent).
Another possibility would be to make use of Python's Future implementation wrapping and passing calls to your already written functions to some ThreadPoolExecutor and yield the resulting Future. Be aware, that this comes with all the caveats of multi-threaded programming, though.
Something along the lines of
from concurrent.futures import ThreadPoolExecutor
from thinair import big_slow_function
executor = ThreadPoolExecutor(max_workers=5)
async def big_slow_coroutine():
await executor.submit(big_slow_function)
As of python 3.9 you can wrap a blocking (non-async) function in a coroutine to make it awaitable using asyncio.to_thread(). The exampe given in the official documentation is:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
This seems like a more joined up approach than using concurrent.futures to make a coroutine, but I haven't tested it extensively.
Related
I am trying to learn to use asyncio in Python to optimize scripts.
My example returns a coroutine was never awaited warning, can you help to understand and find how to solve it?
import time
import datetime
import random
import asyncio
import aiohttp
import requests
def requete_bloquante(num):
print(f'Get {num}')
uid = requests.get("https://httpbin.org/uuid").json()['uuid']
print(f"Res {num}: {uid}")
def faire_toutes_les_requetes():
for x in range(10):
requete_bloquante(x)
print("Bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
async def requete_sans_bloquer(num, session):
print(f'Get {num}')
async with session.get("https://httpbin.org/uuid") as response:
uid = (await response.json()['uuid'])
print(f"Res {num}: {uid}")
async def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
loop.run_until_complete(asyncio.gather(*futures))
loop.close()
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes_sans_bloquer()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
The first classic part of the code runs correctly, but the second half only produces:
synchronicite.py:43: RuntimeWarning: coroutine 'faire_toutes_les_requetes_sans_bloquer' was never awaited
You made faire_toutes_les_requetes_sans_bloquer an awaitable function, a coroutine, by using async def.
When you call an awaitable function, you create a new coroutine object. The code inside the function won't run until you then await on the function or run it as a task:
>>> async def foo():
... print("Running the foo coroutine")
...
>>> foo()
<coroutine object foo at 0x10b186348>
>>> import asyncio
>>> asyncio.run(foo())
Running the foo coroutine
You want to keep that function synchronous, because you don't start the loop until inside that function:
def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
# ...
loop.close()
print("Fin de la boucle !")
However, you are also trying to use a aiophttp.ClientSession() object, and that's an asynchronous context manager, you are expected to use it with async with, not just with, and so has to be run in aside an awaitable task. If you use with instead of async with a TypeError("Use async with instead") exception will be raised.
That all means you need to move the loop.run_until_complete() call out of your faire_toutes_les_requetes_sans_bloquer() function, so you can keep that as the main task to be run; you can call and await on asycio.gather() directly then:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
asyncio.run(faire_toutes_les_requetes_sans_bloquer())
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
I used the new asyncio.run() function (Python 3.7 and up) to run the single main task. This creates a dedicated loop for that top-level coroutine and runs it until complete.
Next, you need to move the closing ) parenthesis on the await resp.json() expression:
uid = (await response.json())['uuid']
You want to access the 'uuid' key on the result of the await, not the coroutine that response.json() produces.
With those changes your code works, but the asyncio version finishes in sub-second time; you may want to print microseconds:
exec_time = (datetime.datetime.now() - start).total_seconds()
print(f"Pour faire 10 requêtes, ça prend {exec_time:.3f}s\n")
On my machine, the synchronous requests code in about 4-5 seconds, and the asycio code completes in under .5 seconds.
Do not use loop.run_until_complete call inside async function. The purpose for that method is to run an async function inside sync context. Anyway here's how you should change the code:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
loop = asyncio.get_event_loop()
loop.run_until_complete(faire_toutes_les_requetes_sans_bloquer())
Note that alone faire_toutes_les_requetes_sans_bloquer() call creates a future that has to be either awaited via explicit await (for that you have to be inside async context) or passed to some event loop. When left alone Python complains about that. In your original code you do none of that.
Not sure if this was the issue for you, but for me the response from the coroutine was another coroutine, so my code started warning me (note not actually crashing) I had creating coroutines that weren't being called. After I actually called them (although I didn't realy use the response the error went away).
Note main code I added was:
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
inspired after I saw:
response: str = await content_from_url[0]
Full code:
"""
-- Notes from [1]
Threading and asyncio both run on a single processor and therefore only run one at a time [1]. It's cooperative concurrency.
Note: threads.py has a very good block with good defintions for io-bound, cpu-bound if you need to recall it.
Note: coroutine is an important definition to understand before proceeding. Definition provided at the end of this tutorial.
General idea for asyncio is that there is a general event loop that controls how and when each tasks gets run.
The event loop is aware of each task and knows what states they are in.
For simplicitly of exponsition assume there are only two states:
a) Ready state
b) Waiting state
a) indicates that a task has work to do and can be run - while b) indicates that a task is waiting for a response from an
external thing (e.g. io, printer, disk, network, coq, etc). This simplified event loop has two lists of tasks
(ready_to_run_lst, waiting_lst) and runs things from the ready to run list. Once a task runs it is in complete control
until it cooperatively hands back control to the event loop.
The way it works is that the task that was ran does what it needs to do (usually an io operation, or an interleaved op
or something like that) but crucially it gives control back to the event loop when the running task (with control) thinks is best.
(Note that this means the task might not have fully completed getting what is "fully needs".
This is probably useful when the user whats to implement the interleaving himself.)
Once the task cooperatively gives back control to the event loop it is placed by the event loop in either the
ready to run list or waiting list (depending how fast the io ran, etc). Then the event loop goes through the waiting
loop to see if anything waiting has "returned".
Once all the tasks have been sorted into the right list the event loop is able to choose what to run next (e.g. by
choosing the one that has been waiting to be ran the longest). This repeats until the event loop code you wrote is done.
The crucial point (and distinction with threads) that we want to emphasizes is that in asyncio, an operation is never
interrupted in the middle and every switching/interleaving is done deliberately by the programmer.
In a way you don't have to worry about making your code thread safe.
For more details see [2], [3].
Asyncio syntax:
i) await = this is where the code you wrote calls an expensive function (e.g. an io) and thus hands back control to the
event loop. Then the event loop will likely put it in the waiting loop and runs some other task. Likely eventually
the event loop comes back to this function and runs the remaining code given that we have the value from the io now.
await = the key word that does (mainly) two things 1) gives control back to the event loop to see if there is something
else to run if we called it on a real expensive io operation (e.g. calling network, printer, etc) 2) gives control to
the new coroutine (code that might give up control copperatively) that it is awaiting. If this is your own code with async
then it means it will go into this new async function (coroutine) you defined.
No real async benefits are being experienced until you call (await) a real io e.g. asyncio.sleep is the typical debug example.
todo: clarify, I think await doesn't actually give control back to the event loop but instead runs the "coroutine" this
await is pointing too. This means that if it's a real IO then it will actually give it back to the event loop
to do something else. In this case it is actually doing something "in parallel" in the async way.
Otherwise, it is your own python coroutine and thus gives it the control but "no true async parallelism" happens.
iii) async = approximately a flag that tells python the defined function might use await. This is not strictly true but
it gives you a simple model while your getting started. todo - clarify async.
async = defines a coroutine. This doesn't define a real io, it only defines a function that can give up and give the
execution power to other coroutines or the (asyncio) event loop.
todo - context manager with async
ii) awaiting = when you call something (e.g. a function) that usually requires waiting for the io response/return/value.
todo: though it seems it's also the python keyword to give control to a coroutine you wrote in python or give
control to the event loop assuming your awaiting an actual io call.
iv) async with = this creates a context manager from an object you would normally await - i.e. an object you would
wait to get the return value from an io. So usually we swap out (switch) from this object.
todo - e.g.
Note: - any function that calls await needs to be marked with async or you’ll get a syntax error otherwise.
- a task never gives up control without intentionally doing so e.g. never in the middle of an op.
Cons: - note how this also requires more thinking carefully (but feels less dangerous than threading due to no pre-emptive
switching) due to the concurrency. Another disadvantage is again the idisocyncracies of using this in python + learning
new syntax and details for it to actually work.
- understanding the semanics of new syntax + learning where to really put the syntax to avoid semantic errors.
- we needed a special asycio compatible lib for requests, since the normal requests is not designed to inform
the event loop that it's block (or done blocking)
- if one of the tasks doesn't cooperate properly then the whole code can be a mess and slow it down.
- not all libraries support the async IO paradigm in python (e.g. asyncio, trio, etc).
Pro: + despite learning where to put await and async might be annoying it forces your to think carefully about your code
which on itself can be an advantage (e.g. better, faster, less bugs due to thinking carefully)
+ often faster...? (skeptical)
1. https://realpython.com/python-concurrency/
2. https://realpython.com/async-io-python/
3. https://stackoverflow.com/a/51116910/6843734
todo - read [2] later (or [3] but thats not a tutorial and its more details so perhaps not a priority).
asynchronous = 1) dictionary def: not happening at the same time
e.g. happening indepedently 2) computing def: happening independently of the main program flow
couroutine = are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed.
So basically it's a routine/"function" that can give up control in "a controlled way" (i.e. not randomly like with threads).
Usually they are associated with a single process -- so it's concurrent but not parallel.
Interesting note: Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.
Likely we have an event loop in this document as an example. I guess yield and operators too are good examples!
Interesting contrast with subroutines: Subroutines are special cases of coroutines.[3] When subroutines are invoked, execution begins at the start,
and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations.
By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine;
from the coroutine's point of view, it is not exiting but calling another coroutine.
Coroutines are very similar to threads. However, coroutines are cooperatively multitasked, whereas threads are typically preemptively multitasked.
event loop = event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program.
Appendix:
For I/O-bound problems, there’s a general rule of thumb in the Python community:
“Use asyncio when you can, threading when you must.”
asyncio can provide the best speed up for this type of program, but sometimes you will require critical libraries that
have not been ported to take advantage of asyncio.
Remember that any task that doesn’t give up control to the event loop will block all of the other tasks
-- Notes from [2]
see asyncio_example2.py file.
The sync fil should have taken longer e.g. in one run the async file took:
Downloaded 160 sites in 0.4063692092895508 seconds
While the sync option took:
Downloaded 160 in 3.351937770843506 seconds
"""
import asyncio
from asyncio import Task
from asyncio.events import AbstractEventLoop
import aiohttp
from aiohttp import ClientResponse
from aiohttp.client import ClientSession
from typing import Coroutine
import time
async def download_site(session: ClientSession, url: str) -> str:
async with session.get(url) as response:
print(f"Read {response.content_length} from {url}")
return response.text()
async def download_all_sites(sites: list[str]) -> list[str]:
# async with = this creates a context manager from an object you would normally await - i.e. an object you would wait to get the return value from an io. So usually we swap out (switch) from this object.
async with aiohttp.ClientSession() as session: # we will usually away session.FUNCS
# create all the download code a coroutines/task to be later managed/run by the event loop
tasks: list[Task] = []
for url in sites:
# creates a task from a coroutine todo: basically it seems it creates a callable coroutine? (i.e. function that is able to give up control cooperatively or runs an external io and also thus gives back control cooperatively to the event loop). read more? https://stackoverflow.com/questions/36342899/asyncio-ensure-future-vs-baseeventloop-create-task-vs-simple-coroutine
task: Task = asyncio.ensure_future(download_site(session, url))
tasks.append(task)
# runs tasks/coroutines in the event loop and aggrates the results. todo: does this halt until all coroutines have returned? I think so due to the paridgm of how async code works.
content_from_url: list[ClientResponse.text] = await asyncio.gather(*tasks, return_exceptions=True)
assert isinstance(content_from_url[0], Coroutine) # note allresponses are coroutines
print(f'result after aggregating/doing all coroutine tasks/jobs = {content_from_url=}')
# this is needed since the response is in a coroutine object for some reason
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
print(f'result after getting response from coroutines that hold the text = {content_from_url_as_str=}')
return content_from_url_as_str
if __name__ == "__main__":
# - args
num_sites: int = 80
sites: list[str] = ["https://www.jython.org", "http://olympus.realpython.org/dice"] * num_sites
start_time: float = time.time()
# - run the same 160 tasks but without async paradigm, should be slower!
# note: you can't actually do this here because you have the async definitions to your functions.
# to test the synchronous version see the synchronous.py file. Then compare the two run times.
# await download_all_sites(sites)
# download_all_sites(sites)
# - Execute the coroutine coro and return the result.
asyncio.run(download_all_sites(sites))
# - run event loop manager and run all tasks with cooperative concurrency
# asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
# makes explicit the creation of the event loop that manages the coroutines & external ios
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# asyncio.run(download_all_sites(sites))
# making creating the coroutine that hasn't been ran yet with it's args explicit
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# download_all_sites_coroutine: Coroutine = download_all_sites(sites)
# asyncio.run(download_all_sites_coroutine)
# - print stats about the content download and duration
duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")
print('Success.\a')
Let's assume I'm new to asyncio. I'm using async/await to parallelize my current project, and I've found myself passing all of my coroutines to asyncio.ensure_future. Lots of stuff like this:
coroutine = my_async_fn(*args, **kwargs)
task = asyncio.ensure_future(coroutine)
What I'd really like is for a call to an async function to return an executing task instead of an idle coroutine. I created a decorator to accomplish what I'm trying to do.
def make_task(fn):
def wrapper(*args, **kwargs):
return asyncio.ensure_future(fn(*args, **kwargs))
return wrapper
#make_task
async def my_async_func(*args, **kwargs):
# usually making a request of some sort
pass
Does asyncio have a built-in way of doing this I haven't been able to find? Am I using asyncio wrong if I'm lead to this problem to begin with?
asyncio had #task decorator in very early pre-released versions but we removed it.
The reason is that decorator has no knowledge what loop to use.
asyncio don't instantiate a loop on import, moreover test suite usually creates a new loop per test for sake of test isolation.
Does asyncio have a built-in way of doing this I haven't been able to
find?
No, asyncio doesn't have decorator to cast coroutine-functions into tasks.
Am I using asyncio wrong if I'm lead to this problem to begin with?
It's hard to say without seeing what you're doing, but I think it may happen to be true. While creating tasks is usual operation in asyncio programs I doubt you created this much coroutines that should be tasks always.
Awaiting for coroutine - is a way to "call some function asynchronously", but blocking current execution flow until it finished:
await some()
# you'll reach this line *only* when some() done
Task on the other hand - is a way to "run function in background", it won't block current execution flow:
task = asyncio.ensure_future(some())
# you'll reach this line immediately
When we write asyncio programs we usually need first way since we usually need result of some operation before starting next one:
text = await request(url)
links = parse_links(text) # we need to reach this line only when we got 'text'
Creating task on the other hand usually means that following further code doesn't depend of task's result. But again it doesn't happening always.
Since ensure_future returns immediately some people try to use it as a way to run some coroutines concurently:
# wrong way to run concurrently:
asyncio.ensure_future(request(url1))
asyncio.ensure_future(request(url2))
asyncio.ensure_future(request(url3))
Correct way to achieve this is to use asyncio.gather:
# correct way to run concurrently:
await asyncio.gather(
request(url1),
request(url2),
request(url3),
)
May be this is what you want?
Upd:
I think using tasks in your case is a good idea. But I don't think you should use decorator: coroutine functionality (to make request) still is a separate part from it's concrete usage detail (it will be used as task). If requests synchronization controlling is separate from their's main functionalities it's also make sense to move synchronization into separate function. I would do something like this:
import asyncio
async def request(i):
print(f'{i} started')
await asyncio.sleep(i)
print(f'{i} finished')
return i
async def when_ready(conditions, coro_to_start):
await asyncio.gather(*conditions, return_exceptions=True)
return await coro_to_start
async def main():
t = asyncio.ensure_future
t1 = t(request(1))
t2 = t(request(2))
t3 = t(request(3))
t4 = t(when_ready([t1, t2], request(4)))
t5 = t(when_ready([t2, t3], request(5)))
await asyncio.gather(t1, t2, t3, t4, t5)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()
I am trying to perform several non blocking tasks with asyncio and aiohttp and I don't think the way I am doing it is efficient. I think it would be best to use await instead of yield. can anyone help?
def_init__(self):
self.event_loop = asyncio.get_event_loop()
def run(self):
tasks = [
asyncio.ensure_future(self.subscribe()),
asyncio.ensure_future(self.getServer()),]
self.event_loop.run_until_complete(asyncio.gather(*tasks))
try:
self.event_loop.run_forever()
#asyncio.coroutine
def getServer(self):
server = yield from self.event_loop.create_server(handler, ip, port)
return server
#asyncio.coroutine
def sunbscribe(self):
while True:
yield from asyncio.sleep(10)
self.sendNotification(self.sub.recieve())
def sendNotification(msg):
# send message as a client
I have to listen to a server and subscribe to listen to broadcasts and depending on the broadcasted message POST to a different server.
According to the PEP 492:
await , similarly to yield from , suspends execution of read_data
coroutine until db.fetch awaitable completes and returns the result
data.
It uses the yield from implementation with an extra step of validating
its argument. await only accepts an awaitable , which can be one of:
So I don't see an efficiency problem in your code, as they use the same implementation.
However, I do wonder why you return the server but never use it.
The main design mistake I see in your code is that you use both:
self.event_loop.run_until_complete(asyncio.gather(*tasks))
try:
self.event_loop.run_forever()
From what I can see you just need the run_forever()
Some extra tips:
In my implementations using asyncio I usually make sure that the loop is closed in case of error, or this can cause a massive leak depending on your app type.
try:
loop.run_until_complete(asyncio.gather(*tasks))
finally: # close the loop no matter what or you leak FDs
loop.close()
I also use Uvloop instead of the builtin one, according to benchmarks it's much more efficient.
import uvloop
...
loop = uvloop.new_event_loop()
asyncio.set_event_loop(loop)
Await will not be more efficient than yield from. It may be more pythonic, but
async def foo():
await some_future
and
#asyncio.coroutine
def foo()
yield from some_future
are approximately the same. Certainly in terms of efficiency, they are very close. Await is implemented using logic very similar to yield from. (There's an additional method call to await involved, but that is typically lost in the noise)
In terms of efficiency, removing the explicit sleep and polling in your subscribe method seems like the primary target in this design. Rather than sleeping for a fixed period of time it would be better to get a future that indicates when the receive call will succeed and only running subscribe's task when receive has data.
I would like to use asyncio module in Python to achieve doing request tasks in parallel because my current request tasks works in sequence, which means it is blocking.
I have read the documents of asyncio module in Python, and I have wrote some simple code as follows, however it doesn't work as I thought.
import asyncio
class Demo(object):
def demo(self):
loop = asyncio.get_event_loop()
tasks = [task1.verison(), task2.verison()]
result = loop.run_until_complete(asyncio.wait(tasks))
loop.close()
print(result)
class Task():
#asyncio.coroutine
def version(self):
print('before')
result = yield from differenttask.GetVersion()
# result = yield from asyncio.sleep(1)
print('after')
I found out that all the example they give use asyncio function to make the non-blocking works, how to make own function works as a asyncio?
What I want to achieve is that for a task it will execute the request and doesn't wait the response then it switch to next task. When I tried this: I get RuntimeError: Task got bad yield: 'hostname', which hostname is one item in my expected result.
so as #AndrewSvetlov said, differentask.GetVersion() is a regular synchronous function. I have tried the second method suggested in similar post, --- the one Keep your synchronous implementation of searching...blabla
#asyncio.coroutine
def version(self):
return (yield from asyncio.get_event_loop().run_in_executor(None, self._proxy.GetVersion()))
And it still doesn't work, Now the error is
Task exception was never retrieved
future: <Task finished coro=<Task.version() done, defined at /root/syi.py:34> exception=TypeError("'dict' object is not callable",)>
I'm not sure if I understand if it right, please advice.
Change to
#asyncio.coroutine
def version(self):
return (yield from asyncio.get_event_loop()
.run_in_executor(None, self._proxy.GetVersion))
Please pay attention self._proxy.GetVersion is not called here but a reference to function is passed into the loop executor.
Now all IO performed by GetVersion() is still synchronous but executed in a thread pool.
It may have benefits for you or may not.
If the whole program uses thread pool based solution only you need concurrent.futures.ThreadPool perhaps, not asyncio.
If the most part of the application is built on top of asynchronous libraries but only relative small part uses thread pools -- that's fine.
I have successfully built a RESTful microservice with Python asyncio and aiohttp that listens to a POST event to collect realtime events from various feeders.
It then builds an in-memory structure to cache the last 24h of events in a nested defaultdict/deque structure.
Now I would like to periodically checkpoint that structure to disc, preferably using pickle.
Since the memory structure can be >100MB I would like to avoid holding up my incoming event processing for the time it takes to checkpoint the structure.
I'd rather create a snapshot copy (e.g. deepcopy) of the structure and then take my time to write it to disk and repeat on a preset time interval.
I have been searching for examples on how to combine threads (and is a thread even the best solution for this?) and asyncio for that purpose but could not find something that would help me.
Any pointers to get started are much appreciated!
It's pretty simple to delegate a method to a thread or sub-process using BaseEventLoop.run_in_executor:
import asyncio
import time
from concurrent.futures import ProcessPoolExecutor
def cpu_bound_operation(x):
time.sleep(x) # This is some operation that is CPU-bound
#asyncio.coroutine
def main():
# Run cpu_bound_operation in the ProcessPoolExecutor
# This will make your coroutine block, but won't block
# the event loop; other coroutines can run in meantime.
yield from loop.run_in_executor(p, cpu_bound_operation, 5)
loop = asyncio.get_event_loop()
p = ProcessPoolExecutor(2) # Create a ProcessPool with 2 processes
loop.run_until_complete(main())
As for whether to use a ProcessPoolExecutor or ThreadPoolExecutor, that's kind of hard to say; pickling a large object will definitely eat some CPU cycles, which initially would make you think ProcessPoolExecutor is the way to go. However, passing your 100MB object to a Process in the pool would require pickling the instance in your main process, sending the bytes to the child process via IPC, unpickling it in the child, and then pickling it again so you can write it to disk. Given that, my guess is the pickling/unpickling overhead will be large enough that you're better off using a ThreadPoolExecutor, even though you're going to take a performance hit because of the GIL.
That said, it's very simple to test both ways and find out for sure, so you might as well do that.
I also used run_in_executor, but I found this function kinda gross under most circumstances, since it requires partial() for keyword args and I'm never calling it with anything other than a single executor and the default event loop. So I made a convenience wrapper around it with sensible defaults and automatic keyword argument handling.
from time import sleep
import asyncio as aio
loop = aio.get_event_loop()
class Executor:
"""In most cases, you can just use the 'execute' instance as a
function, i.e. y = await execute(f, a, b, k=c) => run f(a, b, k=c) in
the executor, assign result to y. The defaults can be changed, though,
with your own instantiation of Executor, i.e. execute =
Executor(nthreads=4)"""
def __init__(self, loop=loop, nthreads=1):
from concurrent.futures import ThreadPoolExecutor
self._ex = ThreadPoolExecutor(nthreads)
self._loop = loop
def __call__(self, f, *args, **kw):
from functools import partial
return self._loop.run_in_executor(self._ex, partial(f, *args, **kw))
execute = Executor()
...
def cpu_bound_operation(t, alpha=30):
sleep(t)
return 20*alpha
async def main():
y = await execute(cpu_bound_operation, 5, alpha=-2)
loop.run_until_complete(main())
Another alternative is to use loop.call_soon_threadsafe along with an asyncio.Queue as the intermediate channel of communication.
The current documentation for Python 3 also has a section on Developing with asyncio - Concurrency and Multithreading:
import asyncio
# This method represents your blocking code
def blocking(loop, queue):
import time
while True:
loop.call_soon_threadsafe(queue.put_nowait, 'Blocking A')
time.sleep(2)
loop.call_soon_threadsafe(queue.put_nowait, 'Blocking B')
time.sleep(2)
# This method represents your async code
async def nonblocking(queue):
await asyncio.sleep(1)
while True:
queue.put_nowait('Non-blocking A')
await asyncio.sleep(2)
queue.put_nowait('Non-blocking B')
await asyncio.sleep(2)
# The main sets up the queue as the communication channel and synchronizes them
async def main():
queue = asyncio.Queue()
loop = asyncio.get_running_loop()
blocking_fut = loop.run_in_executor(None, blocking, loop, queue)
nonblocking_task = loop.create_task(nonblocking(queue))
running = True # use whatever exit condition
while running:
# Get messages from both blocking and non-blocking in parallel
message = await queue.get()
# You could send any messages, and do anything you want with them
print(message)
asyncio.run(main())
How to send asyncio tasks to loop running in other thread may also help you.
If you need a more "powerful" example, check out my Wrapper to launch async tasks from threaded code. It will handle the thread safety part for you (for the most part) and let you do things like this:
# See https://gist.github.com/Lonami/3f79ed774d2e0100ded5b171a47f2caf for the full example
async def async_main(queue):
# your async code can go here
while True:
command = await queue.get()
if command.id == 'print':
print('Hello from async!')
elif command.id == 'double':
await queue.put(command.data * 2)
with LaunchAsync(async_main) as queue:
# your threaded code can go here
queue.put(Command('print'))
queue.put(Command('double', 7))
response = queue.get(timeout=1)
print('The result of doubling 7 is', response)