I want to use http client as a class member, but del function could not call await client.aclose().
e.g.:
import httpx
class Foo(object):
def __init__(self):
self.client = httpx.AsyncClient()
def __del__(self):
await self.client.aclose()
refer: https://www.python-httpx.org/async/#opening-and-closing-clients
how to safely aclose?
Although this is an older question, I may have something compelling to share as I had a similar situation. To #Isabi's point (Answered 2020-12-28), you need to use an event loop to decouple the client from your operations and then manually control it's lifecycle.
In my case, I need more control over the client such that I can separate the Request from the sending and when the client is closed so I can take advantage of session pooling, etc. The example provided below shows how to use http.AsyncClient as a class member and close it on exit.
In figuring this out, I bumped into an Asyncio learning curve but quickly discovered that it's ... actually not too bad. It's not as clean as Go[lang] but it starts making sense after an hour or two of fiddling around with it. Full disclosure: I still question whether this is 100% correct.
The critical pieces are in the __init__, close, and the __del__ methods. What, to me, remains to be answered, is whether using a the http.AsyncClient in a context manager actually resets connections, etc. I can only assume it does because that's what makes sense to me. I can't help but wonder: is this even necessary?
import asyncio
import httpx
import time
from typing import Callable, List
from rich import print
class DadJokes:
headers = dict(Accept='application/json')
def __init__(self):
"""
Since we want to reuse the client, we can't use a context manager that closes it.
We need to use a loop to exert more control over when the client is closed.
"""
self.client = httpx.AsyncClient(headers=self.headers)
self.loop = asyncio.get_event_loop()
async def close(self):
# httpx.AsyncClient.aclose must be awaited!
await self.client.aclose()
def __del__(self):
"""
A destructor is provided to ensure that the client and the event loop are closed at exit.
"""
# Use the loop to call async close, then stop/close loop.
self.loop.run_until_complete(self.close())
self.loop.close()
async def _get(self, url: str, idx: int = None):
start = time.time()
response = await self.client.get(url)
print(response.json(), int((time.time() - start) * 1000), idx)
def get(self, url: str):
self.loop.run_until_complete(self._get(url))
def get_many(self, urls: List[str]):
start = time.time()
group = asyncio.gather(*(self._get(url, idx=idx) for idx, url in enumerate(urls)))
self.loop.run_until_complete(group)
print("Runtime: ", int((time.time() - start) * 1000))
url = 'https://www.icanhazdadjoke.com'
dj = DadJokes()
dj.get_many([url for x in range(4)])
Since I've been using Go as of late, I originally wrote some of these methods with closures as they seemed to make sense; in the end I was able to (IMHO) provide a nice balance in between separation / encapsulation / isolation by converting the closures to class methods.
The resulting usage interface feels approachable and easy to read - I see myself writing class based async moving forward.
The problem might be due to the fact that client.aclose() returns an awaitable, which cannot be awaited in a normal def function.
It could be worth giving a try with asyncio.run(self.client.aclose()). Here it might occur an exception, complaining that you are using a different event loop (or the same, I don't know much of your context so I can't tell) from currently running one. In this case you could get the currently running event loop and run the function from there.
See https://docs.python.org/3/library/asyncio-eventloop.html for more information on how you could accomplish it.
Related
I'm building a library that leverages asyncio internally.
While the user shouldn't be aware of it, the internal implementation currently wraps the async code with the asyncio.run() porcelain wrapper.
However, some users will be executing this library code from a jupyter notebook, and I'm struggling to replace the asyncio.run() with a wrapper that's safe for either environment.
Here's what I've tried:
ASYNC_IO_NO_RUNNING_LOOP_MSG = 'no running event loop'
def jupyter_safe_run_coroutine(async_coroutine, _test_mode: bool = False)
try:
loop = asyncio.get_running_loop()
task = loop.create_task(async_coroutine)
result = loop.run_until_complete(task) # <- fails as loop is already running
# OR
asyncio.wait_for(task, timeout=None, loop=loop) # <- fails as this is an async method
result = task.result()
except RuntimeError as e:
if _test_mode:
raise e
if ASYNC_IO_NO_RUNNING_LOOP_MSG in str(e):
return asyncio.run(async_coroutine)
except Exception as e:
raise e
Requirements
We use python 3.8, so we can't use asyncio.Runner context manager
We can't use threading, so the solution suggested here would not work
Problem:
How can I wait/await for the async_coroutine, or the task/future provided by loop.create_task(async_coroutine) to be completed?
None of the methods above actually do the waiting, and for the reasons stated in the comments.
Update
I've found this nest_asyncio library that's built to solve this problem exactly:
ASYNC_IO_NO_RUNNING_LOOP_MSG = 'no running event loop'
HAS_BEEN_RUN = False
def jupyter_safe_run_coroutine(async_coroutine, _test_mode: bool = False):
global HAS_BEEN_RUN
if not HAS_BEEN_RUN:
_apply_nested_asyncio_patch()
HAS_BEEN_RUN = True
return asyncio.run(async_coroutine)
def _apply_nested_asyncio_patch():
try:
loop = asyncio.get_running_loop()
logger.info(f'as get_running_loop() returned {loop}, this environment has it`s own event loop.\n'
f'Patching with nest_asyncio')
import nest_asyncio
nest_asyncio.apply()
except RuntimeError as e:
if ASYNC_IO_NO_RUNNING_LOOP_MSG in str(e):
logger.info(f'as get_running_loop() raised {e}, this environment does not have it`s own event loop.\n'
f'No patching necessary')
else:
raise e
Still, there are some issues I'm facing with it:
As per this SO answer, there might be starvation issues
Any logs written in the async_coroutine are not printed in the jupyter notebook
The jupyter notebook kernel occasionally crashes upon completion of the task
Edit
For context, the library internally calls external APIs for data enrichment of a user-provided dataframe:
# user code using the library
import my_lib
df = pd.DataFrame(data='some data')
enriched_df = my_lib.enrich(df)
It's usually a good idea to expose the asynchronous function. This way you will give your users more flexibility.
If some of your users can't (or don't want to) use asynchronous calls to your functions, they will be able to call the async function using asyncio.run(your_function()). Or in the rare situation where they have an event loop running but can't make async calls they could use the create_task + add_one_callback method described here. (I really have no idea why such a use case may happen, but for the sake of the argument I included it.)
Hidding the asynchronous interface from your users is not the best idea because it limits their capabilities. They will probably fork your package to patch it and make the exposed function async or call the hidden async function directly. None of which is good news for you (harder to document / track bugs). I would really suggest to stick to the simplest solution and provide the async functions as the main entry points.
Suppose the following package code followed by 3 different usage of it:
async def package_code():
return "package"
Client codes
Typical clients will probably just use it this way:
async def client_code_a():
print(await package_code())
# asyncio.run(client_code_a())
For some people, the following might make sense. For example if your package is the only asynchronous thing they will ever use. Or maybe they are not yet confortable using async code (these you can probably convince to try client_code_a instead):
def client_code_b():
print(asyncio.run(package_code()))
# client_code_b()
The very few (I'm tempted to say none):
async def client_code_c():
# asyncio.run() cannot be called from a running event loop:
# print(asyncio.run(package_code()))
loop = asyncio.get_running_loop()
task = loop.create_task(package_code())
task.add_done_callback(lambda t: print(t.result()))
# asyncio.run(client_code_c())
I'm still not sure to understand what your goal is, but I'll describe with code what I tried to explain in my comment so you can tell me where your issue lies in the following.
If you package requests the user to call some functions (your_package_function in the example) that take coroutines as arguments, then you shouldn't worry about the event loop.
That means the package shouldn't call asyncio.run nor loop.run_until_complete. The client should (in almost all cases) be responsible for starting the even loop.
Your package code should assume there is an event loop running. Since I don't know your package's goal I just made a function that feeds a "test" argument to any coroutine the client is passing:
import asyncio
async def your_package_function(coroutine):
print("- Package internals start")
task = asyncio.create_task(coroutine("test"))
await asyncio.sleep(.5) # Simulates slow tasks within your package
print("- Package internals completed other task")
x = await task
print("- Package internals end")
return x
The client (package user) should then call the following:
async def main():
x = await your_package_function(return_with_delay)
print(f"Computed value = {x}")
async def return_with_delay(value):
print("+ User function start")
await asyncio.sleep(.2)
print("+ User function end")
return value
await main()
# or asyncio.run(main()) if needed
This would print:
- Package internals start
- Package internals completed other task
+ User function start
+ User function end
- Package internals end
Computed value = test
My ThreadPoolExecutor/gen.coroutine(tornado v4.x) solution to circumvent blocking the webserver is not working anymore with tornado version 6.x.
A while back I started to develop an online Browser game using a Tornado webserver(v4.x) and websockets. Whenever user input is expected, the game would send the question to the client and wait for the response. Back than i used gen.coroutine and a ThreadPoolExecutor to make this task non-blocking. Now that I started refactoring the game, it is not working with tornado v6.x and the task is blocking the server again. I searched for possible solutions, but so far i have been unable to get it working again. It is not clear to me how to change my existing code to be non-blocking again.
server.py:
class PlayerWebSocket(tornado.websocket.WebSocketHandler):
executor = ThreadPoolExecutor(max_workers=15)
#run_on_executor
def on_message(self,message):
params = message.split(':')
self.player.callbacks[int(params[0])]=params[1]
if __name__ == '__main__':
application = Application()
application.listen(9999)
tornado.ioloop.IOLoop.instance().start()
player.py:
#gen.coroutine
def send(self, message):
self.socket.write_message(message)
def create_choice(self, id, choices):
d = {}
d['id'] = id
d['choices']=choices
self.choice[d['id']]=d
self.send('update',self)
while not d['id'] in self.callbacks:
pass
del self.choice[d['id']]
return self.callbacks[d['id']]
Whenever a choice is to be made, the create_choice function creates a dict with a list (choices) and an id and stores it in the players self.callbacks. After that it just stays in the while loop until the websocket.on_message function puts the received answer (which looks like this: id:Choice_id, so for example 1:12838732) into the callbacks dict.
The WebSocketHandler.write_message method is not thread-safe, so it can only be called from the IOLoop's thread, and not from a ThreadPoolExecutor (This has always been true, but sometimes it might have seemed to work anyway).
The simplest way to fix this code is to save IOLoop.current() in a global variable from the main thread (the current() function accesses a thread-local variable so you can't call it from the thread pool) and use ioloop.add_callback(self.socket.write_message, message) (and remove #gen.coroutine from send - it doesn't do any good to make functions coroutines if they contain no yield expressions).
I am confused about how to play around with the asyncio module in Python 3.4. I have a searching API for a search engine, and want to each search request to be run either parallel, or asynchronously, so that I don't have to wait for one search finish to start another.
Here is my high-level searching API to build some objects with the raw search results. The search engine itself is using some kind of asyncio mechanism, so I won't bother with that.
# No asyncio module used here now
class search(object):
...
self.s = some_search_engine()
...
def searching(self, *args, **kwargs):
ret = {}
# do some raw searching according to args and kwargs and build the wrapped results
...
return ret
To try to async the requests, I wrote following test case to test how I can interact my stuff with the asyncio module.
# Here is my testing script
#asyncio.coroutine
def handle(f, *args, **kwargs):
r = yield from f(*args, **kwargs)
return r
s = search()
loop = asyncio.get_event_loop()
loop.run_until_complete(handle(s.searching, arg1, arg2, ...))
loop.close()
By running with pytest, it will return a RuntimeError: Task got bad yield : {results from searching...}, when it hits the line r = yield from ....
I also tried another way.
# same handle as above
def handle(..):
....
s = search()
loop = asyncio.get_event_loop()
tasks = [
asyncio.async(handle(s.searching, arg11, arg12, ...)),
asyncio.async(handle(s.searching, arg21, arg22, ...)),
...
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
By running this test case by pytest, it passes but some weird exception from the search engine will raise. And it says Future/Task exception was never retrieved.
Things I wish to ask:
For my 1st try, is that the right way to use yield from, by returning the actual result from a function call?
I think I need to add some sleep to my 2nd test case to wait for the task finish, but how should I do that? And how can I get my function calls to return in my 2nd test case?
Is that a good way to implement asyncio with an existing module, by creating an async handler to handle requests?
If the answer to question 2 is NO, does every client calls to the class search needs to include loop = get_event_loop() this kind of stuffs to async the requests?
The problem is that you can't just call existing synchronous code as if it was an asyncio.coroutine and get asynchronous behavior. When you call yield from searching(...), you're only going to get asynchronous behavior if searching itself is actually an asyncio.coroutine, or at least returns an asyncio.Future. Right now, searching is just a regular synchronous function, so calling yield from searching(...) is just going to throw an error, because it doesn't return a Future or coroutine.
To get the behavior you want, you'll need to have an asynchronous version of searching in addition to a synchronous version (or just drop the synchronous version altogether if you don't need it). You have a few options to support both:
Rewrite searching as an asyncio.coroutine that it uses asyncio-compatible calls to do its I/O, rather than blocking I/O. This will make it work in an asyncio context, but it means you won't be able to call it directly in a synchronous context anymore. Instead, you'd need to also provide an alternative synchronous searching method that starts an asyncio event loop and calls return loop.run_until_complete(self.searching(...)). See this question for more details on that.
Keep your synchronous implementation of searching, and provide an alternative asynchronous API that uses BaseEventLoop.run_in_executor to run your the searching method in a background thread:
class search(object):
...
self.s = some_search_engine()
...
def searching(self, *args, **kwargs):
ret = {}
...
return ret
#asyncio.coroutine
def searching_async(self, *args, **kwargs):
loop = kwargs.get('loop', asyncio.get_event_loop())
try:
del kwargs['loop'] # assuming searching doesn't take loop as an arg
except KeyError:
pass
r = yield from loop.run_in_executor(None, self.searching, *args) # Passing None tells asyncio to use the default ThreadPoolExecutor
return r
Testing script:
s = search()
loop = asyncio.get_event_loop()
loop.run_until_complete(s.searching_async(arg1, arg2, ...))
loop.close()
This way, you can keep your synchronous code as is, and at least provide methods that can be used in asyncio code without blocking the event loop. It's not as clean a solution as it would be if you actually used asynchronous I/O in your code, but its better than nothing.
Provide two completely separate versions of searching, one that uses blocking I/O, and one that's asyncio-compatible. This gives ideal implementations for both contexts, but requires twice the work.
I switched most of my threading implementations to multiprocessing today and everything went great -- except for louie dispatcher messages. Granted, that's probably not the latest publish/subscribe module, but I use it because I already have to use it with python-openzwave. I imagine this has something to do with messages not being able to be sent across processes. My question is, is there a way to do this with louie? If not -- is there a publish/subscribe message module that allows it? Thanks.
EDIT, was asked to post the code:
For example, here is a process that continually runs in the background and performs some computer/network/security checks:
The call to start the check class:
_ = utilities.Environment()
The environment class (just the init and the main function):
class Environment(object):
def __init__(self):
self.logger = logging.getLogger(genConfig.LOGGER_NAME)
self.process = multiprocessing.Process(target=self.run_tests)
self.process.daemon = True
self.process.start()
def run_tests(self):
self.zwaveReceived = False
while True:
self.comp_test()
self.net_test()
self.server_test()
self.audio_test()
self.security_test()
self.ups_test()
self.zwave_test()
time.sleep(genConfig.SYS_CHECKS_INTERVAL)
Within self.comp_test, the publish at the end (I've printed from here and know it is getting here):
if compTest > 0:
wx.CallAfter(dispatcher.send, eventConfig.SYSCHK_LISTENER, orders=eventConfig.EVT_COMP_OFF)
else:
wx.CallAfter(dispatcher.send, eventConfig.SYSCHK_LISTENER, orders=eventConfig.EVT_COMP_ON)
And one of the subscribers:
dispatcher.connect(self.flip_sys_btns, eventConfig.SYSCHK_LISTENER)
Like I said, I've print-stepped through and I get to where the publish is made, I don't get to the subscriber side. The code worked well when I was using threads, nothing has changed except I switched to multiprocessing.
I am making a web application using Python + Tornado which basically serves files to users. I have no database.
The files are either directly picked up and served if they are available, or generated on the fly if not.
I want the clients to be served in an async manner, because some files may already be available, while others need to be generated (thus they need to wait, and I don't want them to block other users).
I have a class that manages the picking or generation of files, and I just need to call it from Tornado.
What is the best way (most efficient on CPU and RAM) to achieve that? Should I use a thread? A sub process? A simple gen.Task like this one?
Also, I would like my implementation to work on Google App Engines (I think they do not allow sub processes to be spawned?).
I'm relatively new to the async web servicing, so any help is welcome.
I've found the answers to my questions: The genTask example is indeed the best way to implement an async call, and it is due to the fact that the example does use a Python coroutine, which I didn't understand at first glance because I thought yield was only used to return a value for generators.
Concrete example:
class MyHandler(tornado.web.RequestHandler):
#asynchronous
#gen.engine
def get(self):
response = yield gen.Task(self.dosomething, 'argument')
What is important here is the combination of two things:
yield , which in fact spawns a coroutine (or pseudo-thread, which is very efficient and are done to be highly concurrent-friendly).
http://www.python.org/dev/peps/pep-0342/
gen.Task() which is a non-blocking (async) function, because if you spawn a coroutine on a blocking function, it won't be async. gen.Task() is provided by Tornado, specifically to work with the coroutine syntax of Python. More infos:
http://www.tornadoweb.org/documentation/gen.html
So a canonical example of an async call in Python using coroutines:
response = yield non_blocking_func(**kwargs)
Now Documentation have solution.
Simple example:
import os.path
import tornado.web
from tornado import gen
class MyHandler(tornado.web.RequestHandler):
#gen.coroutine
def get(self, filename):
result = yield self.some_usefull_process(filename)
self.write(result)
#gen.coroutine
def some_usefull_process(self, filename):
if not os.path.exists(filename):
status = yield self.generate_file(filename)
result = 'File created'
else:
result = 'File exists'
raise gen.Return(result)
#gen.coroutine
def generate_file(self, filename):
fd = open(filename, 'w')
fd.write('created')
fd.close()