Python asyncio task got bad yield - python

I am confused about how to play around with the asyncio module in Python 3.4. I have a searching API for a search engine, and want to each search request to be run either parallel, or asynchronously, so that I don't have to wait for one search finish to start another.
Here is my high-level searching API to build some objects with the raw search results. The search engine itself is using some kind of asyncio mechanism, so I won't bother with that.
# No asyncio module used here now
class search(object):
...
self.s = some_search_engine()
...
def searching(self, *args, **kwargs):
ret = {}
# do some raw searching according to args and kwargs and build the wrapped results
...
return ret
To try to async the requests, I wrote following test case to test how I can interact my stuff with the asyncio module.
# Here is my testing script
#asyncio.coroutine
def handle(f, *args, **kwargs):
r = yield from f(*args, **kwargs)
return r
s = search()
loop = asyncio.get_event_loop()
loop.run_until_complete(handle(s.searching, arg1, arg2, ...))
loop.close()
By running with pytest, it will return a RuntimeError: Task got bad yield : {results from searching...}, when it hits the line r = yield from ....
I also tried another way.
# same handle as above
def handle(..):
....
s = search()
loop = asyncio.get_event_loop()
tasks = [
asyncio.async(handle(s.searching, arg11, arg12, ...)),
asyncio.async(handle(s.searching, arg21, arg22, ...)),
...
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
By running this test case by pytest, it passes but some weird exception from the search engine will raise. And it says Future/Task exception was never retrieved.
Things I wish to ask:
For my 1st try, is that the right way to use yield from, by returning the actual result from a function call?
I think I need to add some sleep to my 2nd test case to wait for the task finish, but how should I do that? And how can I get my function calls to return in my 2nd test case?
Is that a good way to implement asyncio with an existing module, by creating an async handler to handle requests?
If the answer to question 2 is NO, does every client calls to the class search needs to include loop = get_event_loop() this kind of stuffs to async the requests?

The problem is that you can't just call existing synchronous code as if it was an asyncio.coroutine and get asynchronous behavior. When you call yield from searching(...), you're only going to get asynchronous behavior if searching itself is actually an asyncio.coroutine, or at least returns an asyncio.Future. Right now, searching is just a regular synchronous function, so calling yield from searching(...) is just going to throw an error, because it doesn't return a Future or coroutine.
To get the behavior you want, you'll need to have an asynchronous version of searching in addition to a synchronous version (or just drop the synchronous version altogether if you don't need it). You have a few options to support both:
Rewrite searching as an asyncio.coroutine that it uses asyncio-compatible calls to do its I/O, rather than blocking I/O. This will make it work in an asyncio context, but it means you won't be able to call it directly in a synchronous context anymore. Instead, you'd need to also provide an alternative synchronous searching method that starts an asyncio event loop and calls return loop.run_until_complete(self.searching(...)). See this question for more details on that.
Keep your synchronous implementation of searching, and provide an alternative asynchronous API that uses BaseEventLoop.run_in_executor to run your the searching method in a background thread:
class search(object):
...
self.s = some_search_engine()
...
def searching(self, *args, **kwargs):
ret = {}
...
return ret
#asyncio.coroutine
def searching_async(self, *args, **kwargs):
loop = kwargs.get('loop', asyncio.get_event_loop())
try:
del kwargs['loop'] # assuming searching doesn't take loop as an arg
except KeyError:
pass
r = yield from loop.run_in_executor(None, self.searching, *args) # Passing None tells asyncio to use the default ThreadPoolExecutor
return r
Testing script:
s = search()
loop = asyncio.get_event_loop()
loop.run_until_complete(s.searching_async(arg1, arg2, ...))
loop.close()
This way, you can keep your synchronous code as is, and at least provide methods that can be used in asyncio code without blocking the event loop. It's not as clean a solution as it would be if you actually used asynchronous I/O in your code, but its better than nothing.
Provide two completely separate versions of searching, one that uses blocking I/O, and one that's asyncio-compatible. This gives ideal implementations for both contexts, but requires twice the work.

Related

Python asyncio ensure_future decorator

Let's assume I'm new to asyncio. I'm using async/await to parallelize my current project, and I've found myself passing all of my coroutines to asyncio.ensure_future. Lots of stuff like this:
coroutine = my_async_fn(*args, **kwargs)
task = asyncio.ensure_future(coroutine)
What I'd really like is for a call to an async function to return an executing task instead of an idle coroutine. I created a decorator to accomplish what I'm trying to do.
def make_task(fn):
def wrapper(*args, **kwargs):
return asyncio.ensure_future(fn(*args, **kwargs))
return wrapper
#make_task
async def my_async_func(*args, **kwargs):
# usually making a request of some sort
pass
Does asyncio have a built-in way of doing this I haven't been able to find? Am I using asyncio wrong if I'm lead to this problem to begin with?
asyncio had #task decorator in very early pre-released versions but we removed it.
The reason is that decorator has no knowledge what loop to use.
asyncio don't instantiate a loop on import, moreover test suite usually creates a new loop per test for sake of test isolation.
Does asyncio have a built-in way of doing this I haven't been able to
find?
No, asyncio doesn't have decorator to cast coroutine-functions into tasks.
Am I using asyncio wrong if I'm lead to this problem to begin with?
It's hard to say without seeing what you're doing, but I think it may happen to be true. While creating tasks is usual operation in asyncio programs I doubt you created this much coroutines that should be tasks always.
Awaiting for coroutine - is a way to "call some function asynchronously", but blocking current execution flow until it finished:
await some()
# you'll reach this line *only* when some() done
Task on the other hand - is a way to "run function in background", it won't block current execution flow:
task = asyncio.ensure_future(some())
# you'll reach this line immediately
When we write asyncio programs we usually need first way since we usually need result of some operation before starting next one:
text = await request(url)
links = parse_links(text) # we need to reach this line only when we got 'text'
Creating task on the other hand usually means that following further code doesn't depend of task's result. But again it doesn't happening always.
Since ensure_future returns immediately some people try to use it as a way to run some coroutines concurently:
# wrong way to run concurrently:
asyncio.ensure_future(request(url1))
asyncio.ensure_future(request(url2))
asyncio.ensure_future(request(url3))
Correct way to achieve this is to use asyncio.gather:
# correct way to run concurrently:
await asyncio.gather(
request(url1),
request(url2),
request(url3),
)
May be this is what you want?
Upd:
I think using tasks in your case is a good idea. But I don't think you should use decorator: coroutine functionality (to make request) still is a separate part from it's concrete usage detail (it will be used as task). If requests synchronization controlling is separate from their's main functionalities it's also make sense to move synchronization into separate function. I would do something like this:
import asyncio
async def request(i):
print(f'{i} started')
await asyncio.sleep(i)
print(f'{i} finished')
return i
async def when_ready(conditions, coro_to_start):
await asyncio.gather(*conditions, return_exceptions=True)
return await coro_to_start
async def main():
t = asyncio.ensure_future
t1 = t(request(1))
t2 = t(request(2))
t3 = t(request(3))
t4 = t(when_ready([t1, t2], request(4)))
t5 = t(when_ready([t2, t3], request(5)))
await asyncio.gather(t1, t2, t3, t4, t5)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()

multiple nonblocking tasks using asyncio and aiohttp

I am trying to perform several non blocking tasks with asyncio and aiohttp and I don't think the way I am doing it is efficient. I think it would be best to use await instead of yield. can anyone help?
def_init__(self):
self.event_loop = asyncio.get_event_loop()
def run(self):
tasks = [
asyncio.ensure_future(self.subscribe()),
asyncio.ensure_future(self.getServer()),]
self.event_loop.run_until_complete(asyncio.gather(*tasks))
try:
self.event_loop.run_forever()
#asyncio.coroutine
def getServer(self):
server = yield from self.event_loop.create_server(handler, ip, port)
return server
#asyncio.coroutine
def sunbscribe(self):
while True:
yield from asyncio.sleep(10)
self.sendNotification(self.sub.recieve())
def sendNotification(msg):
# send message as a client
I have to listen to a server and subscribe to listen to broadcasts and depending on the broadcasted message POST to a different server.
According to the PEP 492:
await , similarly to yield from , suspends execution of read_data
coroutine until db.fetch awaitable completes and returns the result
data.
It uses the yield from implementation with an extra step of validating
its argument. await only accepts an awaitable , which can be one of:
So I don't see an efficiency problem in your code, as they use the same implementation.
However, I do wonder why you return the server but never use it.
The main design mistake I see in your code is that you use both:
self.event_loop.run_until_complete(asyncio.gather(*tasks))
try:
self.event_loop.run_forever()
From what I can see you just need the run_forever()
Some extra tips:
In my implementations using asyncio I usually make sure that the loop is closed in case of error, or this can cause a massive leak depending on your app type.
try:
loop.run_until_complete(asyncio.gather(*tasks))
finally: # close the loop no matter what or you leak FDs
loop.close()
I also use Uvloop instead of the builtin one, according to benchmarks it's much more efficient.
import uvloop
...
loop = uvloop.new_event_loop()
asyncio.set_event_loop(loop)
Await will not be more efficient than yield from. It may be more pythonic, but
async def foo():
await some_future
and
#asyncio.coroutine
def foo()
yield from some_future
are approximately the same. Certainly in terms of efficiency, they are very close. Await is implemented using logic very similar to yield from. (There's an additional method call to await involved, but that is typically lost in the noise)
In terms of efficiency, removing the explicit sleep and polling in your subscribe method seems like the primary target in this design. Rather than sleeping for a fixed period of time it would be better to get a future that indicates when the receive call will succeed and only running subscribe's task when receive has data.

Make my own function as asyncio function in python

I would like to use asyncio module in Python to achieve doing request tasks in parallel because my current request tasks works in sequence, which means it is blocking.
I have read the documents of asyncio module in Python, and I have wrote some simple code as follows, however it doesn't work as I thought.
import asyncio
class Demo(object):
def demo(self):
loop = asyncio.get_event_loop()
tasks = [task1.verison(), task2.verison()]
result = loop.run_until_complete(asyncio.wait(tasks))
loop.close()
print(result)
class Task():
#asyncio.coroutine
def version(self):
print('before')
result = yield from differenttask.GetVersion()
# result = yield from asyncio.sleep(1)
print('after')
I found out that all the example they give use asyncio function to make the non-blocking works, how to make own function works as a asyncio?
What I want to achieve is that for a task it will execute the request and doesn't wait the response then it switch to next task. When I tried this: I get RuntimeError: Task got bad yield: 'hostname', which hostname is one item in my expected result.
so as #AndrewSvetlov said, differentask.GetVersion() is a regular synchronous function. I have tried the second method suggested in similar post, --- the one Keep your synchronous implementation of searching...blabla
#asyncio.coroutine
def version(self):
return (yield from asyncio.get_event_loop().run_in_executor(None, self._proxy.GetVersion()))
And it still doesn't work, Now the error is
Task exception was never retrieved
future: <Task finished coro=<Task.version() done, defined at /root/syi.py:34> exception=TypeError("'dict' object is not callable",)>
I'm not sure if I understand if it right, please advice.
Change to
#asyncio.coroutine
def version(self):
return (yield from asyncio.get_event_loop()
.run_in_executor(None, self._proxy.GetVersion))
Please pay attention self._proxy.GetVersion is not called here but a reference to function is passed into the loop executor.
Now all IO performed by GetVersion() is still synchronous but executed in a thread pool.
It may have benefits for you or may not.
If the whole program uses thread pool based solution only you need concurrent.futures.ThreadPool perhaps, not asyncio.
If the most part of the application is built on top of asynchronous libraries but only relative small part uses thread pools -- that's fine.

gevent to Tornado ioloop - Structure code with coroutines/generators

I'm trying to convert some fairly straightforward gevent code to use the async facilities of Tornado. The sample code below uses the ZMQ library to do a very simple request-response.
import zmq.green as zmq
def fun():
i = zmq.Context.instance()
sock = i.socket(zmq.REQ)
sock.connect('tcp://localhost:9005')
sock.send('Ping')
return sock.recv()
I can run this as fun() anywhere in my code. The .recv() call blocks while waiting for a reply, and the gevent hub can schedule the other parts of the code. When values are received, the function returns the value.
I read the problems that can arise with these implicit returns, and I want to run this using the Tornado IOLoop (also because I want to run it within the IPython Notebook). The following is an option, where recv_future() returns a Future that contains the result:
#gen.coroutine
def fun():
i = zmq.Context.instance()
sock = i.socket(zmq.REQ)
sock.connect('tcp://localhost:9005')
sock.send('Ping')
msg = yield recv_future(sock)
print "Received {}".format(msg[0])
raise gen.Return(msg)
def recv_future(socket):
zmqstream = ZMQStream(socket) # Required for ZMQ
future = Future()
def _finish(reply):
future.set_result(reply)
zmqstream.on_recv(_finish)
return future
The problem is that now fun() is not a function, but is a generator. So if I need to call it from another function, I need to use yield fun(). But then the calling function also becomes a generator!
What is the right way to structure code that uses Python generators? Do I have to make every function a generator to make it work? What if I need to call one of these functions from __init__()? Should that also become a generator?
What if I need to call one of these functions from __init__()? Should
that also become a generator?
This is one of the currently unsolved issues with explicit asynchronous programming with yield /yield from (on Python 3.3+). Magic methods don't support them. You can read some interesting thoughts from a Python core developer on asynchronous programming that touches on this issue here.
What is the right way to structure code that uses Python generators?
Do I have to make every function a generator to make it work?
Not every function, but every function that you want to call a coroutine, and wait for that coroutine to finish before continuing. When you switch to an explicit asynchronous programming model, you generally want to go all-in with it - your entire program runs inside the tornado ioloop. So, with this toy example, you would just do:
from tornado.ioloop import IOLoop
from tornado.gen import coroutine
from tornado.concurrent import Future
#gen.coroutine
def fun():
i = zmq.Context.instance()
sock = i.socket(zmq.REQ)
sock.connect('tcp://localhost:9005')
sock.send('Ping')
msg = yield recv_future(sock)
print "Received {}".format(msg[0])
raise gen.Return(msg)
def recv_future(socket):
zmqstream = ZMQStream(socket) # Required for ZMQ
future = Future()
def _finish(reply):
future.set_result(reply)
zmqstream.on_recv(_finish)
return future
if __name__ == "__main__":
ioloop = IOLoop.instance()
ioloop.add_callback(fun)
ioloop.start() # This will run fun, and then block forever.
#ioloop.run_sync(fun) # This will start the ioloop, run fun, then stop the ioloop
It looks like you might be able to get access to the ioloop IPython is using via the IPython.kernel API:
In [4]: from IPython.kernel.ioloop import manager
In [5]: manager.ioloop.IOLoop.instance()
Out[5]: <zmq.eventloop.ioloop.ZMQIOLoop at 0x4249ac8>

Simple async example with tornado python

I want find simple async server example.
I have got some function with lot of wait, database transactions ... etc:
def blocking_task(n):
for i in xrange(n):
print i
sleep(1)
return i
I need run it function in separated process without blocking. Is it possible?
Tornado is designed to run all your operations in a single thread, but utilize asynchronous I/O to avoid blocking as much as possible. If the DB you're using has asychronous Python bindings (ideally ones geared for Tornado specifically, like Motor for MongoDB or momoko for Postgres), then you'll be able to run your DB queries without blocking the server; no separate processes or threads needed.
To address the exact example you gave, where time.sleep(1) is called, you could use this approach to do it asynchronously via tornado coroutines:
#!/usr/bin/python
import tornado.web
from tornado.ioloop import IOLoop
from tornado import gen
import time
#gen.coroutine
def async_sleep(seconds):
yield gen.Task(IOLoop.instance().add_timeout, time.time() + seconds)
class TestHandler(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
for i in xrange(100):
print i
yield async_sleep(1)
self.write(str(i))
self.finish()
application = tornado.web.Application([
(r"/test", TestHandler),
])
application.listen(9999)
IOLoop.instance().start()
The interesting part is async_sleep. That method is creating an asynchronous Task, which is calling the ioloop.add_timeout method. add_timeout will run a specified callback after a given number of seconds, without blocking the ioloop while waiting for the timeout to expire. It expects two arguments:
add_timeout(deadline, callback) # deadline is the number of seconds to wait, callback is the method to call after deadline.
As you can see in the example above, we're only actually providing one parameter to add_timeout explicitly in the code, which means we end up this this:
add_timeout(time.time() + seconds, ???)
We're not providing the expected callback parameter. In fact, when gen.Task executes add_timeout, it appends a callback keyword argument to the end of the explicitly provided parameters. So this:
yield gen.Task(loop.add_timeout, time.time() + seconds)
Results in this being executed inside gen.Task():
loop.add_timeout(time.time() + seconds, callback=gen.Callback(some_unique_key))
When gen.Callback is executed after the timeout, it signals that the gen.Task is complete, and the program execution will continue on to the next line. This flow is kind of difficult to fully understand, at least at first (it certainly was for me when I first read about it). It'll probably be helpful to read over the Tornado gen module documentation a few times.
import tornado.web
from tornado.ioloop import IOLoop
from tornado import gen
from tornado.concurrent import run_on_executor
from concurrent.futures import ThreadPoolExecutor # `pip install futures` for python2
MAX_WORKERS = 16
class TestHandler(tornado.web.RequestHandler):
executor = ThreadPoolExecutor(max_workers=MAX_WORKERS)
"""
In below function goes your time consuming task
"""
#run_on_executor
def background_task(self):
sm = 0
for i in range(10 ** 8):
sm = sm + 1
return sm
#tornado.gen.coroutine
def get(self):
""" Request that asynchronously calls background task. """
res = yield self.background_task()
self.write(str(res))
class TestHandler2(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
self.write('Response from server')
self.finish()
application = tornado.web.Application([
(r"/A", TestHandler),
(r"/B", TestHandler2),
])
application.listen(5000)
IOLoop.instance().start()
When you run above code, you can run a computationally expensive operation at http://127.0.0.1:5000/A , which does not block execution, see by visiting http://127.0.0.1:5000/B immediately after you visit http://127.0.0.1:5000/A.
Here I update the information about Tornado 5.0. Tornado 5.0 add a new method IOLoop.run_in_executor. In the "Calling blocking functions" of Coroutine patterns Chapter:
The simplest way to call a blocking function from a coroutine is to use IOLoop.run_in_executor, which returns Futures that are compatible with coroutines:
#gen.coroutine
def call_blocking():
yield IOLoop.current().run_in_executor(blocking_func, args)
Also, in the documeng of run_on_executor, is says:
This decorator should not be confused with the similarly-named IOLoop.run_in_executor. In general, using run_in_executor when calling a blocking method is recommended instead of using this decorator when defining a method. If compatibility with older versions of Tornado is required, consider defining an executor and using executor.submit() at the call site.
In 5.0 version, IOLoop.run_in_executor is recommanded in use case of Calling blocking functions.
Python 3.5 introduced the async and await keywords (functions using these keywords are also called “native coroutines”). For compatibility with older versions of Python, you can use “decorated” or “yield-based” coroutines using the tornado.gen.coroutine decorator.
Native coroutines are the recommended form whenever possible. Only use decorated coroutines when compatibility with older versions of Python is required. Examples in the Tornado documentation will generally use the native form.
Translation between the two forms is generally straightforward:
# Decorated: # Native:
# Normal function declaration
# with decorator # "async def" keywords
#gen.coroutine
def a(): async def a():
# "yield" all async funcs # "await" all async funcs
b = yield c() b = await c()
# "return" and "yield"
# cannot be mixed in
# Python 2, so raise a
# special exception. # Return normally
raise gen.Return(b) return b
Other differences between the two forms of coroutine are outlined below.
Native coroutines:
are generally faster.
can use async for and async with statements which make some patterns much simpler.
do not run at all unless you await or yield them. Decorated coroutines can start running “in the background” as soon as they are called. Note that for both kinds of coroutines it is important to use await or yield so that any exceptions have somewhere to go.
Decorated coroutines:
have additional integration with the concurrent.futures package, allowing the result of executor.submit to be yielded directly. For native coroutines, use IOLoop.run_in_executor instead.
support some shorthand for waiting on multiple objects by yielding a list or dict. Use tornado.gen.multi to do this in native coroutines.
can support integration with other packages including Twisted via a registry of conversion functions. To access this functionality in native coroutines, use tornado.gen.convert_yielded.
always return a Future object. Native coroutines return an awaitable object that is not a Future. In Tornado the two are mostly interchangeable.
Worth to see:
Simplest async/await example

Categories

Resources