I am new to tornado and have some questions about tornado's coroutine.
if i have a call stack looks like:
func_a => func_b => func_c => func_d
and func_d is an asynchronous function and I use yield and #gen.coroutine decorator.
just like this:
#gen.coroutine
def redis_data(self, id):
ret = yield asyn_function()
raise gen.Return(ret)
Must I use yield and #gen.coroutine with func_c, func_b and func_a?
Yes, all your coroutine's callers must also be coroutines, and they must yield the result of your coroutine.
Why? No coroutine can do I/O without executing a yield statement. Look at your code: might it need to talk to the server? Then it must yield. So must its caller, and so on up the chain, so that ultimately you have yielded to the event loop. Otherwise the loop cannot make progress and the I/O does not complete.
This is both a technical requirement of coroutine code, and an advantage of coroutines over threads. You always know by looking at your code when you can be interrupted:
https://glyph.twistedmatrix.com/2014/02/unyielding.html
For more on refactoring coroutines, see:
http://emptysqua.re/blog/refactoring-tornado-coroutines/
Related
I'm transitioning from old-style coroutines (where 'yield' returns a value supplied by 'send', but
which are otherwise essentially generators) to new-style coroutines with 'async def' and 'await'.
There are a couple of things that really puzzle me.
Consider the following old-style coroutine that computes the running average of numbers supplied to
it by 'send', at each point returning the mean-so-far. (This example is from Chapter 16 of Fluent
Python by Luciano Ramalho.)
def averager():
total = 0.0
count = 0
average = None
while True:
term = yield average
total += term
count += 1
average = total/count
If I now create and prime a coroutine object, I can send it numbers and it will return the running
average:
>>> coro_avg = averager()
>>> next(coro_avg)
>>> coro_avg.send(10)
10.0
>>> coro_avg.send(30)
20.0
>>> coro_avg.send(5)
15.0
...and so forth. The question is, how would such a coroutine be written with async/await? There
are three points that confuse me. Do I understand them correctly?
1) In the old style, anybody can send numbers to the same instance of the averager. I can pass
around the value coro_avg above and every time .send(N) is called, no matter from where, N is added to the same running
total. With async/await, however, there's no way to "send in a value". Each time you 'await' a
coroutine you await a new instance with its own context, its own variable values.
2) It seems that the only way for an 'async def' coroutine to hand a value back to the thing awaiting
it is to 'return' and hence lose context. You can't call 'yield' from inside an 'async
def' coroutine (or rather if you do you've created an async generator which
can't be used with await). So an 'async def' coroutine can't compute a value and hand
it out while maintaining context, as averager does.
3) Almost the same as (1): When a coroutine calls 'await' it waits for a single, specific awaitable,
namely the argument to await. This is very unlike old-style coroutines, which give up control and
sit around waiting for anyone to send something to them.
I realize that the new coroutines are a distinct coding paradigm from the old ones: They're used
with event loops, and you use data structures like queues to have the coroutine emit a value without
returning and losing context. It's kind of unfortunate and somewhat confusing that new and old share the same
name---coroutine---given that their call/return protocols are so different.
It is possible, and perhaps instructive, to relate the two models pretty directly. Modern coroutines are actually implemented, like the old, in terms of the (generalized) iterator protocol. The difference is that return values from the iterator are automatically propagated upwards through any number of coroutine callers (via an implicit yield from), whereas actual return values are packaged into StopIteration exceptions.
The purpose of this choreography is to inform the driver (the presumed “event loop”) of the conditions under which a coroutine may be resumed. That driver may resume the coroutine from an unrelated stack frame and may send data back into the execution—via the awaited object, since it is the only channel known to the driver—again, much as send communicates transparently through a yield from.
An example of such bidirectional communication:
class Send:
def __call__(self,x): self.value=x
def __await__(self):
yield self # same object for awaiter and driver
raise StopIteration(self.value)
async def add(x):
return await Send()+x
def plus(a,b): # a driver
c=add(b)
# Equivalent to next(c.__await__())
c.send(None)(a)
try: c.send(None)
except StopIteration as si: return si.value
raise RuntimeError("Didn't resume/finish")
A real driver would of course decide to call the result of send only after recognizing it as a Send.
Practically speaking, you don’t want to drive modern coroutines yourself; they are syntax optimized for exactly the opposite approach. It would, however, be straightforward to use a queue to handle one direction of the communication (as you already noted):
async def avg(q):
n=s=0
while True:
x=await q.get()
if x is None: break
n+=1; s+=x
yield s/n
async def test():
q=asyncio.Queue()
i=iter([10,30,5])
await q.put(next(i))
async for a in avg(q):
print(a)
await q.put(next(i,None))
Providing values that way is a bit painful, but it’s easy if they’re coming from another Queue or so.
I have this Cython code (simplified):
class Callback:
async def foo(self):
print('called')
cdef void call_foo(void* callback):
print('call_foo')
asyncio.wait_for(<object>callback.foo())
async def py_call_foo():
call_foo(Callback())
async def example():
loop.run_until_complete(py_call_foo())
What happens though: I get RuntimeWarning: coroutine Callback.foo was never awaited. And, in fact, it is never called. However, call_foo is called.
Any idea what's going on / how to get it to actually wait for Callback.foo to complete?
Extended version
In the example above some important details are missing: In particular, it is really difficult to get hold of return value from call_foo. The real project setup has this:
Bison parser that has rules. Rules are given a reference to specially crafted struct, let's call it ParserState. This struct contains references to callbacks, which are called by parser when rules match.
In Cython code, there's a class, let's call it Parser, that users of the package are supposed to extend to make their custom parsers. This class has methods which then need to be called from callbacks of ParserState.
Parsing is supposed to happen like this:
async def parse_file(file, parser):
cdef ParserState state = allocate_parser_state(
rule_callbacks,
parser,
file,
)
parse_with_bison(state)
The callbacks are of a general shape:
ctypedef void(callback*)(char* text, void* parser)
I have to admit I don't know how exactly asyncio implements await, and so I don't know if it is in general possible to do this with the setup that I have. My ultimate goal though is that multiple Python functions be able to iteratively parse different files, all at the same time more or less.
TLDR:
Coroutines must be await'ed or run by an event loop. A cdef function cannot await, but it can construct and return a coroutine.
Your actual problem is mixing synchronous with asynchronous code. Case in point:
async def example():
loop.run_until_complete(py_call_foo())
This is similar to putting a subroutine in a Thread, but never starting it.
Even when started, this is a deadlock: the synchronous part would prevent the asynchronous part from running.
Asynchronous code must be awaited
An async def coroutine is similar to a def ...: yield generator: calling it only instantiates it. You must interact with it to actually run it:
def foo():
print('running!')
yield 1
bar = foo() # no output!
print(next(bar)) # prints `running!` followed by `1`
Similarly, when you have an async def coroutine, you must either await it or schedule it in an event loop. Since asyncio.wait_for produces a coroutine, and you never await or schedule it, it is not run. This is the cause of the RuntimeWarning.
Note that the purpose of putting a coroutine into asyncio.wait_for is purely to add a timeout. It produces an asynchronous wrapper which must be await'ed.
async def call_foo(callback):
print('call_foo')
await asyncio.wait_for(callback.foo(), timeout=2)
asyncio.get_event_loop().run_until_complete(call_foo(Callback()))
Asynchronous functions need asynchronous instructions
The key for asynchronous programming is that it is cooperative: Only one coroutine executes until it yields control. Afterwards, another coroutine executes until it yields control. This means that any coroutine blocking without yielding control blocks all other coroutines as well.
In general, if something performs work without an await context, it is blocking. Notably, loop.run_until_complete is blocking. You have to call it from a synchronous function:
loop = asyncio.get_event_loop()
# async def function uses await
async def py_call_foo():
await call_foo(Callback())
# non-await function is not async
def example():
loop.run_until_complete(py_call_foo())
example()
Return values from coroutines
A coroutine can return results like a regular function.
async def make_result():
await asyncio.sleep(0)
return 1
If you await it from another coroutine, you directly get the return value:
async def print_result():
result = await make_result()
print(result) # prints 1
asyncio.get_event_loop().run_until_complete(print_result())
To get the value from a coroutine inside a regular subroutine, use run_until_complete to run the coroutine:
def print_result():
result = asyncio.get_event_loop().run_until_complete(make_result())
print(result)
print_result()
A cdef/cpdef function cannot be a coroutine
Cython supports coroutines via yield from and await only for Python functions. Even for a classical coroutine, a cdef is not possible:
Error compiling Cython file:
------------------------------------------------------------
cdef call_foo(callback):
print('call_foo')
yield from asyncio.wait_for(callback.foo(), timeout=2)
^
------------------------------------------------------------
testbed.pyx:10:4: 'yield from' not supported here
You are perfectly fine calling a synchronous cdef function from a coroutine. You are perfectly fine scheduling a coroutine from a cdef function.
But you cannot await from inside a cdef function, nor await a cdef function. If you need to do that, as in your example, use a regular def function.
You can however construct and return a coroutine in a cdef function. This allows you to await the result in an outer coroutine:
# inner coroutine
async def pingpong(what):
print('pingpong', what)
await asyncio.sleep(0)
return what
# cdef layer to instantiate and return coroutine
cdef make_pingpong():
print('make_pingpong')
return pingpong('nananana')
# outer coroutine
async def play():
for i in range(3):
result = await make_pingpong()
print(i, '=>', result)
asyncio.get_event_loop().run_until_complete(play())
Note that despite the await, make_pingpong is not a coroutine. It is merely a factory for coroutines.
I've been using py3.4's generator-based coroutines and in several places I've chained them by simply having one coroutine call return inner_coroutine() (like in the example below). However, I'm now converting them to use py3.5's native coroutines and I've found that no longer works as the inner coroutine doesn't get to run (see output from running the example below). In order for the native inner coroutine to run I need to use a return await inner_coroutine() instead of the original return inner_coroutine().
I expected chaining of native coroutines to work in the same way as the generator-based ones, and can't find any documentation stating otherwise. Am I missing something or is this an actual limitation of native coroutines?
import asyncio
#asyncio.coroutine
def coro():
print("Inside coro")
#asyncio.coroutine
def outer_coro():
print("Inside outer_coro")
return coro()
async def native_coro():
print("Inside native_coro")
async def native_outer_coro():
print("Inside native_outer_coro")
# return await native_coro() # this works!
return native_coro()
loop = asyncio.get_event_loop()
loop.run_until_complete(outer_coro())
loop.run_until_complete(native_outer_coro())
And the output from running that example:
Inside outer_coro
Inside coro
Inside native_outer_coro
foo.py:26: RuntimeWarning: coroutine 'native_coro' was never awaited
loop.run_until_complete(native_outer_coro())
This is the same content as another answer, but stated in a way that I think will be easier to understand as a response to the question.
The way python determines whether something is a generator or a normal function is whether it contains a yield statement.
This creates an ambiguity with #asyncio.coroutine.
Whether your coroutine executes immediately or whether it waits until the caller calls next on the resulting generator object depends on whether your code actually happens to include a yield statement.
The native coroutines are by design unambiguously generators even if they do not happen to include any await statements.
This provides predictable behavior, but does not permit the form of chaining you are using.
You can as you pointed out do
return await inner_coroutine()
However note that in that await syntax, the inner coroutine is called while executing the outer coroutine in the event loop. However, with the generator-based approach and no yield, the inner coroutine is constructed while actually submitting the coroutine to the event loop.
In most circumstances this difference does not matter.
Your old version had wrong logic and worked only due to imperfect generator-based implementation. New syntax allowed to close this feature and make asyncio more consistent.
Idea of coroutines is to work like this:
c = coro_func() # create coroutine object
coro_res = await c # await this object to get result
In this example...
#asyncio.coroutine
def outer():
return inner()
...awaiting of outer() should return inner() coroutine object not this object's result. But due to imperfect implementation it awaits of inner() (like if yield from inner() was written).
In new syntax asyncio works exactly as it should: it returns coroutine object instead of it's result. And since this coroutine object was never awaited (what usually means mistake) you get this warning.
You can change your code like this to see it all clearly:
loop = asyncio.get_event_loop()
print('old res:', loop.run_until_complete(outer_coro()))
print('new res:', loop.run_until_complete(native_outer_coro()))
In tornado 4.3 + python3, if I have many layers of async function, e.g.:
#gen.coroutine
def layer_2():
return(yield async_func())
#gen.coroutine
def layer_1():
return(yield layer_2())
#gen.coroutine
def main():
return(yield layer_1())
Since an async function returns a Future (yielding this Future returns its result), to get the returned value of async_func in main, I have to:
In each callee, wrap the yielded Future in a return statement
In each caller, to pass the value up the calling chain, yield the callee and warp the returned value in a return statement again
Is there anyway to avoid this pattern ?
This is the correct way to call coroutines from coroutines in Tornado. There is no "way to avoid this pattern", indeed this is how coroutines work by design!
For more info, see the Tornado coroutines guide or my Refactoring Tornado Coroutines.
I have WebSocketHandler in my Tornado application.
I am not sure is this a right way to make code asynchronous.
class MyHandler(WebSocketHandler):
def open(self):
do something ...
self.my_coroutine_method()
#gen.coroutine
def my_coroutine_method(self):
user = yield db.user.find_one() # call motor asynchronous engine
self.write_message(user)
Yes, this is correct. However, in some cases simply calling a coroutine without yielding can cause exceptions to be handled in unexpected ways, so I recommend using IOLoop.current().spawn_callback(self.my_coroutine_method) when calling a coroutine from a non-coroutine like this.