Tornado async http client blocks

Tornado async http client blocks - python

theQueue = tornado.queues.Queue()
theQueue.put_nowait('http://www.baidu.com')
theQueue.put_nowait('http://www.google.com')
theQueue.put_nowait('http://cn.bing.com/')
#tornado.gen.coroutine
def Test1():
def cb(response):
print str(response)
while True:
item = yield theQueue.get()
print item
tmp = tornado.httpclient.AsyncHTTPClient(force_instance=True)
tmp.fetch(item,callback=cb)
#tornado.gen.coroutine
def Test2():
while True:
item = yield theQueue.get()
print item
tmp = tornado.httpclient.AsyncHTTPClient(force_instance=True)
response = yield tmp.fetch(item)
print str(response)
#Test1()
Test2()
tornado.ioloop.IOLoop.instance().start()
python 2.6 and tornado 4.2
In the function Test1,it will first prints out 3 items,then prints 3 responses.
But in Test2,it will print item and it's response one by one.
I was confused,why in Test2 isn't asynchronous?

Test2() is asynchronous, but in a different way, the coroutine way.
Tornado's coroutine suspends when it meets yield keyword, waiting for the async process(in your case, requesting webpage via http client) to complete. tornado will switch to other available coroutines when current coroutine suspends.
With coroutine, your code looks synchronous, and "runs synchronous"(if there is only one coroutine) too.
You can easily test the ASYNC feature of tornado's coroutine, by using two or more conroutines:
#tornado.gen.coroutine
def Test2():
while True:
item = yield theQueue.get()
print 'Test2:', item
tmp = tornado.httpclient.AsyncHTTPClient(force_instance=True)
response = yield tmp.fetch(item)
print 'Test2:', str(response)
# Write another test function called `Test3` and do the exactly same thing with Test2.
#tornado.gen.coroutine
def Test3():
while True:
item = yield theQueue.get()
print 'Test3:', item
tmp = tornado.httpclient.AsyncHTTPClient(force_instance=True)
response = yield tmp.fetch(item)
print 'Test3:', str(response)
Test2()
Test3()
tornado.ioloop.IOLoop.instance().start()
You will see Test2 and Test3 runs simultaneously(but not really) in this example.
The ability of switching between different routines to perform concurrent operations, that's the meaning of coroutine asynchronous.

Related

Beginner async/await question for api requests

I want speed up some API requests... for that I try to figure out how to do and copy some code which run but when I try my own code its no longer asynchrone. Maybe someone find the fail?
Copy Code (guess from stackoverflow):
#!/usr/bin/env python3
import asyncio
#asyncio.coroutine
def func_normal():
print('A')
yield from asyncio.sleep(5)
print('B')
return 'saad'
#asyncio.coroutine
def func_infinite():
for i in range(10):
print("--%d" % i)
return 'saad2'
loop = asyncio.get_event_loop()
tasks = func_normal(), func_infinite()
a, b = loop.run_until_complete(asyncio.gather(*tasks))
print("func_normal()={a}, func_infinite()={b}".format(**vars()))
loop.close()
My "own" code (I need at the end a list returned and merge the results of all functions):
import asyncio
import time
#asyncio.coroutine
def say_after(start,count,say,yep=True):
retl = []
if yep:
time.sleep(5)
for x in range(start,count):
retl.append(x)
print(say)
return retl
def main():
print(f"started at {time.strftime('%X')}")
loop = asyncio.get_event_loop()
tasks = say_after(10,20,"a"), say_after(20,30,"b",False)
a, b = loop.run_until_complete(asyncio.gather(*tasks))
print("func_normal()={a}, func_infinite()={b}".format(**vars()))
loop.close()
c = a + b
#print(c)
print(f"finished at {time.strftime('%X')}")
main()
Or I m completly wrong and should solve that with multithreading? What would be the best way for API requests that returns a list that I need to merge?

Added comment for each section that needs improvement. Removed some to simply code.
In fact, I didn't find any performance uplift with using range() wrapped in coroutine and using async def, might worth with heavier operations.
import asyncio
import time
# #asyncio.coroutine IS DEPRECATED since python 3.8
#asyncio.coroutine
def say_after(wait=True):
result = []
if wait:
print("I'm sleeping!")
time.sleep(5)
print("'morning!")
# This BLOCKs thread, but release GIL so other thread can run.
# But asyncio runs in ONE thread, so this still harms simultaneity.
# normal for is BLOCKING operation.
for i in range(5):
result.append(i)
print(i, end='')
print()
return result
def main():
start = time.time()
# Loop argument will be DEPRECATED from python 3.10
# Make main() as coroutine, then use asyncio.run(main()).
# It will be in asyncio Event loop, without explicitly passing Loop.
loop = asyncio.get_event_loop()
tasks = say_after(), say_after(False)
# As we will use asyncio.run(main()) from now on, this should be await-ed.
a, b = loop.run_until_complete(asyncio.gather(*tasks))
print(f"Took {time.time() - start:5f}")
loop.close()
main()
Better way:
import asyncio
import time
async def say_after(wait=True):
result = []
if wait:
print("I'm sleeping!")
await asyncio.sleep(2) # 'await' a coroutine version of it instead.
print("'morning!")
# wrap iterator in generator - or coroutine
async def asynchronous_range(end):
for _i in range(end):
yield _i
# use it with async for
async for i in asynchronous_range(5):
result.append(i)
print(i, end='')
print()
return result
async def main():
start = time.time()
tasks = say_after(), say_after(False)
a, b = await asyncio.gather(*tasks)
print(f"Took {time.time() - start:5f}")
asyncio.run(main())
Result
Your code:
DeprecationWarning: "#coroutine" decorator is deprecated since Python 3.8, use "async def" instead
def say_after(wait=True):
I'm sleeping!
'morning!
01234
01234
Took 5.003802
Better async code:
I'm sleeping!
01234
'morning!
01234
Took 2.013863
Note that fixed code now finish it's job while other task is sleeping.

Using threading.Timer with asycnio

I'm new to python's ascynio feature and I have a server that processes websocket requests from a browser. Here's a simplified version of how it works:
#asyncio.coroutine
def web_client_connected(self, websocket):
self.web_client_socket = websocket
while True:
request = yield from self.web_client_socket.recv()
json_val = process_request(request)
yield from self.socket_queue.put(json_val)
#asyncio.coroutine
def push_from_web_client_json_queue(self):
while True:
json_val = yield from self.socket_queue.get()
yield from self.web_client_socket.send(json_val)
You have one loop looking for web socket requests coming in from the client. When it gets one, it processes it and puts the value onto a queue. Another loop is looking for values on that queue and when it finds one it sends processed value back out on the web socket. Pretty straight forward and it works.
What I want to do now it introduce a timer. When requests comes and and is done processing, instead of putting a response back on the queue immediately, I want to start a timer for 1 minute. When the timer is finished, then I want to put the response on the queue.
I've tried something like:
#asyncio.coroutine
def web_client_connected(self, websocket):
self.web_client_socket = websocket
while True:
request = yield from self.web_client_socket.recv()
json_val = process_request(request)
t = threading.Timer(60, self.timer_done, json_val)
t.start()
#asyncio.coroutine
def timer_done(self, args):
yield from self.socket_queue.put(args)
It doesn't work though. The timer_done method is never called. If I removed the #asyncio.coroutine decorator and yield from, then timer_done does get called but then call to self.socket_queue.put(args) doesn't work.
I think I'm misunderstanding something fundamental here. How do you do this?

Insted of a timer, use asyncio.ensure_future() and asyncio.sleep():
#asyncio.coroutine
def web_client_connected(self, websocket):
self.web_client_socket = websocket
while True:
request = yield from self.web_client_socket.recv()
json_val = process_request(request)
asyncio.ensure_future(web_client_timer(json_val))
yield
#asyncio.coroutine
def web_client_timer(self, json_val):
yield from asyncio.sleep(60)
yield from self.socket_queue.put(json_val)
Working example:
import asyncio
#asyncio.coroutine
def foo():
print("enter foo")
timers = []
for i in range(10):
print("Start foo", i)
yield from asyncio.sleep(0.5)
print("Got foo", i)
timers.append(asyncio.ensure_future(timer(i)))
yield
print("foo waiting")
# wait for all timers to finish
yield from asyncio.wait(timers)
print("exit foo")
#asyncio.coroutine
def timer(i):
print("Setting timer", i)
yield from asyncio.sleep(2)
print("**** Timer", i)
loop = asyncio.get_event_loop()
resp = loop.run_until_complete(foo())
loop.close()

Asynchronous RabbitMQ consumer with aioamqp

I'm trying to write an asynchronous consumer using asyncio/aioamqp. My problem is, the callback coroutine (below) is blocking. I set the channel to do a basic_consume(), and assign the callback as callback(). The callback has a "yield from asyncio.sleep" statement (to simulate "work"), which takes an integer from the publisher and sleeps for that amount of time before printing the message.
If I published two messages, one with a time of "10", immediately followed by one with a time of "1", I expected the second message would print first, since it has a shorter sleep time. Instead, the callback blocks for 10 seconds, prints the first message, and then prints the second.
It appears either basic_consume, or the callback, is blocking somewhere. Is there another way this could be handled?
#asyncio.coroutine
def callback(body, envelope, properties):
yield from asyncio.sleep(int(body))
print("consumer {} recved {} ({})".format(envelope.consumer_tag, body, envelope.delivery_tag))
#asyncio.coroutine
def receive_log():
try:
transport, protocol = yield from aioamqp.connect('localhost', 5672, login="login", password="password")
except:
print("closed connections")
return
channel = yield from protocol.channel()
exchange_name = 'cloudstack-events'
exchange_name = 'test-async-exchange'
queue_name = 'async-queue-%s' % random.randint(0, 10000)
yield from channel.exchange(exchange_name, 'topic', auto_delete=True, passive=False, durable=False)
yield from asyncio.wait_for(channel.queue(queue_name, durable=False, auto_delete=True), timeout=10)
binding_keys = ['mykey']
for binding_key in binding_keys:
print("binding", binding_key)
yield from asyncio.wait_for(channel.queue_bind(exchange_name=exchange_name,
queue_name=queue_name,
routing_key=binding_key), timeout=10)
print(' [*] Waiting for logs. To exit press CTRL+C')
yield from channel.basic_consume(queue_name, callback=callback)
loop = asyncio.get_event_loop()
loop.create_task(receive_log())
loop.run_forever()

For those interested, I figured out a way to do this. I'm not sure if it's best practice, but it's accomplishing what I need.
Rather than do the "work" (in this case, async.sleep) inside the callback, I create a new task on the loop, and schedule a separate co-routine to run do_work(). Presumably this is working, because it's freeing up callback() to return immediately.
I loaded up a few hundred events in Rabbit with different sleep timers, and they were interleaved when printed by the code below. So it seems to be working. Hope this helps someone!
#asyncio.coroutine
def do_work(envelope, body):
yield from asyncio.sleep(int(body))
print("consumer {} recved {} ({})".format(envelope.consumer_tag, body, envelope.delivery_tag))
#asyncio.coroutine
def callback(body, envelope, properties):
loop = asyncio.get_event_loop()
loop.create_task(do_work(envelope, body))
#asyncio.coroutine
def receive_log():
try:
transport, protocol = yield from aioamqp.connect('localhost', 5672, login="login", password="password")
except:
print("closed connections")
return
channel = yield from protocol.channel()
exchange_name = 'cloudstack-events'
exchange_name = 'test-async-exchange'
queue_name = 'async-queue-%s' % random.randint(0, 10000)
yield from channel.exchange(exchange_name, 'topic', auto_delete=True, passive=False, durable=False)
yield from asyncio.wait_for(channel.queue(queue_name, durable=False, auto_delete=True), timeout=10)
binding_keys = ['mykey']
for binding_key in binding_keys:
print("binding", binding_key)
yield from asyncio.wait_for(channel.queue_bind(exchange_name=exchange_name,
queue_name=queue_name,
routing_key=binding_key), timeout=10)
print(' [*] Waiting for logs. To exit press CTRL+C')
yield from channel.basic_consume(queue_name, callback=callback)
loop = asyncio.get_event_loop()
loop.create_task(receive_log())
loop.run_forever()

Calling a coroutine from asyncio.Protocol.data_received

This is similar to Calling coroutines in asyncio.Protocol.data_received but I think it warrants a new question.
I have a simple server set up like this
loop.create_unix_server(lambda: protocol, path=serverSocket)
It works fine, if I do this
def data_received(self, data):
data = b'data reply'
self.send(data)
my client gets the reply. But I can't get it to work with any sort of asyncio call. I tried all of the following and none of them worked.
#asyncio.coroutine
def go(self):
yield from asyncio.sleep(1, result = b'data reply')
def data_received(self, data):
print('Data Received', flush=True)
task = asyncio.get_event_loop().create_task(self.go())
data = yield from asyncio.wait_for(task,10)
self.send(data)
that one hung and printed nothing (if I decorated data_received with #asyncio.coroutine I get that that is not yielded from) OK, I get that using yield in data_received isn't right.
If I try a new event loop, as below, that hangs in run_until_complete
loop = asyncio.new_event_loop()
task = loop.create_task(self.go())
loop.run_until_complete(task)
data = task.result()
self.send(data)
If I use a Future, that also hangs in run_until_complete
#asyncio.coroutine
def go(self, future):
yield from asyncio.sleep(1)
future.set_result(b'data reply')
def data_received(self, data):
print('Data Received', flush=True)
loop = asyncio.new_event_loop()
future = asyncio.Future(loop=loop)
asyncio.async(self.go(future))
loop.run_until_complete(future)
data = future.result()
self.send(data)
The following gets close, but it returns immediately and the result is of type asyncio.coroutines.CoroWrapper, implying that the wait_for line returned immediately with the unfinished task?
#asyncio.coroutine
def go(self):
return(yield from asyncio.sleep(3, result = b'data reply'))
#asyncio.coroutine
def go2(self):
task = asyncio.get_event_loop().create_task(self.go())
res = yield from asyncio.wait_for(task, 10)
return result
def data_received(self, data):
print('Data Received', flush=True)
data = self.go2()
self.send(data)
I'm a bit stuck really, and would appreciate some pointers about what to look at.

You need to add your coroutine to the event loop, and then use Future.add_done_callback to handle the result when the coroutine completes:
#asyncio.coroutine
def go(self):
return(yield from asyncio.sleep(3, result = b'data reply'))
def data_received(self, data):
print('Data Received', flush=True)
task = asyncio.async(self.go()) # or asyncio.get_event_loop().create_task()
task.add_done_callback(self.handle_go_result)
def handle_go_result(self, task):
data = task.result()
self.send(data)
Calling a coroutine directly in data_received just simply isn't allowed, since the caller isn't going to try to yield from it, and creating/running a new event loop inside of data_received will always end up blocking the main event loop until the inner event loop finishes its work.
You just want to schedule some work with your main event loop (asyncio.async/loop.create_task()), and schedule a callback to run when the work is done (add_done_callback).

What is the best way to refactor generators pipeline as coroutines?

Consider this code:
#!/usr/bin/env python
# coding=utf-8
from string import letters
def filter_upper(letters):
for letter in letters:
if letter.isupper():
yield letter
def filter_selected(letters, selected):
selected = set(map(str.lower, selected))
for letter in letters:
if letter.lower() in selected:
yield letter
def main():
stuff = filter_selected(filter_upper(letters), ['a', 'b', 'c'])
print(list(stuff))
main()
This is the illustration of a pipeline constructed from generators. I often use this pattern in practice to build data processing flow. It's like UNIX pipes.
What is the most elegant way to refactor the generators to coroutines that suspend execution every yield?
UPDATE
My first try was like this:
#!/usr/bin/env python
# coding=utf-8
import asyncio
#asyncio.coroutine
def coro():
for e in ['a', 'b', 'c']:
future = asyncio.Future()
future.set_result(e)
yield from future
#asyncio.coroutine
def coro2():
a = yield from coro()
print(a)
loop = asyncio.get_event_loop()
loop.run_until_complete(coro2())
But for some reason it doesnt work - variable a becomes None.
UPDATE #1
What I came up with recently:
Server:
#!/usr/bin/env python
# coding=utf-8
"""Server that accepts a client and send it strings from user input."""
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = ''
port = 5555
s.bind((host, port))
s.listen(1)
print('Listening...')
conn, addr = s.accept()
print('Client ({}) connected.'.format(addr))
while True:
conn.send(raw_input('Enter data to send: '))
Client:
#!/usr/bin/env python
# coding=utf-8
"""Client that demonstrates processing pipeline."""
import trollius as asyncio
from trollius import From
#asyncio.coroutine
def upper(input, output):
while True:
char = yield From(input.get())
print('Got char: ', char)
yield From(output.put(char.upper()))
#asyncio.coroutine
def glue(input, output):
chunk = []
while True:
e = yield From(input.get())
chunk.append(e)
print('Current chunk: ', chunk)
if len(chunk) == 3:
yield From(output.put(chunk))
chunk = []
#asyncio.coroutine
def tcp_echo_client(loop):
reader, writer = yield From(asyncio.open_connection('127.0.0.1', 5555,
loop=loop))
q1 = asyncio.Queue()
q2 = asyncio.Queue()
q3 = asyncio.Queue()
#asyncio.coroutine
def printer():
while True:
print('Pipeline ouput: ', (yield From(q3.get())))
asyncio.async(upper(q1, q2))
asyncio.async(glue(q2, q3))
asyncio.async(printer())
while True:
data = yield From(reader.read(100))
print('Data: ', data)
for byte in data:
yield From(q1.put(byte))
print('Close the socket')
writer.close()
#asyncio.coroutine
def background_stuff():
while True:
yield From(asyncio.sleep(3))
print('Other background stuff...')
loop = asyncio.get_event_loop()
asyncio.async(background_stuff())
loop.run_until_complete(tcp_echo_client(loop))
loop.close()
Advantage over "David Beazley's coroutines" is that you can use all asyncio stuff inside such processing units with input and output queues.
Disadvantage here - a lot of queues instances needed for connecting pipeline units. It can be fixed with use of data sructure more advanced than asyncio.Queue.
Another disadvantage is that such kind of processing units does not propagate their exceptions to a parent stack frame, because they are "background tasks", whereas "David Beazley's coroutines" does propagate.
UPDATE #2
That's with what I came up:
https://gist.github.com/AndrewPashkin/04c287def6d165fc2832

I think the answer here is "you don't". I'm guessing that you're getting this idea from David Beazley's famous coroutine/generator tutorial. In his tutorials, he's using coroutines as basically a reversed generator pipeline. Instead of pulling the data through the pipeline by iterating over generators, you push data through the pipeline using gen_object.send(). Your first example would look something like this using this notion of coroutines:
from string import letters
def coroutine(func):
def start(*args,**kwargs):
cr = func(*args,**kwargs)
cr.next()
return cr
return start
#coroutine
def filter_upper(target):
while True:
letter = yield
if letter.isupper():
target.send(letter)
#coroutine
def filter_selected(selected):
selected = set(map(str.lower, selected))
out = []
try:
while True:
letter = yield
if letter.lower() in selected:
out.append(letter)
except GeneratorExit:
print out
def main():
filt = filter_upper(filter_selected(['a', 'b', 'c']))
for letter in letters:
filt.send(letter)
filt.close()
if __name__ == "__main__":
main()
Now, the coroutines in asyncio are similar in that they're suspendable generator objects which can have data sent into them, but they really aren't meant for the data pipelining use-case at all. They're meant to be used to enable concurrency when you're executing blocking I/O operations. The yield from suspension points allow control to return to the event loop while the I/O happens, and the event loop will restart the coroutine when it completes, sending the data returned by the I/O call into the coroutine. There's really no practical reason to try to use them for this kind of use case, since there's no blocking I/O happening at all.
Also, the problem with your attempt at using asyncio is that a = yield from coro() is assigning a to the return value of coro. But you're not actually returning anything from coro. You're caught somewhere between treating coro as an actual coroutine and a generator. It looks like you're expecting yield from future to send the contents of future from coro to coro2, but that's not how coroutines work. yield from is used to pull data from a coroutine/Future/Task, and return is used to actually send an object back to the caller. So, for coro to actually return something to coro2, you need to do this:
#asyncio.coroutine
def coro():
for e in ['a', 'b', 'c']:
future = asyncio.Future()
future.set_result(e)
return future
But that's just going to end with 'a' being returned to coro2. I think to get the output you expect you'd need to do this:
#asyncio.coroutine
def coro():
future = asyncio.Future()
future.set_result(['a', 'b', 'c'])
return future
Which maybe demonstrates why asyncio coroutines are not what you want here.
Edit:
Ok, given a case where you want to use pipelining in addition to actually making use of async I/O, I think the approach you used in your update is good. As you suggested, it can be made simpler by creating a data structure to help automate the queue management:
class Pipeline(object):
def __init__(self, *nodes):
if len(nodes) < 2:
raise Exception("Need at least two nodes in the pipeline")
self.start = asyncio.Queue()
in_ = self.start
for node in nodes:
out = asyncio.Queue()
asyncio.async(node(in_, out))
in_ = out
#asyncio.coroutine
def put(self, val):
yield from self.start.put(val)
# ... (most code is unchanged)
#asyncio.coroutine
def printer(input_, output):
# For simplicity, I have the sink taking an output queue. Its not being used,
# but you could make the final output queue accessible from the Pipeline object
# and then add a get() method to the `Pipeline` itself.
while True:
print('Pipeline ouput: ', (yield from input_.get()))
#asyncio.coroutine
def tcp_echo_client(loop):
reader, writer = yield from asyncio.open_connection('127.0.0.1', 5555,
loop=loop)
pipe = Pipeline(upper, glue, printer)
while True:
data = yield from reader.read(100)
if not data:
break
print('Data: ', data)
for byte in data.decode('utf-8'):
yield from pipe.put(byte) # Add to the pipe
print('Close the socket')
writer.close()
This simplifies Queue management, but doesn't address the exception handling issue. I'm not sure if much can be done about that...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tornado async http client blocks - python

Related

Beginner async/await question for api requests

Using threading.Timer with asycnio

Asynchronous RabbitMQ consumer with aioamqp

Calling a coroutine from asyncio.Protocol.data_received

What is the best way to refactor generators pipeline as coroutines?

Categories

Resources