I am trying to play with this piece of code to understand #tornado.web.asynchronous. The code as intended should handle asynchronous web requests but it doesnt seem to work as intended. There are two end points:
1) http://localhost:5000/A (This is the time consuming request and
takes a few seconds)
2) http://localhost:5000/B (This is the fast request and takes no time to return.
However when I hit the browser to go to http://localhost:5000/A and then while that is running go to http://localhost:5000/B the second request is queued and runs only after A has finished.
In other words one task is time consuming but it blocks the other faster task. What am I doing wrong?
import tornado.web
from tornado.ioloop import IOLoop
import sys, random, signal
class TestHandler(tornado.web.RequestHandler):
"""
In below function goes your time consuming task
"""
def background_task(self):
sm = 0
for i in range(10 ** 8):
sm = sm + 1
return str(sm + random.randint(0, sm)) + "\n"
#tornado.web.asynchronous
def get(self):
""" Request that asynchronously calls background task. """
res = self.background_task()
self.write(str(res))
self.finish()
class TestHandler2(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.write('Response from server: ' + str(random.randint(0, 100000)) + "\n")
self.finish()
def sigterm_handler(signal, frame):
# save the state here or do whatever you want
print('SIGTERM: got kill, exiting')
sys.exit(0)
def main(argv):
signal.signal(signal.SIGTERM, sigterm_handler)
try:
if argv:
print ":argv:", argv
application = tornado.web.Application([
(r"/A", TestHandler),
(r"/B", TestHandler2),
])
application.listen(5000)
IOLoop.instance().start()
except KeyboardInterrupt:
print "Caught interrupt"
except Exception as e:
print e.message
finally:
print "App: exited"
if __name__ == '__main__':
sys.exit(main(sys.argv))
According to the documentation:
To minimize the cost of concurrent connections, Tornado uses a
single-threaded event loop. This means that all application code
should aim to be asynchronous and non-blocking because only one
operation can be active at a time.
To achieve this goal you need to prepare the RequestHandler properly. Simply adding #tornado.web.asynchronous decorator to any of the functions (get, post, etc.) is not enough if the function performs only synchronous actions.
What does the #tornado.web.asynchronous decorator do?
Let's look at the get function. The statements are executed one after another in a synchronous manner. Once the work is done and the function returns the request is being closed. A call to self.finish() is being made under the hood. However, when we use the #tornado.web.asynchronous decorator the request is not being closed after the function returned. So the self.finish() must be called by the user to finish the HTTP request. Without this decorator the request is automatically finished when the get() method returns.
Look at the "Example 21" from this page - tornado.web.asynchronous:
#web.asynchronous
def get(self):
http = httpclient.AsyncHTTPClient()
http.fetch("http://example.com/", self._on_download)
def _on_download(self, response):
self.finish()
The get() function performs an asynchronous call to the http://example.com/ page. Let's assume this call is a long action. So the http.fetch() function is being called and a moment later the get() function returns (http.fetch() is still running the background). The Tornado's IOLoop can move forward to serve the next request while the data from the http://example.com/ is being fetched. Once the the http.fetch() function call is finished the callback function - self._on_download - is called. Then self.finish() is called and the request is finally closed. This is the moment when the user can see the result in the browser.
It's possible due to the httpclient.AsyncHTTPClient(). If you use a synchronous version of the httpclient.HTTPClient() you will need to wait for the call to http://example.com/ to finish. Then the get() function will return and the next request will be processed.
To sum up, you use #tornado.web.asynchronous decorator if you use asynchronous code inside the RequestHandler which is advised. Otherwise it doesn't make much difference to the performance.
EDIT: To solve your problem you can run your time-consuming function in a separate thread. Here's a simple example of your TestHandler class:
class TestHandler(tornado.web.RequestHandler):
def on_finish(self, response):
self.write(response)
self.finish()
def async_function(base_function):
#functools.wraps(base_function)
def run_in_a_thread(*args, **kwargs):
func_t = threading.Thread(target=base_function, args=args, kwargs=kwargs)
func_t.start()
return run_in_a_thread
#async_function
def background_task(self, callback):
sm = 0
for i in range(10 ** 8):
sm = sm + 1
callback(str(sm + random.randint(0, sm)))
#tornado.web.asynchronous
def get(self):
res = self.background_task(self.on_finish)
You also need to add those imports to your code:
import threading
import functools
import threading
async_function is a decorator function. If you're not familiar with the topic I suggest to read (e.g.: decorators) and try it on your own. In general, our decorator allows the function to return immediately (so the main program execution can go forward) and the processing to take place at the same time in a separate thread. Once the function in a thread is finished we call a callback function which writes out the results to the end user and closes the connection.
Related
There is a tricky post handler, sometimes it can take a lots of time (depending on a input values), sometimes not.
What I want is to write back whenever 1 second passes, dynamically allocating the response.
def post():
def callback():
self.write('too-late')
self.finish()
timeout_obj = IOLoop.current().add_timeout(
dt.timedelta(seconds=1),
callback,
)
# some asynchronous operations
if not self.request.connection.stream.closed():
self.write('here is your response')
self.finish()
IOLoop.current().remove_timeout(timeout_obj)
Turns out I can't do much from within callback.
Even raising an exception is suppressed by the inner context and won't be passed through the post method.
Any other ways to achieve the goal?
Thank you.
UPD 2020-05-15:
I found similar question
Thanks #ionut-ticus, using with_timeout() is much more convenient.
After some tries, I think I came really close to what i'm looking for:
def wait(fn):
#gen.coroutine
#wraps(fn)
def wrap(*args):
try:
result = yield gen.with_timeout(
dt.timedelta(seconds=20),
IOLoop.current().run_in_executor(None, fn, *args),
)
raise gen.Return(result)
except gen.TimeoutError:
logging.error('### TOO LONG')
raise gen.Return('Next time, bro')
return wrap
#wait
def blocking_func(item):
time.sleep(30)
# this is not a Subprocess.
# It is a file IO and DB
return 'we are done here'
Still not sure, should wait() decorator being wrapped in a
coroutine?
Some times in a chain of calls of a blocking_func(), there can
be another ThreadPoolExecutor. I have a concern, would this work
without making "mine" one global, and passing to the
Tornado's run_in_executor()?
Tornado: v5.1.1
An example of usage of tornado.gen.with_timeout. Keep in mind the task needs to be async or else the IOLoop will be blocked and won't be able to process the timeout:
#gen.coroutine
def async_task():
# some async code
#gen.coroutine
def get(self):
delta = datetime.timedelta(seconds=1)
try:
task = self.async_task()
result = yield gen.with_timeout(delta, task)
self.write("success")
except gen.TimeoutError:
self.write("timeout")
I'd advise to use https://github.com/aio-libs/async-timeout:
import asyncio
import async_timeout
def post():
try:
async with async_timeout.timeout(1):
# some asynchronous operations
if not self.request.connection.stream.closed():
self.write('here is your response')
self.finish()
IOLoop.current().remove_timeout(timeout_obj)
except asyncio.TimeoutError:
self.write('too-late')
self.finish()
Here is the handler for my login page, which i intend to use via ajax post requests.
from argon2 import PasswordHasher
from argon2.exceptions import VerifyMismatchError
class AdminLoginHandler(RequestHandler):
async def post(self):
username = self.get_argument("username")
password = self.get_argument("password")
db_hash = await self.settings['db'].users.find_one({"username":username}, {"password":1})
if not db_hash:
await self.settings['hasher'].verify("","")
self.write("wrong")
return
try:
print(db_hash)
pass_correct = await self.settings['hasher'].verify(db_hash['password'], password)
except VerifyMismatchError:
pass_correct = False
if pass_correct:
self.set_secure_cookie("user", username)
self.write("set?")
else:
self.write("wrong")
The settings includes this argument hasher=PasswordHasher().
I'm getting the following error TypeError: object bool can't be used in 'await' expression, i'm aware this is because the function i'm calling doesn't return a future object but a boolean.
My question is how do i use the hashing library asynchronously without tornado blocking for the full time of the hashing process, which i know by design takes a long time.
You can use a ThreadPoolExecutor or a ProcessPoolExecutor to run the time consuming code in separate threads/processes:
import math
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import tornado.ioloop
import tornado.web
def blocking_task(number):
return len(str(math.factorial(number)))
class MainHandler(tornado.web.RequestHandler):
executor = ProcessPoolExecutor(max_workers=4)
# executor = ThreadPoolExecutor(max_workers=4)
async def get(self):
number = 54545 # factorial calculation takes about one second on my machine
# result = blocking_task(number) # use this line for classic (non-pool) function call
result = await tornado.ioloop.IOLoop.current().run_in_executor(self.executor, blocking_task, number)
self.write("result has %d digits" % result)
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
app = make_app()
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
I used a simple factorial calculation here to simulate a CPU intensive task and tested the above using wrk:
wrk -t2 -c4 -d30s http://127.0.0.1:8888/
Running 30s test # http://127.0.0.1:8888/
2 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.25s 34.16ms 1.37s 72.04%
Req/Sec 2.54 3.40 10.00 83.75%
93 requests in 30.04s, 19.89KB read
Requests/sec: 3.10
Transfer/sec: 678.00B
Without the executor I would get around 1 requests/sec; of course you need to tune the max_workers setting according to your setup.
If you're going to test using a browser, be aware of possible limitations.
Edit
I modified the code to easily allow for a process executor instead of a thread executor, but I doubt it will make a lot of difference in your case mainly because calls to argon2 should release the GIL but you should test it nonetheless.
I am newbie on tornado and python. A couple days ago i started to write a non-blocking rest api, but i couldn't accomplish the mission yet. When i send two request to this endpoint "localhost:8080/async" at the same time, the second request takes response after 20 seconds! That explains i am doing something wrong.
MAX_WORKERS = 4
class ASYNCHandler(tornado.web.RequestHandler):
executor = ThreadPoolExecutor(max_workers=MAX_WORKERS)
counter = 0
def pow_task(self, x, y):
time.sleep(10)
return pow(x,y)
async def background_task(self):
future = ASYNCHandler.executor.submit(self.pow_task, 2, 3)
return future
#gen.coroutine
def get(self, *args, **kwargs):
future = yield from self.background_task()
response= dumps({"result":future.result()}, default=json_util.default)
print(response)
application = tornado.web.Application([
('/async', ASYNCHandler),
('/sync', SYNCHandler),
], db=db, debug=True)
application.listen(8888)
tornado.ioloop.IOLoop.current().start()
Never use time.sleep in Tornado code! Use IOLoop.add_timeout to schedule a callback later, or in a coroutine yield gen.sleep(n).
http://www.tornadoweb.org/en/latest/faq.html#why-isn-t-this-example-with-time-sleep-running-in-parallel
That's strange that returning the ThreadPoolExecutor future, essentially blocks tornado's event loop. If anyone from the tornado team reads this and knows why that is, can they please give an explaination? I had planned to do some stuff with threads in tornado but after dealing with this question, I see that it's not going to be as simple as I originally anticipated. In any case, here is the code which does what you expect (I've trimmed your original example down a bit so that anyone can run it quickly):
from concurrent.futures import ThreadPoolExecutor
from json import dumps
import time
from tornado.platform.asyncio import to_tornado_future
from tornado.ioloop import IOLoop
from tornado import gen, web
MAX_WORKERS = 4
class ASYNCHandler(web.RequestHandler):
executor = ThreadPoolExecutor(max_workers=MAX_WORKERS)
counter = 0
def pow_task(self, x, y):
time.sleep(5)
return pow(x,y)
async def background_task(self):
future = self.executor.submit(self.pow_task, 2, 3)
result = await to_tornado_future(future) # convert to tornado future
return result
#gen.coroutine
def get(self, *args, **kwargs):
result = yield from self.background_task()
response = dumps({"result": result})
self.write(response)
application = web.Application([
('/async', ASYNCHandler),
], debug=True)
application.listen(8888)
IOLoop.current().start()
The main differences are in the background_tasks() method. I convert the asyncio future to a tornado future, wait for the result, then return the result. The code you provided in the question, blocked for some reason when yielding from background_task() and you were unable to await the result because the future wasn't a tornado future.
On a slightly different note, this simple example can easily be implemented using a single thread/async designs and chances are your code can also be done without threads. Threads are easy to implement but equally easy to get wrong and can lead to very sticky situations. When attempting to write threaded code please remember this photo :)
I have a Python script makes many async requests. The API I'm using takes a callback.
The main function calls run and I want it to block execution until all the requests have come back.
What could I use within Python 2.7 to achieve this?
def run():
for request in requests:
client.send_request(request, callback)
def callback(error, response):
# handle response
pass
def main():
run()
# I want to block here
I found that the simplest, least invasive way is to use threading.Event, available in 2.7.
import threading
import functools
def run():
events = []
for request in requests:
event = threading.Event()
callback_with_event = functools.partial(callback, event)
client.send_request(request, callback_with_event)
events.append(event)
return events
def callback(event, error, response):
# handle response
event.set()
def wait_for_events(events):
for event in events:
event.wait()
def main():
events = run()
wait_for_events(events)
In python I want to create an async method in a class that create a thread without blocking the main thread. When the new thread finish, I return a value from that function/thread.
For example the class is used for retrieve some information from web pages. I want run parallel processing in a function that download the page and return a object.
class WebDown:
def display(self, url):
print 'display(): ' + content
def download(self, url):
thread = Thread(target=self.get_info)
# thread join
print 'download(): ' + content
# return the info
def get_info(self, url):
# download page
# retrieve info
return info
if __name__ == '__main__':
wd = WebDown()
ret = wd.download('http://...')
wd.display('http://...')
I this example, in order I call download() for retrieve the info, after display() for print others information. The print output should be
display(): foo, bar, ....
download(): blue, red, ....
One way to write asynchronous, non blocking code in python involves using Python's Twisted. Twisted does not rely on multithreading but uses multiprocessing instead. It gives you convenient way to create Deferred objects, adding callbacks and errbacks to them. The example you give would look like this in Twisted, I'm using treq (Twisted Requests) library which makes generating requests a little quicker and easier:
from treq import get
from twisted.internet import reactor
class WebAsync(object):
def download(self, url):
request = get(url)
request.addCallback(self.deliver_body)
def deliver_body(self, response):
deferred = response.text()
deferred.addCallback(self.display)
return deferred
def display(self, response_body):
print response_body
reactor.stop()
if __name__ == "__main__":
web_client = WebAsync()
web_client.download("http://httpbin.org/html")
reactor.run()
Both 'download' and 'deliver_body' methods return deferreds, you add callbacks to them that are going to be executed when results is available.
I would simply use request and gevent called grequests.
import grequests
>>> urls = [
'http://...',
'http://...'
]
>>> rs = (grequests.get(u) for u in urls)
>>> grequests.map(rs)
[<Response [200]>, <Response [200]>]