how to debug Tornado async operation - python

I am new to Tornado framework, and according to the link Asynchronous and non-Blocking I/O, I wrote some demo code as below. Unfortunately, the sync http client works, but async http client not. It looks like, the callback function that I passed to AsyncHTTPClient.fetch never has the chance to run. So my question is:
Why tornado's async API not work for me?
How should I debug this kind of problem? Set a break-point to my callback function is useless because it never has chance to run.
Any help is great appreciated. Below is my demo code:
from tornado.httpclient import AsyncHTTPClient
from tornado.httpclient import HTTPClient
import time
myUrl = 'a http url serving RESTful service'
def async_fetch(url, callback):
http_client = AsyncHTTPClient()
def handle_test(response):
callback(response.body)
http_client.fetch(url, handle_test)
def sync_fetch(url):
http_client = HTTPClient()
response = http_client.fetch(url)
return response.body
def printResponse(data):
print("response is:" + data)
def main():
sync_fetch(myUrl) #this works
async_fetch(myUrl, printResponse) #this not work
if __name__ == '__main__':
main()
print("begin of sleep!")
time.sleep(2)
print("end of sleep!")

You need to start the IOLoop, otherwise your asynchronous task never makes progress:
from tornado.ioloop import IOLoop
def printResponse(data):
print("response is:" + data)
IOLoop.current().stop()
def main():
sync_fetch(myUrl) #this works
async_fetch(myUrl, printResponse)
IOLoop.current().start()
In this example I stop the loop at the bottom of printResponse. In a real web server application you might never explicitly stop the loop.

Related

Calling a different endpoint from the same running web service in Tornado Python

I have two endpoints in the same web service and one should call the second one. But because the first one is not yet finished, the second one is not called. Below is the demonstration of what I need to achieve.
import tornado.ioloop
import tornado.web
import tornado.escape
import time
import requests
import itertools
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
class SampleApi1(tornado.web.RequestHandler):
def post(self):
print("1")
response = requests.post("http://localhost:5000/test2/")
self.write(response)
class SampleApi2(tornado.web.RequestHandler):
def post(self):
print("2")
self.write({"key": "value"})
def make_app():
return tornado.web.Application([(r"/test1/", SampleApi1),
(r"/test2/", SampleApi2)])
if __name__ == "__main__":
app = make_app()
app.listen(5000)
print("Listening on port 5000")
tornado.ioloop.IOLoop.current().start()
SampleApi1 calls SampleApi2 but SampleApi2 is not being called since SampleApi1 is not yet done. I've read gen.coroutines but it didn't work. I don't need to call SampleApi2 in parallel, I just need to call it from SampleApi1. Thank you in advance!
Tornado is an asynchronous framework, but the requests library is synchronous and blocking (see the tornado user's guide for more on these concepts). You shouldn't use requests in Tornado applications because it blocks the main thread. Instead, use Tornado's own AsyncHTTPClient (or another async http client like aiohttp):
async def post(self):
print("1")
client = tornado.httpclient.AsyncHTTPClient()
response = await client.fetch("http://localhost:5000/test2", method="POST", body=b"")
self.write(response)

How to reach the aim of asynchronous request in the Tornado web service

Question: i set up a web service based on tonardo for my model. And as the request Concurrency increases, the web will easily turn down.
What i have tried: i have already tried to read some relevant code like the asynchronous framework in tornado, but i can't get the effective ideas.
The web service code as follows:
class mainhandler(tornado.web.RequestHandler):
def get(self):
self.write('hello')
def post(self):
in_data = json.loads(self.request.body.decode('utf-8'))
res = predict(in_data)
def set_default_headers(self):
self.set_header('content_type', 'application/json;charset=utf-8')
application = tornado.web.Application([(r'/', mainhandler)])
if __name__ == "__main__":
application.listen(port=5000)
tornado.ioloop.IOloop.current().start()
In Tornado, anything that is slow must be called in a coroutine with await. If predict does some I/O of its own, it should be made into a coroutine so it can do that I/O asynchronously. If it is pure computation, you should run it on a thread pool:
async def post(self):
in_data = json.loads(self.request.body.decode('utf-8'))
res = await IOLoop.current().run_in_executor(None, predict, in_data)

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

Mocking a HTTP server in Python

I'm writing a REST client and I need to mock a HTTP server in my tests. What would be the most appropriate library to do that? It would be great if I could create expected HTTP requests and compare them to actual.
Try HTTPretty, a HTTP client mock library for Python helps you focus on the client side.
You can also create a small mock server on your own.
I am using a small web server called Flask.
import flask
app = flask.Flask(__name__)
def callback():
return flask.jsonify(list())
app.add_url_rule("users", view_func=callback)
app.run()
This will spawn a server under http://localhost:5000/users executing the callback function.
I created a gist to provide a working example with shutdown mechanism etc.
https://gist.github.com/eruvanos/f6f62edb368a20aaa880e12976620db8
You can do this without using any external library by just running a temporary HTTP server.
For example mocking a https://api.ipify.org?format=json
"""Unit tests for ipify"""
import http.server
import threading
import unittest
import urllib.request
class MockIpifyHTTPRequestHandler(http.server.BaseHTTPRequestHandler):
"""HTTPServer mock request handler"""
def do_GET(self): # pylint: disable=invalid-name
"""Handle GET requests"""
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{"ip":"1.2.3.45"}')
def log_request(self, code=None, size=None):
"""Don't log anything"""
class UnitTests(unittest.TestCase):
"""Unit tests for urlopen"""
def test_urlopen(self):
"""Test urlopen ipify"""
server = http.server.ThreadingHTTPServer(
("127.0.0.127", 9999), MockIpifyHTTPRequestHandler
)
with server:
server_thread = threading.Thread(target=server.serve_forever)
server_thread.daemon = True
server_thread.start()
request = request = urllib.request.Request("http://127.0.0.127:9999/")
with urllib.request.urlopen(request) as response:
result = response.read()
server.shutdown()
self.assertEqual(result, b'{"ip":"1.2.3.45"}')
Alternative solution I found is in https://stackoverflow.com/a/34929900/15862
Mockintosh seems like another option.

Why asynchronous function in tornado is blocking?

Why incoming request are not being processed while another request is in the "waiting" state?
If you look at the code below, function "get" has a tornado task which is executed with "yield" keyword, which would mean "wait for a callback to be executed". In my code, the callback is never executed. If you run request second time, while first is on hold, the second request is not processed. If you run any other requests, they are being processed just fine.
So, my actions:
1. Start application
2. GET localhost:8080/
- Application is printing output "incoming call"
3. GET localhost:8080/anotherrequest
- Application is printing output "another request"
4. GET localhost:8080/
- Application is not printing any output while I'm expecting it to print "incoming call". Why?
So, why this piece of code gets blocking? Code sample is attached.
I was using tornado 2.1 and python 2.7 to run this sample.
Thank you
import tornado
import tornado.web
from tornado import gen
class AnotherHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
print 'another request'
self.finish()
class MainHandler(tornado.web.RequestHandler):
def printStuff(*args, **kwargs):
print 'incoming call'
#tornado.web.asynchronous
#tornado.gen.engine
def get(self):
result = yield tornado.gen.Task(self.printStuff);
application = tornado.web.Application([
(r"/", MainHandler),
(r"/anotherrequest", AnotherHandler)
])
if __name__ == "__main__":
application.listen(8080)
tornado.ioloop.IOLoop.instance().start()
Each new request to "localhost:8080/" will, in fact, cause your application to print "incoming call." However, requests to "localhost:8080/" will never finish. In order for the yield statement to be used, printStuff has to accept a callback and execute it. Also, an asynchronous get function must call self.finish:
class MainHandler(tornado.web.RequestHandler):
def printStuff(self, callback):
print 'incoming call'
callback()
#tornado.web.asynchronous
#tornado.gen.engine
def get(self):
result = yield tornado.gen.Task(self.printStuff)
self.finish()
It's easier to use Tornado's modern "coroutine" interface instead of gen.Task and gen.engine:
class MainHandler(tornado.web.RequestHandler):
#gen.coroutine
def printStuff(self):
print 'incoming call'
#gen.coroutine
def get(self):
result = yield self.printStuff()
self.finish()
Found the problem, it's actually happening when requests are made from the browser. With "curl" everything works as expected. Sorry for inconvenience.

Categories

Resources