Using Tornado, I have a POST request that takes a long time as it makes many requests to another API service and processes the data. This can take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.
I looked at multiple threads here on SO, but they are often 8 years old and the code does not work anylonger as tornado removed the "engine" component from tornado.gen.
Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "submit the POST response and work on this one function without blocking any concurrent server requests from getting an immediate response"?
Example:
main.py
def make_app():
return tornado.web.Application([
(r"/v1", MainHandler),
(r"/v1/addfile", AddHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfiles", GetHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfile", GetFileHandler, dict(folderpaths = folderpaths)),
])
if __name__ == "__main__":
app = make_app()
sockets = tornado.netutil.bind_sockets(8888)
tornado.process.fork_processes(0)
tornado.process.task_id()
server = tornado.httpserver.HTTPServer(app)
server.add_sockets(sockets)
tornado.ioloop.IOLoop.current().start()
addHandler.py
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
self.blockingFunction()
Idea is that if input-parameters are validated the POST response should be sent out.
Then internal process (blockingFunction()) should be started, that should not block the Tornado Server from processing another API POST request.
I tried defining the (blockingFunction()) as async, which allows me to process multiple concurrent user requests - however there was a warning about missing "await" with async method.
Any help welcome. Thank you
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
async def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
await loop.run_in_executor(None, self.blockingFunction)
#if this had multiple parameters it would be
#await loop.run_in_executor(None, self.blockingFunction, param1, param2)
Thank you #xyres
Further read: https://www.tornadoweb.org/en/stable/faq.html
Related
I have just ran into a funny situation when testing my FastAPI Python application and thought it might be useful for some of the people who reuse sessions in their apps and want to test requests using the same app, but get stuck on weir errors like the one in the title.
Also I desire to know what is happening here.
Context
I have an async FastAPI application, that schedules multiple requests based on a unimportant configuration. After the list of request definitions is prepared, a session is created the requests are sent, possibly with delays so I can spread them in time.
To test if the requests are getting through, I have cretaed routes in my own app so I can send the testing requests back to my own application. The application basically talks to itself.
It was listening on 127.0.0.1:8000 at the time of testing.
I have following functions defined for building async tasks:
def optional_session(func):
async def wrapper(*args, **kwargs):
if 'session' not in kwargs or kwargs['session'] is None:
async with ClientSession() as session:
kwargs['session'] = session
return await func(*args, **kwargs)
else:
return await func(*args, **kwargs)
return wrapper
#optional_session
async def post_json_with_time_from_url(url: str, data: dict, session: ClientSession = None) -> Tuple[Union[dict, None], float]:
"""
A method that performs a request to a specified URL and reads the response as JSON data.
If the request is successful the data is returned. If an error occurs it is logged and the returned data is None.
:param data: data to send i the request
:param url: The URL to retrieve the image from
:return: A valid response or None
:param session:
"""
result = None, time.time()
try:
async with session.post(url, data=data) as response: # type: ClientResponse
# check if the response is valid
if response.status == 200:
try:
# we have to read the response before leaving the response context manager
result = await response.json(), time.time()
except Exception as e:
logger.error("...")
else:
logger.error(
"...")
except InvalidURL as e:
logger.error(f"...")
except Exception as e:
logger.error("...")
return result
def delay(func, seconds: int):
""""
This decorator adds a time delay to an async function.
"""
if seconds is None:
seconds = 0
async def wrapper(*args, **kwargs):
await asyncio.sleep(seconds)
return await func(*args, **kwargs)
return wrapper
def parse_get_post_request(config: ConfigContext, session: aiohttp.ClientSession = None) -> asyncio.Task:
"""
Parses the get/post request from the configuration dictionary and creates an async task for it.
"""
request_type = config.extract_key('request_type', True).lower()
delay_ = config.extract_key('delay')
url_base_ = config.extract_key('request_url_base', True)
url_suffix_ = config.extract_key('request_url_suffix', True)
url_ = urljoin(base=url_base_, url=url_suffix_)
if request_type == 'get':
return asyncio.ensure_future(
delay(get_json_with_time_from_url, delay_)(url=url_, session=session)
)
elif request_type == 'post':
return asyncio.ensure_future(
delay(post_json_with_time_from_url, delay_)(url=url_, session=session, data=config.extract_key('request_data'))
)
else:
raise ValueError(f"Unsupported request type: {request_type}")
I am creating an aiohttp session like this:
async with aiohttp.ClientSession() as session:
...
and then reusing it throughout the context code block somehting like this:
single_request_tasks = []
...
for config in configs:
single_request_tasks.append(parse_get_post_request(config=plan_config, session=session))
...
responses = await asyncio.gather(*single_request_tasks)
...
Problem
Somehow, when I send the requests altogether, and one of the requests arrives back to the app at the same time as another one, an exception is thrown:
ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine
It turns out, that for some reason, the session I share for all the requests is terminated when multiple requests arrive at the same time, using the same ClientSession instance.
I am not really sure why this happens exactly, apart from suspecting some port clash shanenigans,
but it is resolved, when I use separate session for each request or when I spread them in time with an interval of one second (for example)
Workaround
I have used separate sessions for each request when looping back to localhost.
I also avoided the issue, when I have spread the requests in time, so each one has time to complete before the other one is sent, but timing is not that reliable mechanism (since OS task scheduler, concurrency in asyncio, network latency, etc.)
This problem does not occur when sharing a session with a different host (for example when scraping images from imgur.com) so I believe the problem is related to the fact that I am looping back to the localhost.
Question
Why this happens exactly? Why is the session closed by the software in the situation I described?
Is there anything I am doing wrong with the session? How does Starlette handle loopback connections? Is this case-dependent and do I need to do more detective work somehow or is this a generally recognized, platform independent behaviour?
So i'm creating this application and a part of it is a web page where a trading algorithm is testing itself using live data. All that is working but the issue is if i leave (exit) the web page, it stops. I was wondering how i can keep it running in the background indefinitely as i want the algorithm to keep doing it's thing.
This is the route which i would like to run in the background.
#app.route('/live-data-source')
def live_data_source():
def get_live_data():
live_options = lo.Options()
while True:
live_options.run()
live_options.update_strategy()
trades = live_options.get_all_option_trades()
trades = trades[0]
json_data = json.dumps(
{'data': trades})
yield f"data:{json_data}\n\n"
time.sleep(5)
return Response(get_live_data(), mimetype='text/event-stream')
I've looked into multi-threading but not too sure if that's the right thing for the job. I am kind of still new to flask so hence the poor question. If you need more info, please do comment.
You can do it the following way - this is 100% working example below. Note, in production use Celery for such tasks, or write another one daemon app (another process) by yourself and feed it with tasks from http server with the help of message queue (e.g. RabbitMQ) or with the help of common database.
If any questions regarding code below, feel free to ask, it was quite good exercise for me:
from flask import Flask, current_app
import threading
from threading import Thread, Event
import time
from random import randint
app = Flask(__name__)
# use the dict to store events to stop other treads
# one event per thread !
app.config["ThreadWorkerActive"] = dict()
def do_work(e: Event):
"""function just for another one thread to do some work"""
while True:
if e.is_set():
break # can be stopped from another trhead
print(f"{threading.current_thread().getName()} working now ...")
time.sleep(2)
print(f"{threading.current_thread().getName()} was stoped ...")
#app.route("/long_thread", methods=["GET"])
def long_thread_task():
"""Allows to start a new thread"""
th_name = f"Th-{randint(100000, 999999)}" # not really unique actually
stop_event = Event() # is used to stop another thread
th = Thread(target=do_work, args=(stop_event, ), name=th_name, daemon=True)
th.start()
current_app.config["ThreadWorkerActive"][th_name] = stop_event
return f"{th_name} was created!"
#app.route("/stop_thread/<th_id>", methods=["GET"])
def stop_thread_task(th_id):
th_name = f"Th-{th_id}"
if th_name in current_app.config["ThreadWorkerActive"].keys():
e = current_app.config["ThreadWorkerActive"].get(th_name)
if e:
e.set()
current_app.config["ThreadWorkerActive"].pop(th_name)
return f"Th-{th_id} was asked to stop"
else:
return "Sorry something went wrong..."
else:
return f"Th-{th_id} not found"
#app.route("/", methods=["GET"])
def index_route():
text = ("/long_thread - create another thread. "
"/stop_thread/th_id - stop thread with a certain id. "
f"Available Threads: {'; '.join(current_app.config['ThreadWorkerActive'].keys())}")
return text
if __name__ == '__main__':
app.run(host="0.0.0.0", port=9999)
first time trying asyncio and aiohttp.
I have the following code that gets urls from the MySQL database for GET requests. Gets the responses and pushes them to MySQL database.
if __name__ == "__main__":
database_name = 'db_name'
company_name = 'company_name'
my_db = Db(database=database_name) # wrapper class for mysql.connector
urls_dict = my_db.get_rest_api_urls_for_specific_company(company_name=company_name)
update_id = my_db.get_updateid()
my_db.get_connection(dictionary=True)
for url in urls_dict:
url_id = url['id']
url = url['url']
table_name = my_db.make_sql_table_name_by_url(url)
insert_query = my_db.get_sql_for_insert(table_name)
r = requests.get(url=url).json() # make the request
args = [json.dumps(r), update_id, url_id]
my_db.db_execute_one(insert_query, args, close_conn=False)
my_db.close_conn()
This works fine but to speed it up How can I run it asynchronously?
I have looked here, here and here but can't seem to get my head around it.
Here is what I have tried based on #Raphael Medaer's answer.
async def fetch(url):
async with ClientSession() as session:
async with session.request(method='GET', url=url) as response:
json = await response.json()
return json
async def process(url, update_id):
table_name = await db.make_sql_table_name_by_url(url)
result = await fetch(url)
print(url, result)
if __name__ == "__main__":
"""Get urls from DB"""
db = Db(database="fuse_src")
urls = db.get_rest_api_urls() # This returns list of dictionary
update_id = db.get_updateid()
url_list = []
for url in urls:
url_list.append(url['url'])
print(update_id)
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*[process(url, update_id) for url in url_list]))
I get an error in the process method:
TypeError: object str can't be used in 'await' expression
Not sure whats the problem?
Any code example specific to this would be highly appreciated.
Make this code asynchronous will not speed it up at all. Except if you consider to run a part of your code in "parallel". For instance you can run multiple (SQL or HTTP) queries in "same time". By doing asynchronous programming you will not execute code in "same time". Although you will get benefit of long IO tasks to execute other part of your code while you're waiting for IOs.
First of all, you'll have to use asynchronous libraries (instead of synchronous one).
mysql.connector could be replaced by aiomysql from aio-libs.
requests could be replaced by aiohttp
To execute multiple asynchronous tasks in "parallel" (for instance to replace your loop for url in urls_dict:), you have to read carefully about asyncio tasks and function gather.
I will not (re)write your code in an asynchronous way, however here are a few lines of pseudo code which could help you:
async def process(url):
result = await fetch(url)
await db.commit(result)
if __name__ == "__main__":
db = MyDbConnection()
urls = await db.fetch_all_urls()
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*[process(url) for url in urls]))
I have 2 functions.
1st function stores the data received in a list and 2nd function writes the data into a csv file.
I'm using Flask. Whenever a web service has been called it will store the data and send response to it, as soon as it sends response it triggers the 2nd function.
My Code:
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
arr = []
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
return json.dumps(res)
def trigger():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return
Obviously the 2nd function is not called.
Is there a way to achieve this?
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
One solution would be to have a background thread that will watch a queue. You put your csv data in the queue and the background thread will consume it. You can start such a thread before first request:
import threading
from multiprocessing import Queue
class CSVWriterThread(threading.Thread):
def __init__(self, *args, **kwargs):
threading.Thread.__init__(self, *args, **kwargs)
self.input_queue = Queue()
def send(self, item):
self.input_queue.put(item)
def close(self):
self.input_queue.put(None)
self.input_queue.join()
def run(self):
while True:
csv_array = self.input_queue.get()
if csv_array is None:
break
# Do something here ...
df = pd.DataFrame({'x': csv_array})
df.to_csv("docs/xyz.csv", index=False)
self.input_queue.task_done()
time.sleep(1)
# Done
self.input_queue.task_done()
return
#app.before_first_request
def activate_job_monitor():
thread = CSVWriterThread()
app.csvwriter = thread
thread.start()
And in your code put the message in the queue before returning:
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
app.csvwriter.send(arr)
return json.dumps(res)
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
Consider using celery which is made for the very problem you're trying to solve. From docs:
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.
I recommend you integrate celery with your flask app as described here. your trigger method would then become a straightforward celery task that you can execute without having to worry about long response time.
Im actually working on another interesting case on my side where i pass the work off to a python worker that sends the job to a redis queue. There are some great blogs using redis with Flask , you basically need to ensure redis is running (able to connect on port 6379)
The worker would look something like this:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
In my example I have a function that queries a database for usage and since it might be a lengthy process i pass it off to the worker (running as a seperate script)
def post(self):
data = Task.parser.parse_args()
job = q.enqueue_call(
func=migrate_usage, args=(my_args),
result_ttl=5000
)
print("Job ID is: {}".format(job.get_id()))
job_key = job.get_id()
print(str(Job.fetch(job_key, connection=conn).result))
if job:
return {"message": "Job : {} added to queue".format(job_key)}, 201
Credit due to the following article:
https://realpython.com/flask-by-example-implementing-a-redis-task-queue/#install-requirements
You can try use streaming. See next example:
import time
from flask import Flask, Response
app = Flask(__name__)
#app.route('/')
def main():
return '''<div>start</div>
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', '/test', true);
xhr.onreadystatechange = function(e) {
var div = document.createElement('div');
div.innerHTML = '' + this.readyState + ':' + this.responseText;
document.body.appendChild(div);
};
xhr.send();
</script>
'''
#app.route('/test')
def test():
def generate():
app.logger.info('request started')
for i in range(5):
time.sleep(1)
yield str(i)
app.logger.info('request finished')
yield ''
return Response(generate(), mimetype='text/plain')
if __name__ == '__main__':
app.run('0.0.0.0', 8080, True)
All magic in this example in genarator where you can start response data, after do some staff and yield empty data to end your stream.
For details look at http://flask.pocoo.org/docs/patterns/streaming/.
You can defer route specific actions with limited context by combining after_this_request and response.call_on_close. Note that request and response context won't be available but the route function context remains available. So you'll need to copy any request/response data you'll need into local variables for deferred access.
I moved your array to a local var to show how the function context is preserved. You could change your csv write function to an append so you're not pushing data endlessly into memory.
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
#app.route("/test", methods=['GET','POST'])
def check():
arr = []
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
#flask.after_this_request
def add_close_action(response):
#response.call_on_close
def process_after_request():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return response
return json.dumps(res)
On GAE, I need to make some REST calls to a PiCloud server, then create an output page based on PiCloud return values. However, it will take several minutes for PiCloud to process the model. Thus I am wondering if I could create a 'loading' page first, and after finishing calculation, present the real output page.
In detail, the questions is how do I keep checking the status of my REST service and then generate different HTML pages based upon.
I appreciate any suggestions and comments!
PS: jQuery BlockUI seems to be good example, but it requires to estimate the timeout duration, which I could not guess...
Function to Call REST Service:
def get_jid(pdf_t, pdf_nop, pdf_p):
response = urlfetch.fetch(url=url, payload=data, method=urlfetch.POST, headers=http_headers)
jid= json.loads(response.content)['jid']
output_st = "running"
while output_st!="done":
response_st = urlfetch.fetch(url='https://api.picloud.com/job/?jids=%s&field=status' %jid, headers=http_headers)
output_st = json.loads(response_st.content)['info']['%s' %jid]['status']
url_val = 'https://api.picloud.com/job/result/?jid='+str(jid)
response_val = urlfetch.fetch(url=url_val, method=urlfetch.GET, headers=http_headers)
output_val = json.loads(response_val.content)['result']
return(jid, output_st, output_val)
Generate HTML Page:
class pdfPage_loading(webapp.RequestHandler):
def post(self):
final_res=get_jid(pdf_t, pdf_nop, pdf_p)[2]
html = html + template.render(templatepath + 'popup_pdf_eco.html', {
'title':'Ubertool',
'model_page':'',
'model_attributes':'Please wait','text_paragraph':''})
self.response.out.write(html)
class pdfPage_done(webapp.RequestHandler):
def post(self):
final_res=get_jid(pdf_t, pdf_nop, pdf_p)[2]
html = html + template.render(templatepath + 'popup_pdf_eco.html', {
'title':'Ubertool',
'model_page':final_res,
'model_attributes':'Please download your PDF here','text_paragraph':''})
self.response.out.write(html)
app_loading = webapp.WSGIApplication([('/.*', pdfPage_loading)], debug=True)
app_done = webapp.WSGIApplication([('/.*', pdfPage_done)], debug=True)
def main():
##Here is the problematic part:
if get_jid(pdf_t, pdf_nop, pdf_p)!='done':
run_wsgi_app(app_pre)
else:
run_wsgi_app(app)
if __name__ == '__main__':
main()
First off, you do not need multiple WSGIApplication handlers. You can base requests off of URLs and GET/POST params.
I would use a mixture of tasks and the Channel API:
When a user visits the page:
Create a channel token for the user to use
Make the REST API calls
Kick off a task (described below) passing it the created token and the id of the PiCloud job
Show the "loading page" and have the user connect to the Channel API with the token created in step 1
In the task, have it check on the status of the PiCloud job. If the status is not done, have the task setup a new task to call itself again, and check on the status of the job. Once the job is complete, pass the data along through the Channel API to the user so that they can then load the contents of the page or redirect to a page that will have the contents ready for them.
Example status check code (this is using the example code from PiCloud, you can use your own requests as you have been to get the status through urlfetch - this is merely meant as an example so you can plug your code in to work as needed):
import cloud
from google.appengine.api import taskqueue
from google.appengine.api import channel
#<Your other hanlders here>
class CheckStatus(webapp.RequestHandler):
def post(self):
job_id = self.request.get('job_id')
token = self.request.get('token')
status = cloud.get('status')
if status != 'done':
taskqueue.add(url='/checkstatus', params={'job_id': job_id, 'token': token}, method="POST", countdown=3)
return
channel.send_message(token, '<some message here>')
app = webapp.WSGIApplication([('/done', pdfPage_done),
('/checkstatus', CheckStatus),
('/.*', pdfPage_loading)], debug=True)
def main():
run_wsgi_app(app)
if __name__ == '__main__':
main()