Tornado celery integration hacks - python

Since nobody provided a solution to this post plus the fact that I desperately need a workaround, here is my situation and some abstract solutions/ideas for debate.
My stack:
Tornado
Celery
MongoDB
Redis
RabbitMQ
My problem: Find a way for Tornado to dispatch a celery task ( solved ) and then asynchronously gather the result ( any ideas? ).
Scenario 1: (request/response hack plus webhook)
Tornado receives a (user)request, then saves in local memory (or in Redis) a { jobID : (user)request} to remember where to propagate the response, and fires a celery task with jobID
When celery completes the task, it performs a webhook at some url and tells tornado that this jobID has finished ( plus the results )
Tornado retrieves the (user)request and forwards a response to the (user)
Can this happen? Does it have any logic?
Scenario 2: (tornado plus long-polling)
Tornado dispatches the celery task and returns some primary json data to the client (jQuery)
jQuery does some long-polling upon receipt of the primary json, say, every x microseconds, and tornado replies according to some database flag. When the celery task completes, this database flag is set to True, then jQuery "loop" is finished.
Is this efficient?
Any other ideas/schemas?

My solution involves polling from tornado to celery:
class CeleryHandler(tornado.web.RequestHandlerr):
#tornado.web.asynchronous
def get(self):
task = yourCeleryTask.delay(**kwargs)
def check_celery_task():
if task.ready():
self.write({'success':True} )
self.set_header("Content-Type", "application/json")
self.finish()
else:
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
Here is post about it.

Here is our solution to the problem. Since we look for result in several handlers in our application we made the celery lookup a mixin class.
This also makes code more readable with the tornado.gen pattern.
from functools import partial
class CeleryResultMixin(object):
"""
Adds a callback function which could wait for the result asynchronously
"""
def wait_for_result(self, task, callback):
if task.ready():
callback(task.result)
else:
# TODO: Is this going to be too demanding on the result backend ?
# Probably there should be a timeout before each add_callback
tornado.ioloop.IOLoop.instance().add_callback(
partial(self.wait_for_result, task, callback)
)
class ARemoteTaskHandler(CeleryResultMixin, tornado.web.RequestHandler):
"""Execute a task asynchronously over a celery worker.
Wait for the result without blocking
When the result is available send it back
"""
#tornado.web.asynchronous
#tornado.web.authenticated
#tornado.gen.engine
def post(self):
"""Test the provided Magento connection
"""
task = expensive_task.delay(
self.get_argument('somearg'),
)
result = yield tornado.gen.Task(self.wait_for_result, task)
self.write({
'success': True,
'result': result.some_value
})
self.finish()

I stumbled upon this question and hitting the results backend repeatedly did not look optimal to me. So I implemented a Mixin similar to your Scenario 1 using Unix Sockets.
It notifies Tornado as soon as the task finishes (to be accurate, as soon as next task in chain runs) and only hits results backend once. Here is the link.

Now, https://github.com/mher/tornado-celery comes to rescue...
class GenAsyncHandler(web.RequestHandler):
#asynchronous
#gen.coroutine
def get(self):
response = yield gen.Task(tasks.sleep.apply_async, args=[3])
self.write(str(response.result))
self.finish()

Related

how to redirect to another page and call a python function when clicking on submit button [duplicate]

I am writing an application in Flask, which works really well except that WSGI is synchronous and blocking. I have one task in particular which calls out to a third party API and that task can take several minutes to complete. I would like to make that call (it's actually a series of calls) and let it run. while control is returned to Flask.
My view looks like:
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
# do stuff
return Response(
mimetype='application/json',
status=200
)
Now, what I want to do is have the line
final_file = audio_class.render_audio()
run and provide a callback to be executed when the method returns, whilst Flask can continue to process requests. This is the only task which I need Flask to run asynchronously, and I would like some advice on how best to implement this.
I have looked at Twisted and Klein, but I'm not sure they are overkill, as maybe Threading would suffice. Or maybe Celery is a good choice for this?
I would use Celery to handle the asynchronous task for you. You'll need to install a broker to serve as your task queue (RabbitMQ and Redis are recommended).
app.py:
from flask import Flask
from celery import Celery
broker_url = 'amqp://guest#localhost' # Broker URL for RabbitMQ task queue
app = Flask(__name__)
celery = Celery(app.name, broker=broker_url)
celery.config_from_object('celeryconfig') # Your celery configurations in a celeryconfig.py
#celery.task(bind=True)
def some_long_task(self, x, y):
# Do some long task
...
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
some_long_task.delay(x, y) # Call your async task and pass whatever necessary variables
return Response(
mimetype='application/json',
status=200
)
Run your Flask app, and start another process to run your celery worker.
$ celery worker -A app.celery --loglevel=debug
I would also refer to Miguel Gringberg's write up for a more in depth guide to using Celery with Flask.
Threading is another possible solution. Although the Celery based solution is better for applications at scale, if you are not expecting too much traffic on the endpoint in question, threading is a viable alternative.
This solution is based on Miguel Grinberg's PyCon 2016 Flask at Scale presentation, specifically slide 41 in his slide deck. His code is also available on github for those interested in the original source.
From a user perspective the code works as follows:
You make a call to the endpoint that performs the long running task.
This endpoint returns 202 Accepted with a link to check on the task status.
Calls to the status link returns 202 while the taks is still running, and returns 200 (and the result) when the task is complete.
To convert an api call to a background task, simply add the #async_api decorator.
Here is a fully contained example:
from flask import Flask, g, abort, current_app, request, url_for
from werkzeug.exceptions import HTTPException, InternalServerError
from flask_restful import Resource, Api
from datetime import datetime
from functools import wraps
import threading
import time
import uuid
tasks = {}
app = Flask(__name__)
api = Api(app)
#app.before_first_request
def before_first_request():
"""Start a background thread that cleans up old tasks."""
def clean_old_tasks():
"""
This function cleans up old tasks from our in-memory data structure.
"""
global tasks
while True:
# Only keep tasks that are running or that finished less than 5
# minutes ago.
five_min_ago = datetime.timestamp(datetime.utcnow()) - 5 * 60
tasks = {task_id: task for task_id, task in tasks.items()
if 'completion_timestamp' not in task or task['completion_timestamp'] > five_min_ago}
time.sleep(60)
if not current_app.config['TESTING']:
thread = threading.Thread(target=clean_old_tasks)
thread.start()
def async_api(wrapped_function):
#wraps(wrapped_function)
def new_function(*args, **kwargs):
def task_call(flask_app, environ):
# Create a request context similar to that of the original request
# so that the task can have access to flask.g, flask.request, etc.
with flask_app.request_context(environ):
try:
tasks[task_id]['return_value'] = wrapped_function(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['return_value'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['return_value'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
finally:
# We record the time of the response, to help in garbage
# collecting old tasks
tasks[task_id]['completion_timestamp'] = datetime.timestamp(datetime.utcnow())
# close the database session (if any)
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task_thread': threading.Thread(
target=task_call, args=(current_app._get_current_object(),
request.environ))}
tasks[task_id]['task_thread'].start()
# Return a 202 response, with a link that the client can use to
# obtain task status
print(url_for('gettaskstatus', task_id=task_id))
return 'accepted', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return new_function
class GetTaskStatus(Resource):
def get(self, task_id):
"""
Return status about an asynchronous task. If this request returns a 202
status code, it means that task hasn't finished yet. Else, the response
from the task is returned.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'return_value' not in task:
return '', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return task['return_value']
class CatchAll(Resource):
#async_api
def get(self, path=''):
# perform some intensive processing
print("starting processing task, path: '%s'" % path)
time.sleep(10)
print("completed processing task, path: '%s'" % path)
return f'The answer is: {path}'
api.add_resource(CatchAll, '/<path:path>', '/')
api.add_resource(GetTaskStatus, '/status/<task_id>')
if __name__ == '__main__':
app.run(debug=True)
You can also try using multiprocessing.Process with daemon=True; the process.start() method does not block and you can return a response/status immediately to the caller while your expensive function executes in the background.
I experienced similar problem while working with falcon framework and using daemon process helped.
You'd need to do the following:
from multiprocessing import Process
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
heavy_process = Process( # Create a daemonic process with heavy "my_func"
target=my_func,
daemon=True
)
heavy_process.start()
return Response(
mimetype='application/json',
status=200
)
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You should get a response immediately and, after 10s you should see a printed message in the console.
NOTE: Keep in mind that daemonic processes are not allowed to spawn any child processes.
Flask 2.0
Flask 2.0 supports async routes now. You can use the httpx library and use the asyncio coroutines for that. You can change your code a bit like below
#app.route('/render/<id>', methods=['POST'])
async def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = await asyncio.gather(
audio_class.render_audio(data=text_list),
do_other_stuff_function()
)
# Just make sure that the coroutine should not having any blocking calls inside it.
return Response(
mimetype='application/json',
status=200
)
The above one is just a pseudo code, but you can checkout how asyncio works with flask 2.0 and for HTTP calls you can use httpx. And also make sure the coroutines are only doing some I/O tasks only.
If you are using redis, you can use Pubsub event to handle background tasks.
See more: https://redis.com/ebook/part-2-core-concepts/chapter-3-commands-in-redis/3-6-publishsubscribe/

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

tornado one handler blocks for another

Using python/tornado I wanted to set up a little "trampoline" server that allows two devices to communicate with each other in a RESTish manner. There's probably vastly superior/simpler "off the shelf" ways to do this. I'd welcome those suggestions, but I still feel it would be educational to figure out how to do my own using tornado.
Basically, the idea was that I would have the device in the role of server doing a longpoll with a GET. The client device would POST to the server, at which point the POST body would be transferred as the response of the blocked GET. Before the POST responded, it would block. The server side then does a PUT with the response, which is transferred to the blocked POST and return to the device. I thought maybe I could do this with tornado.queues. But that appears to not have worked out. My code:
import tornado
import tornado.web
import tornado.httpserver
import tornado.queues
ToServerQueue = tornado.queues.Queue()
ToClientQueue = tornado.queues.Queue()
class Query(tornado.web.RequestHandler):
def get(self):
toServer = ToServerQueue.get()
self.write(toServer)
def post(self):
toServer = self.request.body
ToServerQueue.put(toServer)
toClient = ToClientQueue.get()
self.write(toClient)
def put(self):
ToClientQueue.put(self.request.body)
self.write(bytes())
services = tornado.web.Application([(r'/query', Query)], debug=True)
services.listen(49009)
tornado.ioloop.IOLoop.instance().start()
Unfortunately, the ToServerQueue.get() does not actually block until the queue has an item, but rather returns a tornado.concurrent.Future. Which is not a legal value to pass to the self.write() call.
I guess my general question is twofold:
1) How can one HTTP verb invocation (e.g. get, put, post, etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
I've only really scratched the simple/straightforward use cases of making little REST servers with tornado. I wonder if the coroutine stuff is what I need, but haven't found a good tutorial/example of that to help me see the light, if that's indeed the way to go.
1) How can one HTTP verb invocation (e.g. get, put, post,u ne etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
The new RequestHandler object is created for every request. So you need some coordinator e.g. queues or locks with state object (in your case it would be re-implementing queue).
tornado.queues are queues for coroutines. Queue.get, Queue.put, Queue.join return Future objects, that need to be "resolved" - scheduled task done either with success or exception. To wait until future is resolved you should yielded it (just like in the doc examples of tornado.queues). The verbs method also need to be decorated with tornado.gen.coroutine.
import tornado.gen
class Query(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
toServer = yield ToServerQueue.get()
self.write(toServer)
#tornado.gen.coroutine
def post(self):
toServer = self.request.body
yield ToServerQueue.put(toServer)
toClient = yield ToClientQueue.get()
self.write(toClient)
#tornado.gen.coroutine
def put(self):
yield ToClientQueue.put(self.request.body)
self.write(bytes())
The GET request will last (wait in non-blocking manner) until something will be available on the queue (or timeout that can be defined as Queue.get arg).
tornado.queues.Queue provides also get_nowait (there is put_nowait as well) that don't have to be yielded - returns immediately item from queue or throws exception.

Making an asynchronous task in Flask

I am writing an application in Flask, which works really well except that WSGI is synchronous and blocking. I have one task in particular which calls out to a third party API and that task can take several minutes to complete. I would like to make that call (it's actually a series of calls) and let it run. while control is returned to Flask.
My view looks like:
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
# do stuff
return Response(
mimetype='application/json',
status=200
)
Now, what I want to do is have the line
final_file = audio_class.render_audio()
run and provide a callback to be executed when the method returns, whilst Flask can continue to process requests. This is the only task which I need Flask to run asynchronously, and I would like some advice on how best to implement this.
I have looked at Twisted and Klein, but I'm not sure they are overkill, as maybe Threading would suffice. Or maybe Celery is a good choice for this?
I would use Celery to handle the asynchronous task for you. You'll need to install a broker to serve as your task queue (RabbitMQ and Redis are recommended).
app.py:
from flask import Flask
from celery import Celery
broker_url = 'amqp://guest#localhost' # Broker URL for RabbitMQ task queue
app = Flask(__name__)
celery = Celery(app.name, broker=broker_url)
celery.config_from_object('celeryconfig') # Your celery configurations in a celeryconfig.py
#celery.task(bind=True)
def some_long_task(self, x, y):
# Do some long task
...
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
some_long_task.delay(x, y) # Call your async task and pass whatever necessary variables
return Response(
mimetype='application/json',
status=200
)
Run your Flask app, and start another process to run your celery worker.
$ celery worker -A app.celery --loglevel=debug
I would also refer to Miguel Gringberg's write up for a more in depth guide to using Celery with Flask.
Threading is another possible solution. Although the Celery based solution is better for applications at scale, if you are not expecting too much traffic on the endpoint in question, threading is a viable alternative.
This solution is based on Miguel Grinberg's PyCon 2016 Flask at Scale presentation, specifically slide 41 in his slide deck. His code is also available on github for those interested in the original source.
From a user perspective the code works as follows:
You make a call to the endpoint that performs the long running task.
This endpoint returns 202 Accepted with a link to check on the task status.
Calls to the status link returns 202 while the taks is still running, and returns 200 (and the result) when the task is complete.
To convert an api call to a background task, simply add the #async_api decorator.
Here is a fully contained example:
from flask import Flask, g, abort, current_app, request, url_for
from werkzeug.exceptions import HTTPException, InternalServerError
from flask_restful import Resource, Api
from datetime import datetime
from functools import wraps
import threading
import time
import uuid
tasks = {}
app = Flask(__name__)
api = Api(app)
#app.before_first_request
def before_first_request():
"""Start a background thread that cleans up old tasks."""
def clean_old_tasks():
"""
This function cleans up old tasks from our in-memory data structure.
"""
global tasks
while True:
# Only keep tasks that are running or that finished less than 5
# minutes ago.
five_min_ago = datetime.timestamp(datetime.utcnow()) - 5 * 60
tasks = {task_id: task for task_id, task in tasks.items()
if 'completion_timestamp' not in task or task['completion_timestamp'] > five_min_ago}
time.sleep(60)
if not current_app.config['TESTING']:
thread = threading.Thread(target=clean_old_tasks)
thread.start()
def async_api(wrapped_function):
#wraps(wrapped_function)
def new_function(*args, **kwargs):
def task_call(flask_app, environ):
# Create a request context similar to that of the original request
# so that the task can have access to flask.g, flask.request, etc.
with flask_app.request_context(environ):
try:
tasks[task_id]['return_value'] = wrapped_function(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['return_value'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['return_value'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
finally:
# We record the time of the response, to help in garbage
# collecting old tasks
tasks[task_id]['completion_timestamp'] = datetime.timestamp(datetime.utcnow())
# close the database session (if any)
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task_thread': threading.Thread(
target=task_call, args=(current_app._get_current_object(),
request.environ))}
tasks[task_id]['task_thread'].start()
# Return a 202 response, with a link that the client can use to
# obtain task status
print(url_for('gettaskstatus', task_id=task_id))
return 'accepted', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return new_function
class GetTaskStatus(Resource):
def get(self, task_id):
"""
Return status about an asynchronous task. If this request returns a 202
status code, it means that task hasn't finished yet. Else, the response
from the task is returned.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'return_value' not in task:
return '', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return task['return_value']
class CatchAll(Resource):
#async_api
def get(self, path=''):
# perform some intensive processing
print("starting processing task, path: '%s'" % path)
time.sleep(10)
print("completed processing task, path: '%s'" % path)
return f'The answer is: {path}'
api.add_resource(CatchAll, '/<path:path>', '/')
api.add_resource(GetTaskStatus, '/status/<task_id>')
if __name__ == '__main__':
app.run(debug=True)
You can also try using multiprocessing.Process with daemon=True; the process.start() method does not block and you can return a response/status immediately to the caller while your expensive function executes in the background.
I experienced similar problem while working with falcon framework and using daemon process helped.
You'd need to do the following:
from multiprocessing import Process
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
heavy_process = Process( # Create a daemonic process with heavy "my_func"
target=my_func,
daemon=True
)
heavy_process.start()
return Response(
mimetype='application/json',
status=200
)
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You should get a response immediately and, after 10s you should see a printed message in the console.
NOTE: Keep in mind that daemonic processes are not allowed to spawn any child processes.
Flask 2.0
Flask 2.0 supports async routes now. You can use the httpx library and use the asyncio coroutines for that. You can change your code a bit like below
#app.route('/render/<id>', methods=['POST'])
async def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = await asyncio.gather(
audio_class.render_audio(data=text_list),
do_other_stuff_function()
)
# Just make sure that the coroutine should not having any blocking calls inside it.
return Response(
mimetype='application/json',
status=200
)
The above one is just a pseudo code, but you can checkout how asyncio works with flask 2.0 and for HTTP calls you can use httpx. And also make sure the coroutines are only doing some I/O tasks only.
If you are using redis, you can use Pubsub event to handle background tasks.
See more: https://redis.com/ebook/part-2-core-concepts/chapter-3-commands-in-redis/3-6-publishsubscribe/

Simple threading in Django

I need to implement threading in Django. I require three simple APIs:
/work?process=data1&jobid=1&jobtype=nonasync
/status
/kill?jobid=1
The API descriptions are:
The work api will take a process and spawn a thread that processes it. For now, we can assume it to be a simple sleep(10) method. It will name the thread as jobid-1. The thread should be retrievable by this name. A new thread cannot be created if a jobid already exists. The jobtype could be async i.e, api call will immediately return http status code 200 after spawning a thread. Or it could be nonasync such that the api waits for the server to complete the thread and return result.
status api should just show the statues of each running processes.
kill api should kill a process based on jobid. status api should not show this job any longer.
Here is my Django code:
processList = []
class Processes(threading.Thread):
""" The work api can instantiate a process object and monitor it completion"""
threadBeginTime = time.time()
def __init__(self, timeout, threadName, jobType):
threading.Thread.__init__(self)
self.totalWaitTime = timeout
self.threadName = threadName
self.jobType = jobtype
def beginThread(self):
self.thread = threading.Thread(target=self.execution,
name = self.threadName)
self.thread.start()
def execution(self):
time.sleep(self.totalWaitTime)
def calculatePercentDone(self):
"""Gets the current percent done for the thread."""
temp = time.time()
secondsDone = float(temp - self.threadBeginTime)
percentDone = float((secondsDone) * 100 / self.totalWaitTime)
return (secondsDone, percentDone)
def killThread(self):
pass
# time.sleep(self.totalWaitTime)
def work(request):
""" Django process initiation view """
data = {}
timeout = int(request.REQUEST.get('process'))
jobid = int(request.REQUEST.get('jobid'))
jobtype = int(request.REQUEST.get('jobtype'))
myProcess = Processes(timeout, jobid, jobtype)
myProcess.beginThread()
processList.append(myProcess)
return render_to_response('work.html',{'data':data}, RequestContext(request))
def status(request):
""" Django process status view """
data = {}
for p in processList:
print p.threadName, p.calculatePercentDone()
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
def kill(request):
""" Django process kill view """
data = {}
jobid = int(request.REQUEST.get('jobid'))
# find jobid in processList and kill it
return render_to_response('server-status.html',{'data':data}, RequestContext(request))
There are several implementation issues in the above code. The thread spawning is not done in a proper way. I am not able to retrieve the processes status in status function. Also, the kill function is still implemented as I could not grab thread from its job id. Need help refactoring.
Update: I am doing this example for learning purposes, not for writing production code. Hence will not favour any off-the-shelf queueing libraries. The objective here is to understand how a multithreading works in conjunction with a web framework and what edge cases are there to be dealt.
As #Daniel Roseman mentioned above -- doing threading INSIDE of a Django request / response cycle is a very bad idea for many reasons.
What you're actually looking for here is task queueing.
There are a few libraries out there which make this sort of thing fairly simple -- I'll list them below in order of ease-of-use (the simplest ones are listed first):
Django-RQ (https://github.com/ui/django-rq) -- A very awesome, simple API that uses Redis to handle queueing and asynchronous tasks.
Celery (http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html) -- A very powerful, flexible, and large projects which handles queuing and supports many different technologies for backends. I'd recommend this for large projects, but for everything else I'd use RQ as it's quite a bit simpler.
Just my two cents.

Categories

Resources