I have couple of simple tasks which could take maximum 20 seconds to complete, so I decided to use separate thread to accomplish them. I want thread to do the job and update database with result.
While it works (no exceptions yet) I have lack of understanding Flask internals and how it works with WSGI server. I'm not quite sure that on some amount of parallel requests it won't end with some database access error.
Simplified code:
from time import time, sleep
from threading import Thread
from peewee import *
from playhouse.shortcuts import model_to_dict
from flask import Flask, abort, jsonify
db = SqliteDatabase("test.db")
Flask(__name__)
class Task(Model):
status = IntegerField(default=0)
result = TextField(null=True)
class Meta:
database = db
def do_smth(task_id):
start = time()
sleep(10)
# DATABASE UPDATE HERE
Task.update({Task.status: 1, Task.result: f"{start} - {time()}"})\
.where(Task.id == task_id).execute()
#app.route("/new")
def new_task():
try:
task = Task.create()
except IntegrityError:
abort(500)
else:
Thread(target=do_smth, args=(task.id,)).start()
return jsonify(model_to_dict(task))
#app.route("/get/<int:task_id>")
def get_task(task_id):
try:
task = Task.get(Task.id == task_id)
except Task.DoesNotExist:
abort(404)
else:
return jsonify(model_to_dict(task))
#app.before_request
def before_request():
db.connect()
#app.after_request
def after_request(response):
db.close()
return response
if __name__ == "__main__":
with db:
db.create_tables([Task])
app.run(host="127.0.0.1", port=5000)
As it suggested in peewee tutorial I added custom Flask.before_request and Flask.after_request which open and close database connection.
So the question is how to update database from separate thread safely? I have had an idea to add route which will update database and send request from thread, but I find it kinda dumb.
P.S. I've tried my best trying to be precise, but if something is unclear I will try to clarify it, just ask it in comments section.
This is a good question:
how to update database from separate thread safely?
With Sqlite you have to remember that it only allows one writer at a time. So you have to manage your connections carefully to ensure that you are only doing a write txn when you have to, and that you're committing it as soon as you're done with it.
Since you're opening and closing the DB during the lifetime of a request, and running your DB operations in separate thread(s), you should be OK for a smallish number of operations (100?). I think the main thing I'd be careful about is, during your task body, be sure you're only holding that write txn open for as short a time as possible:
def do_smth(task_id):
# open a database connection. it will be read-only for now.
with db.connection_context():
start = time()
sleep(10)
with db.atomic() as txn: # here is write tx, keep this brief!
Task.update({Task.status: 1, Task.result: f"{start} - {time()}"})\
.where(Task.id == task_id).execute()
See the first section on transactions: https://charlesleifer.com/blog/going-fast-with-sqlite-and-python/
For a more drastic approach, you can try this: https://charlesleifer.com/blog/multi-threaded-sqlite-without-the-operationalerrors/
Related
I'm new to Python and I'm working on a FastAPI & peewee application. I want to manage the database connection pool explicitly so according to the framework integration documentation for FastAPI I use the startup/shutdown events.
database = PooledSqliteDatabase('foobar.db', autoconnect=False, stale_timeout=60)
with database:
database.create_tables([models.Foobar])
app = FastAPI()
app.include_router(foobar_router)
#app.on_event("startup")
def startup():
database.connect()
#app.on_event("shutdown")
def shutdown():
if not database.is_closed():
database.close()
The problem is that these events do not correspond to "open the connection when a request is received, then close it when the response is returned", but when the whole app starts and stops. More importantly I don't understand why I get "Error, database connection not opened." when I access the database in my router as it should still be opened at that point? It only seems to work when I add async to my routes.
I guess I could switch to some proper "before/after request" middleware, but at this point I'm wondering if my approach is just completely wrong.
Edit: I tried using middleware, but while this unsurprisingly works as intended, I still have the issue that I get "Error, database connection not opened." unless I add async to my routes.
#app.middleware("http")
async def with_database(request: Request, call_next):
with database:
response = await call_next(request)
return response
Note your pool probably won't work as expected in an asyncio environment. Peewee's connection model, including the pool, is built around a thread-per-connection. With async the connections are pooled, but since everything is running in the same thread, all your coroutines will be sharing the same connection. This doesn't present any problems for standard threaded or gevent code (which patches thread local to be green-thread local).
That said, I don't use FastAPI but extrapolating from their (insane) sqlalchemy doc, you might try the following approach if you want to avoid a per-request middleware:
from fastapi import Depends, FastAPI, HTTPException
app = FastAPI()
database = PooledSqliteDatabase('foobar.db', autoconnect=False, stale_timeout=60)
with database:
database.create_tables([models.Foobar])
def ensure_connection():
database.connect()
try:
yield database
finally:
database.close()
#app.post("/users/", ...)
def create_user(..., database=Depends(ensure_connection)):
...
I think a per-request middleware is probably cleaner to implement, however. Given my caveat at the beginning of the comment, I'd strongly suggest you analyze your application and verify that it's using connections properly -- very likely (and this will depend on when you yield to the event loop) it is not, though.
I am writing an application in Flask, which works really well except that WSGI is synchronous and blocking. I have one task in particular which calls out to a third party API and that task can take several minutes to complete. I would like to make that call (it's actually a series of calls) and let it run. while control is returned to Flask.
My view looks like:
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
# do stuff
return Response(
mimetype='application/json',
status=200
)
Now, what I want to do is have the line
final_file = audio_class.render_audio()
run and provide a callback to be executed when the method returns, whilst Flask can continue to process requests. This is the only task which I need Flask to run asynchronously, and I would like some advice on how best to implement this.
I have looked at Twisted and Klein, but I'm not sure they are overkill, as maybe Threading would suffice. Or maybe Celery is a good choice for this?
I would use Celery to handle the asynchronous task for you. You'll need to install a broker to serve as your task queue (RabbitMQ and Redis are recommended).
app.py:
from flask import Flask
from celery import Celery
broker_url = 'amqp://guest#localhost' # Broker URL for RabbitMQ task queue
app = Flask(__name__)
celery = Celery(app.name, broker=broker_url)
celery.config_from_object('celeryconfig') # Your celery configurations in a celeryconfig.py
#celery.task(bind=True)
def some_long_task(self, x, y):
# Do some long task
...
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
some_long_task.delay(x, y) # Call your async task and pass whatever necessary variables
return Response(
mimetype='application/json',
status=200
)
Run your Flask app, and start another process to run your celery worker.
$ celery worker -A app.celery --loglevel=debug
I would also refer to Miguel Gringberg's write up for a more in depth guide to using Celery with Flask.
Threading is another possible solution. Although the Celery based solution is better for applications at scale, if you are not expecting too much traffic on the endpoint in question, threading is a viable alternative.
This solution is based on Miguel Grinberg's PyCon 2016 Flask at Scale presentation, specifically slide 41 in his slide deck. His code is also available on github for those interested in the original source.
From a user perspective the code works as follows:
You make a call to the endpoint that performs the long running task.
This endpoint returns 202 Accepted with a link to check on the task status.
Calls to the status link returns 202 while the taks is still running, and returns 200 (and the result) when the task is complete.
To convert an api call to a background task, simply add the #async_api decorator.
Here is a fully contained example:
from flask import Flask, g, abort, current_app, request, url_for
from werkzeug.exceptions import HTTPException, InternalServerError
from flask_restful import Resource, Api
from datetime import datetime
from functools import wraps
import threading
import time
import uuid
tasks = {}
app = Flask(__name__)
api = Api(app)
#app.before_first_request
def before_first_request():
"""Start a background thread that cleans up old tasks."""
def clean_old_tasks():
"""
This function cleans up old tasks from our in-memory data structure.
"""
global tasks
while True:
# Only keep tasks that are running or that finished less than 5
# minutes ago.
five_min_ago = datetime.timestamp(datetime.utcnow()) - 5 * 60
tasks = {task_id: task for task_id, task in tasks.items()
if 'completion_timestamp' not in task or task['completion_timestamp'] > five_min_ago}
time.sleep(60)
if not current_app.config['TESTING']:
thread = threading.Thread(target=clean_old_tasks)
thread.start()
def async_api(wrapped_function):
#wraps(wrapped_function)
def new_function(*args, **kwargs):
def task_call(flask_app, environ):
# Create a request context similar to that of the original request
# so that the task can have access to flask.g, flask.request, etc.
with flask_app.request_context(environ):
try:
tasks[task_id]['return_value'] = wrapped_function(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['return_value'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['return_value'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
finally:
# We record the time of the response, to help in garbage
# collecting old tasks
tasks[task_id]['completion_timestamp'] = datetime.timestamp(datetime.utcnow())
# close the database session (if any)
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task_thread': threading.Thread(
target=task_call, args=(current_app._get_current_object(),
request.environ))}
tasks[task_id]['task_thread'].start()
# Return a 202 response, with a link that the client can use to
# obtain task status
print(url_for('gettaskstatus', task_id=task_id))
return 'accepted', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return new_function
class GetTaskStatus(Resource):
def get(self, task_id):
"""
Return status about an asynchronous task. If this request returns a 202
status code, it means that task hasn't finished yet. Else, the response
from the task is returned.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'return_value' not in task:
return '', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return task['return_value']
class CatchAll(Resource):
#async_api
def get(self, path=''):
# perform some intensive processing
print("starting processing task, path: '%s'" % path)
time.sleep(10)
print("completed processing task, path: '%s'" % path)
return f'The answer is: {path}'
api.add_resource(CatchAll, '/<path:path>', '/')
api.add_resource(GetTaskStatus, '/status/<task_id>')
if __name__ == '__main__':
app.run(debug=True)
You can also try using multiprocessing.Process with daemon=True; the process.start() method does not block and you can return a response/status immediately to the caller while your expensive function executes in the background.
I experienced similar problem while working with falcon framework and using daemon process helped.
You'd need to do the following:
from multiprocessing import Process
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
heavy_process = Process( # Create a daemonic process with heavy "my_func"
target=my_func,
daemon=True
)
heavy_process.start()
return Response(
mimetype='application/json',
status=200
)
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You should get a response immediately and, after 10s you should see a printed message in the console.
NOTE: Keep in mind that daemonic processes are not allowed to spawn any child processes.
Flask 2.0
Flask 2.0 supports async routes now. You can use the httpx library and use the asyncio coroutines for that. You can change your code a bit like below
#app.route('/render/<id>', methods=['POST'])
async def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = await asyncio.gather(
audio_class.render_audio(data=text_list),
do_other_stuff_function()
)
# Just make sure that the coroutine should not having any blocking calls inside it.
return Response(
mimetype='application/json',
status=200
)
The above one is just a pseudo code, but you can checkout how asyncio works with flask 2.0 and for HTTP calls you can use httpx. And also make sure the coroutines are only doing some I/O tasks only.
If you are using redis, you can use Pubsub event to handle background tasks.
See more: https://redis.com/ebook/part-2-core-concepts/chapter-3-commands-in-redis/3-6-publishsubscribe/
I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]
I've created a web application using Twisted and SQLAlchemy. Since SQLAlchemy doesn't work together very well with Twisted's callback-based design (Twisted + SQLAlchemy and the best way to do it), I use deferToThread() within the root resource in order to run every request within its own thread. While this does generally work, about 10% of the requests get "stuck". This means that when I click a link in the browser, the request is handled by Twisted and the code for the respective resource runs and generates HTML output. But for whatever reason, that output is never sent back to the browser. Instead, Twisted sends the HTTP headers (along with the correct Content-Length), but never sends the body. The connection just stays open indefinitely with the browser showing the spinner icon. No errors are generated by Twisted in the logfile.
Below is a minimal example. If you want to run it, save it with a .tac extension, then run twistd -noy example.tac. On my server, the issue seems to occur relatively infrequently in this particular piece of code. Use something like while true; do wget -O- 'http://server.example.com:8080' >/dev/null; done to test it.
from twisted.web.server import Site
from twisted.application import service, internet
from twisted.web.resource import Resource
from twisted.internet import threads
from twisted.web.server import NOT_DONE_YET
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine, Column, Integer, String
Base = declarative_base()
class User(Base):
'''A user account.'''
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
login = Column(String(64))
class WebInterface(Resource):
def __init__(self):
Resource.__init__(self)
db_url = "mysql://user:password#mysql-server.example.com/myapp?charset=utf8"
db_engine = create_engine(db_url, echo=False, pool_recycle=300) #discard connections after 300 seconds
self.DBSession = sessionmaker(bind=db_engine)
def on_request_done(self, _, request):
'''All actions that need to be done after a request has been successfully handled.'''
request.db_session.close()
print('Session closed') #does get printed, so session should get closed properly
def on_request_failed(self, err, call):
'''What happens if the request failed on a network level, for example because the user aborted the request'''
call.cancel()
def on_error(self, err, request):
'''What happens if an exception occurs during processing of the request'''
request.setResponseCode(500)
self.on_request_done(None, request)
request.finish()
return err
def getChild(self, name, request):
'''We dispatch all requests to ourselves in order to be able to do the processing in separate threads'''
return self
def render(self, request):
'''Dispatch the real work to a thread'''
d = threads.deferToThread(self.do_work, request)
d.addCallbacks(self.on_request_done, errback=self.on_error, callbackArgs=[request], errbackArgs=[request])
#If the client aborts the request, we need to cancel it to avoid error messages from twisted
request.notifyFinish().addErrback(self.on_request_failed, d)
return NOT_DONE_YET
def do_work(self, request):
'''This method runs in thread context.'''
db_session = self.DBSession()
request.db_session = db_session
user = db_session.query(User).first()
body = 'Hello, {} '.format(user.login) * 1024 #generate some output data
request.write(body)
request.finish()
application = service.Application("My Testapp")
s = internet.TCPServer(8080, Site(WebInterface()), interface='0.0.0.0')
s.setServiceParent(application)
Its possible you are not closing your database connection or some dead lock situation in the database using SQLAlchemy? I've had flask lock up on me before from not closing connections / not ending transactions.
I've solved the issue. #beiller, you were pretty close to it with your guess. As can be seen in the source code of my question, the DB session gets opened after request processing has started, but the two are closed in the same (instead of the reverse) order. Close the session before calling request.finish(), and everything's fine.
I'm using Flask-Testing which says:
Another gotcha is that Flask-SQLAlchemy also removes the session
instance at the end of every request (as should any threadsafe
application using SQLAlchemy with scoped_session). Therefore the
session is cleared along with any objects added to it every time you
call client.get() or another client method.
However, I'm not seeing that. This test fails:
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from flask.ext.testing import TestCase
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
db = SQLAlchemy(app)
#app.route('/')
def index():
print 'before request:', `db.session`
u = db.session.query(User).first()
u.name = 'bob'
return ''
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
class SessionTest(TestCase):
def create_app(self):
return app
def test_remove(self):
db.drop_all()
db.create_all()
u = User()
u.name = 'joe'
db.session.add(u)
db.session.commit()
client = app.test_client()
client.get('/')
print 'after request:', `db.session`
print u.name
assert u not in db.session
(Run with $ nosetests test_file.py to see it in action.)
stdout:
-------------------- >> begin captured stdout << ---------------------
before request: <sqlalchemy.orm.scoping.ScopedSession object at 0x10224c610>
after request: <sqlalchemy.orm.scoping.ScopedSession object at 0x10224c610>
bob
--------------------- >> end captured stdout << ----------------------
According to the docs, user u should not be in the session after a get request, but it is! Does anybody know why this is happening?
Furthermore, u.name is bob and not joe, even though the request never committed! (So I'm convinced it's the same session.)
For the record,
$ pip freeze | grep Flask
Flask==0.10.1
Flask-Bcrypt==0.5.2
Flask-DebugToolbar==0.8.0
Flask-Failsafe==0.1
Flask-SQLAlchemy==0.16
Flask-Script==0.6.2
Flask-Testing==0.4
Flask-Uploads==0.1.3
Flask-WTF==0.8
I'm pretty sure the confusion comes from the fact that sessions in SQLAlchemy are scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
For this reason, the session that you used outside of the context of a request is likely not the same session that the view function that handles the '/' route gets and then destroys at the end.
UPDATE: I dug around a bit and figured this thing out.
Flask-SQLAlchemy installs a hook on app.teardown_appcontext, and here is where it calls db.session.remove().
The testing environment does not fully replicate the environment of a real request because it does not push/pop the application context. Because of that the session is never removed at the end of the request.
As a side note, keep in mind that functions registered with before_request and after_request are also not called when you call client.get().
You can force an application context push and pop with a small change to your test:
def test_remove(self):
db.drop_all()
db.create_all()
u = User()
u.name = 'joe'
db.session.add(u)
db.session.commit()
with app.app_context():
client = app.test_client()
client.get('/')
print 'after request:', `db.session`
print u.name
assert u not in db.session
with this change the test passes.
The documentation for Flask-Testing seems to be wrong or more likely outdated. Maybe things worked like they describe at some point, but that isn't accurate for current Flask and Flask-SQLAlchemy versions.
I hope this helps!
FlaskClient works with request context while Flask-SQLAlchemy calls it's shutdown_session on app.teardown_appcontext since Flask version 0.9. Thats why nothing happens after test client call, bacause app context started by flask.ext.testing.TestCase even before test's setUp and will be closed after tearDown.
I got the same problem when tried to use Flask-Manage to run my tests. Running tests in a separate thread solved the problem.
import threading
# some code omited
runner = unittest.TextTestRunner(verbosity=2)
t = threading.Thread(target=runner.run, args=[test_suite])
t.start()
t.join()
# other code omited