Twisted webapp gets stuck when generating HTTP response - python

I've created a web application using Twisted and SQLAlchemy. Since SQLAlchemy doesn't work together very well with Twisted's callback-based design (Twisted + SQLAlchemy and the best way to do it), I use deferToThread() within the root resource in order to run every request within its own thread. While this does generally work, about 10% of the requests get "stuck". This means that when I click a link in the browser, the request is handled by Twisted and the code for the respective resource runs and generates HTML output. But for whatever reason, that output is never sent back to the browser. Instead, Twisted sends the HTTP headers (along with the correct Content-Length), but never sends the body. The connection just stays open indefinitely with the browser showing the spinner icon. No errors are generated by Twisted in the logfile.
Below is a minimal example. If you want to run it, save it with a .tac extension, then run twistd -noy example.tac. On my server, the issue seems to occur relatively infrequently in this particular piece of code. Use something like while true; do wget -O- 'http://server.example.com:8080' >/dev/null; done to test it.
from twisted.web.server import Site
from twisted.application import service, internet
from twisted.web.resource import Resource
from twisted.internet import threads
from twisted.web.server import NOT_DONE_YET
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine, Column, Integer, String
Base = declarative_base()
class User(Base):
'''A user account.'''
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
login = Column(String(64))
class WebInterface(Resource):
def __init__(self):
Resource.__init__(self)
db_url = "mysql://user:password#mysql-server.example.com/myapp?charset=utf8"
db_engine = create_engine(db_url, echo=False, pool_recycle=300) #discard connections after 300 seconds
self.DBSession = sessionmaker(bind=db_engine)
def on_request_done(self, _, request):
'''All actions that need to be done after a request has been successfully handled.'''
request.db_session.close()
print('Session closed') #does get printed, so session should get closed properly
def on_request_failed(self, err, call):
'''What happens if the request failed on a network level, for example because the user aborted the request'''
call.cancel()
def on_error(self, err, request):
'''What happens if an exception occurs during processing of the request'''
request.setResponseCode(500)
self.on_request_done(None, request)
request.finish()
return err
def getChild(self, name, request):
'''We dispatch all requests to ourselves in order to be able to do the processing in separate threads'''
return self
def render(self, request):
'''Dispatch the real work to a thread'''
d = threads.deferToThread(self.do_work, request)
d.addCallbacks(self.on_request_done, errback=self.on_error, callbackArgs=[request], errbackArgs=[request])
#If the client aborts the request, we need to cancel it to avoid error messages from twisted
request.notifyFinish().addErrback(self.on_request_failed, d)
return NOT_DONE_YET
def do_work(self, request):
'''This method runs in thread context.'''
db_session = self.DBSession()
request.db_session = db_session
user = db_session.query(User).first()
body = 'Hello, {} '.format(user.login) * 1024 #generate some output data
request.write(body)
request.finish()
application = service.Application("My Testapp")
s = internet.TCPServer(8080, Site(WebInterface()), interface='0.0.0.0')
s.setServiceParent(application)

Its possible you are not closing your database connection or some dead lock situation in the database using SQLAlchemy? I've had flask lock up on me before from not closing connections / not ending transactions.

I've solved the issue. #beiller, you were pretty close to it with your guess. As can be seen in the source code of my question, the DB session gets opened after request processing has started, but the two are closed in the same (instead of the reverse) order. Close the session before calling request.finish(), and everything's fine.

Related

Safe database query (peewee) from thread Flask

I have couple of simple tasks which could take maximum 20 seconds to complete, so I decided to use separate thread to accomplish them. I want thread to do the job and update database with result.
While it works (no exceptions yet) I have lack of understanding Flask internals and how it works with WSGI server. I'm not quite sure that on some amount of parallel requests it won't end with some database access error.
Simplified code:
from time import time, sleep
from threading import Thread
from peewee import *
from playhouse.shortcuts import model_to_dict
from flask import Flask, abort, jsonify
db = SqliteDatabase("test.db")
Flask(__name__)
class Task(Model):
status = IntegerField(default=0)
result = TextField(null=True)
class Meta:
database = db
def do_smth(task_id):
start = time()
sleep(10)
# DATABASE UPDATE HERE
Task.update({Task.status: 1, Task.result: f"{start} - {time()}"})\
.where(Task.id == task_id).execute()
#app.route("/new")
def new_task():
try:
task = Task.create()
except IntegrityError:
abort(500)
else:
Thread(target=do_smth, args=(task.id,)).start()
return jsonify(model_to_dict(task))
#app.route("/get/<int:task_id>")
def get_task(task_id):
try:
task = Task.get(Task.id == task_id)
except Task.DoesNotExist:
abort(404)
else:
return jsonify(model_to_dict(task))
#app.before_request
def before_request():
db.connect()
#app.after_request
def after_request(response):
db.close()
return response
if __name__ == "__main__":
with db:
db.create_tables([Task])
app.run(host="127.0.0.1", port=5000)
As it suggested in peewee tutorial I added custom Flask.before_request and Flask.after_request which open and close database connection.
So the question is how to update database from separate thread safely? I have had an idea to add route which will update database and send request from thread, but I find it kinda dumb.
P.S. I've tried my best trying to be precise, but if something is unclear I will try to clarify it, just ask it in comments section.
This is a good question:
how to update database from separate thread safely?
With Sqlite you have to remember that it only allows one writer at a time. So you have to manage your connections carefully to ensure that you are only doing a write txn when you have to, and that you're committing it as soon as you're done with it.
Since you're opening and closing the DB during the lifetime of a request, and running your DB operations in separate thread(s), you should be OK for a smallish number of operations (100?). I think the main thing I'd be careful about is, during your task body, be sure you're only holding that write txn open for as short a time as possible:
def do_smth(task_id):
# open a database connection. it will be read-only for now.
with db.connection_context():
start = time()
sleep(10)
with db.atomic() as txn: # here is write tx, keep this brief!
Task.update({Task.status: 1, Task.result: f"{start} - {time()}"})\
.where(Task.id == task_id).execute()
See the first section on transactions: https://charlesleifer.com/blog/going-fast-with-sqlite-and-python/
For a more drastic approach, you can try this: https://charlesleifer.com/blog/multi-threaded-sqlite-without-the-operationalerrors/

Flask SQLAlchemy sessions out of sync

I have a Flask REST API, running with a gunicorn/nginx stack. There is global SQLAlchemy session set up once for each thread that the API runs on. I set up an endpoint /test/ for running the unit tests for the API. One test makes a POST request to add something to the database, then has a finally: clause to clean up:
def test_something():
try:
url = "http://myposturl"
data = {"content" : "test post"}
headers = {'content-type': 'application/json'}
result = requests.post(url, json=data, headers=headers).json()
validate(result, myschema)
finally:
db.sqlsession.query(MyTable).filter(MyTable.content == "test post").delete()
db.sqlsession.commit()
The problem is that the thread to which the POST request is made now has a "test post" object in its session, but the database has no such object because the thread on which the tests ran deleted that thing from the database. So when I make a GET request to the server, about 1 in 4 times (I have 4 gunicorn workers), I get the "test post" object, and 3 in 4 times I do not. This is because the threads each have their own session object, and they are getting out of sync, but I don't really know what to do about it....
Here is my setup for my SQLAlchemy session:
def connectSQLAlchemy():
import sqlalchemy
import sqlalchemy.orm
engine = sqlalchemy.create_engine(connection_string(DBConfig.USER, DBConfig.PASSWORD, DBConfig.HOST, DBConfig.DB))
session_factory = sqlalchemy.orm.sessionmaker(bind=engine)
Session = sqlalchemy.orm.scoped_session(session_factory)
return Session()
# Create a global session for everyone
sqlsession = connectSQLAlchemy()
Please use flask-sqlalchemy if you're using flask, it takes care of the lifecycle of the session for you.
If you insist on doing it yourself, the correct pattern is to create a session for each request instead of having a global session. You should be doing
Session = scoped_session(session_factory, scopefunc=flask._app_ctx_stack.__ident_func__)
return Session
instead of
Session = scoped_session(session_factory)
return Session()
And do
session = Session()
every time you need a session. By virtue of the scoped_session and the scopefunc, this will return you a different session in each request, but the same session in the same request.
Figured it out. What I did was to add a setup and teardown to the request in my app's __init__.py:
#app.before_request
def startup_session():
db.session = db.connectSQLAlchemy()
#app.teardown_request
def shutdown_session(exception=None):
db.session.close()
still using the global session object in my db module:
db.py:
....
session = None
....
The scoped_session handles the different threads, I think...
Please advise if this is a terrible way to do this for some reason. =c)

sqlalchemy session not getting removed properly in flask testing

I'm using Flask-Testing which says:
Another gotcha is that Flask-SQLAlchemy also removes the session
instance at the end of every request (as should any threadsafe
application using SQLAlchemy with scoped_session). Therefore the
session is cleared along with any objects added to it every time you
call client.get() or another client method.
However, I'm not seeing that. This test fails:
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from flask.ext.testing import TestCase
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
db = SQLAlchemy(app)
#app.route('/')
def index():
print 'before request:', `db.session`
u = db.session.query(User).first()
u.name = 'bob'
return ''
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
class SessionTest(TestCase):
def create_app(self):
return app
def test_remove(self):
db.drop_all()
db.create_all()
u = User()
u.name = 'joe'
db.session.add(u)
db.session.commit()
client = app.test_client()
client.get('/')
print 'after request:', `db.session`
print u.name
assert u not in db.session
(Run with $ nosetests test_file.py to see it in action.)
stdout:
-------------------- >> begin captured stdout << ---------------------
before request: <sqlalchemy.orm.scoping.ScopedSession object at 0x10224c610>
after request: <sqlalchemy.orm.scoping.ScopedSession object at 0x10224c610>
bob
--------------------- >> end captured stdout << ----------------------
According to the docs, user u should not be in the session after a get request, but it is! Does anybody know why this is happening?
Furthermore, u.name is bob and not joe, even though the request never committed! (So I'm convinced it's the same session.)
For the record,
$ pip freeze | grep Flask
Flask==0.10.1
Flask-Bcrypt==0.5.2
Flask-DebugToolbar==0.8.0
Flask-Failsafe==0.1
Flask-SQLAlchemy==0.16
Flask-Script==0.6.2
Flask-Testing==0.4
Flask-Uploads==0.1.3
Flask-WTF==0.8
I'm pretty sure the confusion comes from the fact that sessions in SQLAlchemy are scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
For this reason, the session that you used outside of the context of a request is likely not the same session that the view function that handles the '/' route gets and then destroys at the end.
UPDATE: I dug around a bit and figured this thing out.
Flask-SQLAlchemy installs a hook on app.teardown_appcontext, and here is where it calls db.session.remove().
The testing environment does not fully replicate the environment of a real request because it does not push/pop the application context. Because of that the session is never removed at the end of the request.
As a side note, keep in mind that functions registered with before_request and after_request are also not called when you call client.get().
You can force an application context push and pop with a small change to your test:
def test_remove(self):
db.drop_all()
db.create_all()
u = User()
u.name = 'joe'
db.session.add(u)
db.session.commit()
with app.app_context():
client = app.test_client()
client.get('/')
print 'after request:', `db.session`
print u.name
assert u not in db.session
with this change the test passes.
The documentation for Flask-Testing seems to be wrong or more likely outdated. Maybe things worked like they describe at some point, but that isn't accurate for current Flask and Flask-SQLAlchemy versions.
I hope this helps!
FlaskClient works with request context while Flask-SQLAlchemy calls it's shutdown_session on app.teardown_appcontext since Flask version 0.9. Thats why nothing happens after test client call, bacause app context started by flask.ext.testing.TestCase even before test's setUp and will be closed after tearDown.
I got the same problem when tried to use Flask-Manage to run my tests. Running tests in a separate thread solved the problem.
import threading
# some code omited
runner = unittest.TextTestRunner(verbosity=2)
t = threading.Thread(target=runner.run, args=[test_suite])
t.start()
t.join()
# other code omited

Twisted: Wait for a deferred to 'finish'

How can I 'throw' deferred's into the reactor so it gets handled somewhere down the road?
Situation
I have 2 programs running on localhost.
A twisted jsonrpc service (localhost:30301)
A twisted webservice (localhost:4000)
When someone connects to the webservice, It needs to send a query to the jsonrpc service, wait for it to come back with a result, then display the result in the web browser of the user (returning the value of the jsonrpc call).
I can't seem to figure out how to return the value of the deferred jsonrpc call. When I visit the webservice with my browser I get a HTML 500 error code (did not return any byte) and Value: < Deferred at 0x3577b48 >.
It returns the deferred object and not the actual value of the callback.
Been looking around for a couple of hours and tried a lot of different variations before asking.
from txjsonrpc.web.jsonrpc import Proxy
from twisted.web import resource
from twisted.web.server import Site
from twisted.internet import reactor
class Rpc():
def __init__(self, child):
self._proxy = Proxy('http://127.0.0.1:30301/%s' % child)
def execute(self, function):
return self._proxy.callRemote(function)
class Server(resource.Resource):
isLeaf = True
def render_GET(self, request):
rpc = Rpc('test').execute('test')
def test(result):
return '<h1>%s</h1>' % result
rpc.addCallback(test)
return rpc
site = Site(Server())
reactor.listenTCP(4000, site)
print 'Running'
reactor.run()
The problem you're having here is that web's IResource is a very old interface, predating even Deferred.
The quick solution to your problem is to use Klein, which provides a nice convenient high-level wrapper around twisted.web for writing web applications, among other things, adding lots of handling for Deferreds throughout the API.
The slightly more roundabout way to address it is to read the chapter of the Twisted documentation that is specifically about asynchronous responses in twisted.web.

SQLAlchemy+Tornado: How to create a scopefunc for SQLAlchemy's ScopedSession?

Using tornado, I want to create a bit of middleware magic that ensures that my SQLAlchemy sessions get properly closed/cleaned up so that objects aren't shared from one request to the next. The trick is that, since some of my tornado handlers are asynchronous, I can't just share one session for each request.
So I am left trying to create a ScopedSession that knows how to create a new session for each request. All I need to do is define a scopefunc for my code that can turn the currently executing request into a unique key of some sort, however I can't seem to figure out how to get the current request at any one point in time (outside of the scope of the current RequestHandler, which my function doesn't have access to either).
Is there something I can do to make this work?
You might want to associate the Session with the request itself (i.e. don't use scopedsession if it's not convenient). Then you can just say, request.session. Still needs to have hooks at the start/end for setup/teardown.
edit: custom scoping function
def get_current_tornado_request():
# TODO: ask on the Tornado mailing list how
# to acquire the request currently being invoked
Session = scoped_session(sessionmaker(), scopefunc=get_current_tornado_request)
(This is a 2017 answer to a 2011 question) As #Stefano Borini pointed out, easiest way in Tornado 4 is to just let the RequestHandler implicitly pass the session around. Tornado will track the handler instance state when using coroutine decorator patterns:
import logging
_logger = logging.getLogger(__name__)
from sqlalchemy import create_engine, exc as sqla_exc
from sqlalchemy.orm import sessionmaker, exc as orm_exc
from tornado import gen
from tornado.web import RequestHandler
from my_models import SQLA_Class
Session = sessionmaker(bind=create_engine(...))
class BaseHandler(RequestHandler):
#gen.coroutine
def prepare():
self.db_session = Session()
def on_finish():
self.db_session.close()
class MyHander(BaseHandler):
#gen.coroutine
def post():
SQLA_Object = self.db_session.query(SQLA_Class)...
SQLA_Object.attribute = ...
try:
db_session.commit()
except sqla_exc.SQLAlchemyError:
_logger.exception("Couldn't commit")
db_session.rollback()
If you really really need to asynchronously reference a SQL Alchemy session inside a declarative_base (which I would consider an anti-pattern since it over-couples the model to the application), Amit Matani has a non-working example here.

Categories

Resources