Is this an acceptable way to make threaded SQLAlchemy queries from Twisted? - python

I've been doing some reading on using SQLAlchemy's ORM in the context of a Twisted application. It's a lot of information to digest, so I'm having a bit of trouble putting all the pieces together. So far, I've gathered the following absolute truths:
One session implies one thread. Always.
scoped_session, by default, provides us with a way of constraining sessions to a given thread. In other words, I am sure that by using scoped_session, I will not pass sessions to other threads (unless I do so explicitly, which I won't).
I also gathered that there are some issues relating to lazy/eager-loading and that one possible approach is to dissociate ORM objects from a session and reattach them to another session when changing threads. I'm quite fuzzy on the details, but I also concluded that scoped_session renders many of these points moot.
My first question is whether or not I am severely mistaken in my above conclusions.
Beyond that, I've crafted this approach, which I hope is satisfactory.
I begin by creating a scoped_session object...
Session = scoped_session(sessionmaker(bind=_my_engine))
... which I will then use from a context manager, in order to handle exceptions and clean-up gracefully:
#contextmanager
def transaction_context():
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.remove() # dispose of the session
Now all I need to do is to use the above context manager in a function that is deferred to a separate thread. I've thrown together a decorator to make things a bit prettier:
def threaded(fn):
#wraps(fn) # functools.wraps
def wrapper(*args, **kwargs):
return deferToThread(fn, *args, **kwargs) # t.i.threads.deferToThread
return wrapper
Here is an example of how I intend to use the whole shebang. Below is a function that performs a DB lookup using the SQLAlchemy ORM:
#threaded
def get_some_attributes(group):
with transaction_context() as session:
return session.query(Attribute).filter(Attribute.group == group)
My second question is whether or not this approach is viable.
Am I making any fundamentally flawed assumptions?
Are there any caveats?
Is there a better way?
Edit: Here is a related question concerning the unexpected error in my context manager.

Right now I work on this exact problem, and I think I found a solution.
Indeed, you must defer all database access functions to a thread. But in your solution, you remove the session after querying the database, so all your results ORM objects will be detached and you wont have access to their fields.
You can't use scoped_session because in Twisted we have only one MainThread (except with things that work in deferToThread). We can, however, use scoped_sesssion with scopefunc.
In Twisted there is a great thing known as ContextTracker:
provides a way to pass arbitrary key/value data up and down a call
stack without passing them as parameters to the functions on that call
stack.
In my twisted web app in method render_GET I set a uuid parameter:
call = context.call({"uuid": str(uuid.uuid4())}, self._render, request)
and then I call the _render method to do the actual work (work with db, render html, etc).
I create the scoped_session like this:
scopefunc = functools.partial(context.get, "uuid")
Session = scoped_session(session_factory, scopefunc=scopefunc)
Now within any function calls of _render I can get session with:
Session()
and at the end of _render I have to do Session.remove() to remove the session.
It worksa with my webapp and I think can work for other tasks.
This is completely standalone example, show how all it work together.
from twisted.internet import reactor, threads
from twisted.web.resource import Resource
from twisted.web.server import Site, NOT_DONE_YET
from twisted.python import context
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
import uuid
import functools
engine = create_engine(
'sqlite:///test.sql',
connect_args={'check_same_thread': False},
echo=False)
session_factory = sessionmaker(bind=engine)
scopefunc = functools.partial(context.get, "uuid")
Session = scoped_session(session_factory, scopefunc=scopefunc)
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
Base.metadata.create_all(bind=engine)
class TestPage(Resource):
isLeaf = True
def render_GET(self, request):
context.call({"uuid": str(uuid.uuid4())}, self._render, request)
return NOT_DONE_YET
def render_POST(self, request):
return self.render_GET(request)
def work_with_db(self):
user = User(name="TestUser")
Session.add(user)
Session.commit()
return user
def _render(self, request):
print "session: ", id(Session())
d = threads.deferToThread(self.work_with_db)
def success(result):
html = "added user with name - %s" % result.name
request.write(html.encode('UTF-8'))
request.finish()
Session.remove()
call = functools.partial(context.call, {"uuid": scopefunc()}, success)
d.addBoth(call)
return d
if __name__ == "__main__":
reactor.listenTCP(8888, Site(TestPage()))
reactor.run()
I print out id of session, and you can see that its different for each request. If you remove scopefunc from scoped_session constructor and do two simultaneous request(insert time.sleep to work_with_db), you will get one common session for this two requests.
The scoped_session object by default uses threading.local() as storage, so that a single Session is maintained for all who call upon the scoped_session registry, but only within the scope of a single thread
a problem here that in twisted we have only one thread for all requests. Thats why we have to create own scopefunc, that will show the difference between requests.
An other problem, that twisted didnt pass context to callbacks and we have to wrap callback and send current context to it.
call = functools.partial(context.call, {"uuid": scopefunc()}, success)
Still I dont know how to make it work with defer.inLineCallback, that I use everywhere in my code.

Related

FastAPI refuses to let me create a mongoengine document

I have FastAPI Python application with routes that operate on a MongoDB instance.
The connection works fine, and I can query documents for my GET endpoints, but creating a new document from within FastAPI seems impossible.
I consistently get:
You have not defined a default connection
I have a standalone script that handles some data migration tasks and it uses the exact same DB class and Document models that the FastAPI app does, and that script is able to save documents to mongo perfectly fine. There is no difference in how the DB object is instantiated between the API and the script.
The DB class:
from os import getenv
from mongoengine import connect
from pymongo import MongoClient
from pymongo.errors import ServerSelectionTimeoutError
class Mongo:
#property
def target_db(self):
return 'some_db'
#property
def uri(self) -> str:
env_uri = getenv('MONGODB', None)
if env_uri is None:
raise DBError('MONGODB environment variable missing')
return env_uri.strip()
def connect(self) -> MongoClient:
try:
return connect(host=self.uri, db=self.target_db, alias=self.target_db)
except ServerSelectionTimeoutError as e:
raise ServerSelectionTimeoutError(e)
All of my DB models have meta attributes defining exactly what DB and collection to use:
class Thing(Document):
meta = {'db_alias': 'some_db',
'collection': 'things'}
Queries on existing documents succeed inside of a route definition:
results = Thing.objects.filter(**query)
# This returns things that I can iterate over
Document creation fails inside of a route definition:
new_thing = Thing(**creation_args)
new_thing.save()
Error:
mongoengine.connection.ConnectionFailure: You have not defined a default connection
What does that even mean? I know that I'm connected because I can query the db.
How is it possible that I can successfully query documents from Mongo but not save them?
Every suggestion I have seen online points to not having defined a db or alias in the call to mongoengine.connect, but I clearly am in my Mongo object, and even if that were true, surely I wouldn't be able to retrieve documents from the db collection...
The mongoengine Document models had malformatted meta attributes...
class Accounts(Document):
meta = {'db_alias': 'some_db',
'collection': 'things'}
Solution: ​db_alias​ needed to be changed to ​db​.
I didn't find this through documentation, and definitely not through the extremely unhelpful error messages. I just tried it on a whim. Now everything works using the FastAPI framework.

Using Flask SQLAlchemy from worker threads

I have a python app that uses Flask RESTful as well as Flask SQLAlchemy. Part of the API I'm writing has the side effect of spinning off Timer objects. When a Timer expires, it executes some database queries. I'm seeing an issue in which code that is supposed to update rows in the database (a sqlite backend) is actually not issuing any UPDATE statements. I have verified this by turning the SQLALCHEMY_ECHO flag on to log the SQL statements. Whether or not the code works seems to be random. About half the time it fails to issue the UPDATE statement. See full example below.
My guess here is that SQLAlchemy Flask does not work properly when called from a worker thread. I think part of the point of Flask SQLAlchemy is to manage the SQLAlchemy sessions for you per API request. Obviously since there are no API requests going on when the Timer expires, I could see where things may not work properly.
Just to test this, I went ahead and wrote a simple data access layer using python's sqlite3 interface and it seems to solve the problem.
I'd really rather not have to rewrite a bunch of data access code though. Is there a way to get Flask SQLAlchemy to work properly in this case?
Sample code
Here's where I set up the flask app and save off the SQLAlchemy db object:
from flask import Flask
from flask_restful import Api
from flask.ext.sqlalchemy import SQLAlchemy
from flask_cors import CORS
import db_conn
flask_app = Flask(__name__)
flask_app.config.from_object('config')
CORS(flask_app)
api = Api(flask_app)
db_conn.db = SQLAlchemy(flask_app)
api.add_resource(SomeClass, '/abc/<some_id>/def')
Here's how I create the ORM models:
import db_conn
db = db_conn.db
class MyTable(db.Model):
__tablename__ = 'my_table'
id = db.Column(db.Integer, primary_key=True)
phase = db.Column(db.Integer, nullable=False, default=0)
def set_phase(self, phase):
self.phase = phase
db.session.commit()
Here's the API handler with timer and the database call that is failing:
from flask_restful import Resource
from threading import Timer
from models import MyTable
import db_conn
import global_store
class SomeClass(Resource):
def put(self, some_id):
global_store.saved_id = some_id
self.timer = Timer(60, self.callback)
return '', 204
def callback(self):
row = MyTable.query.filter_by(id=global_store.saved_id).one()
# sometimes this works, sometimes it doesn't
row.set_phase(1)
db_conn.db.session.commit()
I'm guessing in your callback you aren't actually changing the value of the object. SQLAlchemey won't issue DB UPDATE calls if the session state is not dirty. So if the phase is already 1 for some reason there is nothing to do.

SQLAlchemy ORM Event hook for attribute persisted

I am working on finding a way in SQLAlchemy events to call an external API upon an attribute gets updated and persisted into the database. Here is my context:
An User model with an attribute named birthday. When an instance of User model gets updated and saved, I want to call to an external API to update this user's birthday accordingly.
I've tried Attribute Events, however, it generates too many hits and there is no way to guarantee that the set/remove attribute event would get persisted eventually (auto commit is set to False and transaction gets rolled back when errors occurred.)
Session Events would not work either because it requires a Session/SessionFactory as a parameter and there are just so many places in the code based that sessions have been used.
I have been looking at all the possible SQLAlchemy ORM event hooks in the official documentation but I couldn't find any one of them satisfy my requirement.
I wonder if anyone else has any insight into how to implement this kind of combination event trigger in SQLAlchemy. Thanks.
You can do this by combining multiple events. The specific events you need to use depend on your particular application, but the basic idea is this:
[InstanceEvents.load] when an instance is loaded, note down the fact that it was loaded and not added to the session later (we only want to save the initial state if the instance was loaded)
[AttributeEvents.set/append/remove] when an attribute changes, note down the fact that it was changed, and, if necessary, what it was changed from (these first two steps are optional if you don't need the initial state)
[SessionEvents.before_flush] when a flush happens, note down which instances are actually being saved
[SessionEvents.before_commit] before a commit completes, note down the current state of the instance (because you may not have access to it anymore after the commit)
[SessionEvents.after_commit] after a commit completes, fire off the custom event handler and clear the instances that you saved
An interesting challenge is the ordering of the events. If you do a session.commit() without doing a session.flush(), you'll notice that the before_commit event fires before the before_flush event, which is different from the scenario where you do a session.flush() before session.commit(). The solution is to call session.flush() in your before_commit call to force the ordering. This is probably not 100% kosher, but it works for me in production.
Here's a (simple) diagram of the ordering of events:
begin
load
(save initial state)
set attribute
...
flush
set attribute
...
flush
...
(save modified state)
commit
(fire off "object saved and changed" event)
Complete Example
from itertools import chain
from weakref import WeakKeyDictionary, WeakSet
from sqlalchemy import Column, String, Integer, create_engine
from sqlalchemy import event
from sqlalchemy.orm import sessionmaker, object_session
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine("sqlite://")
Session = sessionmaker(bind=engine)
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
birthday = Column(String)
#event.listens_for(User.birthday, "set", active_history=True)
def _record_initial_state(target, value, old, initiator):
session = object_session(target)
if session is None:
return
if target not in session.info.get("loaded_instances", set()):
return
initial_state = session.info.setdefault("initial_state", WeakKeyDictionary())
# this is where you save the entire object's state, not necessarily just the birthday attribute
initial_state.setdefault(target, old)
#event.listens_for(User, "load")
def _record_loaded_instances_on_load(target, context):
session = object_session(target)
loaded_instances = session.info.setdefault("loaded_instances", WeakSet())
loaded_instances.add(target)
#event.listens_for(Session, "before_flush")
def track_instances_before_flush(session, context, instances):
modified_instances = session.info.setdefault("modified_instances", WeakSet())
for obj in chain(session.new, session.dirty):
if session.is_modified(obj) and isinstance(obj, User):
modified_instances.add(obj)
#event.listens_for(Session, "before_commit")
def set_pending_changes_before_commit(session):
session.flush() # IMPORTANT
initial_state = session.info.get("initial_state", {})
modified_instances = session.info.get("modified_instances", set())
del session.info["modified_instances"]
pending_changes = session.info["pending_changes"] = []
for obj in modified_instances:
initial = initial_state.get(obj)
current = obj.birthday
pending_changes.append({
"initial": initial,
"current": current,
})
initial_state[obj] = current
#event.listens_for(Session, "after_commit")
def after_commit(session):
pending_changes = session.info.get("pending_changes", {})
del session.info["pending_changes"]
for changes in pending_changes:
print(changes) # this is where you would fire your custom event
loaded_instances = session.info["loaded_instances"] = WeakSet()
for v in session.identity_map.values():
if isinstance(v, User):
loaded_instances.add(v)
def main():
engine = create_engine("sqlite://", echo=False)
Base.metadata.create_all(bind=engine)
session = Session(bind=engine)
user = User(birthday="foo")
session.add(user)
user.birthday = "bar"
session.flush()
user.birthday = "baz"
session.commit() # prints: {"initial": None, "current": "baz"}
user.birthday = "foobar"
session.commit() # prints: {"initial": "baz", "current": "foobar"}
session.close()
if __name__ == "__main__":
main()
As you can see, it's a little complicated and not very ergonomic. It would be nicer if it were integrated into the ORM, but I also understand there may be reasons for not doing so.

Moving some database logic to its own helper module in Flask-Sqlalchemy?

I am trying to separate some of my database logic into its own helper module. This is because I have several routes that perform the same database functions, and I don't want to keep repeating the same code. I'm a bit confused on the db session scopes.
From the SQLAlchemy docs:
Some web frameworks include infrastructure to assist in the task of aligning the lifespan of a Session with that of a web request. This includes products such as Flask-SQLAlchemy, for usage in conjunction with the Flask web framework...
I think this means my db session scope is contained within a particular route since I'm using Flask and Flask-SQLAlchemy, so I came up with the following:
init.py
app = Flask(__name__)
db = SQLAlchemy(app)
routes.py
from init import db
#app.route('/one')
def one():
form = MyForm()
if form.validate_on_submit():
myhelper.saveStuff1(form.stuff1.data)
myhelper.saveStuff2(form.stuff2.data)
db.session.commit()
return render_template(...)
#app.route('/two')
def two():
form = MyForm()
if form.validate_on_submit():
myhelper.saveStuff1(form.stuff1.data)
myhelper.saveStuff2(form.stuff2.data)
myhelper.saveStuff3(form.stuff3.data)
db.session.commit()
return render_template(...)
myhelper.py
from init import db
# Add new Item
def saveStuff1(formdata):
db.session.add(Item(name=formdata))
# Update Item
def saveStuff2(formdata):
item = Item.query.filter_by(name=formdata).first()
item.description = 'default'
db.session.add(item)
# etc...
Would this be the correct way for structuring my helpers? I'm worried that from init import db will cause problems with scoping since it's imported in both files, or if this overall code pattern will cause other problems.
SQLAlchemy's session scope is not related to Python's variable scope. So no, importing db in multiple places as you've shown won't cause problems. Regarding the session scope, Flask-SQLAlchemy takes care of that for you, so you can ignore (or not worry about) the discussion of scope in the SQLAlchemy docs.

SQLAlchemy freezing application

I am using SQLAlchemy in my python command line app. The app is basically reading a set of URLs and doing inserts into a postgreql database based on the data.
After about the same number of inserts (give or take a few), the entire app freezes.
Having seen python sqlalchemy + postgresql program freezes I am assuming I am doing something wrong with the SQLAlchemy Session (although I am not using drop_all(), which seemed to be the cause of that issue). I've tried a couple of things but thus far they have had no impact.
Any hints or help would be welcome. If my integration of SQLAlchemy into my app is incorrect, a pointer to a good example of doing it right would also be welcome.
My code is as follows:
Set up the sql alchemy base:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
Create the session info and attach it to the Base
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine("sqlite:///myapp.db")
db_session = scoped_session(sessionmaker(bind=engine))
Base.query = db_session.query_property()
Base.scoped_db_session = db_session
Create my model from Base and make use of the session
class Person(Base):
def store(self):
if self.is_new():
self.scoped_db_session.add(self)
self.scoped_db_session.commit()
If I create enough objects of type Person and call store(), the app eventually freezes.
Managed to solve the problem. Turns out that my implementation is specifically on the don't do it this way list (see http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#session-frequently-asked-questions E.g. don't do this) and I was not managing the session correctly
To solve my problem I moved the session out of the model into a separate class, so instead of having calls like:
mymodel.store()
I now have:
db.current_session.store(mymodel)
where db is an instance of my custom DBContext below:
from contextlib import contextmanager
from sqlalchemy.orm import scoped_session, sessionmaker
class DbContext(object):
def __init__(self, engine, session=None):
self._engine = engine
self._session = session or scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=self._engine))
self.query = self._session.query_property()
self.current_session = None
def start_session(self):
self.current_session = self._session()
def end_session(self):
if self.current_session:
self.current_session.commit()
self.current_session.close()
self.current_session = None
#contextmanager
def new_session(self):
try:
self.start_session()
yield
finally:
self.end_session()
When you want to store one or more model objects, call DBContext.start_session() to start a clean session. When you finally want to commit, call DBContext.end_session().

Categories

Resources