Two nested sessions in Sqlalchemy - python

I have outer global session (via Flask-SqlAlchemy) and then within a function, I create another Session which commits some data into database (mariadb backend). The data is, however, not accessible from the outer session, until it closed.
Example:
db = SqlAlchemy()
def func():
s = db.Session()
with s.no_autoflush:
obj = models.MyObj(var="test")
s.add(obj)
# This inner session is needed because we can't do commits
# in this session at this point, but still do some inserts
# via outer session (db.session).
# Finally we commit inner session to database.
s.commit()
# This assertion will fail because data is not accessible
# in outer session.
# db.session.close() here would help, but it is not desirable
assert db.session.query(MyObj).filter_by(var="test").first()
# -> this fails.
How could I create inner session such that it would be within same transaction as outer session (db.session), and thus data committed in inner session would be accessible in outer session?
Update:
Here is minimal complete and verifiable example, hope it better explains the problem. Flask/flask-sqlalchemy is not needed.
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = sa.create_engine('mysql+mysqldb://user:password#mariadb/mydatabase')
#engine = sa.create_engine('sqlite:///:memory:')
Session = sessionmaker(bind=engine)
global_session = Session()
Model = declarative_base()
class MyTable(Model):
__tablename__ = 'mytable'
id = sa.Column(sa.Integer, primary_key=True)
var = sa.Column(sa.String(length=255), unique=True, nullable=False)
def func():
internal_session = Session()
with internal_session.no_autoflush:
# We add objects to internal_session, but we can't flush them yet (their linkage etc.
# will be built gradually here)
obj = MyTable(var="test")
internal_session.add(obj)
# At the same time we add some objects via global_session
obj2 = MyTable(var='test2')
global_session.add(obj2)
global_session.commit()
# If we perform any select query to global_session here, we will face problems later (at [*]).
# If we comment out this line [*] is fine.
assert not global_session.query(MyTable).filter_by(var='whatever!').first()
# Finally we commit inner session to database.
internal_session.commit()
# This assertion will fail because data is not accessible
# in outer session.
# global_session.close() # here would help, but it is not desirable
# [*]: this assertion will fail.
assert global_session.query(MyTable).filter_by(var='test').first()
if __name__ == '__main__':
try:
Model.metadata.drop_all(engine)
except:
pass
Model.metadata.create_all(engine)
func()
print('Ready')

Changing transaction isolation level to READ COMMITTED in the global_session helps.
In the above example, change definition of global_session to the following:
global_session = Session(
bind=engine.execution_options(isolation_level='READ COMMITTED'))

Related

How to reproduce an error caused by sqlalchemy session caching

I'm trying to reproduce a bug locally which I think is caused by a race condition where an update is relying on stale data (due to synchronize_session=False), essentially something like the following:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, Boolean, CheckConstraint
from sqlalchemy.orm.session import sessionmaker
Base = declarative_base()
# change this to your actual postgres url
db_string = "postgres://max:steve#localhost/test"
db = create_engine(db_string)
class User(Base):
__tablename__ = 'users4'
id = Column(Integer, primary_key=True)
deleted = Column(Boolean)
super_user = Column(Boolean, CheckConstraint('NOT (super_user AND deleted)', name='check1'))
Base.metadata.create_all(db)
Session = sessionmaker(bind=db)
session = Session()
session.autoflush = False
# Create a user
session.add(User(id=1, deleted=False, super_user=False))
# Delete that user
session.query(User).filter(User.id == 1).update(
{'deleted': True}, synchronize_session=False)
# Make all non-deleted users into super users
# Will violate the CHECK constraint if it's the previous query hasn't
# been flushed
session.query(User).filter(User.deleted == False).update({'super_user': True})
Is there a way I can force sqlalchemy to use the cached session (maybe through mocking or some such) so that this code will raise violate the constraint and raise an IntegrityError?
The docs for synchronize_session say that
... updated objects may still remain in the session with stale values on their attributes, which can lead to confusing results.
This is the situation that I want to reproduce.
The last update query does not utilize the stale session data. I think a case like this, where logic acts on the stale attributes will trigger the check constraint when a flush finally does occur:
# Create a user
user1 = User(id=1, deleted=False, super_user=False)
session.add(user1)
# Delete that user
session.query(User).filter(User.id == 1).update(
{'deleted': True}, synchronize_session=False)
# Make all non-deleted users into super users
# Will violate the CHECK constraint if it's the previous query hasn't
# been flushed
if not user1.deleted:
user1.super_user = True
session.flush()

SQLAlchemy refresh() not working after committing from a different session

I'm experiencing an issue with sqlalchemy where an update to a record in one session is not reflected in a second session even after committing and refreshing the object.
To demonstrate, consider this (complete) example:
import logging
from sqlalchemy import create_engine, Column, Boolean, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
logging.basicConfig(level=logging.INFO)
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
# works with this
#engine = create_engine("sqlite://")
# fails with this
engine = create_engine("mysql+mysqldb://{user}:{pass}#{host}:{port}/{database}?charset=utf8mb4".format(**DB_SETTINGS))
Session = sessionmaker(bind=engine)
Base = declarative_base()
class Foo(Base):
__tablename__ = "foo"
id = Column(Integer, primary_key=True, autoincrement=True)
flag = Column(Boolean)
def __repr__(self):
return "Foo(id={0.id}, flag={0.flag})".format(self)
# create the table
Base.metadata.create_all(engine)
# add a row
session = Session()
foo = Foo(id=1, flag=False)
session.add(foo)
session.commit()
# fetch the row in a different session
session2 = Session()
foo2 = session2.query(Foo).filter_by(id=1).one()
logging.info("SESSION2: Got {0}".format(foo2))
# update the row in first session and commit
foo.flag = True
session.commit()
# refresh the row in second session
logging.info("SESSION2: Refreshing...")
session2.refresh(foo2)
logging.info("SESSION2: After refresh: {0}".format(foo2))
# does "flag" come back as True?
When I run this against with the mysql+mysqldb:// engine to connect to my remote MySQL instance, the change to foo.flag is not reflected in session2.
But if I uncomment the line that creates an engine using a simple sqlite:// in-memory database, the change to foo.flag is reflected in session2.
What is it about my MySQL server configuration could cause an UPDATE command in one session followed immediately by a SELECT query in another session to return different data?

SQLAlchemy ORM Event hook for attribute persisted

I am working on finding a way in SQLAlchemy events to call an external API upon an attribute gets updated and persisted into the database. Here is my context:
An User model with an attribute named birthday. When an instance of User model gets updated and saved, I want to call to an external API to update this user's birthday accordingly.
I've tried Attribute Events, however, it generates too many hits and there is no way to guarantee that the set/remove attribute event would get persisted eventually (auto commit is set to False and transaction gets rolled back when errors occurred.)
Session Events would not work either because it requires a Session/SessionFactory as a parameter and there are just so many places in the code based that sessions have been used.
I have been looking at all the possible SQLAlchemy ORM event hooks in the official documentation but I couldn't find any one of them satisfy my requirement.
I wonder if anyone else has any insight into how to implement this kind of combination event trigger in SQLAlchemy. Thanks.
You can do this by combining multiple events. The specific events you need to use depend on your particular application, but the basic idea is this:
[InstanceEvents.load] when an instance is loaded, note down the fact that it was loaded and not added to the session later (we only want to save the initial state if the instance was loaded)
[AttributeEvents.set/append/remove] when an attribute changes, note down the fact that it was changed, and, if necessary, what it was changed from (these first two steps are optional if you don't need the initial state)
[SessionEvents.before_flush] when a flush happens, note down which instances are actually being saved
[SessionEvents.before_commit] before a commit completes, note down the current state of the instance (because you may not have access to it anymore after the commit)
[SessionEvents.after_commit] after a commit completes, fire off the custom event handler and clear the instances that you saved
An interesting challenge is the ordering of the events. If you do a session.commit() without doing a session.flush(), you'll notice that the before_commit event fires before the before_flush event, which is different from the scenario where you do a session.flush() before session.commit(). The solution is to call session.flush() in your before_commit call to force the ordering. This is probably not 100% kosher, but it works for me in production.
Here's a (simple) diagram of the ordering of events:
begin
load
(save initial state)
set attribute
...
flush
set attribute
...
flush
...
(save modified state)
commit
(fire off "object saved and changed" event)
Complete Example
from itertools import chain
from weakref import WeakKeyDictionary, WeakSet
from sqlalchemy import Column, String, Integer, create_engine
from sqlalchemy import event
from sqlalchemy.orm import sessionmaker, object_session
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine("sqlite://")
Session = sessionmaker(bind=engine)
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
birthday = Column(String)
#event.listens_for(User.birthday, "set", active_history=True)
def _record_initial_state(target, value, old, initiator):
session = object_session(target)
if session is None:
return
if target not in session.info.get("loaded_instances", set()):
return
initial_state = session.info.setdefault("initial_state", WeakKeyDictionary())
# this is where you save the entire object's state, not necessarily just the birthday attribute
initial_state.setdefault(target, old)
#event.listens_for(User, "load")
def _record_loaded_instances_on_load(target, context):
session = object_session(target)
loaded_instances = session.info.setdefault("loaded_instances", WeakSet())
loaded_instances.add(target)
#event.listens_for(Session, "before_flush")
def track_instances_before_flush(session, context, instances):
modified_instances = session.info.setdefault("modified_instances", WeakSet())
for obj in chain(session.new, session.dirty):
if session.is_modified(obj) and isinstance(obj, User):
modified_instances.add(obj)
#event.listens_for(Session, "before_commit")
def set_pending_changes_before_commit(session):
session.flush() # IMPORTANT
initial_state = session.info.get("initial_state", {})
modified_instances = session.info.get("modified_instances", set())
del session.info["modified_instances"]
pending_changes = session.info["pending_changes"] = []
for obj in modified_instances:
initial = initial_state.get(obj)
current = obj.birthday
pending_changes.append({
"initial": initial,
"current": current,
})
initial_state[obj] = current
#event.listens_for(Session, "after_commit")
def after_commit(session):
pending_changes = session.info.get("pending_changes", {})
del session.info["pending_changes"]
for changes in pending_changes:
print(changes) # this is where you would fire your custom event
loaded_instances = session.info["loaded_instances"] = WeakSet()
for v in session.identity_map.values():
if isinstance(v, User):
loaded_instances.add(v)
def main():
engine = create_engine("sqlite://", echo=False)
Base.metadata.create_all(bind=engine)
session = Session(bind=engine)
user = User(birthday="foo")
session.add(user)
user.birthday = "bar"
session.flush()
user.birthday = "baz"
session.commit() # prints: {"initial": None, "current": "baz"}
user.birthday = "foobar"
session.commit() # prints: {"initial": "baz", "current": "foobar"}
session.close()
if __name__ == "__main__":
main()
As you can see, it's a little complicated and not very ergonomic. It would be nicer if it were integrated into the ORM, but I also understand there may be reasons for not doing so.

What's the best way to scope a session for use with updates and queries

I'm using sqlalchemy and the suggested with session_scope() context.
I've just found that using the scope to issue a query causes the resulting object(s) to somehow reattach to the session, perhaps in the session.commit() call. I've narrowed it down to this test case, which fails with a DetachedInstanceError:
import sqlalchemy.ext.declarative
import sqlalchemy as sql
import sqlalchemy.orm as sqlorm
from contextlib import contextmanager
DeclarativeBase = sqlalchemy.ext.declarative.declarative_base()
class Test(DeclarativeBase):
__tablename__ = 'test'
id = sql.Column(sql.Integer, sql.Sequence('id_seq'), primary_key=True)
value = sql.Column(sql.Integer)
engine = sql.create_engine('sqlite:///:memory:')
DeclarativeBase.metadata.create_all(engine)
Session = sqlorm.sessionmaker(bind=engine)
#contextmanager
def session_scope():
"""Provide a transactional scope around a series of operations."""
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
with session_scope() as session:
session.add(Test(value=10))
with session_scope() as session:
items = session.query(Test).all()
print items[0].value
I have managed to get it to work by adding a commit parameter to session_scope, and changing session.commit() to if commit: session.commit(), but this is kind of ugly and error-prone. It seems like commit should be a no-op if only queries have been executed. Is there a better, standard way to do this?

Sqlalchemy session.refresh does not refresh object

I have the following mapping (straight from SA examples):
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
I'm working with a MySql DB and the table has an innoDB engine.
I have a single record in my table:
1|'user1'|'user1 test'|'password'
I've opened a session with the following code:
from sqlalchemy.orm.session import sessionmaker
from sqlalchemy.engine import create_engine
from sqlalchemy.orm.scoping import scoped_session
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
db_engine = create_engine('mysql://...#localhost/test_db?charset=utf8',echo=False,pool_recycle=1800)
session_factory = sessionmaker(bind=db_engine,autocommit=False,autoflush=False)
session_maker = scoped_session(session_factory)
session = session_maker()
user_1 = session.query(User).filter(User.id==1).one()
user_1.name # This prints: u'user1'
Now, when I change the record's name in the DB to 'user1_change' and commit it and then refresh the object like this:
session.refresh(user_1)
user_1.name # This still prints: u'user1' and not u'user1_change'
It still prints: u'user1' and not u'user1_change'.
What am I missing (or setting up wrong) here?
Thanks!
From the docs:
Note that a highly isolated transaction will return the same values as were previously read in that same transaction, regardless of changes in database state outside of that transaction
SQLAlchemy uses a transactional unit of work model, wherein each transaction is assumed to be internally consistent. A session is an interface on top of a transaction. Since a transaction is assumed to be internally consistent, SQLAlchemy will only (well, not quite, but for ease of explanation...) retrieve a given piece of data from the database and update the state of the associated objects once per transaction. Since you already queried for the object in the same session transaction, SQLAlchemy will not update the data in that object from the database again within that transaction scope. If you want to poll the database, you'll need to do it with a fresh transaction each time.
session.refresh() didn't work for me either. Even though I saw a low-level SELECT the object was not updated after the refresh.
This answer https://stackoverflow.com/a/11121788/562267 hints to doing an actual commit/rollback to reset the session, and that worked for me:
user_1 = session.query(User).filter(User.id==1).one()
user_1.name # This prints: u'user1'
# update the database from another client here
session.commit()
user_1 = session.query(User).filter(User.id==1).one()
user_1.name # Should be updated now.
Did you try with "expire" as described in the official doc:
http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#refreshing-expiring
# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(user_1)
session.refresh(user_1)
Using expire on a object results in a reload that will occur upon next access.
Merge the session.
u = session.query(User).get(id)
u.name = 'user1_changed'
u = session.merge(u)
This will update the database and return the newer object.

Categories

Resources