Sqlalchemy session.refresh does not refresh object - python

I have the following mapping (straight from SA examples):
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
I'm working with a MySql DB and the table has an innoDB engine.
I have a single record in my table:
1|'user1'|'user1 test'|'password'
I've opened a session with the following code:
from sqlalchemy.orm.session import sessionmaker
from sqlalchemy.engine import create_engine
from sqlalchemy.orm.scoping import scoped_session
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
db_engine = create_engine('mysql://...#localhost/test_db?charset=utf8',echo=False,pool_recycle=1800)
session_factory = sessionmaker(bind=db_engine,autocommit=False,autoflush=False)
session_maker = scoped_session(session_factory)
session = session_maker()
user_1 = session.query(User).filter(User.id==1).one()
user_1.name # This prints: u'user1'
Now, when I change the record's name in the DB to 'user1_change' and commit it and then refresh the object like this:
session.refresh(user_1)
user_1.name # This still prints: u'user1' and not u'user1_change'
It still prints: u'user1' and not u'user1_change'.
What am I missing (or setting up wrong) here?
Thanks!

From the docs:
Note that a highly isolated transaction will return the same values as were previously read in that same transaction, regardless of changes in database state outside of that transaction
SQLAlchemy uses a transactional unit of work model, wherein each transaction is assumed to be internally consistent. A session is an interface on top of a transaction. Since a transaction is assumed to be internally consistent, SQLAlchemy will only (well, not quite, but for ease of explanation...) retrieve a given piece of data from the database and update the state of the associated objects once per transaction. Since you already queried for the object in the same session transaction, SQLAlchemy will not update the data in that object from the database again within that transaction scope. If you want to poll the database, you'll need to do it with a fresh transaction each time.

session.refresh() didn't work for me either. Even though I saw a low-level SELECT the object was not updated after the refresh.
This answer https://stackoverflow.com/a/11121788/562267 hints to doing an actual commit/rollback to reset the session, and that worked for me:
user_1 = session.query(User).filter(User.id==1).one()
user_1.name # This prints: u'user1'
# update the database from another client here
session.commit()
user_1 = session.query(User).filter(User.id==1).one()
user_1.name # Should be updated now.

Did you try with "expire" as described in the official doc:
http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#refreshing-expiring
# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(user_1)
session.refresh(user_1)
Using expire on a object results in a reload that will occur upon next access.

Merge the session.
u = session.query(User).get(id)
u.name = 'user1_changed'
u = session.merge(u)
This will update the database and return the newer object.

Related

How to reproduce an error caused by sqlalchemy session caching

I'm trying to reproduce a bug locally which I think is caused by a race condition where an update is relying on stale data (due to synchronize_session=False), essentially something like the following:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, Boolean, CheckConstraint
from sqlalchemy.orm.session import sessionmaker
Base = declarative_base()
# change this to your actual postgres url
db_string = "postgres://max:steve#localhost/test"
db = create_engine(db_string)
class User(Base):
__tablename__ = 'users4'
id = Column(Integer, primary_key=True)
deleted = Column(Boolean)
super_user = Column(Boolean, CheckConstraint('NOT (super_user AND deleted)', name='check1'))
Base.metadata.create_all(db)
Session = sessionmaker(bind=db)
session = Session()
session.autoflush = False
# Create a user
session.add(User(id=1, deleted=False, super_user=False))
# Delete that user
session.query(User).filter(User.id == 1).update(
{'deleted': True}, synchronize_session=False)
# Make all non-deleted users into super users
# Will violate the CHECK constraint if it's the previous query hasn't
# been flushed
session.query(User).filter(User.deleted == False).update({'super_user': True})
Is there a way I can force sqlalchemy to use the cached session (maybe through mocking or some such) so that this code will raise violate the constraint and raise an IntegrityError?
The docs for synchronize_session say that
... updated objects may still remain in the session with stale values on their attributes, which can lead to confusing results.
This is the situation that I want to reproduce.
The last update query does not utilize the stale session data. I think a case like this, where logic acts on the stale attributes will trigger the check constraint when a flush finally does occur:
# Create a user
user1 = User(id=1, deleted=False, super_user=False)
session.add(user1)
# Delete that user
session.query(User).filter(User.id == 1).update(
{'deleted': True}, synchronize_session=False)
# Make all non-deleted users into super users
# Will violate the CHECK constraint if it's the previous query hasn't
# been flushed
if not user1.deleted:
user1.super_user = True
session.flush()

How do I make SQLAlchemy set values for a foreign key by passing a related entity in the constructor?

When using SQLAlchemy I would like the foreign key fields to be filled in on the Python object when I pass in a related object. For example, assume you have network devices with ports, and assume that the device has a composite primary key in the database.
If I already have a reference to a "Device" instance and want to create a new "Port" instance linked to that device without knowing if it already exists in the database I would use the merge operation in SA. However, only setting the device attribute on the port instance is insufficient. The fields of the composite foreign key will not be propagated to the port instance and SA will be unable to determine the existence of the row in the database and unconditionally issue an INSERT statement instead of an UPDATE.
The following code examples demonstrate the issue. They should be run as one .py file so we have the same in-memory SQLite instance! They have only been split for readability.
Model Definition
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Unicode, ForeignKeyConstraint, create_engine
from sqlalchemy.orm import sessionmaker, relation
from textwrap import dedent
Base = declarative_base()
class Device(Base):
__tablename__ = 'device'
hostname = Column(Unicode, primary_key=True)
scope = Column(Unicode, primary_key=True)
poll_ip = Column(Unicode, primary_key=True)
notes = Column(Unicode)
ports = relation('Port', backref='device')
class Port(Base):
__tablename__ = 'port'
__table_args__ = (
ForeignKeyConstraint(
['hostname', 'scope', 'poll_ip'],
['device.hostname', 'device.scope', 'device.poll_ip'],
onupdate='CASCADE', ondelete='CASCADE'
),
)
hostname = Column(Unicode, primary_key=True)
scope = Column(Unicode, primary_key=True)
poll_ip = Column(Unicode, primary_key=True)
name = Column(Unicode, primary_key=True)
engine = create_engine('sqlite://', echo=True)
Base.metadata.bind = engine
Base.metadata.create_all()
Session = sessionmaker(bind=engine)
The model defines a Device class with a composite PK with three fields. The Port class references Device through a composite FK on those three columns. Device also has a relationship to Port which will use that FK.
Using the model
First, we add a new device and port. As we're using an in-memory SQLite DB, these will be the only two entries in the DB. And by inserting one device into the database we have something in the device
table that we expect to be loaded on the subsequent merge in session "sess2"
sess1 = Session()
d1 = Device(hostname='d1', scope='s1', poll_ip='pi1')
p1 = Port(device=d1, name='port1')
sess1.add(d1)
sess1.commit()
sess1.close()
Working example
This block works, but it is not written in a way I would expect it to behave. More precisely, the instance "d1" is instantiated with "hostname", "scope" and "poll_ip", and that instance is passed to the "Port" instance "p2". I would expect that "p2" would "receive" those 3 values through the foreign key. But it doesn't. I am forced to manually assign the values to "p2" before calling "merge". If the values are not assigned, SA does not find the identity and tries to run an "INSERT" query for "p2" which will conflict with the already existing instance.
sess2 = Session()
d1 = Device(hostname='d1', scope='s1', poll_ip='pi1')
p2 = Port(device=d1, name='port1')
p2.hostname=d1.hostname
p2.poll_ip=d1.poll_ip
p2.scope = d1.scope
p2 = sess2.merge(p2)
sess2.commit()
sess2.close()
Broken example (but expecting it to work)
This block shows how I would expect it to work. I would expect that assigning a value to "device" when creating the Port instance should be enough.
sess3 = Session()
d1 = Device(hostname='d1', scope='s1', poll_ip='pi1')
p2 = Port(device=d1, name='port1')
p2 = sess3.merge(p2)
sess3.commit()
sess3.close()
How can I make this last block work?
The FK of the child object isn't updated until you issue a flush() either explicitly or through a commit(). I think the reason for this is that if the parent object of a relationship is also a new instance with an auto-increment PK, SQLAlchemy needs to get the PK from the database before it can update the FK on the child object (but I stand to be corrected!).
According to the docs, a merge():
examines the primary key of the instance. If it’s present, it attempts
to locate that instance in the local identity map. If the load=True
flag is left at its default, it also checks the database for this
primary key if not located locally.
If the given instance has no primary key, or if no instance can be
found with the primary key given, a new instance is created.
As you are merging before flushing, there is incomplete PK data on your p2 instance and so this line p2 = sess3.merge(p2) returns a new Port instance with the same attribute values as the p2 you previously created, that is tracked by the session. Then, sess3.commit() finally issues the flush where the FK data is populated onto p2 and then the integrity error is raised when it tries to write to the port table. Although, inserting a sess3.flush() will only raise the integrity error earlier, not avoid it.
Something like this would work:
def existing_or_new(sess, kls, **kwargs):
inst = sess.query(kls).filter_by(**kwargs).one_or_none()
if not inst:
inst = kls(**kwargs)
return inst
id_data = dict(hostname='d1', scope='s1', poll_ip='pi1')
sess3 = Session()
d1 = Device(**id_data)
p2 = existing_or_new(sess3, Port, name='port1', **id_data)
d1.ports.append(p2)
sess3.commit()
sess3.close()
This question has more thorough examples of existing_or_new style functions for SQLAlchemy.

SQLAlchemy refresh() not working after committing from a different session

I'm experiencing an issue with sqlalchemy where an update to a record in one session is not reflected in a second session even after committing and refreshing the object.
To demonstrate, consider this (complete) example:
import logging
from sqlalchemy import create_engine, Column, Boolean, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
logging.basicConfig(level=logging.INFO)
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
# works with this
#engine = create_engine("sqlite://")
# fails with this
engine = create_engine("mysql+mysqldb://{user}:{pass}#{host}:{port}/{database}?charset=utf8mb4".format(**DB_SETTINGS))
Session = sessionmaker(bind=engine)
Base = declarative_base()
class Foo(Base):
__tablename__ = "foo"
id = Column(Integer, primary_key=True, autoincrement=True)
flag = Column(Boolean)
def __repr__(self):
return "Foo(id={0.id}, flag={0.flag})".format(self)
# create the table
Base.metadata.create_all(engine)
# add a row
session = Session()
foo = Foo(id=1, flag=False)
session.add(foo)
session.commit()
# fetch the row in a different session
session2 = Session()
foo2 = session2.query(Foo).filter_by(id=1).one()
logging.info("SESSION2: Got {0}".format(foo2))
# update the row in first session and commit
foo.flag = True
session.commit()
# refresh the row in second session
logging.info("SESSION2: Refreshing...")
session2.refresh(foo2)
logging.info("SESSION2: After refresh: {0}".format(foo2))
# does "flag" come back as True?
When I run this against with the mysql+mysqldb:// engine to connect to my remote MySQL instance, the change to foo.flag is not reflected in session2.
But if I uncomment the line that creates an engine using a simple sqlite:// in-memory database, the change to foo.flag is reflected in session2.
What is it about my MySQL server configuration could cause an UPDATE command in one session followed immediately by a SELECT query in another session to return different data?

SQLAlchemy - Multiple Database Issues

We are making a game server using SQLAlchemy.
because game servers must be very fast, we have decided to separate databases depending on user ID(integer).
so for example I did it successfully like the following.
from threading import Thread
from sqlalchemy import Column, Integer, String, DateTime, create_engine
from sqlalchemy.ext.declarative import declarative_base, DeferredReflection
from sqlalchemy.orm import sessionmaker
DeferredBase = declarative_base(cls=DeferredReflection)
class BuddyModel(DeferredBase):
__tablename__ = 'test_x'
id = Column(Integer(), primary_key=True, autoincrement=True)
value = Column(String(50), nullable=False)
and the next code will create multiple databases.
There will be test1 ~ test10 databases.
for i in range(10):
url = 'mysql://user#localhost/'
engine = create_engine(url, encoding='UTF-8', pool_recycle=300)
con = engine.connect()
con.execute('create database test%d' % i)
the following code will create 10 separate engines.
the get_engine() function will give you an engine depending on the user ID.
(User ID is integer)
engines = []
for i in range(10):
url = 'mysql://user#localhost/test%d'% i
engine = create_engine(url, encoding='UTF-8', pool_recycle=300)
DeferredBase.metadata.bind = engine
DeferredBase.metadata.create_all()
engines.append(engine)
def get_engine(user_id):
index = user_id%10
return engines[index]
by running prepare function, the BuddyModel class will be prepared, and mapped to the engine.
def prepare(user_id):
engine = get_engine(user_id)
DeferredBase.prepare(engine)
** The next code will do what I want to do exactly **
for user_id in range(100):
prepare(user_id)
engine = get_engine(user_id)
session = sessionmaker(engine)()
buddy = BuddyModel()
buddy.value = 'user_id: %d' % user_id
session.add(buddy)
session.commit()
But the problem is that when I do it in multiple threads, it just raise errors
class MetalMultidatabaseThread(Thread):
def run(self):
for user_id in range(100):
prepare(user_id)
engine = get_engine(user_id)
session = sessionmaker(engine)()
buddy = BuddyModel()
buddy.value = 'user_id: %d' % user_id
session.add(buddy)
session.commit()
threads = []
for i in range(100):
t = MetalMultidatabaseThread()
t.setDaemon(True)
t.start()
threads.append(t)
for t in threads:
t.join()
the error message is ...
ArgumentError: Class '<class '__main__.BuddyModel'>' already has a primary mapper defined. Use non_primary=True to create a non primary Mapper. clear_mappers() will remove *all* current mappers from all classes.
so.. my question is that How CAN I DO MULTIPLE-DATABASE like the above architecture using SQLAlchemy?
this is called horizontal sharding and is a bit of a tricky use case. The version you have, make a session based on getting the engine first, will work fine. There are two variants of this which you may like.
One is to use the horizontal sharding extension. This extension allows you to create a Session to automatically select the correct node.
The other is more or less what you have, but less verbose. Build a Session class that has a routing function, so you at least could share a single session and say, session.using_bind('engine1') for a query instead of making a whole new session.
I have found an answer for my question.
For building up multiple-databases depending on USER ID (integer) just use session.
Before explain this, I want to expound on the database architecture more.
For example if the user ID 114 connects to the server, the server will determine where to retrieve the user's information by using something like this.
user_id%10 # <-- 4th database
Architecture
DATABASES
- DB0 <-- save all user data whose ID ends with 0
- DB1 <-- save all user data whose ID ends with 1
.
.
.
- DB8 <-- save all user data whose ID ends with 9
Here is the answer
First do not use bind parameter.. simply make it empty.
Base = declarative_base()
Declare Model..
class BuddyModel(Base):
__tablename__ = 'test_x'
id = Column(Integer(), primary_key=True, autoincrement=True)
value = Column(String(50), nullable=False)
When you want to do CRUD ,make a session
engine = get_engine_by_user_id(user_id)
session = sessionmaker(bind=engine)()
buddy = BuddyModel()
buddy.value = 'This is Sparta!! %d' % user_id
session.add(buddy)
session.commit()
engine should be the one matched with the user ID.

SqlAlchemy add tables versioning to existing tables

Imagine that I have one table in my project with some rows in it.
For example:
# -*- coding: utf-8 -*-
import sqlalchemy as sa
from app import db
class Article(db.Model):
__tablename__ = 'article'
id = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
name = sa.Column(sa.Unicode(255))
content = sa.Column(sa.UnicodeText)
I'm using Flask-SQLAlchemy, so db.session is scoped session object.
I saw in https://github.com/zzzeek/sqlalchemy/blob/master/examples/versioned_history/history_meta.py
but i can't understand how to use it with my existing tables and anymore how to start it. (I get ArgumentError: Session event listen on a scoped_session requires that its creation callable is associated with the Session class. error when I pass db.session in versioned_session func)
From versioning I need the following:
1) query for old versions of object
2) query old versions by date range when they changed
3) revert old state to existing object
4) add additional info to history table when version is creating (for example editor user_id, date_edit, remote_ip)
Please, tell me what are the best practicies for my case and if you can add a little working example for it.
You can work around that error by attaching the event handler to the SignallingSession class[1] instead of the created session object:
from flask.ext.sqlalchemy import SignallingSession
from history_meta import versioned_session, Versioned
# Create your Flask app...
versioned_session(SignallingSession)
db = SQLAlchemy(app)
class Article(Versioned, db.Model):
__tablename__ = 'article'
id = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
name = sa.Column(sa.Unicode(255))
content = sa.Column(sa.UnicodeText)
The sample code creates parallel tables with a _history suffix and an additional changed datetime column. Querying for old versions is just a matter of looking in that table.
For managing the extra fields, I would put them on your main table, and they'll automatically be kept track of in the history table.
[1] Note, if you override SQLAlchemy.create_session() to use a different session class, you should adjust the class you pass to versioned_session.
I think the problem is you're running into this bug: https://github.com/mitsuhiko/flask-sqlalchemy/issues/182
One workaround would be to stop using flask-sqlalchemy and configure sqlalchemy yourself.

Categories

Resources