Using session.query to read uncommitted data in SQLAlchemy - python

Summary
I'm trying write integration tests against a series of database operations, and I want to be able to use a SQLAlchemy session as a staging environment in which to validate and rollback a transaction.
Is it possible to retrieve uncommitted data using session.query(Foo) instead of session.execute(text('select * from foo'))?
Background and Research
These results were observed using SQLAlchemy 1.2.10, Python 2.7.13, and Postgres 9.6.11.
I've looked at related StackOverflow posts but haven't found an explanation as to why the two operations below should behave differently.
SQLalchemy: changes not committing to db
Tried with and without session.flush() before every session.query. No success.
sqlalchemy update not commiting changes to database. Using single connection in an app
Checked to make sure I am using the same session object throughout
Sqlalchemy returns different results of the SELECT command (query.all)
N/A: My target workflow is to assess a series of CRUD operations within the staging tables of a single session.
Querying objects added to a non committed session in SQLAlchemy
Seems to be the most related issue, but my motivation for avoiding session.commit() is different, and I didn't quite find the explanation I'm looking for.
Reproducible Example
1) I establish a connection to the database and define a model object; no issues so far:
from sqlalchemy import text
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, ForeignKey
#####
# Prior DB setup:
# CREATE TABLE foo (id int PRIMARY KEY, label text);
#####
# from https://docs.sqlalchemy.org/en/13/orm/mapping_styles.html#declarative-mapping
Base = declarative_base()
class Foo(Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
label = Column(String)
# from https://docs.sqlalchemy.org/en/13/orm/session_basics.html#getting-a-session
some_engine = create_engine('postgresql://username:password#endpoint/database')
Session = sessionmaker(bind=some_engine)
2) I perform some updates without committing the result, and I can see the staged data by executing a select statement within the session:
session = Session()
sql_insert = text("INSERT INTO foo (id, label) VALUES (1, 'original')")
session.execute(sql_insert);
sql_read = text("SELECT * FROM foo WHERE id = 1");
res = session.execute(sql_read).first()
print res.label
sql_update = text("UPDATE foo SET label = 'updated' WHERE id = 1")
session.execute(sql_update)
res2 = session.execute(sql_read).first()
print res2.label
sql_update2 = text("""
INSERT INTO foo (id, label) VALUES (1, 'second_update')
ON CONFLICT (id) DO UPDATE
SET (label) = (EXCLUDED.label)
""")
session.execute(sql_update2)
res3 = session.execute(sql_read).first()
print res3.label
session.rollback()
# prints expected values: 'original', 'updated', 'second_update'
3) I attempt to replace select statements with session.query, but I can't see the new data:
session = Session()
sql_insert = text("INSERT INTO foo (id, label) VALUES (1, 'original')")
session.execute(sql_insert);
res = session.query(Foo).filter_by(id=1).first()
print res.label
sql_update = text("UPDATE foo SET label = 'updated' WHERE id = 1")
session.execute(sql_update)
res2 = session.query(Foo).filter_by(id=1).first()
print res2.label
sql_update2 = text("""
INSERT INTO foo (id, label) VALUES (1, 'second_update')
ON CONFLICT (id) DO UPDATE
SET (label) = (EXCLUDED.label)
""")
session.execute(sql_update2)
res3 = session.query(Foo).filter_by(id=1).first()
print res3.label
session.rollback()
# prints: 'original', 'original', 'original'
I expect the printed output of Step 3 to be 'original', 'updated', 'second_update'.

The root cause is that the raw SQL queries and the ORM do not mix automatically in this case. While the Session is not a cache, meaning it does not cache queries, it does store objects based on their primary key in the identity map. When a Query returns a row for a mapped object, the existing object is returned. This is why you do not observe the changes you made in the 3rd step. This might seem like a rather poor way to handle the situation, but SQLAlchemy is operating based on some assumptions about transaction isolation, as described in "When to Expire or Refresh":
Transaction Isolation
...[So] as a best guess, it assumes that within the scope of a transaction, unless it is known that a SQL expression has been emitted to modify a particular row, there’s no need to refresh a row unless explicitly told to do so.
The whole note about transaction isolation is a worthwhile read. The way to make such changes known to SQLAlchemy is to perform updates using the Query API, if possible, and to manually expire changed objects, if all else fails. With this in mind, your 3rd step could look like:
session = Session()
sql_insert = text("INSERT INTO foo (id, label) VALUES (1, 'original')")
session.execute(sql_insert);
res = session.query(Foo).filter_by(id=1).first()
print(res.label)
session.query(Foo).filter_by(id=1).update({Foo.label: 'updated'},
synchronize_session='fetch')
# This query is actually redundant, `res` and `res2` are the same object
res2 = session.query(Foo).filter_by(id=1).first()
print(res2.label)
sql_update2 = text("""
INSERT INTO foo (id, label) VALUES (1, 'second_update')
ON CONFLICT (id) DO UPDATE
SET label = EXCLUDED.label
""")
session.execute(sql_update2)
session.expire(res)
# Again, this query is redundant and fetches the same object that needs
# refreshing anyway
res3 = session.query(Foo).filter_by(id=1).first()
print(res3.label)
session.rollback()

Related

"Maximum number of parameters" error with filter .in_(list) using pyodbc

One of our queries that was working in Python 2 + mxODBC is not working in Python 3 + pyodbc; it raises an error like this: Maximum number of parameters in the sql query is 2100. while connecting to SQL Server. Since both the printed queries have 3000 params, I thought it should fail in both environments, but clearly that doesn't seem to be the case here. In the Python 2 environment, both MSODBC 11 or MSODBC 17 works, so I immediately ruled out a driver related issue.
So my question is:
Is it correct to send a list as multiple params in SQLAlchemy because the param list will be proportional to the length of list? I think it looks a bit strange; I would have preferred concatenating the list into a single string because the DB doesn't understand the list datatype.
Are there any hints on why it would be working in mxODBC but not pyodbc? Does mxODBC optimize something that pyodbc does not? Please let me know if there are any pointers - I can try and paste more info here. (I am still new to debugging SQLAlchemy.)
Footnote: I have seen lot of answers that suggest to chunk the data, but because of 1 and 2, I wonder if I am doing the correct thing in the first place.
(Since it seems to be related to pyodbc, I have raised an internal issue in the official repository.)
import sqlalchemy
import sqlalchemy.orm
from sqlalchemy import MetaData, Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
create_tables = """
CREATE TABLE products(
idn NUMERIC(8) PRIMARY KEY
);
"""
check_tables = """
SELECT * FROM products;
"""
insert_values = """
INSERT INTO products
(idn)
values
(1),
(2);
"""
delete_tables = """
DROP TABLE products;
"""
engine = sqlalchemy.create_engine('mssql+pyodbc://user:password#dsn')
connection = engine.connect()
cursor = engine.raw_connection().cursor()
Session = sqlalchemy.orm.sessionmaker(bind=connection)
session = Session()
session.execute(create_tables)
metadata = MetaData(connection)
class Products(Base):
__table__ = Table('products', metadata, autoload=True)
try:
session.execute(check_tables)
session.execute(insert_values)
session.commit()
query = session.query(Products).filter(
Products.idn.in_(list(range(0, 3000)))
)
query.all()
f = open("query.sql", "w")
f.write(str(query))
f.close()
finally:
session.execute(delete_tables)
session.commit()
When you do a straightforward .in_(list_of_values) SQLAlchemy renders the following SQL ...
SELECT team.prov AS team_prov, team.city AS team_city
FROM team
WHERE team.prov IN (?, ?)
... where each value in the IN clause is specified as a separate parameter value. pyodbc sends this to SQL Server as ...
exec sp_prepexec #p1 output,N'#P1 nvarchar(4),#P2 nvarchar(4)',N'SELECT team.prov AS team_prov, team.city AS team_city, team.team_name AS team_team_name
FROM team
WHERE team.prov IN (#P1, #P2)',N'AB',N'ON'
... so you hit the limit of 2100 parameters if your list is very long. Presumably, mxODBC inserted the parameter values inline before sending it to SQL Server, e.g.,
SELECT team.prov AS team_prov, team.city AS team_city
FROM team
WHERE team.prov IN ('AB', 'ON')
You can get SQLAlchemy to do that for you with
provinces = ["AB", "ON"]
stmt = (
session.query(Team)
.filter(
Team.prov.in_(sa.bindparam("p1", expanding=True, literal_execute=True))
)
.statement
)
result = list(session.query(Team).params(p1=provinces).from_statement(stmt))

SQLAlchemy does not update/expire model instances with external changes

Recently I came across strange behavior of SQLAlchemy regarding refreshing/populating model instances with the the changes that were made outside of the current session. I created the following minimal working example and was able to reproduce problem with it.
from time import sleep
from sqlalchemy import orm, create_engine, Column, BigInteger, Integer
from sqlalchemy.ext.declarative import declarative_base
DATABASE_URI = "postgresql://{user}:{password}#{host}:{port}/{name}".format(
user="postgres",
password="postgres",
host="127.0.0.1",
name="so_sqlalchemy",
port="5432",
)
class SQLAlchemy:
def __init__(self, db_url, autocommit=False, autoflush=True):
self.engine = create_engine(db_url)
self.session = None
self.autocommit = autocommit
self.autoflush = autoflush
def connect(self):
session_maker = orm.sessionmaker(
bind=self.engine,
autocommit=self.autocommit,
autoflush=self.autoflush,
expire_on_commit=True
)
self.session = orm.scoped_session(session_maker)
def disconnect(self):
self.session.flush()
self.session.close()
self.session.remove()
self.session = None
BaseModel = declarative_base()
class TestModel(BaseModel):
__tablename__ = "test_models"
id = Column(BigInteger, primary_key=True, nullable=False)
field = Column(Integer, nullable=False)
def loop(db):
while True:
with db.session.begin():
t = db.session.query(TestModel).with_for_update().get(1)
if t is None:
print("No entry in db, creating...")
t = TestModel(id=1, field=0)
db.session.add(t)
db.session.flush()
print(f"t.field value is {t.field}")
t.field += 1
print(f"t.field value before flush is {t.field}")
db.session.flush()
print(f"t.field value after flush is {t.field}")
print(f"t.field value after transaction is {t.field}")
print("Sleeping for 2 seconds.")
sleep(2.0)
def main():
db = SQLAlchemy(DATABASE_URI, autocommit=True, autoflush=True)
db.connect()
try:
loop(db)
except KeyboardInterrupt:
print("Canceled")
if __name__ == '__main__':
main()
My requirements.txt file looks like this:
alembic==1.0.10
psycopg2-binary==2.8.2
sqlalchemy==1.3.3
If I run the script (I use Python 3.7.3 on my laptop running Ubuntu 16.04), it will nicely increment a value every two seconds as expected:
t.field value is 0
t.field value before flush is 1
t.field value after flush is 1
t.field value after transaction is 1
Sleeping for 2 seconds.
t.field value is 1
t.field value before flush is 2
t.field value after flush is 2
t.field value after transaction is 2
Sleeping for 2 seconds.
...
Now at some point I open postgres database shell and begin another transaction:
so_sqlalchemy=# BEGIN;
BEGIN
so_sqlalchemy=# UPDATE test_models SET field=100 WHERE id=1;
UPDATE 1
so_sqlalchemy=# COMMIT;
COMMIT
As soon as I press Enter after the UPDATE query, the script blocks as expected, as I'm issuing SELECT ... FOR UPDATE query there. However, when I commit the transaction in the database shell, script continues from the previous value (say, 27) and does not detect that external transaction has changed the value of field in database to 100.
My question is, why does this happen at all? There are several factors that seem to contradict the current behavior:
I'm using expire_on_commit setting set to True, which seems to imply that every model instance that has been used in transaction will be marked as expired after the transaction has been committed. (Quoting documentation, "When True, all instances will be fully expired after each commit(), so that all attribute/object access subsequent to a completed transaction will load from the most recent database state.").
I'm not accessing some old model instance but rather issue completely new query every time. As far as I understand, this should lead to direct query to the database and not access cached instance. I can confirm that this is indeed the case if I turn sqlalchemy debug log on.
The quick and dirty fix for this problem is to call db.session.expire_all() right after the transaction has begun, but this seems very inelegant and counter-intuitive. I would be very glad to understand what's wrong with the way I'm working with sqlalchemy here.
I ran into a very similar situation with MySQL. I needed to "see" changes to the table that were coming from external sources in the middle of my code's database operations. I ended up having to set autocommit=True in my session call and use the begin() / commit() methods of the session to "see" data that was updated externally.
The SQLAlchemy docs say this is a legacy configuration:
Warning
“autocommit” mode is a legacy mode of use and should not be considered for new projects.
but also say in the next paragraph:
Modern usage of “autocommit mode” tends to be for framework integrations that wish to control specifically when the “begin” state occurs
So it doesn't seem to be clear which statement is correct.

Why Does SQLAlchemy Label Columns in Query

When I make a query in SQLAlchemy, I noticed that the queries use the AS keyword for each column. It sets the alias_name = column_name for every column.
For example, if I run the command print(session.query(DefaultLog)), it returns:
Note: DefaultLog is my table object.
SELECT default_log.id AS default_log_id, default_log.msg AS default_log_msg, default_log.logger_time AS default_log_logger_time, default_log.logger_line AS default_log_logger_line, default_log.logger_filepath AS default_log_logger_filepath, default_log.level AS default_log_level, default_log.logger_name AS default_log_logger_name, default_log.logger_method AS default_log_logger_method, default_log.hostname AS default_log_hostname
FROM default_log
Why does it use an alias = original name? Is there some way I can disable this behavior?
Thank you in advance!
Query.statement:
The full SELECT statement represented by this Query.
The statement by default will not have disambiguating labels applied
to the construct unless with_labels(True) is called first.
Using this model:
class DefaultLog(Base):
id = sa.Column(sa.Integer, primary_key=True)
msg = sa.Column(sa.String(128))
logger_time = sa.Column(sa.DateTime)
logger_line = sa.Column(sa.Integer)
print(session.query(DefaultLog).statement) shows:
SELECT defaultlog.id, defaultlog.msg, defaultlog.logger_time, defaultlog.logger_line
FROM defaultlog
print(session.query(DefaultLog).with_labels().statement) shows:
SELECT defaultlog.id AS defaultlog_id, defaultlog.msg AS defaultlog_msg, defaultlog.logger_time AS defaultlog_logger_time, defaultlog.logger_line AS defaultlog_logger_line
FROM defaultlog
You asked:
Why does it use an alias = original name?
From Query.with_labels docs:
...this is commonly used to disambiguate columns from multiple tables which have the same name.
So if you want to issue a single query that calls upon multiple tables, there is nothing stopping those tables having columns that share the same name.
Is there some way I can disable this behavior?
Also from the Query.with_labels docs:
When the Query actually issues SQL to load rows, it always uses column
labeling.
All of the methods that retrieve rows (get(), one(), one_or_none(), all() and iterating over the Query) route through the Query.__iter__() method:
def __iter__(self):
context = self._compile_context()
context.statement.use_labels = True
if self._autoflush and not self._populate_existing:
self.session._autoflush()
return self._execute_and_instances(context)
... where this line hard codes the label usage: context.statement.use_labels = True. So it is "baked in" and can't be disabled.
You can execute the statement without labels:
session.execute(session.query(DefaultLog).statement)
... but that takes the ORM out of the equation.
It is possible to hack sqlachemy Query class to not add labels. But one must be aware that this will breaks when a table is used twice in the query. For example, self join or join thought another table.
from sqlalchemy.orm import Query
class MyQuery(Query):
def __iter__(self):
"""Patch to disable auto labels"""
context = self._compile_context(labels=False)
context.statement.use_labels = False
if self._autoflush and not self._populate_existing:
self.session._autoflush()
return self._execute_and_instances(context)
And then use it according to mtth answer
sessionmaker(bind=engine, query_cls=MyQuery)
Printing an SQLAlchemy query is tricky and produced not human-friendly output. Not only columns but also bind params are in an odd place.
Here's how to do it correctly:
qry = session.query(SomeTable)
compiled = qry.statement.compile(dialect=session.bind.dialect, compile_kwargs={"literal_binds": True})
print(compiled)
Here's how to fix it for all your future work:
from sqlalchemy.orm import Query
class MyQuery(Query):
def __str__(self):
dialect = self.session.bind.dialect
compiled = self.statement.compile(dialect=dialect, compile_kwargs={"literal_binds": True})
return str(compiled)
To use:
session = sessionmaker(bind=engine, query_cls=MyQuery)()

Close SQLAlchemy connection

I have the following function in python:
def add_odm_object(obj, table_name, primary_key, unique_column):
db = create_engine('mysql+pymysql://root:#127.0.0.1/mydb')
metadata = MetaData(db)
t = Table(table_name, metadata, autoload=True)
s = t.select(t.c[unique_column] == obj[unique_column])
rs = s.execute()
r = rs.fetchone()
if not r:
i = t.insert()
i_res = i.execute(obj)
v_id = i_res.inserted_primary_key[0]
return v_id
else:
return r[primary_key]
This function looks if the object obj is in the database, and if it is not found, it saves it to the DB. Now, I have a problem. I call the above function in a loop many times. And after few hundred times, I get an error: user root has exceeded the max_user_connections resource (current value: 30) I tried to search for answers and for example the question: How to close sqlalchemy connection in MySQL recommends creating a conn = db.connect() object where dbis the engine and calling conn.close() after my query is completed.
But, where should I open and close the connection in my code? I am not working with the connection directly, but I'm using the Table() and MetaData functions in my code.
The engine is an expensive-to-create factory for database connections. Your application should call create_engine() exactly once per database server.
Similarly, the MetaData and Table objects describe a fixed schema object within a known database. These are also configurational constructs that in most cases are created once, just like classes, in a module.
In this case, your function seems to want to load up tables dynamically, which is fine; the MetaData object acts as a registry, which has the convenience feature that it will give you back an existing table if it already exists.
Within a Python function and especially within a loop, for best performance you typically want to refer to a single database connection only.
Taking these things into account, your module might look like:
# module level variable. can be initialized later,
# but generally just want to create this once.
db = create_engine('mysql+pymysql://root:#127.0.0.1/mydb')
# module level MetaData collection.
metadata = MetaData()
def add_odm_object(obj, table_name, primary_key, unique_column):
with db.begin() as connection:
# will load table_name exactly once, then store it persistently
# within the above MetaData
t = Table(table_name, metadata, autoload=True, autoload_with=conn)
s = t.select(t.c[unique_column] == obj[unique_column])
rs = connection.execute(s)
r = rs.fetchone()
if not r:
i_res = connection.execute(t.insert(), some_col=obj)
v_id = i_res.inserted_primary_key[0]
return v_id
else:
return r[primary_key]

How to execute raw SQL in Flask-SQLAlchemy app

How do you execute raw SQL in SQLAlchemy?
I have a python web app that runs on flask and interfaces to the database through SQLAlchemy.
I need a way to run the raw SQL. The query involves multiple table joins along with Inline views.
I've tried:
connection = db.session.connection()
connection.execute( <sql here> )
But I keep getting gateway errors.
Have you tried:
result = db.engine.execute("<sql here>")
or:
from sqlalchemy import text
sql = text('select name from penguins')
result = db.engine.execute(sql)
names = [row[0] for row in result]
print names
Note that db.engine.execute() is "connectionless", which is deprecated in SQLAlchemy 2.0.
SQL Alchemy session objects have their own execute method:
result = db.session.execute('SELECT * FROM my_table WHERE my_column = :val', {'val': 5})
All your application queries should be going through a session object, whether they're raw SQL or not. This ensures that the queries are properly managed by a transaction, which allows multiple queries in the same request to be committed or rolled back as a single unit. Going outside the transaction using the engine or the connection puts you at much greater risk of subtle, possibly hard to detect bugs that can leave you with corrupted data. Each request should be associated with only one transaction, and using db.session will ensure this is the case for your application.
Also take note that execute is designed for parameterized queries. Use parameters, like :val in the example, for any inputs to the query to protect yourself from SQL injection attacks. You can provide the value for these parameters by passing a dict as the second argument, where each key is the name of the parameter as it appears in the query. The exact syntax of the parameter itself may be different depending on your database, but all of the major relational databases support them in some form.
Assuming it's a SELECT query, this will return an iterable of RowProxy objects.
You can access individual columns with a variety of techniques:
for r in result:
print(r[0]) # Access by positional index
print(r['my_column']) # Access by column name as a string
r_dict = dict(r.items()) # convert to dict keyed by column names
Personally, I prefer to convert the results into namedtuples:
from collections import namedtuple
Record = namedtuple('Record', result.keys())
records = [Record(*r) for r in result.fetchall()]
for r in records:
print(r.my_column)
print(r)
If you're not using the Flask-SQLAlchemy extension, you can still easily use a session:
import sqlalchemy
from sqlalchemy.orm import sessionmaker, scoped_session
engine = sqlalchemy.create_engine('my connection string')
Session = scoped_session(sessionmaker(bind=engine))
s = Session()
result = s.execute('SELECT * FROM my_table WHERE my_column = :val', {'val': 5})
docs: SQL Expression Language Tutorial - Using Text
example:
from sqlalchemy.sql import text
connection = engine.connect()
# recommended
cmd = 'select * from Employees where EmployeeGroup = :group'
employeeGroup = 'Staff'
employees = connection.execute(text(cmd), group = employeeGroup)
# or - wee more difficult to interpret the command
employeeGroup = 'Staff'
employees = connection.execute(
text('select * from Employees where EmployeeGroup = :group'),
group = employeeGroup)
# or - notice the requirement to quote 'Staff'
employees = connection.execute(
text("select * from Employees where EmployeeGroup = 'Staff'"))
for employee in employees: logger.debug(employee)
# output
(0, 'Tim', 'Gurra', 'Staff', '991-509-9284')
(1, 'Jim', 'Carey', 'Staff', '832-252-1910')
(2, 'Lee', 'Asher', 'Staff', '897-747-1564')
(3, 'Ben', 'Hayes', 'Staff', '584-255-2631')
You can get the results of SELECT SQL queries using from_statement() and text() as shown here. You don't have to deal with tuples this way. As an example for a class User having the table name users you can try,
from sqlalchemy.sql import text
user = session.query(User).from_statement(
text("""SELECT * FROM users where name=:name""")
).params(name="ed").all()
return user
For SQLAlchemy ≥ 1.4
Starting in SQLAlchemy 1.4, connectionless or implicit execution has been deprecated, i.e.
db.engine.execute(...) # DEPRECATED
as well as bare strings as queries.
The new API requires an explicit connection, e.g.
from sqlalchemy import text
with db.engine.connect() as connection:
result = connection.execute(text("SELECT * FROM ..."))
for row in result:
# ...
Similarly, it’s encouraged to use an existing Session if one is available:
result = session.execute(sqlalchemy.text("SELECT * FROM ..."))
or using parameters:
session.execute(sqlalchemy.text("SELECT * FROM a_table WHERE a_column = :val"),
{'val': 5})
See "Connectionless Execution, Implicit Execution" in the documentation for more details.
result = db.engine.execute(text("<sql here>"))
executes the <sql here> but doesn't commit it unless you're on autocommit mode. So, inserts and updates wouldn't reflect in the database.
To commit after the changes, do
result = db.engine.execute(text("<sql here>").execution_options(autocommit=True))
This is a simplified answer of how to run SQL query from Flask Shell
First, map your module (if your module/app is manage.py in the principal folder and you are in a UNIX Operating system), run:
export FLASK_APP=manage
Run Flask shell
flask shell
Import what we need::
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app)
from sqlalchemy import text
Run your query:
result = db.engine.execute(text("<sql here>").execution_options(autocommit=True))
This use the currently database connection which has the application.
Flask-SQLAlchemy v: 3.0.x / SQLAlchemy v: 1.4
users = db.session.execute(db.select(User).order_by(User.title.desc()).limit(150)).scalars()
So basically for the latest stable version of the flask-sqlalchemy specifically the documentation suggests using the session.execute() method in conjunction with the db.select(Object).
Have you tried using connection.execute(text( <sql here> ), <bind params here> ) and bind parameters as described in the docs? This can help solve many parameter formatting and performance problems. Maybe the gateway error is a timeout? Bind parameters tend to make complex queries execute substantially faster.
If you want to avoid tuples, another way is by calling the first, one or all methods:
query = db.engine.execute("SELECT * FROM blogs "
"WHERE id = 1 ")
assert query.first().name == "Welcome to my blog"

Categories

Resources