PyQt, SQLAlchemy - what sessions are good for? - python

I have this class which I use to create and manage an SQLIte database:
def __init__(self, db_file = None, parent = None):
self.db = None
self.db_connection = None
self.db_file = str(db_file)
def db_open(self):
self.db = create_engine('sqlite:///' + self.db_file)
self.db_connection = self.db.connect()
def db_close(self):
self.db_connection.close()
def db_create_voltdrop(self):
metadata = MetaData()
tb_cable_brands = Table('cable_brands', metadata,
Column('id', Integer, primary_key=True),
Column('brand', String)
)
tb_cable_types = Table('cable_types', metadata,
Column('id', Integer, primary_key=True),
Column('brand_id', None, ForeignKey('cable_brands.id')),
Column('type', String),
Column('alpha', String)
)
tb_cable_data = Table('cable_data', metadata,
Column('id', Integer, primary_key=True),
Column('type_id', None, ForeignKey('cable_types.id')),
Column('size', String),
Column('resistance', Float)
)
metadata.create_all(self.db)
I instantiate this class when my MainWindow opens, use the DB and close the DB when the program quits.
I've just started learning SQLAlchemy. That code works fine. And then I came across sessions in SQLAlchemy which are also used to create and manage databases. Which way is better? What advantage do sessions have over the above way? Thank you.

The best practice for the session management is to declare it in global scope and use it.
The doc given in sqlalchemy says
When you write your application, place the result of the sessionmaker() call at the global level. The resulting Session class, configured for your application, should then be used by the rest of the applcation as the source of new Session instances.
So you have to create in any package level session. You can refer this link.

Short answer: for your example (where create_all issues DDLs) it is not really important (and I am not even sure if SA supports DDL transactions), but whenever you add/delete/modify/query the objects themselves, Sessions are the way to go. See Using the Sessions for more info.
More info:
Technically, the following statement is not correct: ... then I came across sessions in SQLAlchemy which are also *used to create and manage databases*.
Sessions are not used to create and manage databases, but rather to provide a UnitOfWork pattern for database operations.
A simple view is to see sessions as SQL Transactions: SA Sessions to SA objects are what SQL transactions are to DML (data modification) statements. Your particular example is generating DDL statements (data definition), and many RDBMS do not even support transactions for DDLs (you cannot rollback CREATE TABLE statement, but should use DROP TABLE to cancel your work).

Related

How to initialize SQL Alchemy engine, session and table globally

I'm developing a python application where most of its functions will interact (create, read, update and delete) with a specific table in a MySQL database. I know that I can query this specific table with the following code:
engine = create_engine(
f"mysql+pymysql://{username}:{password}#{host}:{port}",
pool_pre_ping=True
)
meta = MetaData(engine)
my_table = Table(
'my_table',
meta,
autoload=True,
schema=db_name
)
dbsession = sessionmaker(bind=engine)
session = dbsession()
# example query to table
results = session.query(my_table).filter(my_table.columns.id >=1)
results.all()
However, I do not understand how to make these definitions (engine, meta, table, session) global to all of my functions. Should I define these things in my init.py and then pass them along as function arguments? Should I define a big class and initialize them during the class init?
My goal is to be able to query that table in any of my functions at any time without having to worry if the connection has gone away. According to the SQL Alchemy docs:
Just one time, somewhere in your application’s global scope. It should be looked upon as part of your application’s configuration. If your application has three .py files in a package, you could, for example, place the sessionmaker line in your init.py file; from that point on your other modules say “from mypackage import Session”. That way, everyone else just uses Session(), and the configuration of that session is controlled by that central point.
Ok, but what about the engine, table and meta? Do I need to worry about those?
If you are working with a single table then the reflected table instance (my_table) and the engine should be all you need to expose globally.
the metadata object (meta) not required for querying, but is available as my_table.metadata if required
sessions are not required because you do not appear to be using the SQLAlchemy ORM.
The engine maintains a pool of connections, which you can check out to run queries (don't forget to close them though). This example code uses context managers to ensure that transactions are committed and connections are closed:
# Check out a connection
with engine.connect() as conn:
# Start a transaction
with conn.begin():
q = select(my_table).where(my_table.c.id >= 1)
result = conn.execute(q)

local "merge" in SqlAlchemy

I'm wondering if there's a way to merge new mappings with database data, such as with session.merge, but without updating the database? Like when I do a pull with git, to get a local state which is a merge of the remote and previous local state(which might contain unpushed commits), but without updating the remote state. Here, I want to get a local "view" of the state that would result from doing a session.merge.
Maybe making savepoint(with session.begin_nested), then doing a session.merge and later on a session.rollback would accomplish this, but is there a way that doesn't require this kind of transaction management(which can imply actual undo operations on the db, so not terribly efficient for my purposes)?
Would using session.no_autoflush do the trick?
Example code for what I want to do:
local_merge = session.merge_local(Model(...))
# do stuff with local_merge...
remotes = session.query(Model).all()
# remotes should remain "old" db versions, as no data was pushed
return something
Edit: So I think I may be wrong on the rollback method being inefficient. At least, as long as no commit are emitted, there shouldn't be expensive undo operations, only chucking out the transaction.
Merge will only update the database because of the auto-flush. You can turn that off temporarily using the session.no_autoflush context manager, or just setting autoflush=False on your session. You can also pass autoflush=False to your sessionmaker.
One thing to watch out for is how the results of the session.query(Model).all() will interact with the unflushed, changed, local objects.
Because the session maintains a record of unique objects (against primary keys) in an Identity Map, you will not be able to have two versions of the same object in the same session.
Here's an example which shows how local changes interact with autoflush=False:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
def __repr__(self):
return "<User(name='%s')>" % self.name
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
ed_user = User(name='ed')
session.add(ed_user)
session.commit()
ed_again = session.query(User).get(1)
ed_again.name = "not ed"
with session.no_autoflush:
ed_three = session.query(User).get(1)
all_eds = session.query(User).all()
print(ed_again, id(ed_again))
print(ed_three, id(ed_three))
print(all_eds, id(all_eds[0]))
<User(name='not ed')> 139664151068624
<User(name='not ed')> 139664151068624
[<User(name='not ed')>] 139664151068624
Yep, it's not able to get the original Ed from the database, even with no_autoflush - this is to be expected for get(), since it will check the identity map first before the database, and won't bother querying the DB if it finds it in the identity map. But with query.all(), it queries the database, finds that one of the objects it gets back was already in the identity map, and returns that reference instead so as to maintain uniqueness of objects in the session (which was also my hunch, but I couldn't find this explicitly spelled out in the docs).
You could do something like expunge objects from a session, but I think the easiest way to have an old and new copy of the merged objects is to use two separate sessions, one where the changes will be merged and possibly committed and one which you can use to check the existing state of objects in the database.

Get Primary Key column name from table in sqlalchemy (Core)

I am using the core of Sqlalchemy so I am not using a declarative base class like in other similar questions.
How to get the primary key of a table using the engine?
So, I just ran into the same problem. You need to create a Table object that reflects the table for which you are looking to find the primary key.
from sqlalchemy import create_engine, Table, MetaData
dbUrl = 'postgres://localhost:5432/databaseName' #will be different for different db's
engine = create_engine(dbUrl)
meta = MetaData()
table = Table('tableName', meta, autoload=True, autoload_with=engine)
primaryKeyColName = table.primary_key.columns.values()[0].name
The Table construct above is useful for a number of different functions. I use it quite a bit for managing geospatial tables since I do not really like any of the current packages out there.
In your comment you mention that you are not defining a table...I think that means that you aren't creating a sqlalchemy model of the the table. With the approach above, you don't need to do that and can gather all sorts of information from a table in a more dynamic fashion. Especially useful if you are be handed someone else's messy database!
Hope that helps.
I'd like to comment, but I do not have enough reputation for that.
Building on greenbergé answer:
from sqlalchemy import create_engine, Table, MetaData
dbUrl = 'postgres://localhost:5432/databaseName' #will be different for different db's
engine = create_engine(dbUrl)
meta = MetaData()
table = Table('tableName', meta, autoload=True, autoload_with=engine)
[0] is not always the PK, only if the PK has only one column.
table.primary_key.columns.values() is a list.
In order to get all columns of a multi-column pk you could use:
primaryKeyColNames = [pk_column.name for pk_column in table.primary_key.columns.values()]
The two answers were given for retrieving the primary key from a metadata object.
Even if it works well, sometimes one can look for retrieving the primary key from an instance of a SQL Alchemy model, without even knowing what actually the model class is (for example if you search for having a helper function called, let's say, get_primary_key, that would accept an instance of a DB Model and output the primary keys).
For this we can use the inspect function from the inspection module :
from sqlalchemy import inspect
def get_primary_key(model_instance):
inspector = inspect(model_instance)
model_columns = inspector.mapper.columns
return [c.description for c in model_columns if c.primary_key]
You could also directly use the __mapper__ object
def get_primary_key(model_instance):
model_columns = model_instance.__mapper__.columns
return [c.description for c in model_columns if c.primary_key]
for a reflected table this works:
insp=inspect(self.db.engine)
pk_temp=insp.get_pk_constraint(self.__tablename__)['constrained_columns']

User defined function creation in SQLAlchemy

SQLAlchemy provides a very clean interface for defining database tables:
engine = create_engine('sqlite:///:memory:')
metadata = MetaData()
user = Table('user', metadata,
Column('user_id', Integer, primary_key = True),
Column('user_name', String(16), nullable = False),
Column('email_address', String(60), key='email'),
Column('password', String(20), nullable = False)
)
user_prefs = Table('user_prefs', metadata,
Column('pref_id', Integer, primary_key=True),
Column('user_id', Integer, ForeignKey("user.user_id"), nullable=False),
Column('pref_name', String(40), nullable=False),
Column('pref_value', String(100))
)
And once these tables have been defined, it is very easy to create these tables with the metadata.create_all(engine) function. This is especially nice for testing, when you want to create tables that will not interfere with the existing tables being used in production.
A project I am currently working on relies heavily on user defined functions in postgres. Is there a clean way to define these functions using SQLAlchemy in order for the metadata.create_all(engine) to properly create the functions along with the appropriate tables?
I've been working on something similar today. So far the best way I've found to do it is using sqlalchemy before_create event listeners. For general functions, you can bind the events to the metadata, but for table specific functions you could bind to the tables. For example:
import sqlalchemy
from sqlalchemy.schema import DDL
sqlalchemy.event.listen(
metadata,
'before_create',
DDL('CREATE OR REPLACE FUNCTION myfunc() ...')
)
You can just replace metadata with your table if you wanted to create the function before a table was created.
This DDL seems to run every time you call metadata.create_all so it's important to use CREATE OR REPLACE. If you wanted a bit more control over when functions are created, you might be better looking into migrations with alembic or similar.
Some uses of DDL are described in the sqlalchemy docs here: http://docs.sqlalchemy.org/en/latest/core/ddl.html#custom-ddl

Why does this SQLAlchemy example commit changes to the DB?

This example illustrates a mystery I encountered in an application I am building. The application needs to support an option allowing the user to exercise the code without actually committing changes to the DB. However, when I added this option, I discovered that changes were persisted to the DB even when I did not call the commit() method.
My specific question can be found in the code comments. The underlying goal is to have a clearer understanding of when and why SQLAlchemy will commit to the DB.
My broader question is whether my application should (a) use a global Session instance, or (b) use a global Session class, from which particular instances would be instantiated. Based on this example, I'm starting to think that the correct answer is (b). Is that right? Edit: this SQLAlchemy documentation suggests that (b) is recommended.
import sys
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key = True)
name = Column(String)
age = Column(Integer)
def __init__(self, name, age = 0):
self.name = name
self.age = 0
def __repr__(self):
return "<User(name='{0}', age={1})>".format(self.name, self.age)
engine = create_engine('sqlite://', echo = False)
Base.metadata.create_all(engine)
Session = sessionmaker()
Session.configure(bind=engine)
global_session = Session() # A global Session instance.
commit_ages = False # Whether to commit in modify_ages().
use_global = True # If True, modify_ages() will commit, regardless
# of the value of commit_ages. Why?
def get_session():
return global_session if use_global else Session()
def add_users(names):
s = get_session()
s.add_all(User(nm) for nm in names)
s.commit()
def list_users():
s = get_session()
for u in s.query(User): print ' ', u
def modify_ages():
s = get_session()
n = 0
for u in s.query(User):
n += 10
u.age = n
if commit_ages: s.commit()
add_users(('A', 'B', 'C'))
print '\nBefore:'
list_users()
modify_ages()
print '\nAfter:'
list_users()
tl;dr - The updates are not actually committed to the database-- they are part of an uncommitted transaction in progress.
I made 2 separate changes to your call to create_engine(). (Other than this one line, I'm using your code exactly as posted.)
The first was
engine = create_engine('sqlite://', echo = True)
This provides some useful information. I'm not going to post the entire output here, but notice that no SQL update commands are issued until after the second call to list_users() is made:
...
After:
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 UPDATE users SET age=? WHERE users.id = ?
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 (10, 1)
...
This is a clue that the data is not persisted, but kept around in the session object.
The second change I made was to persist the database to a file with
engine = create_engine('sqlite:///db.sqlite', echo = True)
Running the script again provides the same output as before for the second call to list_users():
<User(name='A', age=10)>
<User(name='B', age=20)>
<User(name='C', age=30)>
However, if you now open the db we just created and query it's contents, you can see that the added users were persisted to the database, but the age modifications were not:
$ sqlite3 db.sqlite "select * from users"
1|A|0
2|B|0
3|C|0
So, the second call to list_users() is getting its values from the session object, not from the database, because there is a transaction in progress that hasn't been committed yet. To prove this, add the following lines to the end of your script:
s = get_session()
s.rollback()
print '\nAfter rollback:'
list_users()
Since you state you are actually using MySQL on the system you are seeing the problem, check the engine type the table was created with. The default is MyISAM, which does not support ACID transactions. Make sure you are using the InnoDB engine, which does do ACID transactions.
You can see which engine a table is using with
show create table users;
You can change the db engine for a table with alter table:
alter table users engine="InnoDB";
1. the example: Just to make sure that (or check if) the session does not commit the changes, it is enough to call expunge_all on the session object. This will most probably prove that the changes are not actually committed:
....
print '\nAfter:'
get_session().expunge_all()
list_users()
2. mysql: As you already mentioned, the sqlite example might not reflect what you actually see when using mysql. As documented in sqlalchemy - MySQL - Storage Engines, the most likely reason for your problem is the usage of non-transactional storage engines (like MyISAM), which results in an autocommit mode of execution.
3. session scope: Although having one global session sounds like a quest for a problem, using new session for every tiny little request is also not a great idea. You should think of a session as a transaction/unit-of-work. I find the usage of the contextual sessions the best of two worlds, where you do not have to pass the session object in the hierarchy of method calls, and at the same time you are given a pretty good safety in the multi-threaded environment. I do use the local session once in a while where I know I do not want to interact with the currently running transaction (session).
Note that the defaults of create_session() are the opposite of that of sessionmaker(): autoflush and expire_on_commit are False, autocommit is True.
global_session is already instantiated when you call modify_ages() and you've already committed to the database. If you re-instantiate global_session after you commit, it should start a new transaction.
My guess is since you've already committed and are re-using the same object, each additional modification is automatically committed.

Categories

Resources