Working with engines in sqlalchemy

Working with engines in sqlalchemy - python

I am curious about the proper way to close a connection when reading a query through pandas with read_sql_query. I've been using:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('credentials')
data = pd.read_sql_query(sql_query, engine)
though it seems like the traditional usage is this:
engine = create_engine('credentials')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
# ....
connection.close()
If I am not creating "connection" with engine.connect() as in the second approach, how do I close my connection? Or, is it closed after pd.read_sql_query is finished?

From http://docs.sqlalchemy.org/en/latest/core/connections.html
The Engine is intended to normally be a permanent fixture established up-front and maintained throughout the lifespan of an application. It is not intended to be created and disposed on a per-connection basis; it is instead a registry that maintains both a pool of connections as well as configurational information about the database and DBAPI in use, as well as some degree of internal caching of per-database resources.
The Engine object lazily allocates Connections on demand from an internal pool. These connections aren't necessarily closed when you call the close method of the individual Connection objects, just returned to that pool, or "checked in".
If you explicitly need the connections to be closed, you should check in all connections and then call engine.dispose(), or you may need to change the Pooling strategy your Engine object uses, see http://docs.sqlalchemy.org/en/latest/core/pooling.html#pool-switching.

Related

How to use multiple cursors in mysql.connector?

I want to execute multiple queries without each blocking other. I created multiple cursors and did the following but got mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query
import mysql.connector as mc
from threading import Thread
conn = mc.connect(#...username, password)
cur1 = conn.cursor()
cur2 = conn.cursor()
e1 = Thread(target=cur1.execute, args=("do sleep(30)",)) # A 'time taking' task
e2 = Thread(target=cur2.execute, args=("show databases",)) # A simple task
e1.start()
e2.start()
But I got that OperationalError. And reading a few other questions, some suggest that using multiple connections is better than multiple cursors. So shall I use multiple connections?

I don't have the full context of your situation to understand the performance considerations. Yes, starting a new connection could be considered heavy if you are operating under strict timing constraints that are short relative to the time it takes to start a new connection and you were forced to do that for every query...
But you can mitigate that with a shared connection pool that you create ahead of time, and then distribute your queries (in separate threads) over those connections as resources allow.
On the other hand, if all of your query times are fairly long relative to the time it takes to create a new connection, and you aren't looking to run more than a handful of queries in parallel, then it can be a reasonable option to create connections on demand. Just be aware that you will run into limits with the number of open connections if you try to go too far, as well as resource limitations on the database system itself. You probably don't want to do something like that against a shared database. Again, this is only a reasonable option within some very specific contexts.

SQLAlchemy - work with connections to DB and Sessions (not clear behavior and part in documentation)

I use SQLAlchemy (really good ORM but documentation is not clear enough) for communicating with PostgreSQL
Everything was great till one case when postgres "crashed" cause of maximum connection limits was reached: no more connections allowed (max_client_conn).
That case makes me think that I do smth wrong. After few experiments I figure out how not to face that issue again, but some questions left
Below you'll see code examples (in Python 3+, PostgreSQL settings are default) without and with mentioned issue, and what I'd like to hear eventually is answers on following questions:
What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
What is the best practices to open and close connections(and/or work with Session) when you use SQLAlchemy and PostgreSQL DB for multiple instances of script (or separate threads in script) that listens requests and has to have separate session to each of them? (I mean raw SQLAlchemy not Flask-SQLAlchemy or smth like this)
Working example of code without issue:
making connection to DB:
from sqlalchemy.pool import NullPool # does not work without NullPool, why?
def connect(user, password, db, host='localhost', port=5432):
"""Returns a connection and a metadata object"""
url = 'postgresql://{}:{}#{}:{}/{}'.format(user, password, host, port, db)
temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)
return temp_con, temp_meta
function to get session to work with DB:
from contextlib import contextmanager
#contextmanager
def session_scope():
con_loc, meta_loc = connect(db_user, db_pass, db_instance, 'localhost')
Session = sessionmaker(bind=con_loc)
"""Provide a transactional scope around a series of operations."""
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
query example:
with session_scope() as session:
entity = session.query(SomeEntity).first()
Failing example of code:
function to get session to work with DB:
def create_session():
# connect method the same as in first example
con, meta = connect(db_user, db_pass, db_instance, 'localhost')
Session = sessionmaker(bind=con)
session = Session()
return session
query example:
session = create_session()
entity = session.query(SomeEntity).first()
Hope you got the main idea

First of all you should not create engines repeatedly in your connect() function. The usual practice is to have a single global Engine instance per database URL in your application. The same goes for the Session class created by the sessionmaker().
What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
What you've programmed it to do, and if this seems unclear, read about context managers in general. In this case it commits or rolls back the session if an exception was raised within the block governed by the with-statement. Both actions return the connection used by the session to the pool, which in your case is a NullPool, so the connection is simply closed.
Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
and
from sqlalchemy.pool import NullPool # does not work without NullPool, why?
Without NullPool the engines you repeatedly create also pool connections, so if they for some reason do not go out of scope, or their refcounts are otherwise not zeroed, they will hold on to the connections even if the sessions return them. It is unclear if the sessions go out of scope timely in the second example, so they might also be holding on to the connections.
Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
The first example ends up closing the connection due to the use of the context manager that handles transactions properly and the NullPool, so the connection is returned to the bouncer, which is another pool layer.
The second example might never close the connections because it lacks the transaction handling, but that's unclear due to the example given. It also might be holding on to connections in the separate engines that you create.
The 4th point of your question set is pretty much covered by the official documentation in "Session Basics", especially "When do I construct a Session, when do I commit it, and when do I close it?" and "Is the session thread-safe?".
There's one exception: multiple instances of the script. You should not share an engine between processes, so in order to pool connections between them you need an external pool such as the PgBouncer.

What exactly does context manager do with connections and sessions?
Closing session and disposing connection or what?
The context manager in Python is used to create a runtime context for use with the with statement. Simply, when you run the code:
with session_scope() as session:
entity = session.query(SomeEntity).first()
session is the yielded session. So, to your question of what the context manager does with the connections and sessions, all you have to do is look at what happens after the yield to see what happens. In this case it's just:
try:
yield session
session.commit()
except:
session.rollback()
raise
If you trigger no exceptions, it will be session.commit(), which according to the SQLAlchemy docs will "Flush pending changes and commit the current transaction."
Why does first working example of code behave as example with issue
without NullPool as poolclass in "connect" method?
The poolclass argument is just telling SQLAlchemy which subclass of Pool to use. However, in the case where you pass NullPool here, you are telling SQLAlchemy to not use a pool. You're effectively disabling pooling connections when you pass in NullPool. From the docs: "to disable pooling, set poolclass to NullPool instead." I can't say for sure but using NullPool is probably contributing to your max_connection issues.
Why in the first example I got only 1 connection to db for all queries
but in second example I got separate connection for each query?
(please correct me if I understood it wrong, was checking it with
"pgbouncer")
I'm not exactly sure. I think this has to do with how in the first example, you are using a context manager so everything within the with block will use a session generator. In your second example, you created a function that initializes a new Session and returns it, so you're not getting back a generator. I also think this has to do with your NullPool use which prevents connection pooling. With NullPool each query execution is acquiring a connection on its own.
What is the best practices to open and close connections(and/or work
with Session) when you use SQLAlchemy and PostgreSQL DB for multiple
instances of script (or separate threads in script) that listens
requests and has to have separate session to each of them? (I mean raw
SQLAlchemy not Flask-SQLAlchemy or smth like this)
See the section Is the session thread-safe? for this, but you need to take a "share nothing" approach to your concurrency. So in your case, you need each instance of a script to share nothing between each other.
You probably want to check out Working with Engines and Connections. I don't think messing with sessions is where you want to be if concurrency is what you're working on. There's more information about the NullPool and concurrency there:
For a multiple-process application that uses the os.fork system call,
or for example the Python multiprocessing module, it’s usually
required that a separate Engine be used for each child process. This
is because the Engine maintains a reference to a connection pool that
ultimately references DBAPI connections - these tend to not be
portable across process boundaries. An Engine that is configured not
to use pooling (which is achieved via the usage of NullPool) does not
have this requirement.

#Ilja Everilä answer was mostly helpful
I'll leave edited code here, maybe it'll help someone
New code that works like I expected is following:
making connection to DB::
from sqlalchemy.pool import NullPool # will work even without NullPool in code
def connect(user, password, db, host='localhost', port=5432):
"""Returns a connection and a metadata object"""
url = 'postgresql://{}:{}#{}:{}/{}'.format(user, password, host, port, db)
temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)
return temp_con, temp_meta
one instance of connection and sessionmaker per app, for example where your main function:
from sqlalchemy.orm import sessionmaker
# create one connection and Sessionmaker to each instance of app (to avoid creating it repeatedly)
con, meta = connect(db_user, db_pass, db_instance, db_host)
session_maker = sessionmaker(bind=con) enter code here
function to get session with with statement:
from contextlib import contextmanager
from some_place import session_maker
#contextmanager
def session_scope() -> Session:
"""Provide a transactional scope around a series of operations."""
session = session_maker() # create session from SQLAlchemy sessionmaker
try:
yield session
session.commit()
except:
session.rollback()
raise
wrap transaction and use session:
with session_scope() as session:
entity = session.query(SomeEntity).first()

Triggering connection pools with sqlalchemy in flask

I am using Flask + SQLAlchemy (DB is Postgres) for my server, and am wondering how connection pooling happens. I know that it is enabled by default with a pool size of 5, but I don't know if my code triggers it.
Assuming I use the default flask SQLalchemy bridge :
db = SQLAlchemy(app)
And then use that object to place database calls like
db.session.query(......)
How does flask-sqlalchemy manage the connection pool behind the scene? Does it grab a new session every time I access db.session? When is this object returned to the pool (assuming I don't store it in a local variable)?
What is the correct pattern to write code to maximize concurrency + performance? If I access the DB multiple times in one serial method, is it a good idea to use db.session every time?
I was unable to find documentation on this manner, so I don't know what is happening behind the scene (the code works, but will it scale?)
Thanks!

You can use event registration - http://docs.sqlalchemy.org/en/latest/core/event.html#event-registration
There are many different event types that can be monitored, checkout, checkin, connect etc... - http://docs.sqlalchemy.org/en/latest/core/events.html
Here is a basic example from the docs on printing a when a new connection is established.
from sqlalchemy.event import listen
from sqlalchemy.pool import Pool
def my_on_connect(dbapi_con, connection_record):
print "New DBAPI connection:", dbapi_con
listen(Pool, 'connect', my_on_connect)

How to close a SQLAlchemy session?

Following what we commented in How to close sqlalchemy connection in MySQL, I am checking the connections that SQLAlchemy creates into my database and I cannot manage to close them without exiting from Python.
If I run this code in a python console, it keeps the session opened until I exit from python:
from sqlalchemy.orm import sessionmaker
from models import OneTable, get_engine
engine = get_engine(database="mydb")
session = sessionmaker(bind=engine)()
results = session.query(OneTable.company_name).all()
# some work with the data #
session.close()
and the only workaround I found to close it is to call engine.dispose() at the end.
As per the comments in the link I gave above, my question are now:
Why is engine.dispose() necessary to close sessions?
Doesn't session.close() suffice?

There's a central confusion here over the word "session". I'm not sure here, but it appears like you may be confusing the SQLAlchemy Session with a MySQL ##session, which refers to the scope of when you first make a connection to MySQL and when you disconnect.
These two concepts are not the same. A SQLAlchemy Session generally represents the scope of one or more transactions, upon a particular database connection.
Therefore, the answer to your question as literally asked, is to call session.close(), that is, "how to properly close a SQLAlchemy session".
However, the rest of your question indicates you'd like some functionality whereby when a particular Session is closed, you'd like the actual DBAPI connection to be closed as well.
What this basically means is that you wish to disable connection pooling. Which as other answers mention, easy enough, use NullPool.

session.close() will give the connection back to the connection pool of Engine and doesn't close the connection.
engine.dispose() will close all connections of the connection pool.
Engine will not use connection pool if you set poolclass=NullPool. So the connection (SQLAlchemy session) will close directly after session.close().

In LogicBank, I had a series of unittest tests. Each test copied a sqlite database prior to running, like this:
copyfile(src=nw_source, dst=nw_loc)
Each test ran individually, but failed in discover mode. It became apparent that somehow the database copy was not happening.
It appeared that perhaps unittest were not run serially. Not so - unittests do, in fact, run serially. So that was not the problem (logging that, to perhaps save somebody some time).
After a remarkable amount of thrashing, it appears that this was because the database was not completely closed from the prior test. Somehow that interfered with the copy, above. Mine is not to wonder why...
Thanks to the posts above, I resolved it like this:
def tearDown(file: str, started_at: str, engine: sqlalchemy.engine.base.Engine, session: sqlalchemy.orm.session.Session):
"""
close session & engine, banner
:param file: caller, usually __file__
:param started_at: eg, str(datetime.now())
:param engine: eg, nw.logic import session, engine
:param session: from nw.logic import session, engine
:return:
"""
session.close()
engine.dispose(). # NOTE: close required before dispose!
print("\n")
print("**********************")
print("** Test complete, SQLAlchemy session/engine closed for: " + file)
print("** Started: " + started_at + " Ended: " + str(datetime.now()))
print("**********************")

Proper scoping to instantiate a Connection() object in Pymongo

I'm running a Flask-based web app that uses Mongodb (with Pymongo for use in Python). Nearly every view access the database, so I want to make the most effective use of memory and CPU resources. I'm unsure what the most efficient method is for instantiating pymongo's Connection() object, which is used access and manipulate the database. Right now, I declare from pymongo import Connection at the top of my file, and then at the beginning of each view function I have:
def sampleViewFunction():
myCollection = Connection()['myDB']['myCollection']
## then use myCollection to manipulation the database
## more code...
The other way I could do it is declare at the top of my file:
from pymongo import Connection
myCollection = Connection()['myD']['myCollection']
And then later on, your code would just read:
def sampleViewFunction():
## no declaration of myCollection since it's a global variable
## then use myCollection to manipulation the database
## more code...
So the only difference is the declaration scope of myCollection. How do these two methods differ in the way memory is handled and CPU consumption? Since this is a web application, I'm thinking about scenarios where multiple users are the site simultaneously. I imagine there's a difference in the lifespan of the connection to the database, which I'm guessing could impact performance.

You should use the second method. When you create a connection in pymongo you by default create a connection pool. See the documentation for more details see here. This is the correct way of doing things. The default max_pool_size is 10 so this will give you 10 connections to your mongod instance(s). If you did it the other way and created a pool per function call you would
Be creating and destroying a connection with each function call which is wasteful of resources - both RAM and CPU.
Have no control over how many connections your code is going to create to the mongod - you could flood the mongod with connections

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.