I have recently been seeing MySQL server has gone away in my application logs for a daemon that I have running utilizing SQLAlchemy.
I wrap every database query or update in a decorator that should close all the sessions after finishing. In theory, that should also close the connection.
My decorator looks like
def dbop(meth):
#wraps(meth)
def nf(self, *args, **kwargs):
self.session = self.sm()
res = meth(self, *args, **kwargs)
self.session.commit()
self.session.close()
return res
return nf
I also initialize the database at the top of my Python script with:
def initdb(self):
engine = create_engine(db_url)
Base.metadata.create_all(engine)
self.sm = sessionmaker(bind=engine,
autocommit=False,
autoflush=False,
expire_on_commit=False)
To my understanding, I am getting that error because my connection is timing out. Why would this be the case if I wrap each method in that decorator above? Is this because expire_on_commit cause queries even after connection is closed and might reopen them? Is this because Base.metadata.create_all causes SQL to be executed which opens a connection that isn't closed?
Your session is bound to an "engine", which in turn uses a connection pool. Each time SQLAlchemy requires a connection it checks one out from the pool, and if it is done with it, it is returned to the pool but it is not closed! This is a common strategy to reduce overhead from opening/closing connections. All the options you set above have only an impact on the session, not the connection!
By default, the connections in the pool are kept open indefinitely.
But MySQL will automatically close the connection after a certain amount of inactivity (See wait_timeout).
The issue here is that your Python process will not be informed by the MySQL server that the connection was closed if it falls into the inactivity timeout. Instead, the next time a query is sent to that connection, the Python will discover that the connection is no longer available. A similar thing can happen if the connection is lost due to other reasons, for example forced service restarts which don't wait for open connections to be cleanly closed (for example, using the "immediate" option in postgres restarts).
This is when you run into the exception.
SQLAlchemy gives you various strategies of dealing with this, which are well documented int the "Dealing with Disconnects" section as mentioned by #lukas-graf
If you jump through some hoops you can get a reference to the connection which is currently in use by the session. You could close it that way but I strongly recommend against this. Instead, refer to the "Dealing with Disconnects" session above, and let SQLAlchemy deal with this for you transparently. In your case, setting the pool_recycle option might solve your problem.
Related
I am using sqlalchemy with pandas.to_sql() to copy some data into SQL server. After the copying is done and engine.dispose() is called, I see the following INFO message in logs:
[INFO] sqlalchemy.pool.impl.QueuePool: Pool recreating
I was wondering if this message means that even though I dispose of the engine, the connection is still being kept live. And if so, what would be the safe and correct way to do it?
The connection is not alive. But you can restart the connection with the help of the Pool object.
This is described in detail in the documentation:
The Engine has logic which can detect disconnection events and refresh the pool automatically.
When the Connection attempts to use a DBAPI connection, and an exception is raised that corresponds to a “disconnect” event, the connection is invalidated. The Connection then calls the Pool.recreate() method, effectively invalidating all connections not currently checked out so that they are replaced with new ones upon next checkout.
Also check out the code example in the link. It is really neat.
If there is a connection which is already checked out from the pool, those connections will still be alive as they are being referenced by something.
You may refer to following links for detailed information.
https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L2512-L2539
https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal
https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Engine.dispose
If you are using QueuePool (its by default if you don't specify any poolClass when creating engine object) and doesn't want any connections to be kept alive then you may close the connection [conn.close() or session.close()] which in-turn returns the connection back to the pool (called checked-in connection). Later when you call engine.dispose() after you copy job is done, that will take care of closing the connection really and won't be keep any checked-in connections alive
I wrote a Python app which connects to Postgres DB using SQLAlchemy. The engine and session are defined in db.py
engine = create_engine(URL(**settings.DATABASE))
session = scoped_session(sessionmaker(bind=engine))
Most of the db operations are in service.py which imports session from db.py
from app.db import engine, session
def get_door_ids():
result = session.query(ControllerDetail.door_id).distinct().all()
ids = [c[0] for c in result]
return ids
def get_last_health_cd(door_id):
result = session.query(ControllerDetail.door_health_cd).filter(ControllerDetail.door_id == door_id).order_by(ControllerDetail.etl_insert_ts.desc()).first()
return result[0]
Now everything works great, but the problem is I need to run the same thing every couple minutes repeatedly. So I have the following code in my main module:
try:
while True:
run_task()
time.sleep(120)
except KeyboardInterrupt:
print('Manual break by user')
The DB time out idle connections every minute. So I get error every time the process goes to sleep for more than 1 minute.
psycopg2.InternalError: terminating connection due to idle-in-transaction timeout
SSL connection has been closed unexpectedly
I wonder if there is a way to close the session and reopen it after time.sleep(120) so it won't get timed out. Maybe in the main modole, I import session from db as a global variable and somehow pass it to methods in services. How do I do that? I can't import session in services from main instead of db since main imports functions from services.
First and foremost, you should end your transactions timely by committing / rolling back. Closing the session will also implicitly rollback. In case of scoped session removing will close the session (and remove the session from the registry). All in all "When do I construct a Session, when do I commit it, and when do I close it?" is a good read at this point.
What might surprise some people is that the Session starts a new transaction as soon as you start communicating with the database. It is then your job to end that transaction. Without seeing what run_task() actually does it's hard to say which part should be handling the session's lifetime, but it's safe to say that if the task runs every couple minutes, you should not leave a transaction hanging for that long.
Increase the timeout
create_engine(URL(**settings.DATABASE), connect_args={'connect_timeout': 10})
Create a new session when you get an error
you can close the connection by session.close()
I use SQLAlchemy (really good ORM but documentation is not clear enough) for communicating with PostgreSQL
Everything was great till one case when postgres "crashed" cause of maximum connection limits was reached: no more connections allowed (max_client_conn).
That case makes me think that I do smth wrong. After few experiments I figure out how not to face that issue again, but some questions left
Below you'll see code examples (in Python 3+, PostgreSQL settings are default) without and with mentioned issue, and what I'd like to hear eventually is answers on following questions:
What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
What is the best practices to open and close connections(and/or work with Session) when you use SQLAlchemy and PostgreSQL DB for multiple instances of script (or separate threads in script) that listens requests and has to have separate session to each of them? (I mean raw SQLAlchemy not Flask-SQLAlchemy or smth like this)
Working example of code without issue:
making connection to DB:
from sqlalchemy.pool import NullPool # does not work without NullPool, why?
def connect(user, password, db, host='localhost', port=5432):
"""Returns a connection and a metadata object"""
url = 'postgresql://{}:{}#{}:{}/{}'.format(user, password, host, port, db)
temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)
return temp_con, temp_meta
function to get session to work with DB:
from contextlib import contextmanager
#contextmanager
def session_scope():
con_loc, meta_loc = connect(db_user, db_pass, db_instance, 'localhost')
Session = sessionmaker(bind=con_loc)
"""Provide a transactional scope around a series of operations."""
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
query example:
with session_scope() as session:
entity = session.query(SomeEntity).first()
Failing example of code:
function to get session to work with DB:
def create_session():
# connect method the same as in first example
con, meta = connect(db_user, db_pass, db_instance, 'localhost')
Session = sessionmaker(bind=con)
session = Session()
return session
query example:
session = create_session()
entity = session.query(SomeEntity).first()
Hope you got the main idea
First of all you should not create engines repeatedly in your connect() function. The usual practice is to have a single global Engine instance per database URL in your application. The same goes for the Session class created by the sessionmaker().
What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
What you've programmed it to do, and if this seems unclear, read about context managers in general. In this case it commits or rolls back the session if an exception was raised within the block governed by the with-statement. Both actions return the connection used by the session to the pool, which in your case is a NullPool, so the connection is simply closed.
Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
and
from sqlalchemy.pool import NullPool # does not work without NullPool, why?
Without NullPool the engines you repeatedly create also pool connections, so if they for some reason do not go out of scope, or their refcounts are otherwise not zeroed, they will hold on to the connections even if the sessions return them. It is unclear if the sessions go out of scope timely in the second example, so they might also be holding on to the connections.
Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
The first example ends up closing the connection due to the use of the context manager that handles transactions properly and the NullPool, so the connection is returned to the bouncer, which is another pool layer.
The second example might never close the connections because it lacks the transaction handling, but that's unclear due to the example given. It also might be holding on to connections in the separate engines that you create.
The 4th point of your question set is pretty much covered by the official documentation in "Session Basics", especially "When do I construct a Session, when do I commit it, and when do I close it?" and "Is the session thread-safe?".
There's one exception: multiple instances of the script. You should not share an engine between processes, so in order to pool connections between them you need an external pool such as the PgBouncer.
What exactly does context manager do with connections and sessions?
Closing session and disposing connection or what?
The context manager in Python is used to create a runtime context for use with the with statement. Simply, when you run the code:
with session_scope() as session:
entity = session.query(SomeEntity).first()
session is the yielded session. So, to your question of what the context manager does with the connections and sessions, all you have to do is look at what happens after the yield to see what happens. In this case it's just:
try:
yield session
session.commit()
except:
session.rollback()
raise
If you trigger no exceptions, it will be session.commit(), which according to the SQLAlchemy docs will "Flush pending changes and commit the current transaction."
Why does first working example of code behave as example with issue
without NullPool as poolclass in "connect" method?
The poolclass argument is just telling SQLAlchemy which subclass of Pool to use. However, in the case where you pass NullPool here, you are telling SQLAlchemy to not use a pool. You're effectively disabling pooling connections when you pass in NullPool. From the docs: "to disable pooling, set poolclass to NullPool instead." I can't say for sure but using NullPool is probably contributing to your max_connection issues.
Why in the first example I got only 1 connection to db for all queries
but in second example I got separate connection for each query?
(please correct me if I understood it wrong, was checking it with
"pgbouncer")
I'm not exactly sure. I think this has to do with how in the first example, you are using a context manager so everything within the with block will use a session generator. In your second example, you created a function that initializes a new Session and returns it, so you're not getting back a generator. I also think this has to do with your NullPool use which prevents connection pooling. With NullPool each query execution is acquiring a connection on its own.
What is the best practices to open and close connections(and/or work
with Session) when you use SQLAlchemy and PostgreSQL DB for multiple
instances of script (or separate threads in script) that listens
requests and has to have separate session to each of them? (I mean raw
SQLAlchemy not Flask-SQLAlchemy or smth like this)
See the section Is the session thread-safe? for this, but you need to take a "share nothing" approach to your concurrency. So in your case, you need each instance of a script to share nothing between each other.
You probably want to check out Working with Engines and Connections. I don't think messing with sessions is where you want to be if concurrency is what you're working on. There's more information about the NullPool and concurrency there:
For a multiple-process application that uses the os.fork system call,
or for example the Python multiprocessing module, it’s usually
required that a separate Engine be used for each child process. This
is because the Engine maintains a reference to a connection pool that
ultimately references DBAPI connections - these tend to not be
portable across process boundaries. An Engine that is configured not
to use pooling (which is achieved via the usage of NullPool) does not
have this requirement.
#Ilja Everilä answer was mostly helpful
I'll leave edited code here, maybe it'll help someone
New code that works like I expected is following:
making connection to DB::
from sqlalchemy.pool import NullPool # will work even without NullPool in code
def connect(user, password, db, host='localhost', port=5432):
"""Returns a connection and a metadata object"""
url = 'postgresql://{}:{}#{}:{}/{}'.format(user, password, host, port, db)
temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)
return temp_con, temp_meta
one instance of connection and sessionmaker per app, for example where your main function:
from sqlalchemy.orm import sessionmaker
# create one connection and Sessionmaker to each instance of app (to avoid creating it repeatedly)
con, meta = connect(db_user, db_pass, db_instance, db_host)
session_maker = sessionmaker(bind=con) enter code here
function to get session with with statement:
from contextlib import contextmanager
from some_place import session_maker
#contextmanager
def session_scope() -> Session:
"""Provide a transactional scope around a series of operations."""
session = session_maker() # create session from SQLAlchemy sessionmaker
try:
yield session
session.commit()
except:
session.rollback()
raise
wrap transaction and use session:
with session_scope() as session:
entity = session.query(SomeEntity).first()
I'm experiencing some strange bugs which seem to be caused by connections used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone has a clue whats going on here.
We're working on a Pyramid (version 1.5b1) and use Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we get errors related to the db connection or session, most of the time this would be a cursor already closed or This Connection is closed error, but we get other related exceptions too:
(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further
There is no silver bullet to reproduce them, only by refreshing many times they are bound to happen one at some point. So i made a script using multi-mechanize to spam different urls concurrently and see where and when it happens.
It appears the url triggered doesn't really matter, the errors happen when there are concurrent requests that span a longer time (and other requests get served in between). This seems to indicate there is some kind of threading problem; that either the session or connection is shared among different threads.
After googling for these issues I found a lot of topics, most of them tell to use scoped sessions, but the thing is we do use them already:
db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))
db_meta = MetaData()
We have a BaseModel for all our orm objects:
BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)
We use the pyramid_tm tween to handle transactions during the request
We hook db_session.remove() to the pyramid NewResponse event (which is fired after everything has run). I also tried putting it in a seperate tween running after pyramid_tm or even not doing it at all, none of these seem to have effect, so the response event seemed like the most clean place to put it.
We create the engine in our main entrypoint of our pyramid project and use a NullPool and leave connection pooling to pgbouncer. We also configure the session and the bind for our BaseModel here:
engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool)
db_session.configure(bind=engine, query_cls=FilterQuery)
BaseModel.metadata.bind = engine
config.add_subscriber(cleanup_db_session, NewResponse)
return config.make_wsgi_app()
In our app we access all db operation using:
from project.db import db_session
...
db_session.query(MyModel).filter(...)
db_session.execute(...)
We use psycopg2==2.5.2 to handle the connection to postgres with pgbouncer in between
I made sure no references to db_session or connections are saved anywhere (which could result in other threads reusing them)
I also tried the spamming test using different webservers, using waitress and cogen i got the errors very easily, using wsgiref we unsurprisingly have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 workers, gevent) i didn't get any errors.
Given the differences in the webserver used, I thought it either has to do with some webservers handling requests in threads and some using new processes (maybe a forking problem)? To complicate matters even more, when time went on and i did some new tests, the problem had gone away in waitress but now happened with gunicorn (when using gevent)! I have no clue on how to go debugging this...
Finally, to test what happens to the connection, i attached an attribute to the connection at the start of the cursor execute and tried to read the attribute out at the end of the execute:
#event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
conn.pdtb_start_timer = time.time()
#event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
print conn.pdtb_start_timer
Surprisingly this sometimes raised an exception: 'Connection' object has no attribute 'pdtb_start_timer'
Which struck me as very strange.. I found one discussion about something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ
And tried adding strategy='threadlocal' to the engine, which from what i understand should force 1 connection for the tread. But it didn't have any effect on the errors im seeing.. (besides some unittests failing because i need two different sessions/connections for some tests and this forces 1 connection to be associated)
Does anyone have any idea what might go on here or have some more pointers on how to attack this problem?
Thanks in advance!
Matthijs Blaas
Update: The errors where caused by multiple commands that where send in one prepared sql statement. Psycopg2 seems to allow this, but apparently it can cause strange issues. The PG8000 connector is more strict and bailed out on the multiple commands, sending one command fixed the issue!
Following what we commented in How to close sqlalchemy connection in MySQL, I am checking the connections that SQLAlchemy creates into my database and I cannot manage to close them without exiting from Python.
If I run this code in a python console, it keeps the session opened until I exit from python:
from sqlalchemy.orm import sessionmaker
from models import OneTable, get_engine
engine = get_engine(database="mydb")
session = sessionmaker(bind=engine)()
results = session.query(OneTable.company_name).all()
# some work with the data #
session.close()
and the only workaround I found to close it is to call engine.dispose() at the end.
As per the comments in the link I gave above, my question are now:
Why is engine.dispose() necessary to close sessions?
Doesn't session.close() suffice?
There's a central confusion here over the word "session". I'm not sure here, but it appears like you may be confusing the SQLAlchemy Session with a MySQL ##session, which refers to the scope of when you first make a connection to MySQL and when you disconnect.
These two concepts are not the same. A SQLAlchemy Session generally represents the scope of one or more transactions, upon a particular database connection.
Therefore, the answer to your question as literally asked, is to call session.close(), that is, "how to properly close a SQLAlchemy session".
However, the rest of your question indicates you'd like some functionality whereby when a particular Session is closed, you'd like the actual DBAPI connection to be closed as well.
What this basically means is that you wish to disable connection pooling. Which as other answers mention, easy enough, use NullPool.
session.close() will give the connection back to the connection pool of Engine and doesn't close the connection.
engine.dispose() will close all connections of the connection pool.
Engine will not use connection pool if you set poolclass=NullPool. So the connection (SQLAlchemy session) will close directly after session.close().
In LogicBank, I had a series of unittest tests. Each test copied a sqlite database prior to running, like this:
copyfile(src=nw_source, dst=nw_loc)
Each test ran individually, but failed in discover mode. It became apparent that somehow the database copy was not happening.
It appeared that perhaps unittest were not run serially. Not so - unittests do, in fact, run serially. So that was not the problem (logging that, to perhaps save somebody some time).
After a remarkable amount of thrashing, it appears that this was because the database was not completely closed from the prior test. Somehow that interfered with the copy, above. Mine is not to wonder why...
Thanks to the posts above, I resolved it like this:
def tearDown(file: str, started_at: str, engine: sqlalchemy.engine.base.Engine, session: sqlalchemy.orm.session.Session):
"""
close session & engine, banner
:param file: caller, usually __file__
:param started_at: eg, str(datetime.now())
:param engine: eg, nw.logic import session, engine
:param session: from nw.logic import session, engine
:return:
"""
session.close()
engine.dispose(). # NOTE: close required before dispose!
print("\n")
print("**********************")
print("** Test complete, SQLAlchemy session/engine closed for: " + file)
print("** Started: " + started_at + " Ended: " + str(datetime.now()))
print("**********************")