I'm playing around with SQLAlchemy core in Python, and I've read over the documentation numerous times and still need clarification about engine.execute() vs connection.execute().
As I understand it, engine.execute() is the same as doing connection.execute(), followed by connection.close().
The tutorials I've followed let me to use this in my code:
Initial setup in script
try:
engine = db.create_engine("postgres://user:pass#ip/dbname", connect_args={'connect_timeout': 5})
connection = engine.connect()
metadata = db.MetaData()
except exc.OperationalError:
print_error(f":: Could not connect to {db_ip}!")
sys.exit()
Then, I have functions that handle my database access, for example:
def add_user(a_username):
query = db.insert(table_users).values(username=a_username)
connection.execute(query)
Am I supposed to be calling connection.close() before my script ends? Or is that handled efficiently enough by itself? Would I be better off closing the connection at the end of add_user(), or is that inefficient?
If I do need to be calling connection.close() before the script ends, does that mean interrupting the script will cause hanging connections on my Postgres DB?
I found this post helpful to better understand the different interaction paradigms in sqlalchemy, in case you haven't read it yet.
Regarding your question as to when to close your db connection: It is indeed very inefficient to create and close connections for every statement execution. However you should make sure that your application does not have connection leaks in it's global flow.
Related
I'm experiencing some strange bugs which seem to be caused by connections used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone has a clue whats going on here.
We're working on a Pyramid (version 1.5b1) and use Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we get errors related to the db connection or session, most of the time this would be a cursor already closed or This Connection is closed error, but we get other related exceptions too:
(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further
There is no silver bullet to reproduce them, only by refreshing many times they are bound to happen one at some point. So i made a script using multi-mechanize to spam different urls concurrently and see where and when it happens.
It appears the url triggered doesn't really matter, the errors happen when there are concurrent requests that span a longer time (and other requests get served in between). This seems to indicate there is some kind of threading problem; that either the session or connection is shared among different threads.
After googling for these issues I found a lot of topics, most of them tell to use scoped sessions, but the thing is we do use them already:
db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))
db_meta = MetaData()
We have a BaseModel for all our orm objects:
BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)
We use the pyramid_tm tween to handle transactions during the request
We hook db_session.remove() to the pyramid NewResponse event (which is fired after everything has run). I also tried putting it in a seperate tween running after pyramid_tm or even not doing it at all, none of these seem to have effect, so the response event seemed like the most clean place to put it.
We create the engine in our main entrypoint of our pyramid project and use a NullPool and leave connection pooling to pgbouncer. We also configure the session and the bind for our BaseModel here:
engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool)
db_session.configure(bind=engine, query_cls=FilterQuery)
BaseModel.metadata.bind = engine
config.add_subscriber(cleanup_db_session, NewResponse)
return config.make_wsgi_app()
In our app we access all db operation using:
from project.db import db_session
...
db_session.query(MyModel).filter(...)
db_session.execute(...)
We use psycopg2==2.5.2 to handle the connection to postgres with pgbouncer in between
I made sure no references to db_session or connections are saved anywhere (which could result in other threads reusing them)
I also tried the spamming test using different webservers, using waitress and cogen i got the errors very easily, using wsgiref we unsurprisingly have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 workers, gevent) i didn't get any errors.
Given the differences in the webserver used, I thought it either has to do with some webservers handling requests in threads and some using new processes (maybe a forking problem)? To complicate matters even more, when time went on and i did some new tests, the problem had gone away in waitress but now happened with gunicorn (when using gevent)! I have no clue on how to go debugging this...
Finally, to test what happens to the connection, i attached an attribute to the connection at the start of the cursor execute and tried to read the attribute out at the end of the execute:
#event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
conn.pdtb_start_timer = time.time()
#event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
print conn.pdtb_start_timer
Surprisingly this sometimes raised an exception: 'Connection' object has no attribute 'pdtb_start_timer'
Which struck me as very strange.. I found one discussion about something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ
And tried adding strategy='threadlocal' to the engine, which from what i understand should force 1 connection for the tread. But it didn't have any effect on the errors im seeing.. (besides some unittests failing because i need two different sessions/connections for some tests and this forces 1 connection to be associated)
Does anyone have any idea what might go on here or have some more pointers on how to attack this problem?
Thanks in advance!
Matthijs Blaas
Update: The errors where caused by multiple commands that where send in one prepared sql statement. Psycopg2 seems to allow this, but apparently it can cause strange issues. The PG8000 connector is more strict and bailed out on the multiple commands, sending one command fixed the issue!
Following what we commented in How to close sqlalchemy connection in MySQL, I am checking the connections that SQLAlchemy creates into my database and I cannot manage to close them without exiting from Python.
If I run this code in a python console, it keeps the session opened until I exit from python:
from sqlalchemy.orm import sessionmaker
from models import OneTable, get_engine
engine = get_engine(database="mydb")
session = sessionmaker(bind=engine)()
results = session.query(OneTable.company_name).all()
# some work with the data #
session.close()
and the only workaround I found to close it is to call engine.dispose() at the end.
As per the comments in the link I gave above, my question are now:
Why is engine.dispose() necessary to close sessions?
Doesn't session.close() suffice?
There's a central confusion here over the word "session". I'm not sure here, but it appears like you may be confusing the SQLAlchemy Session with a MySQL ##session, which refers to the scope of when you first make a connection to MySQL and when you disconnect.
These two concepts are not the same. A SQLAlchemy Session generally represents the scope of one or more transactions, upon a particular database connection.
Therefore, the answer to your question as literally asked, is to call session.close(), that is, "how to properly close a SQLAlchemy session".
However, the rest of your question indicates you'd like some functionality whereby when a particular Session is closed, you'd like the actual DBAPI connection to be closed as well.
What this basically means is that you wish to disable connection pooling. Which as other answers mention, easy enough, use NullPool.
session.close() will give the connection back to the connection pool of Engine and doesn't close the connection.
engine.dispose() will close all connections of the connection pool.
Engine will not use connection pool if you set poolclass=NullPool. So the connection (SQLAlchemy session) will close directly after session.close().
In LogicBank, I had a series of unittest tests. Each test copied a sqlite database prior to running, like this:
copyfile(src=nw_source, dst=nw_loc)
Each test ran individually, but failed in discover mode. It became apparent that somehow the database copy was not happening.
It appeared that perhaps unittest were not run serially. Not so - unittests do, in fact, run serially. So that was not the problem (logging that, to perhaps save somebody some time).
After a remarkable amount of thrashing, it appears that this was because the database was not completely closed from the prior test. Somehow that interfered with the copy, above. Mine is not to wonder why...
Thanks to the posts above, I resolved it like this:
def tearDown(file: str, started_at: str, engine: sqlalchemy.engine.base.Engine, session: sqlalchemy.orm.session.Session):
"""
close session & engine, banner
:param file: caller, usually __file__
:param started_at: eg, str(datetime.now())
:param engine: eg, nw.logic import session, engine
:param session: from nw.logic import session, engine
:return:
"""
session.close()
engine.dispose(). # NOTE: close required before dispose!
print("\n")
print("**********************")
print("** Test complete, SQLAlchemy session/engine closed for: " + file)
print("** Started: " + started_at + " Ended: " + str(datetime.now()))
print("**********************")
I am writing a web service based on web.py, using storm as an ORM layer for querying records from a MySQL database. The web service is deployed via mod_wsgi using Apache2 on a Linux box. I create a connection to the MySQL database server when the script is started using storm's create_database() method. This is also the point where I create a Store object, which is used later on to perform queries when a request comes in.
After some hours of inactivity, store.find() throws a DisconnectionError: (2006, 'MySQL server has gone away'). I am not surprised that the database connection is dropped as Apache/mod_wsgi reuses the Python processes without reinitializing them for a long time. My question is how to correctly deal with this?
I have tried setting up a mechanism to keep alive the connection to the MySQL server by sending it a recurring "SELECT 1" (every 300 seconds). Unfortunately that fixed the problem on our testing machine, but not on our demo deployments (ouch) while both share the same MySQL configuration (wait_timeout is set to 8 hours).
I have searched for solutions for reconnecting the storm store to the database, but didn't find anything sophisticated. The only recommendation seems to be that one has to catch the exception, treat it like an inconsistency, call rollback() on the store and then retry. However, this would imply that I either have to wrap the whole Store class or implement the same retry-mechanism over and over. Is there a better solution or am I getting something completely wrong here?
Update: I have added a web.py processor that gracefully handles the disconnection error by recreating the storm Store if the exception is caught and then retrying the operation like recommended by Andrey. However, this is an incomplete and suboptimal solution as (a) the store is referenced by a handful of objects for re-use, which requires an additional mechanism to re-wire the store reference on each of these objects, and (b), it doesn't cover transaction handling (rollbacks) when performing writes on the database. However, at least it's an acceptable fix for all read operations on the store for now.
Perhaps you can use web.py's application processor to wrap your controller methods and catch DisconnectionError from them. Something like this:
def my_processor(handler):
tries = 3
while True:
try:
return handler()
except DisconnectionError:
tries -= 1
if tries == 0:
raise
Or you may check cookbook entry of how application processor is used to have SqlAlchemy with web.py: http://webpy.org/cookbook/sqlalchemy and make something similar for storm:
def load_storm(handler):
web.ctx.store = Store(database)
try:
return handler()
except web.HTTPError:
web.ctx.store.commit()
raise
except:
web.ctx.store.rollback()
raise
finally:
web.ctx.store.commit()
app.add_processor(load_storm)
Trying to serve database query results to adhoc client requests, but do not want to open a connection for each individual query. I'm not sure if i'm doing it right.
Current solution is something like this on the "server" side (heavily cut down for clarity):
import rpyc
from rpyc.utils.server import ThreadedServer
import cx_Oracle
conn = cx_Oracle.conect('whatever connect string')
cursor = conn.cursor()
def get_some_data(barcode):
# do something
return cursor.execute("whatever query",{'barcode':barcode})
class data_service(rpyc.Service):
def exposed_get_some_data(self, brcd):
return get_some_data(brcd)
if __name__ == '__main__':
s = ThreadedServer(data_service, port=12345, auto_register=False)
s.start()
This runs okay for a while. However from time to time the program crashes and so far i haven't been able to track when it does that.
What i wish to confirm, is see how the database connection is created outside of the data_service class. Is this in itself likely to cause problems?
Many thanks any thoughts appreciated.
I don't think the problem is that you're creating the connection outside of the class, that should be fine.
I think the problem is that you are creating just one cursor and using it for a long time, which as far as I understand is not how cursors are meant to be used.
You can use conn.execute without manually creating a cursor, which should be fine for how you're using the database. If I remember correctly, behind the scenes this creates a new cursor for each SQL command. You could also do this yourself in get_some_data(): create a new cursor, use it once, and then close it before returning the data.
In the long run, if you wish your server to be more robust, you'll need to add some error-handling for when database operations fail or the connection is lost.
A final note: Essentially you've written a very basic database proxy server. There are probably various existing solutions for this already, which already handle many issues you are likely to run in to. I recommend at least considering using an existing solution.
I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?
Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.
It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.
Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.