I am using sqlalchemy with pandas.to_sql() to copy some data into SQL server. After the copying is done and engine.dispose() is called, I see the following INFO message in logs:
[INFO] sqlalchemy.pool.impl.QueuePool: Pool recreating
I was wondering if this message means that even though I dispose of the engine, the connection is still being kept live. And if so, what would be the safe and correct way to do it?
The connection is not alive. But you can restart the connection with the help of the Pool object.
This is described in detail in the documentation:
The Engine has logic which can detect disconnection events and refresh the pool automatically.
When the Connection attempts to use a DBAPI connection, and an exception is raised that corresponds to a “disconnect” event, the connection is invalidated. The Connection then calls the Pool.recreate() method, effectively invalidating all connections not currently checked out so that they are replaced with new ones upon next checkout.
Also check out the code example in the link. It is really neat.
If there is a connection which is already checked out from the pool, those connections will still be alive as they are being referenced by something.
You may refer to following links for detailed information.
https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L2512-L2539
https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal
https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Engine.dispose
If you are using QueuePool (its by default if you don't specify any poolClass when creating engine object) and doesn't want any connections to be kept alive then you may close the connection [conn.close() or session.close()] which in-turn returns the connection back to the pool (called checked-in connection). Later when you call engine.dispose() after you copy job is done, that will take care of closing the connection really and won't be keep any checked-in connections alive
Related
I'm running a Tornado HTTPS server across multiple processes using the first method described here http://www.tornadoweb.org/en/stable/guide/running.html (server.start(n))
The server is connected to a local MySQL instance and I would like to have a independent MySQL connection per Tornado process.
However, right now I only have one MySQL connection according to the output of SHOW PROCESSLIST. I guess this happens because I establish the connection before calling server.start(n) and IOLoop.current().start() right?
What I don't really understand is whether the processes created after calling server.start(n) share some data (for instance, global variables within the same module) or are totally independent.
Should I establish the connection after calling server.start(n) ? Or after calling IOLoop.current().start() ? If I do so, will I have one MySQL connection per Tornado process?
Thanks
Each child process gets a copy of the variables that existed in the parent process when start(n) was called. For things like connections, this will usually cause problems. When using multi-process mode, it's important to do as little as possible before starting the child processes, so don't create the mysql connections until after start(n) (but before IOLoop.start(); IOLoop.start() doesn't return until the server is stopped).
I have SQLAlchemy connecting to Postgres via PGPool. PGPool is configured to recycle connections that are about 60s old.
I have two problems:
1) Sometimes, we get a huge query that takes more than 60s (I know it's bad... we're working on improving this) and subsequent queries fail because they rely on the same old connection that is no longer valid.
2) Similarly, when I start my Pyramid app using iPython, the connections get stale here when I stop to think for a moment.
When attempting to perform a query with a session with a stale connection, I get an exception saying:
OperationalError: (psycopg2.OperationalError) connection terminated due to client idle limit reached
ERROR: connection terminated due to client idle limit reached
SQLAlchemy's pessimistic disconnect handling docs recommend testing the connection when you get it out of the pool. However, the connection is becoming stale after being checked out, so this wouldn't help much.
I think the right solution would be to refresh the session's connection upon getting this type of error:
session = MySession() # using scoped_session here
query = session.query(...)
try:
rows = [r for r in query]
except OperationalError:
# somehow tell query.session to use a new connection here and try again?
How can I do this?
For me, executing
session.close_all()
makes the session then able run queries, at least until it idles out again.
Interestingly, running session.remove() or session.close(), like the SQLAlchemy documentation seems to imply should work, doesn't work; that makes future queries give InvalidRequestError: Can't reconnect until invalid transaction is rolled back (which of course session.rollback() doesn't fix) until calling session.close_all() .
I hope somebody can provide insight into why session.close_all() does the trick, and it may not be an appropriate solution for production, but that should at least make it so you don't have to restart the whole app in your iPython session.
I have recently been seeing MySQL server has gone away in my application logs for a daemon that I have running utilizing SQLAlchemy.
I wrap every database query or update in a decorator that should close all the sessions after finishing. In theory, that should also close the connection.
My decorator looks like
def dbop(meth):
#wraps(meth)
def nf(self, *args, **kwargs):
self.session = self.sm()
res = meth(self, *args, **kwargs)
self.session.commit()
self.session.close()
return res
return nf
I also initialize the database at the top of my Python script with:
def initdb(self):
engine = create_engine(db_url)
Base.metadata.create_all(engine)
self.sm = sessionmaker(bind=engine,
autocommit=False,
autoflush=False,
expire_on_commit=False)
To my understanding, I am getting that error because my connection is timing out. Why would this be the case if I wrap each method in that decorator above? Is this because expire_on_commit cause queries even after connection is closed and might reopen them? Is this because Base.metadata.create_all causes SQL to be executed which opens a connection that isn't closed?
Your session is bound to an "engine", which in turn uses a connection pool. Each time SQLAlchemy requires a connection it checks one out from the pool, and if it is done with it, it is returned to the pool but it is not closed! This is a common strategy to reduce overhead from opening/closing connections. All the options you set above have only an impact on the session, not the connection!
By default, the connections in the pool are kept open indefinitely.
But MySQL will automatically close the connection after a certain amount of inactivity (See wait_timeout).
The issue here is that your Python process will not be informed by the MySQL server that the connection was closed if it falls into the inactivity timeout. Instead, the next time a query is sent to that connection, the Python will discover that the connection is no longer available. A similar thing can happen if the connection is lost due to other reasons, for example forced service restarts which don't wait for open connections to be cleanly closed (for example, using the "immediate" option in postgres restarts).
This is when you run into the exception.
SQLAlchemy gives you various strategies of dealing with this, which are well documented int the "Dealing with Disconnects" section as mentioned by #lukas-graf
If you jump through some hoops you can get a reference to the connection which is currently in use by the session. You could close it that way but I strongly recommend against this. Instead, refer to the "Dealing with Disconnects" session above, and let SQLAlchemy deal with this for you transparently. In your case, setting the pool_recycle option might solve your problem.
I am working on an online judge.I am using python 2.7 and Mysql ( as I am working on back end-part)
My Method:
I create a main thread which pulls out submissions from database( 10 at a time) and puts them in a queue.Then I have multiple threads that take submissions from queue, evaluate it and write the result back to database.
Now I have some doubts(I know they are doubts from different topics but approach to some of them also is highly appreciated).
Currently when I start the threads I give them their own db connections, Which they use.Is this a good practice to give one connection per thread. Does sharing of connections between threads create problems.How do I go about this.
My main thread uses a single connection as its only work is to pull submissions from db and put then in queue(also update their status in db to Assessing Submission). But sometimes I get the error: Lost connection to Mysql server while querying. I keep getting it even when I stop the program and start it again.What do I do about it? Also should I implement a Pool of connections for only the main thread?
Also does a db connection stay alive for ever? What to do when its session memory etc gets exhausted how to handle that?
Use a connection pool. Sharing the database connection is not always bad but you have to be careful about it. You can try SQLAlchemy to manage a lot of this for you: http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#unitofwork-contextual
The server might be out of connections, your connection might have been killed because it uses too many resources.. etc. A connection pool could help you solve this.
It all depends, it could stay alive indefinitely theoretically, but usually you have a timeout somewhere.
If you give the same connection to every thread then the threads will not be able to query the database and race condition will occur. So you need to provide separate connection to every thread and indeed it is a good idea. Use a Connection Pool for the purpose it will help you get different connections.
Connection Pool will surely help.
Release the connection once your work is over. There is a limit to connection which is termed as connection time out. So you need to use some third party library to handle that, c3p0 is a good library which can help you in this.
Please refer the below link to configure it:
Best configuration of c3p0
I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?
Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.
It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.
Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.