Why does SQLAlchemy/mysql keep timing out on me? - python

I have 2 functions that need to be executed and the first takes about 4 hours to execute. Both use SQLAlchemy:
def first():
session = DBSession
rows = session.query(Mytable).order_by(Mytable.col1.desc())[:150]
for i,row in enumerate(rows):
time.sleep(100)
print i, row.accession
def second():
print "going onto second function"
session = DBSession
new_row = session.query(Anothertable).order_by(Anothertable.col1.desc()).first()
print 'New Row: ', new_row.accession
first()
second()
And here is how I define DBSession:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy import create_engine
engine = create_engine('mysql://blah:blah#blah/blahblah',echo=False,pool_recycle=3600*12)
DBSession = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base()
Base.metadata.bind = engine
first() finishes fine (takes about 4 hrs) and I see "going onto second function" printed then it immediately gives me an error:
sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away')
From reading the docs I thought assigning session=DBSession would get two different session instances and so that second() wouldn't timeout. I've also tried playing with pool_recycle and that doesn't seem to have any effect here. In the real world, I can't split first() and second() into 2 scripts: second() has to execute immediately after first()

Your engine (not session) keeps a pool of connections. When a mysql connection has not been used for several hours, mysql server closes the socket, this causes a "Mysql server has gone away" error when you try to use this connection. If you have a simple single-threaded script then calling create_engine with pool_size=1 will probably do the trick. If not, you can use events to ping the connection when it is checked out of the pool. This great answer has all the details:
SQLAlchemy error MySQL server has gone away

assigning session=DBSession would get two different session instances
That simply isn't true. session = DBSession is a local variable assignment, and you cannot override local variable assignment in Python (you can override instance member assignment, but that's unrelated).
Another thing to note is that scoped_session produces, by default, a thread-local scoped session (i.e. all codes in the same thread all have the same session). Since you call first() and second() in the same thread, they are one and the same session.
One thing you can do is to use regular (unscoped) session, just manage your session scope manually and create a new session in both function. Alternatively, you can check the doc about how to define custom session scope.

It doesn't look like you're getting separate Session instances. If the first query is successfully committing, then your Session could be expiring after that commit.
Try setting auto-expire to false for your session:
DBSession = scoped_session(sessionmaker(expire_on_commit=False, autocommit=False, autoflush=False, bind=engine))
and then commit later.

Related

SQLAlchemy session.refresh() spawns new connections

I am using sqlalchemy for orm in my project. My problem is that every time i use session.refresh(obj) new db connection is used which are held until session.close() is called.
So when i want to refresh multiple objects i quickly run out of connections.
Session maker:
session = session_maker()
try:
yield session
session.commit()
for obj in session:
session.refresh(obj)
except Exception as e:
session.rollback()
raise e
finally:
session.close()
Usage:
with make_session(...) as session:
for mapped in [self._mapper.map(obj) for obj in objects]:
saved_entities.append(mapped)
session.add(mapped)
session.flush()
I am using refresh because i have columns that are filled on update and I want to return current values.
The curious thing is that when i do that:
for obj in session:
session.commit()
session.refresh(obj)
only two connections are used (which is fine) but the objects have no data.
Use scoped_session, see http://docs.sqlalchemy.org/en/latest/orm/contextual.html
If you do, you will get the same session (connection ID) each time you request it. Also, you don't need to call refresh(): add() and flush() should be enough, the updated values should be available after the flush() and before the commit(), but only if you're using the same session ID (database transaction) to look them up (hence, you need a scoped_session).

try-finally with SqlAlchemy: is this a good habit?

I'm used to do this:
from sqlalchemy.orm import sessionmaker
from sqlalchemy.engine import create_engine
Session = sessionmaker()
engine = create_engine("some connection db string", echo=False)
Session.configure(bind=engine)
db_con = Session()
try:
# DB MANIPULATION
finally:
db_con.close()
Is this a good habit? If so, why sqlalchemy does not permit you to do simply:
with Session() as db_con:
# DB MANIPULATION
?
No, this isn't good practice. It's easy to forget, and will make the code more confusing.
Instead, you can use the contextlib.closing context manager, and make that the only way to get a session.
# Wrapped in a custom context manager for better readability
#contextlib.contextmanager
def get_session():
with contextlib.closing(Session()) as session:
yield session
with get_session() as session:
session.add(...)
Firstly if you are done with the session object you should close the session. session.close will return the connection back to engine pool and if you are exiting the program you should dispose the engine pool with engine.dispose.
Now to your question. In most cases sessions will be used on long running applications like web server. Where it makes sense to centralize the session management. For example in flask-sqlalchemy session is created with start of each web-request and closed when the request of over.

SQLAlchemy connection hangs on AWS MySQL RDS reboot with failover

We have a Python server which uses SQLAlchemy to read/write data from an AWS MySQL MultiAZ RDS instance.
We're experiencing a behavior we'd like to avoid where whenever we trigger a failover reboot, a connection which was open already and then issues a statement hangs indefinitely. While this is something to expect according to AWS documentation, we would expect the Python MySQL connector would be able to cope with this situation.
The closest case we've found on the web is this google groups thread which talks about the issue and offers a solution regarding a Postgres RDS.
For example, the below script will hang indefinitely when initiating a failover reboot (adopted from the above mention google groups thread).
from datetime import datetime
from time import time, sleep
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.scoping import scoped_session
from sqlalchemy.ext.declarative import declarative_base
import logging
current_milli_time = lambda: int(round(time() * 1000))
Base = declarative_base()
logging.basicConfig(format='%(asctime)s %(filename)s %(lineno)s %(process)d %(levelname)s: %(message)s', level="INFO")
class Message(Base):
__tablename__ = 'message'
id = Column(Integer, primary_key=True)
body = Column(String(450), nullable=False)
engine = create_engine('mysql://<username>:<password>#<db_host>/<db_name>',echo=False, pool_recycle=1800,)
session_maker = scoped_session(sessionmaker(bind=engine, autocommit=False, autoflush=False))
session = session_maker()
while True:
try:
ids = ''
start = current_milli_time()
for msg in session.query(Message).order_by(Message.id.desc()).limit(5):
ids += str(msg.id) + ', '
logging.info('({!s}) (took {!s} ms) fetched ids: {!s}'.format(datetime.now().time().isoformat(), current_milli_time() - start, ids))
start = current_milli_time()
m = Message()
m.body = 'some text'
session.add(m)
session.commit()
logging.info('({!s}) (took {!s} ms) inserted new message'.format(datetime.now().time().isoformat(), current_milli_time() - start))
except Exception, e:
logging.exception(e)
session.rollback()
finally:
session_maker.remove()
sleep(0.25)
We've tried playing with the connection timeouts but it seems the issue is related to an already opened connection which simply hangs once AWS switches to the failover instance.
Our question is - has anyone encountered this issue or has possible directions worthwhile checking?
IMHO, using SQL connector timeout to handle switchcover is like black magic. Each connector always act differently and difficult to diagnose.
If you read #univerio comment again, AWS will reassign a new IP address for the SAME RDS endpoint name. While doing the switching, your RDS endpoint name and old IP adderss is still inside your server instance DNS cache. So this is a DNS caching issues, and that's why AWS ask you to "clean up....".
Unless you restart SQLAlchemy to read the DNS again, there is no way that the session know something happens and switch it dynamically. And worst, the issue can be happens in connector that used by SQLAlchemy.
IMHO, it doesn't worth the effort to deal with switch over inside the code. I will just subscribe to AWS service like lambda that can act upon switch over events, trigger the app server to restart the connection, which suppose to reflect the new IP address.

SQLAlchemy multithreading without flesk

I want to access a sqlite database file from the main thread and a background thread. The problem is no matter how i change my code i always get the following problem:
ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id -1250925472 and this is thread id -1225814016
my code looks something like that:
engine = create_engine('sqlite:///data/storage.sqlite', poolclass=NullPool)
Footprint.Base.metadata.create_all(engine)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
def storeData(fp):
s = Session()
s.add(fp)
s.commit()
Has anybody an idea how to fix this annoying problem?

Does this thread-local Flask-SQLAchemy session cause a "MySQL server has gone away" error?

I have a web application that runs long jobs that are independent of user sessions. To achieve this, I have an implementation for a thread-local Flask-SQLAlchemy session. The problem is a few times a day, I get a MySQL server has gone away error when I visit my site. The site always loads upon refresh. I think the issue is related to these thread-local sessions, but I'm not sure.
This is my implementation of a thread-local session scope:
#contextmanager
def thread_local_session_scope():
"""Provides a transactional scope around a series of operations.
Context is local to current thread.
"""
# See this StackOverflow answer for details:
# http://stackoverflow.com/a/18265238/1830334
Session = scoped_session(session_factory)
threaded_session = Session()
try:
yield threaded_session
threaded_session.commit()
except:
threaded_session.rollback()
raise
finally:
Session.remove()
And here is my standard Flask-SQLAlchemy session:
#contextmanager
def session_scope():
"""Provides a transactional scope around a series of operations.
Context is HTTP request thread using Flask-SQLAlchemy.
"""
try:
yield db.session
db.session.commit()
except Exception as e:
print 'Rolling back database'
print e
db.session.rollback()
# Flask-SQLAlchemy handles closing the session after the HTTP request.
Then I use both session context managers like this:
def build_report(tag):
report = _save_report(Report())
thread = Thread(target=_build_report, args=(report.id,))
thread.daemon = True
thread.start()
return report.id
# This executes in the main thread.
def _save_report(report):
with session_scope() as session:
session.add(report)
session.commit()
return report
# These executes in a separate thread.
def _build_report(report_id):
with thread_local_session_scope() as session:
report = do_some_stuff(report_id)
session.merge(report)
EDIT: Engine configurations
app.config['SQLALCHEMY_DATABASE_URI'] = 'mysql://<username>:<password>#<server>:3306/<db>?charset=utf8'
app.config['SQLALCHEMY_POOL_RECYCLE'] = 3600
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
Try adding an
app.teardown_request(Exception=None)
Decorator, which executes at the end of each request. I am currently experiencing a similar issue, and it seems as if today I have actually resolved it using.
#app.teardown_request
def teardown_request(exception=None):
Session.remove()
if exception and Session.is_active:
print(exception)
Session.rollback()
I do not use Flask-SQLAlchemy Only Raw SQLAlchemy, so it may have differences for you.
From the Docs
The teardown callbacks are special callbacks in that they are executed
at at different point. Strictly speaking they are independent of the
actual request handling as they are bound to the lifecycle of the
RequestContext object. When the request context is popped, the
teardown_request() functions are called.
In my case, I open a new scoped_session for each request, requiring me to remove it at the end of each request (Flask-SQLAlchemy may not need this). Also, the teardown_request function is passed an Exception if one occured during the context. In this scenario, if an exception occured (possibly causing the transaction to not be removed, or need a rollback), we check if there was an exception, and rollback.
If this doesnt work for my own testing, the next thing I was going to do was a session.commit() at each teardown, just to make sure everything is flushing
UPDATE : it also appears MySQL invalidates connections after 8 hours, causing the Session to be corrupted.
set pool_recycle=3600 on your engine configuration, or to a setting < MySQL timeout. This in conjunction with proper session scoping (closing sessions) should do it.

Categories

Resources