SQLAlchemy: Can't reconnect until invalid transaction is rolled back - python

I have a weird problem.
I have a simple py3 app, which uses sqlalchemy.
But several hours later, there is an error:
(sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
My init part:
self.db_engine = create_engine(self.db_config, pool_pre_ping=True) # echo=True if needed to see background SQL
Session = sessionmaker(bind=self.db_engine)
self.db_session = Session()
The query (this is the only query that happens):
while True:
device_id = self.db_session.query(Device).filter(Device.owned_by == msg['user_id']).first()
sleep(20)
The whole script is in infinite loop, single threaded (SQS reading out). Does anybody cope with this problem?

The solution:
don't let your connection open a long time.
SQLAlchemy documentation also shares the same solution: session basics
#contextmanager
def session_scope(self):
self.db_engine = create_engine(self.db_config, pool_pre_ping=True) # echo=True if needed to see background SQL
Session = sessionmaker(bind=self.db_engine)
session = Session()
try:
# this is where the "work" happens!
yield session
# always commit changes!
session.commit()
except:
# if any kind of exception occurs, rollback transaction
session.rollback()
raise
finally:
session.close()

Related

Python SQLalchemy - can I pass the connection object between functions?

I have a python application that is reading from mysql/mariadb, uses that to fetch data from an api and then inserts results into another table.
I had setup a module with a function to connect to the database and return the connection object that is passed to other functions/modules. However, I believe this might not be a correct approach. The idea was to have a small module that I could just call whenever I needed to connect to the db.
Also note, that I am using the same connection object during loops (and within the loop passing to the db_update module) and call close() when all is done.
I am also getting some warnings from the db sometimes, those mostly happen at the point where I call db_conn.close(), so I guess I am not handling the connection or session/engine correctly. Also, the connection id's in the log warning keep increasing, so that is another hint, that I am doing it wrong.
[Warning] Aborted connection 351 to db: 'some_db' user: 'some_user' host: '172.28.0.3' (Got an error reading communication packets)
Here is some pseudo code that represents the structure I currently have:
################
## db_connect.py
################
# imports ...
from sqlalchemy import create_engine
def db_connect():
# get env ...
db_string = f"mysql+pymysql://{db_user}:{db_pass}#{db_host}:{db_port}/{db_name}"
try:
engine = create_engine(db_string)
except Exception as e:
return None
db_conn = engine.connect()
return db_conn
################
## db_update.py
################
# imports ...
def db_insert(db_conn, api_result):
# ...
ins_qry = "INSERT INTO target_table (attr_a, attr_b) VALUES (:a, :b);"
ins_qry = text(ins_qry)
ins_qry = ins_qry.bindparams(a = value_a, b = value_b)
try:
db_conn.execute(ins_qry)
except Exception as e:
print(e)
return None
return True
################
## main.py
################
from sqlalchemy import text
from db_connect import db_connect
from db_update import db_insert
def run():
try:
db_conn = db_connect()
if not db_conn:
return False
except Exception as e:
print(e)
qry = "SELECT *
FROM some_table
WHERE some_attr IN (:some_value);"
qry = text(qry)
search_run_qry = qry.bindparams(
some_value = 'abc'
)
result_list = db_conn.execute(qry).fetchall()
for result_item in result_list:
## do stuff like fetching data from api for every record in the query result
api_result = get_api_data(...)
## insert into db:
db_ins_status = db_insert(db_conn, api_result)
## ...
db_conn.close
run()
EDIT: Another question:
a) Is it ok in a loop, that does an update on every iteration to use the same connection, or would it be wiser to instead pass the engine to the run() function and call db_conn = engine.connect() and db_conn.close() just before and after each update?
b) I am thinking about using ThreadPoolExecutor instead of the loop for the API calls. Would this have implications on how to use the connection, i.e. can I use the same connection for multiple threads that are doing updates to the same table?
Note: I am not using the ORM feature mostly because I have a strong DWH/SQL background (though not so much as DBA) and I am used to writing even complex sql queries. I am thinking about switching to just using PyMySQL connector for that reason.
Thanks in advance!
Yes you can return/pass connection object as parameter but what is the aim of db_connect method, except testing connection ? As I see there is no aim of this db_connect method therefore I would recommend you to do this as I done it before.
I would like to share a code snippet from one of my project.
def create_record(sql_query: str, data: tuple):
try:
connection = mysql_obj.connect()
db_cursor = connection.cursor()
db_cursor.execute(sql_query, data)
connection.commit()
return db_cursor, connection
except Exception as error:
print(f'Connection failed error message: {error}')
and then using this one as for another my need
db_cursor, connection, query_data = fetch_data(sql_query, query_data)
and after all my needs close the connection with this method and method call.
def close_connection(connection, db_cursor):
"""
This method used to close SQL server connection
"""
db_cursor.close()
connection.close()
and calling method
close_connection(connection, db_cursor)
I am not sure can I share my github my check this link please. Under model.py you can see database methods and to see how calling them check it main.py
Best,
Hasan.

SQLAlchemy session.refresh() spawns new connections

I am using sqlalchemy for orm in my project. My problem is that every time i use session.refresh(obj) new db connection is used which are held until session.close() is called.
So when i want to refresh multiple objects i quickly run out of connections.
Session maker:
session = session_maker()
try:
yield session
session.commit()
for obj in session:
session.refresh(obj)
except Exception as e:
session.rollback()
raise e
finally:
session.close()
Usage:
with make_session(...) as session:
for mapped in [self._mapper.map(obj) for obj in objects]:
saved_entities.append(mapped)
session.add(mapped)
session.flush()
I am using refresh because i have columns that are filled on update and I want to return current values.
The curious thing is that when i do that:
for obj in session:
session.commit()
session.refresh(obj)
only two connections are used (which is fine) but the objects have no data.
Use scoped_session, see http://docs.sqlalchemy.org/en/latest/orm/contextual.html
If you do, you will get the same session (connection ID) each time you request it. Also, you don't need to call refresh(): add() and flush() should be enough, the updated values should be available after the flush() and before the commit(), but only if you're using the same session ID (database transaction) to look them up (hence, you need a scoped_session).

Does this thread-local Flask-SQLAchemy session cause a "MySQL server has gone away" error?

I have a web application that runs long jobs that are independent of user sessions. To achieve this, I have an implementation for a thread-local Flask-SQLAlchemy session. The problem is a few times a day, I get a MySQL server has gone away error when I visit my site. The site always loads upon refresh. I think the issue is related to these thread-local sessions, but I'm not sure.
This is my implementation of a thread-local session scope:
#contextmanager
def thread_local_session_scope():
"""Provides a transactional scope around a series of operations.
Context is local to current thread.
"""
# See this StackOverflow answer for details:
# http://stackoverflow.com/a/18265238/1830334
Session = scoped_session(session_factory)
threaded_session = Session()
try:
yield threaded_session
threaded_session.commit()
except:
threaded_session.rollback()
raise
finally:
Session.remove()
And here is my standard Flask-SQLAlchemy session:
#contextmanager
def session_scope():
"""Provides a transactional scope around a series of operations.
Context is HTTP request thread using Flask-SQLAlchemy.
"""
try:
yield db.session
db.session.commit()
except Exception as e:
print 'Rolling back database'
print e
db.session.rollback()
# Flask-SQLAlchemy handles closing the session after the HTTP request.
Then I use both session context managers like this:
def build_report(tag):
report = _save_report(Report())
thread = Thread(target=_build_report, args=(report.id,))
thread.daemon = True
thread.start()
return report.id
# This executes in the main thread.
def _save_report(report):
with session_scope() as session:
session.add(report)
session.commit()
return report
# These executes in a separate thread.
def _build_report(report_id):
with thread_local_session_scope() as session:
report = do_some_stuff(report_id)
session.merge(report)
EDIT: Engine configurations
app.config['SQLALCHEMY_DATABASE_URI'] = 'mysql://<username>:<password>#<server>:3306/<db>?charset=utf8'
app.config['SQLALCHEMY_POOL_RECYCLE'] = 3600
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
Try adding an
app.teardown_request(Exception=None)
Decorator, which executes at the end of each request. I am currently experiencing a similar issue, and it seems as if today I have actually resolved it using.
#app.teardown_request
def teardown_request(exception=None):
Session.remove()
if exception and Session.is_active:
print(exception)
Session.rollback()
I do not use Flask-SQLAlchemy Only Raw SQLAlchemy, so it may have differences for you.
From the Docs
The teardown callbacks are special callbacks in that they are executed
at at different point. Strictly speaking they are independent of the
actual request handling as they are bound to the lifecycle of the
RequestContext object. When the request context is popped, the
teardown_request() functions are called.
In my case, I open a new scoped_session for each request, requiring me to remove it at the end of each request (Flask-SQLAlchemy may not need this). Also, the teardown_request function is passed an Exception if one occured during the context. In this scenario, if an exception occured (possibly causing the transaction to not be removed, or need a rollback), we check if there was an exception, and rollback.
If this doesnt work for my own testing, the next thing I was going to do was a session.commit() at each teardown, just to make sure everything is flushing
UPDATE : it also appears MySQL invalidates connections after 8 hours, causing the Session to be corrupted.
set pool_recycle=3600 on your engine configuration, or to a setting < MySQL timeout. This in conjunction with proper session scoping (closing sessions) should do it.

SQLAlchemy error MySQL server has gone away

Error OperationalError: (OperationalError) (2006, 'MySQL server has gone away') i'm already received this error when i coded project on Flask, but i cant understand why i get this error.
I have code (yeah, if code small and executing fast, then no errors) like this \
db_engine = create_engine('mysql://root#127.0.0.1/mind?charset=utf8', pool_size=10, pool_recycle=7200)
Base.metadata.create_all(db_engine)
Session = sessionmaker(bind=db_engine, autoflush=True)
Session = scoped_session(Session)
session = Session()
# there many classes and functions
session.close()
And this code returns me error 'MySQL server has gone away', but return it after some time, when i use pauses in my script.
Mysql i use from openserver.ru (it's web server like such as wamp).
Thanks..
Looking at the mysql docs, we can see that there are a bunch of reasons why this error can occur. However, the two main reasons I've seen are:
1) The most common reason is that the connection has been dropped because it hasn't been used in more than 8 hours (default setting)
By default, the server closes the connection after eight hours if nothing has happened. You can change the time limit by setting the wait_timeout variable when you start mysqld
I'll just mention for completeness the two ways to deal with that, but they've already been mentioned in other answers:
A: I have a very long running job and so my connection is stale. To fix this, I refresh my connection:
create_engine(conn_str, pool_recycle=3600) # recycle every hour
B: I have a long running service and long periods of inactivity. To fix this I ping mysql before every call:
create_engine(conn_str, pool_pre_ping=True)
2) My packet size is too large, which should throw this error:
_mysql_exceptions.OperationalError: (1153, "Got a packet bigger than 'max_allowed_packet' bytes")
I've only seen this buried in the middle of the trace, though often you'll only see the generic _mysql_exceptions.OperationalError (2006, 'MySQL server has gone away'), so it's hard to catch, especially if logs are in multiple places.
The above doc say the max packet size is 64MB by default, but it's actually 16MB, which can be verified with SELECT ##max_allowed_packet
To fix this, decrease packet size for INSERT or UPDATE calls.
SQLAlchemy now has a great write-up on how you can use pinging to be pessimistic about your connection's freshness:
http://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-pessimistic
From there,
from sqlalchemy import exc
from sqlalchemy import event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
cursor = dbapi_connection.cursor()
try:
cursor.execute("SELECT 1")
except:
# optional - dispose the whole pool
# instead of invalidating one at a time
# connection_proxy._pool.dispose()
# raise DisconnectionError - pool will try
# connecting again up to three times before raising.
raise exc.DisconnectionError()
cursor.close()
And a test to make sure the above works:
from sqlalchemy import create_engine
e = create_engine("mysql://scott:tiger#localhost/test", echo_pool=True)
c1 = e.connect()
c2 = e.connect()
c3 = e.connect()
c1.close()
c2.close()
c3.close()
# pool size is now three.
print "Restart the server"
raw_input()
for i in xrange(10):
c = e.connect()
print c.execute("select 1").fetchall()
c.close()
from documentation you can use pool_recycle parameter:
from sqlalchemy import create_engine
e = create_engine("mysql://scott:tiger#localhost/test", pool_recycle=3600)
I just faced the same problem, which is solved with some effort. Wish my experience be helpful to others.
Fallowing some suggestions, I used connection pool and set pool_recycle less than wait_timeout, but it still doesn't work.
Then, I realized that global session maybe just use the same connection and connection pool didn't work. To avoid global session, for each request generate a new session which is removed by Session.remove() after processing.
Finally, all is well.
One more point to keep in mind is to manually push the flask application context with database initialization. This should resolve the issue.
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
app = Flask(__name__)
with app.app_context():
db.init_app(app)
https://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-optimistic
def sql_read(cls, sql, connection):
"""sql for read action like select
"""
LOG.debug(sql)
try:
result = connection.engine.execute(sql)
header = result.keys()
for row in result:
yield dict(zip(header, row))
except OperationalError as e:
LOG.info("recreate pool duo to %s" % e)
connection.engine.pool.recreate()
result = connection.engine.execute(sql)
header = result.keys()
for row in result:
yield dict(zip(header, row))
except Exception as ee:
LOG.error(ee)
raise SqlExecuteError()

sqlalchemy: stopping a long-running query

I have a seemingly straight-forward situation, but can't find a straight-forward solution.
I'm using sqlalchemy to query postgres. If a client timeout occurs, I'd like to stop/cancel the long running postgres queries from another thread. The thread has access to the Session or Connection object.
At this point I've tried:
session.bind.raw_connection().close()
and
session.connection().close()
and
session.close
and
session.transaction.close()
But no matter what I try, the postgres query still continues until it's end. I know this from watching pg in top. Shouldn't this be fairly easy to do? I'm I missing something? Is this impossible without getting the pid and sending a stop signal directly?
This seems to work well, so far:
def test_close_connection(self):
import threading
from psycopg2.extensions import QueryCanceledError
from sqlalchemy.exc import DBAPIError
session = Session()
conn = session.connection()
sql = self.get_raw_sql_for_long_query()
seconds = 5
t = threading.Timer(seconds, conn.connection.cancel)
t.start()
try:
conn.execute(sql)
except DBAPIError, e:
if type(e.orig) == QueryCanceledError:
print 'Long running query was cancelled.'
t.cancel()
source
For those MySQL folks that may have ended up here, a modified version of this answer that kills the query from a second connection can work. Essentially the following, assuming pymysql under the hood:
thread_id = conn1.connection.thread_id()
t = threading.Timer(seconds, lambda: conn2.execute("kill {}".format(thread_id)))
The original connection will raise pymysql.err.OperationalError. See this other answer for a neat way to create a long running query for testing.
Found on MYSQL that you can specify the query optimiser hints.
One such hint is MAX_EXECUTION_TIME to specify how long query should execute before termination.
You can add this in your app.py
#event.listens_for(engine, 'before_execute', retval=True)
def intercept(conn, clauseelement, multiparams, params):
from sqlalchemy.sql.selectable import Select
# check if it's select statement
if isinstance(clauseelement, Select):
# 'froms' represents list of tables that statement is querying
table = clauseelement.froms[0]
'''Update the timeout here in ms (1s = 1000ms)'''
timeout_ms = 4000
# adding filter in clause
clauseelement = clauseelement.prefix_with(f"/*+ MAX_EXECUTION_TIME({timeout_ms}) */", dialect="mysql")
return clauseelement, multiparams, params
SQLAlchemy query API not working correctly with hints and
MYSQL reference

Categories

Resources