SQLite & Sql Alchemy SingletonThreadPool - can I share a connection object? - python

I'm getting errors of the form:
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 139661426296576 and this is thread id 139662493492992
in my multithreaded application.
I'm instantiating my engine with:
from sqlalchemy.pool import SingletonThreadPool
db_path = "sqlite:///" + cwd + "/data/data.db"
create_engine(db_path, poolclass=SingletonThreadPool, pool_size=50)
I had expected the SingletonThreadPool to solve this problem. What am I missing?
(bonus question: for the sake of reduced headache, should I move to MySQL?)

If you are using sqlite3 then you just have to pass the check_same_thread parameter as below:
create_engine(db_path, connect_args={'check_same_thread': False})

Related

QueuePool limit reached for Flask application

I am running an application with Flask and Flask-SQLAlchemy.
from config import FlaskDatabaseConfig
from flask import Flask
from flask import request
from flask_migrate import Migrate
from flask_sqlalchemy import SQLAlchemy
application = Flask(__name__)
application.config.from_object(FlaskDatabaseConfig())
db = SQLAlchemy(application)
#application.route("/queue/request", methods=["POST"])
def handle_queued_request():
stuff()
return ""
def stuff():
# Includes database queries and updates and a call to db.session.commit()
# db.session.begin() and db.session.close() are not called
pass
if __name__ == "__main__":
application.run(debug=False, port=5001)
Now, from my understanding, by using Flask-SQLAlchemy I do not need to manage sessions on my own. So why am I getting the following error if I run several requests turn by turn to my endpoint?
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)
I've tried using db.session.close() but then, instead of this error, my database updates are not committed properly. What am I doing incorrectly? Do I need to manually close connections with the database once a request has been handled?
I have found a solution to this. The issue was that I had a lot of processes that were "idle in transaction" because I did not call db.session.commit() after making certain database SELECT statements using Query.first()
To investigate this, I queried my (development) PostgreSQL database directly using:
SELECT * FROM pg_stat_activity
Just remove connection every time you make a query session to db.
products = db.session.query(Product).limit(20).all()
db.session.remove()

SQLAlchemy connection hangs on AWS MySQL RDS reboot with failover

We have a Python server which uses SQLAlchemy to read/write data from an AWS MySQL MultiAZ RDS instance.
We're experiencing a behavior we'd like to avoid where whenever we trigger a failover reboot, a connection which was open already and then issues a statement hangs indefinitely. While this is something to expect according to AWS documentation, we would expect the Python MySQL connector would be able to cope with this situation.
The closest case we've found on the web is this google groups thread which talks about the issue and offers a solution regarding a Postgres RDS.
For example, the below script will hang indefinitely when initiating a failover reboot (adopted from the above mention google groups thread).
from datetime import datetime
from time import time, sleep
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.scoping import scoped_session
from sqlalchemy.ext.declarative import declarative_base
import logging
current_milli_time = lambda: int(round(time() * 1000))
Base = declarative_base()
logging.basicConfig(format='%(asctime)s %(filename)s %(lineno)s %(process)d %(levelname)s: %(message)s', level="INFO")
class Message(Base):
__tablename__ = 'message'
id = Column(Integer, primary_key=True)
body = Column(String(450), nullable=False)
engine = create_engine('mysql://<username>:<password>#<db_host>/<db_name>',echo=False, pool_recycle=1800,)
session_maker = scoped_session(sessionmaker(bind=engine, autocommit=False, autoflush=False))
session = session_maker()
while True:
try:
ids = ''
start = current_milli_time()
for msg in session.query(Message).order_by(Message.id.desc()).limit(5):
ids += str(msg.id) + ', '
logging.info('({!s}) (took {!s} ms) fetched ids: {!s}'.format(datetime.now().time().isoformat(), current_milli_time() - start, ids))
start = current_milli_time()
m = Message()
m.body = 'some text'
session.add(m)
session.commit()
logging.info('({!s}) (took {!s} ms) inserted new message'.format(datetime.now().time().isoformat(), current_milli_time() - start))
except Exception, e:
logging.exception(e)
session.rollback()
finally:
session_maker.remove()
sleep(0.25)
We've tried playing with the connection timeouts but it seems the issue is related to an already opened connection which simply hangs once AWS switches to the failover instance.
Our question is - has anyone encountered this issue or has possible directions worthwhile checking?
IMHO, using SQL connector timeout to handle switchcover is like black magic. Each connector always act differently and difficult to diagnose.
If you read #univerio comment again, AWS will reassign a new IP address for the SAME RDS endpoint name. While doing the switching, your RDS endpoint name and old IP adderss is still inside your server instance DNS cache. So this is a DNS caching issues, and that's why AWS ask you to "clean up....".
Unless you restart SQLAlchemy to read the DNS again, there is no way that the session know something happens and switch it dynamically. And worst, the issue can be happens in connector that used by SQLAlchemy.
IMHO, it doesn't worth the effort to deal with switch over inside the code. I will just subscribe to AWS service like lambda that can act upon switch over events, trigger the app server to restart the connection, which suppose to reflect the new IP address.

SQLAlchemy multithreading without flesk

I want to access a sqlite database file from the main thread and a background thread. The problem is no matter how i change my code i always get the following problem:
ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id -1250925472 and this is thread id -1225814016
my code looks something like that:
engine = create_engine('sqlite:///data/storage.sqlite', poolclass=NullPool)
Footprint.Base.metadata.create_all(engine)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
def storeData(fp):
s = Session()
s.add(fp)
s.commit()
Has anybody an idea how to fix this annoying problem?

sqlalchemy: stopping a long-running query

I have a seemingly straight-forward situation, but can't find a straight-forward solution.
I'm using sqlalchemy to query postgres. If a client timeout occurs, I'd like to stop/cancel the long running postgres queries from another thread. The thread has access to the Session or Connection object.
At this point I've tried:
session.bind.raw_connection().close()
and
session.connection().close()
and
session.close
and
session.transaction.close()
But no matter what I try, the postgres query still continues until it's end. I know this from watching pg in top. Shouldn't this be fairly easy to do? I'm I missing something? Is this impossible without getting the pid and sending a stop signal directly?
This seems to work well, so far:
def test_close_connection(self):
import threading
from psycopg2.extensions import QueryCanceledError
from sqlalchemy.exc import DBAPIError
session = Session()
conn = session.connection()
sql = self.get_raw_sql_for_long_query()
seconds = 5
t = threading.Timer(seconds, conn.connection.cancel)
t.start()
try:
conn.execute(sql)
except DBAPIError, e:
if type(e.orig) == QueryCanceledError:
print 'Long running query was cancelled.'
t.cancel()
source
For those MySQL folks that may have ended up here, a modified version of this answer that kills the query from a second connection can work. Essentially the following, assuming pymysql under the hood:
thread_id = conn1.connection.thread_id()
t = threading.Timer(seconds, lambda: conn2.execute("kill {}".format(thread_id)))
The original connection will raise pymysql.err.OperationalError. See this other answer for a neat way to create a long running query for testing.
Found on MYSQL that you can specify the query optimiser hints.
One such hint is MAX_EXECUTION_TIME to specify how long query should execute before termination.
You can add this in your app.py
#event.listens_for(engine, 'before_execute', retval=True)
def intercept(conn, clauseelement, multiparams, params):
from sqlalchemy.sql.selectable import Select
# check if it's select statement
if isinstance(clauseelement, Select):
# 'froms' represents list of tables that statement is querying
table = clauseelement.froms[0]
'''Update the timeout here in ms (1s = 1000ms)'''
timeout_ms = 4000
# adding filter in clause
clauseelement = clauseelement.prefix_with(f"/*+ MAX_EXECUTION_TIME({timeout_ms}) */", dialect="mysql")
return clauseelement, multiparams, params
SQLAlchemy query API not working correctly with hints and
MYSQL reference

Pylons: Sharing SQLAlchemy MySQL connection with external library

I am running Pylons using SQLAlchemy to connect to MySQL, so when I want to use a database connection in a controller, I can do this:
from myapp.model.meta import Session
class SomeController(BaseController):
def index(self):
conn = Session.connection()
rows = conn.execute('SELECT whatever')
...
Say my controller needs to call up an external library, that also needs a database connection, and I want to provide the connection for it from the SQLAlchemy MySQL connection that is already established:
from myapp.model.meta import Session
import mymodule
class SomeController(BaseController):
def index(self):
conn = Session.connection()
myobject = mymodule.someobject(DATABASE_OBJECT)
...
conn.close()
What should DATABSE_OBJECT be? Possibilities:
Pass Session -- and then open and close Session.connection() in the module code
Pass conn, and then call conn.close() in the controller
Just pass the connection parameters, and have the module code set up its own connection
There is another wrinkle, which is that I need to instantiate some objects in app_globals.py, and these objects need a database connection as well. It seems that app_globals.py cannot use Session's SQLAlchemy connection yet -- it's not bound yet.
Is my architecture fundamentally unsounds? Should I not be trying to share connections between Pylons and external libraries this way? Thanks!
You should not manage connections yourself - it's all done by SQLAlchemy. Just use scoped session object everywhere, and you will be fine.
def init_model(engine):
sm = orm.sessionmaker(autoflush=False, autocommit=False, expire_on_commit=False, bind=engine)
meta.engine = engine
meta.Session = orm.scoped_session(sm)
def index(self):
rows = Session.execute('SELECT ...')
You can pass Session object to your external library and do queries there as you wish. There is no need to call .close() on it.
Regarding app_globals, I solved that by adding other method in globals class which is called after db initialization from environment.py
class Globals(...):
def init_model(self, config):
self.some_persistent_db_object = Session.execute('...')
def load_environment(...):
...
config['pylons.app_globals'].init_model(config)
return config
What should DATABSE_OBJECT be? Possibilities:
4. pass a "proxy" or "helper" object with higher level of abstraction interface
Unless the external library really needs direct access to SQLAlchemy session, you could provide it with object that has methods like "get_account(account_no)" instead of "execute(sql)". Doing so would keep SQLAlchemy-specific code more isolated, and the code would be also easier to test.
Sorry that this is not so much an answer to your original question, more a design suggestion.

Categories

Resources