I'm getting the following error (which I assume is because of the forking in my application), "This result object does not return rows".
Traceback
---------
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/smg/analytics/services/impact_analysis.py", line 140, in _do_impact_analysis_mp
Correlation.user_id.in_(user_ids)).all())
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2241, in all
return list(self)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 65, in instances
fetch = cursor.fetchall()
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 752, in fetchall
self.cursor, self.context)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1027, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 746, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 715, in _fetchall_impl
self._non_result()
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 720, in _non_result
"This result object does not return rows. "
I'm using dask and it's multiprocessing scheduler (which uses multiprocessing.Pool).
As I understand it (based on the documentation), sessions created from a scoped session object (created via scoped_session()), are threadsafe. This is because they are threadlocal. This would lead me to believe that when I call Session() (or using the proxy Session) I'm getting a session object that only exists and is only accessible from the thread it was called from.
This seems pretty straight forward.
What I am confused about, is what is why I'm having issues when forking the process. I understand that you can't
re-use an engine across processes, so I've followed the event-based solution verbatim from the docs and done this:
class _DB(object):
_engine = None
#classmethod
def _get_engine(cls, force_new=False):
if cls._engine is None or force_new is True:
cfg = Config.get_config()
user = cfg['USER']
host = cfg['HOST']
password = cfg['PASSWORD']
database = cfg['DATABASE']
engine = create_engine(
'mysql://{}:{}#{}/{}?local_infile=1&'
'unix_socket=/var/run/mysqld/mysqld.sock'.
format(user, password, host, database),
pool_size=5, pool_recycle=3600)
cls._engine = engine
return cls._engine
# From the docs, handles multiprocessing
#event.listens_for(_DB._get_engine(), "connect")
def connect(dbapi_connection, connection_record):
connection_record.info['pid'] = os.getpid()
#From the docs, handles multiprocessing
#event.listens_for(_DB._get_engine(), "checkout")
def checkout(dbapi_connection, connection_record, connection_proxy):
pid = os.getpid()
if connection_record.info['pid'] != pid:
connection_record.connection = connection_proxy.connection = None
raise exc.DisconnectionError(
"Connection record belongs to pid %s, "
"attempting to check out in pid %s" %
(connection_record.info['pid'], pid)
)
# The following is how I create the scoped session object.
Session = scoped_session(sessionmaker(
bind=_DB._get_engine(), autocommit=False, autoflush=False))
Base = declarative_base()
Base.query = Session.query_property()
So my assumptions (based on the docs) are the following:
Using a session object created from a scoped session object, it must always give me a threadlocal session (which in my case would just be the main thread of the child process). Although not in the docs I imagine this should apply even if the scoped session object was created in another process.
The threadlocal session will get a connection from the pool via the engine, if the connection was not created within this process it will create a new one (based on the above connection() and checkout() implementations.)
If both of these things were true, then everything should "just work" (AFAICT). That's not the case though.
I managed to get it to work by creating a new scoped session object in each new process, and using it
in all subsequent calls using a session.
BTW the Base.query attribute needed to be updated from this new scoped session object as well.
I imagine that my #1 assumption above is incorrect. Can anyone help me understand why I need to create a new scoped session object in each process?
Cheers.
It is not clear when your fork happens but the most common issue is that the engine is created before the fork, which initializes a TCP connections to the database with your pool_size=5 which then gets copied over to the new processes and results in multiple processes interacting with the same physical sockets => troubles.
Options are to:
Disable the pool and use an on demand connection: poolclass=NullPool
Re-create the pool after fork: sqla_engine.dispose()
Delay the create_engine until after the fork
Related
After a long search, I could not find an answer to my question and if what I desire is even possible. My question concerns a MySQL connection implementation for a Flask API. What I desire to implement is as follows:
When the Flask app is started, a create_db_connection method is called, which created a number of mysql connections in a pooling object.
For each incoming request, a get_connection method is called, to get one connection from the poule
And of course when the request is ended, a method close_connection is called to close the connection and mark it available in the pool.
The problem I'm having concerns persistently storing the connection pool such that it can be re-used for each request.
create_db_connection method:
def create_db_connection():
print("-----INITIALISING-----")
db_pool = mysql.connector.pooling.MySQLConnectionPool(pool_name = "BestLiar_Public_API",
pool_size = 10,
autocommit = True,
pool_reset_session = True,
user = 'user',
password = 'pass',
host = 'hostel',
database =' db')
print("-----DB POOL INITIALISED-----")
// SOLUTION TO PERSISTENTLY SAVE db_pool OBJECT
get_connection method:
def __enter__(self):
try:
// SOLUTION TO FETCH THE db_pool OBJECT > NAMED AS db_pool IN LINE BELOW
self.con = db_pool.get_connection()
self.cur = self.con.cursor(dictionary=True)
if self.con.is_connected():
return {'cur': self.cur, 'con': self.con}
else:
raise NoConnectionError("No database connection", "Pool connection not connected")
except mysql.connector.PoolError:
raise SystemOverload("Too many requests, could not process","No pool connection available")
except:
raise NoConnectionError("No database connection", "Unknown reason")
close_connection method:
def __exit__(self, type, value, traceback):
if self.con:
self.cur.close()
self.con.close()
I have tried storing the db_pool object as a global variable (undesirable) and have tried the flask global object (only works for one request).
Anyone who has the key to the solution?
I have a multi tenancy python falcon app. Every tenant have their own database. On incoming request, I need to connect to tenant database.
But there is a situation here. Database configs are stored on another service and configs changing regularly.
I tried session create before process resource. But sql queries slowing down after this change. To make this faster, what should I do?
P.S. : I use PostgreSQL
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
import config
import json
import requests
class DatabaseMiddleware:
def __init__(self):
pass
def process_resource(self, req, resp, resource, params):
engineConfig = requests.get('http://database:10003/v1/databases?loadOnly=config&appId=06535108-111a-11e9-ab14-d663bd873d93').text
engineConfig = json.loads(engineConfig)
engine = create_engine(
'{dms}://{user}:{password}#{host}:{port}/{dbName}'.format(
dms= engineConfig[0]['config']['dms'],
user= engineConfig[0]['config']['user'],
password= engineConfig[0]['config']['password'],
host= engineConfig[0]['config']['host'],
port= engineConfig[0]['config']['port'],
dbName= engineConfig[0]['config']['dbName']
))
session_factory = sessionmaker(bind=engine,autoflush=True)
databaseSession = scoped_session(session_factory)
resource.databaseSession = databaseSession
def process_response(self, req, resp, resource, req_succeeded):
if hasattr(resource, 'mainDatabase'):
if not req_succeeded:
resource.databaseSession.rollback()
self.databaseSession.remove()
Your approach is probably wrong since it is against the intended usage pattern of engine instances described in engine disposal. The lifetime of engine instance should be the same as for the instance of your middleware.
The Engine refers to a connection pool, which means under normal circumstances, there are open database connections present while the Engine object is still resident in memory. When an Engine is garbage collected, its connection pool is no longer referred to by that Engine, and assuming none of its connections are still checked out, the pool and its connections will also be garbage collected, which has the effect of closing out the actual database connections as well. But otherwise, the Engine will hold onto open database connections assuming it uses the normally default pool implementation of QueuePool.
The Engine is intended to normally be a permanent fixture established up-front and maintained throughout the lifespan of an application. It is not intended to be created and disposed on a per-connection basis; it is instead a registry that maintains both a pool of connections as well as configurational information about the database and DBAPI in use, as well as some degree of internal caching of per-database resources.
In conjunction with SQLAlchemy, I use SQLService as an interface layer to SQLAlchemy's session manager and ORM layer, which nicely centralizes the core functionality of SQLAlchemy.
Here is my middleware component definition:
class DatabaseSessionComponent(object):
""" Initiates a new Session for incoming request and closes it in the end. """
def __init__(self, sqlalchemy_database_uri):
self.sqlalchemy_database_uri = sqlalchemy_database_uri
def process_resource(self, req, resp, resource, params):
resource.db = sqlservice.SQLClient(
self.sqlalchemy_database_uri,
model_class=BaseModel
)
def process_response(self, req, resp, resource):
if hasattr(resource, "db"):
resource.db.disconnect()
With its instantiation within the API's instantiation here:
api = falcon.API(
middleware=[
DatabaseSessionComponent(os.environ["SQLALCHEMY_DATABASE_URI"]),
]
)
I have a problem with every insert query (little query) which is executed in celery tasks asynchronously.
In sync mode when i do insert all done great, but when it executed in apply_async() i get this:
OperationTimedOut('errors=errors=errors={}, last_host=***.***.*.***, last_host=None, last_host=None',)
Traceback:
Traceback (most recent call last):
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/var/nfs_www/***/www_v1/app/mods/news_feed/tasks.py", line 26, in send_new_comment_reply_notifications
send_new_comment_reply_notifications_method(comment_id)
File "/var/nfs_www/***www_v1/app/mods/news_feed/methods.py", line 83, in send_new_comment_reply_notifications
comment_type='comment_reply'
File "/var/nfs_www/***/www_v1/app/mods/news_feed/models/storage.py", line 129, in add
CommentsFeed(**kwargs).save()
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/models.py", line 531, in save
consistency=self.__consistency__).save()
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/query.py", line 907, in save
self._execute(insert)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/query.py", line 786, in _execute
tmp = execute(q, consistency_level=self._consistency)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/connection.py", line 95, in execute
result = session.execute(query, params)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 1103, in execute
result = future.result(timeout)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 2475, in result
raise OperationTimedOut(errors=self._errors, last_host=self._current_host)
OperationTimedOut: errors={}, last_host=***.***.*.***
Does anyone have ideas about problem?
I found this When cassandra-driver was executing the query, cassandra-driver returned error OperationTimedOut, but my query is very little and problem only in celery tasks.
UPDATE:
I made a test task and it raises this error too.
#celery.task()
def test_task_with_cassandra():
from app import cassandra_session
cassandra_session.execute('use news_feed')
return 'Done'
UPDATE 2:
Made this:
#celery.task()
def test_task_with_cassandra():
from cqlengine import connection
connection.setup(app.config['CASSANDRA_SERVERS'], port=app.config['CASSANDRA_PORT'],
default_keyspace='test_keyspace')
from .models import Feed
Feed.objects.count()
return 'Done'
Got this:
NoHostAvailable('Unable to connect to any servers', {'***.***.*.***': OperationTimedOut('errors=errors=Timed out creating connection, last_host=None, last_host=None',)})
From shell i can connect to it
UPDATE 3:
From deleted thread on github issue (found this in my emails): (this worked for me too)
Here's how, in substance, I plug CQLengine to Celery:
from celery import Celery
from celery.signals import worker_process_init, beat_init
from cqlengine import connection
from cqlengine.connection import (
cluster as cql_cluster, session as cql_session)
def cassandra_init():
""" Initialize a clean Cassandra connection. """
if cql_cluster is not None:
cql_cluster.shutdown()
if cql_session is not None:
cql_session.shutdown()
connection.setup()
# Initialize worker context for both standard and periodic tasks.
worker_process_init.connect(cassandra_init)
beat_init.connect(cassandra_init)
app = Celery()
This is crude, but works. Should we add this snippet in the FAQ ?
I had a similar issue. It seemed to be related to sharing the Cassandra session between tasks. I solved it by creating a session per thread. Make sure you call get_session() from you tasks and then do this:
thread_local = threading.local()
def get_session():
if hasattr(thread_local, "cassandra_session"):
return thread_local.cassandra_session
cluster = Cluster(settings.CASSANDRA_HOSTS)
session = cluster.connect(settings.CASSANDRA_KEYSPACE)
thread_local.cassandra_session = session
return session
Inspired by Ron's answer, I come up with the following code to put in tasks.py:
import threading
from django.conf import settings
from cassandra.cluster import Cluster
from celery.signals import worker_process_init,worker_process_shutdown
thread_local = threading.local()
#worker_process_init.connect
def open_cassandra_session(*args, **kwargs):
cluster = Cluster([settings.DATABASES["cassandra"]["HOST"],], protocol_version=3)
session = cluster.connect(settings.DATABASES["cassandra"]["NAME"])
thread_local.cassandra_session = session
#worker_process_shutdown.connect
def close_cassandra_session(*args,**kwargs):
session = thread_local.cassandra_session
session.shutdown()
thread_local.cassandra_session = None
This neat solution will automatically open/close cassandra sessions when celery worker process starts and stops.
Side note: protocol_version=3, because Cassandra 2.1 only supports protocol versions 3 and lower.
The other answers didn't work for me, but the question's 'update 3' did. Here's what I ended up with (small updates to the suggestion within the question):
from celery.signals import worker_process_init
from cassandra.cqlengine import connection
from cassandra.cqlengine.connection import (
cluster as cql_cluster, session as cql_session)
def cassandra_init(*args, **kwargs):
""" Initialize a clean Cassandra connection. """
if cql_cluster is not None:
cql_cluster.shutdown()
if cql_session is not None:
cql_session.shutdown()
connection.setup(settings.DATABASES["cassandra"]["HOST"].split(','), settings.DATABASES["cassandra"]["NAME"])
# Initialize worker context (only standard tasks)
worker_process_init.connect(cassandra_init)
Using django-cassandra-engine the following resolved the issue for me:
db_connection = connections['cassandra']
#worker_process_init.connect
def connect_db(**_):
db_connection.reconnect()
#worker_shutdown.connect
def disconnect(**_):
db_connection.connection.close_all()
look at here
Update 3/4:
I've done some testing and proved that using checkout event handler to check disconnects works with Elixir. Beginning to think my problem has something to do with calling session.commit() from a subprocess? Update: I just disproved myself by calling session.commit() in a subprocess, updated example below. I'm using the multiprocessing module to create the subprocess.
Here's the code that shows how it should work (without even using pool_recycle!):
from sqlalchemy import exc
from sqlalchemy import event
from sqlalchemy.pool import Pool
from elixir import *
import multiprocessing as mp
class SubProcess(mp.Process):
def run(self):
a3 = TestModel(name="monkey")
session.commit()
class TestModel(Entity):
name = Field(String(255))
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
cursor = dbapi_connection.cursor()
try:
cursor.execute("SELECT 1")
except:
# optional - dispose the whole pool
# instead of invalidating one at a time
# connection_proxy._pool.dispose()
# raise DisconnectionError - pool will try
# connecting again up to three times before raising.
raise exc.DisconnectionError()
cursor.close()
from sqlalchemy import create_engine
metadata.bind = create_engine("mysql://foo:bar#localhost/some_db", echo_pool=True)
setup_all(True)
subP = SubProcess()
a1 = TestModel(name='foo')
session.commit()
# pool size is now three.
print "Restart the server"
raw_input()
subP.start()
#a2 = TestModel(name='bar')
#session.commit()
Update 2:
I'm forced to find another solution as post 1.2.2 versions of MySQL-python drops support for the reconnect param. Anyone got a solution? :\
Update 1 (old-solution, doesn't work for MySQL-python versions > 1.2.2):
Found a solution: passing connect_args={'reconnect':True} to the create_engine call fixes the problem, automagically reconnects. Don't even seem to need the checkout event handler.
So, in the example from the question:
metadata.bind = create_engine("mysql://foo:bar#localhost/db_name", pool_size=100, pool_recycle=3600, connect_args={'reconnect':True})
Original question:
Done quite a bit of Googling for this problem and haven't seem to found a solution specific to Elixir - I'm trying to use the "Disconnect Handling - Pessimistic" example from the SQLAlchemy docs to handle MySQL disconnects. However, when I test this (by restarting the MySQL server), the "MySQL server has gone away" error is raised before before my checkout event handler.
Here's the code I use to initialize elixir:
##### Initialize elixir/SQLAlchemy
# Disconnect handling
from sqlalchemy import exc
from sqlalchemy import event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
logging.debug("***********ping_connection**************")
cursor = dbapi_connection.cursor()
try:
cursor.execute("SELECT 1")
except:
logging.debug("######## DISCONNECTION ERROR #########")
# optional - dispose the whole pool
# instead of invalidating one at a time
# connection_proxy._pool.dispose()
# raise DisconnectionError - pool will try
# connecting again up to three times before raising.
raise exc.DisconnectionError()
cursor.close()
metadata.bind= create_engine("mysql://foo:bar#localhost/db_name", pool_size=100, pool_recycle=3600)
setup_all()
I create elixir entity objects and save them with session.commit(), during which I see the "ping_connection" message generated from the event defined above. However, when I restart the mysql server and test it again, it fails with the mysql server has gone away message just before the ping connection event.
Here's the stack trace starting from the relevant lines:
File "/usr/local/lib/python2.6/dist-packages/elixir/entity.py", line 1135, in get_by
return cls.query.filter_by(*args, **kwargs).first()
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1963, in first
ret = list(self[0:1])
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1857, in __getitem__
return list(res)
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 2032, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 2047, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 1399, in execute
params)
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 1532, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 1640, in _execute_context
context)
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 1633, in _execute_context
context)
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/engine/default.py", line 330, in do_execute
cursor.execute(statement, parameters)
File "/usr/lib/pymodules/python2.6/MySQLdb/cursors.py", line 166, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/pymodules/python2.6/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
OperationalError: (OperationalError) (2006, 'MySQL server has gone away')
The final workaround was calling session.remove() in the start of methods before manipulating and loading elixir entities. What this does is it will return the connection to the pool, so that when it's used again, the pool's checkout event will be fired, and our handler will detect the disconnection. From SQLAlchemy docs:
It’s not strictly necessary to remove the session at the end of the request - other options include calling Session.close(), Session.rollback(), Session.commit() at the end so that the existing session returns its connections to the pool and removes any existing transactional context. Doing nothing is an option too, if individual controller methods take responsibility for ensuring that no transactions remain open after a request ends.
Quite an important little piece of information I wish it were mentioned in the elixir docs. But then I guess it assumes prior knowledge with SQLAlchemy?
the actual problem is sqlalchemy giving you the same session every time you call the sessionmaker factory. Due to this it can happen that a later query is performed with a much earlier opened session as long as you did not call session.remove() on the session. Having to remember calling remove() every time you request a session however is no fun and sqlalchemy provides a much simpler thing: contexual "scoped" sessions.
To create a scoped session simply wrap your sessionmaker:
from sqlalchemy.orm import scoped_session, sessionmaker
Session = scoped_session(sessionmaker())
This way you get a contexual bound session every time you call the factory, meaning sqlalchemy calls the session.remove() for you as soon as the calling function exits. See here: sqlalchemy - lifespan of a contextual session
Are you using the same session for both (before and after mysqld restart) operations? If so, the "checkout" event occurs only when new transaction is started. When you call commit() the new transaction is started (unless you use autocommit mode) and connection is checked out. So you restart mysqld after checkout.
The simple hack with commit() or rollback() call just before the second operation (and after restarting mysqld) should solve your problem. Otherwise consider using new fresh session each time you wait long time after previous commit.
I'm not sure if this is the same problem that I had, but here goes:
When I encountered MySQL server has gone away, I solved it using create_engine(..., pool_recycle=3600), see http://www.sqlalchemy.org/docs/dialects/mysql.html#connection-timeouts
I'm using sqlobject in Python. I connect to the database with
conn = connectionForURI(connStr)
conn.makeConnection()
This succeeds, and I can do queries on the connection:
g_conn = conn.getConnection()
cur = g_conn.cursor()
cur.execute(query)
res = cur.fetchall()
This works as intended. However, I also defined some classes, e.g:
class User(SQLObject):
class sqlmeta:
table = "gui_user"
username = StringCol(length=16, alternateID=True)
password = StringCol(length=16)
balance = FloatCol(default=0)
When I try to do a query using the class:
User.selectBy(username="foo")
I get an exception:
...
File "c:\python25\lib\site-packages\SQLObject-0.12.4-py2.5.egg\sqlobject\main.py", line 1371, in selectBy
conn = connection or cls._connection
File "c:\python25\lib\site-packages\SQLObject-0.12.4-py2.5.egg\sqlobject\dbconnection.py", line 837, in __get__
return self.getConnection()
File "c:\python25\lib\site-packages\SQLObject-0.12.4-py2.5.egg\sqlobject\dbconnection.py", line 850, in getConnection
"No connection has been defined for this thread "
AttributeError: No connection has been defined for this thread or process
How do I define a connection for a thread? I just realized I can pass in a connection keyword which I can give conn to to make it work, but how do I get it to work if I weren't to do that?
Do:
from sqlobject import sqlhub, connectionForURI
sqlhub.processConnection = connectionForURI(connStr)