Do queries executed with the same SQLAlchemy session object use the same underlying connection? If not, is there a way to ensure this?
Some background: I have a need to use MySQL's named lock feature, i.e. GET_LOCK() and RELEASE_LOCK() functions. As far as the MySQL server is concerned, only the connection that obtained the lock can release it - so I have to make sure that I either execute these two commands within the same connection or the connection dies to ensure the lock is released.
To make things nicer, I have created a "locked" context like so:
#contextmanager
def mysql_named_lock(session, name, timeout):
"""Get a named mysql lock on a session
"""
lock = session.execute("SELECT GET_LOCK(:name, :timeout)",
name=name, timeout=timeout).scalar()
if lock:
try:
yield session
finally:
session.execute("SELECT RELEASE_LOCK(:name)", name=name)
else:
e = "Count not obtain named lock {} within {} sections".format(
name, timeout)
raise RuntimeError(e)
def my_critical_section(session):
with mysql_named_lock(session, __name__, 10) as lockedsession:
thing = lockedsession.query(MyStuff).one()
return thing
I want to make sure that the two execute calls in mysql_named_lock happen on the same underlying connection or the connection is closed.
Can I assume this would "just work" or is there anything I need to be aware of here?
it will "just work" if (a) your session is a scoped_session and (b) you are using it in a non-concurrent fashion (same pid / thread). If you're too paranoid, make sure (assert) you're using the same connection ID via
session.connection().connection.thread_id()
also, there is no point to pass session as an argument. Init it once, somewhere in your application’s global scope, then call anywhere in a code, you will get the same connection ID.
Related
I have an (not web) application that's continuously (once every few seconds) scanning a MySQL table for commands to handle certain jobs. Those jobs can take a while and/or may start delayed. For that reason I'm starting a thread for each job. The thread first removes the command from the table to avoid more threads for the same job to be created and then (or after a given delay) begins with the execution of the job. It is my understanding that for this purpose Contextual/Thread-local Sessions (https://docs.sqlalchemy.org/en/14/orm/contextual.html) are a good way to handle it.
So far so good, my question aims at when and where to use Session.remove() because, while it's said in the article that, "if the application thread itself ends, the “storage” for that thread is also garbage collected", I don't think relying on garbage collection is good coding and that I should clean up properly myself whenever possible.
# database.py
from sqlalchemy.orm import sessionmaker, scoped_session
session_factory = sessionmaker(bind=some_engine)
Session = scoped_session(session_factory)
# Model definitions below
# command_handler.py
from threading import Thread
from database import Session
from classes import Helper, Table1, Commands
def job(id):
session = Session() # creating a thread-local session
cmd = session.query(Commands).get(id) # for this unique job thread
command, timestamp = cmd.command, cmd.timestamp # get contents of the command row
session.delete(cmd) # before deleting it from the db
session.commit()
# potential delay via sleep() funtion here if timestamp is in the future.
for element in session.query(Table1).all(): # running a static class method
Helper.some_method(element) # on a class that's imported
Session.remove() # remove session after
# the job is done
while True:
with Session() as session: # create a thread-local session
commands = session.query(Commands).all() # for the __main__ thread
for cmd in commands:
if cmd.command.startswith('SOMECOMMAND'):
Thread(target=job, args=[cmd.id]).start() # create a thread to handle the job
# Other commands can be added here and functions
# like job could also be imported from other files
sleep(10) # wait 10 sec before checking again
# helper.py
from database import Session
from classes import Table2
class Helper:
#classmethod
def some_method(cls, element):
session = Session() # should be the same session object
row = Table2(name=element.name) # as in the calling job function (?)
session.add(row) # do some arbitrary stuff
session.commit() # no Session.remove() required here (?)
Now I wonder if this is the correct way to handle it. The way I understand it, the advantage of having thread-local sessions is not having to pass them as function arguments since the same thread will always get the same session object when Session() is called, right?
If that's right then only one Session.remove() at the end of the job function should be required to clean up the session that was used within that particular thread.
Is there an easy way to check how many sessions the scoped_session registry contains at any given point in time?
I have a generic multiprocess script that could run any task in a multiprocess set up. I inject the task as a command line argument and use getattr to call the functions in the injected code.
taskModule = importlib.import_module(taskFile.replace(".py", ""))
taskContext = getattr(taskModule, 'init')()
response = pool.map_async(getattr(taskModule, 'run'), inputList)
The init() function creates all relevant variables for the task to execute and returns them as a dict object - the taskContext. inputList is a list of dict objects, each dict containing both the taskContext object as well as the specific item to be processed, so that each process gets a unique item to process along with a copy of the context required by the task.
One of those tasks is meant for FTP and the taskContext in that case contains information on the FTP server along with other details. The run function in the FTP task pretty much opens a connection using the context variables, uploads the required files and closes it, and this works perfectly.
However, I think it'd be good if I can set up a connection pool with multiple FTP connections at the start, as part of the init() function when the context is created, and then use them in an as-available fashion within the run method, very similar to a DB connection pool that prevents the need to open and close connections to the database every time.
Is this even feasible? If so, what's the best way to go about doing it?
I put together a connection_pool module as part of a proof of concept. I'm not sure how robust it is.
I added connection closing to this which is a bugfix of this.
I was able to set up connection pooling of FTP and SFTP connections transferring a few thousand files over 10-20 threads.
You can install my version from conda:
conda install -c jmeppley connectionpool
Creating an FTP pool looks something like this:
# this is from snakemake
def connect(*args_to_use, **kwargs_to_use):
ftp_base_class = (
ftplib.FTP_TLS if kwargs_to_use["encrypt_data_channel"] else ftplib.FTP
)
ftp_session_factory = ftputil.session.session_factory(
base_class=ftp_base_class,
port=kwargs_to_use["port"],
encrypt_data_channel=kwargs_to_use["encrypt_data_channel"],
debug_level=None,
)
return ftputil.FTPHost(
kwargs_to_use["host"],
kwargs_to_use["username"],
kwargs_to_use["password"],
session_factory=ftp_session_factory,
)
# a function to create a pool using the connect() method
def create_ftp_pool(pool_size, *args_to_use, **kwargs_to_use):
create_callback = partial(connect, *args_to_use, **kwargs_to_use)
connection_pool = ConnectionPool(create_callback,
close=lambda c: c.close(),
max_size=pool_size)
# create a pool with the arguments you'd use to create a connection
pool_size = 10
ftp_pool = create_ftp_pool(pool_size, host=...)
# use item() as a context manager
with ftp_pool.item() as connection:
...
I'm trying to insert to update really big values of data in a MySQL db and in the same try, I was trying to see in the process list what is doing!
So I made the following script:
I have a modified db MySQL that takes care to connect. Everything is working fine unless I use multiprocesses, if I use multiprocessing I got an error at some time with "Lost connection to database".
The script is like:
from mysql import DB
import multiprocessing
def check writing(db):
result = db.execute("show full processlist").fethcall()
for i in result:
if i['State'] == "updating":
print i['Info']
def main(db):
# some work to create a big list of tuple called tuple
sql = "update `table_name` set `field` = %s where `primary_key_id` = %s"
monitor = multiprocessing.Process(target=check_writing,args=(db,)) # I create the monitor process
monitor.start()
db.execute_many(sql,tuple) # I start to modify table
monitor.terminate()
monitor.join
if __name__ == "__main__"
db = DB(host,user,password,database_name) # this way I create the object connected
main(db)
db.close()
And the a part of my mysql class is:
class DB:
def __init__(self,host,user,password,db_name)
self.db = MySQLdb.connect(host=host.... etc
def execute_many(self,sql,data):
c = self.db.cursor()
c.executemany(sql, data)
c.close()
self.db.commit()
As I said before, if I don't try to execute in check_writing, the script is working fine!
Maybe someone can explain me what is the cause and how can overcome? Also, I have problems trying to threadPool writing in MySQL using map (or map_async).
Do I miss something related to mysql?
There is a better way to approach that:
Connector/Python Connection Pooling:
mysql.connector.pooling module implements pooling.
A pool opens a number of connections and handles thread safety when providing connections to requesters.
The size of a connection pool is configurable at pool creation time. It cannot be resized thereafter.
it is possible to have multiple connection pools. This enables applications to support pools of connections to different MySQL servers, for example.
Check documentation here
I think your parallel processes are exhausting your mysql connections.
I'm using gevent to handle API I/O on a Django-based web system.
I've monkey-patched using:
import gevent.monkey; gevent.monkey.patch_socket()
I've patched psychopg using:
import psycogreen; psycogreen.gevent.patch_psycopg()
Nonetheless, certain Django calls so Model.save() are failing with the error: "Asynchronous Connection Failed." Do I need to do something else to make postgres greenlet-safe in the Django environment? Is there something else I'm missing?
there is an article on this promblem, unfortunately it's in Russian. Let me quote the final part:
All the connections are stored in django.db.connections, which is
the instance of django.db.utils.ConnectionHandler. Every time ORM
is about to issue a query, it requests a DB connection by calling
connections['default']. In turn, ConnectionHandler.__getattr__ checks if there is a connection in
ConnectionHandler._connections, and creates a new one if it is
empty.
All opened connections should be closed after use. There is a signal
request_finished, which is run by
django.http.HttpResponseBase.close. Django closes DB connections
at the very last moment, when nobody could use it anymore - and it
seems reasonable.
Yet there is tricky part about how ConnectionHandler stores DB
connections. It uses threading.local, which becomes
gevent.local.local after monkeypatching. Declared once, this
structure works just as it was unique at every greenlet. Controller
*some_view* started its work in one greenlet, and now we've got a connection in *ConnectionHandler._connections*. Then we create few
more greenlets and which get an empty
*ConnectionHandlers._connections*, and they've got connectinos from pool. After new greenlets done, the content of their local() is gone,
and DB connections gone withe them without being returned to pool. At
some moment, pool becomes empty
Developing Django+gevent you should always keep that in mind and close
the DB connection by calling django.db.close_connection. It
should be called at exception as well, you can use a decorator for
that, something like:
class autoclose(object):
def __init__(self, f=None):
self.f = f
def __call__(self, *args, **kwargs):
with self:
return self.f(*args, **kwargs)
def __enter__(self):
pass
def __exit__(self, exc_type, exc_info, tb):
from django.db import close_connection
close_connection()
return exc_type is None
Scenario:
A .NET-based application server (Wonderware IAS/System Platform) hosts automation objects that communicate with various equipment on the factory floor.
CPython is hosted inside this application server (using Python for .NET).
The automation objects have scripting functionality built-in (using a custom, .NET-based language). These scripts call Python functions.
The Python functions are part of a system to track Work-In-Progress on the factory floor. The purpose of the system is to track the produced widgets along the process, ensure that the widgets go through the process in the correct order, and check that certain conditions are met along the process. The widget production history and widget state is stored in a relational database, this is where SQLAlchemy plays its part.
For example, when a widget passes a scanner, the automation software triggers the following script (written in the application server's custom scripting language):
' wiget_id and scanner_id provided by automation object
' ExecFunction() takes care of calling a CPython function
retval = ExecFunction("WidgetScanned", widget_id, scanner_id);
' if the python function raises an Exception, ErrorOccured will be true
' in this case, any errors should cause the production line to stop.
if (retval.ErrorOccured) then
ProductionLine.Running = False;
InformationBoard.DisplayText = "ERROR: " + retval.Exception.Message;
InformationBoard.SoundAlarm = True
end if;
The script calls the WidgetScanned python function:
# pywip/functions.py
from pywip.database import session
from pywip.model import Widget, WidgetHistoryItem
from pywip import validation, StatusMessage
from datetime import datetime
def WidgetScanned(widget_id, scanner_id):
widget = session.query(Widget).get(widget_id)
validation.validate_widget_passed_scanner(widget, scanner) # raises exception on error
widget.history.append(WidgetHistoryItem(timestamp=datetime.now(), action=u"SCANNED", scanner_id=scanner_id))
widget.last_scanner = scanner_id
widget.last_update = datetime.now()
return StatusMessage("OK")
# ... there are a dozen similar functions
My question is: How do I best manage SQLAlchemy sessions in this scenario? The application server is a long-running process, typically running months between restarts. The application server is single-threaded.
Currently, I do it the following way:
I apply a decorator to the functions I make avaliable to the application server:
# pywip/iasfunctions.py
from pywip import functions
def ias_session_handling(func):
def _ias_session_handling(*args, **kwargs):
try:
retval = func(*args, **kwargs)
session.commit()
return retval
except:
session.rollback()
raise
return _ias_session_handling
# ... actually I populate this module with decorated versions of all the functions in pywip.functions dynamically
WidgetScanned = ias_session_handling(functions.WidgetScanned)
Question: Is the decorator above suitable for handling sessions in a long-running process? Should I call session.remove()?
The SQLAlchemy session object is a scoped session:
# pywip/database.py
from sqlalchemy.orm import scoped_session, sessionmaker
session = scoped_session(sessionmaker())
I want to keep the session management out of the basic functions. For two reasons:
There is another family of functions, sequence functions. The sequence functions call several of the basic functions. One sequence function should equal one database transaction.
I need to be able to use the library from other environments. a) From a TurboGears web application. In that case, session management is done by TurboGears. b) From an IPython shell. In that case, commit/rollback will be explicit.
(I am truly sorry for the long question. But I felt I needed to explain the scenario. Perhaps not necessary?)
The described decorator is suitable for long running applications, but you can run into trouble if you accidentally share objects between requests. To make the errors appear earlier and not corrupt anything it is better to discard the session with session.remove().
try:
try:
retval = func(*args, **kwargs)
session.commit()
return retval
except:
session.rollback()
raise
finally:
session.remove()
Or if you can use the with context manager:
try:
with session.registry().transaction:
return func(*args, **kwargs)
finally:
session.remove()
By the way, you might want to use .with_lockmode('update') on the query so your validate doesn't run on stale data.
Ask your WonderWare administrator to give you access to the Wonderware Historian, you can track the values of the tags pretty easily via MSSQL calls over sqlalchemy that you can poll every so often.
Another option is to use the archestra toolkit to listen for the internal tag updates and have a server deployed as a platform in the galaxy which you can listen from.