SQLAlchemy session management in long-running process

SQLAlchemy session management in long-running process - python

Scenario:
A .NET-based application server (Wonderware IAS/System Platform) hosts automation objects that communicate with various equipment on the factory floor.
CPython is hosted inside this application server (using Python for .NET).
The automation objects have scripting functionality built-in (using a custom, .NET-based language). These scripts call Python functions.
The Python functions are part of a system to track Work-In-Progress on the factory floor. The purpose of the system is to track the produced widgets along the process, ensure that the widgets go through the process in the correct order, and check that certain conditions are met along the process. The widget production history and widget state is stored in a relational database, this is where SQLAlchemy plays its part.
For example, when a widget passes a scanner, the automation software triggers the following script (written in the application server's custom scripting language):
' wiget_id and scanner_id provided by automation object
' ExecFunction() takes care of calling a CPython function
retval = ExecFunction("WidgetScanned", widget_id, scanner_id);
' if the python function raises an Exception, ErrorOccured will be true
' in this case, any errors should cause the production line to stop.
if (retval.ErrorOccured) then
ProductionLine.Running = False;
InformationBoard.DisplayText = "ERROR: " + retval.Exception.Message;
InformationBoard.SoundAlarm = True
end if;
The script calls the WidgetScanned python function:
# pywip/functions.py
from pywip.database import session
from pywip.model import Widget, WidgetHistoryItem
from pywip import validation, StatusMessage
from datetime import datetime
def WidgetScanned(widget_id, scanner_id):
widget = session.query(Widget).get(widget_id)
validation.validate_widget_passed_scanner(widget, scanner) # raises exception on error
widget.history.append(WidgetHistoryItem(timestamp=datetime.now(), action=u"SCANNED", scanner_id=scanner_id))
widget.last_scanner = scanner_id
widget.last_update = datetime.now()
return StatusMessage("OK")
# ... there are a dozen similar functions
My question is: How do I best manage SQLAlchemy sessions in this scenario? The application server is a long-running process, typically running months between restarts. The application server is single-threaded.
Currently, I do it the following way:
I apply a decorator to the functions I make avaliable to the application server:
# pywip/iasfunctions.py
from pywip import functions
def ias_session_handling(func):
def _ias_session_handling(*args, **kwargs):
try:
retval = func(*args, **kwargs)
session.commit()
return retval
except:
session.rollback()
raise
return _ias_session_handling
# ... actually I populate this module with decorated versions of all the functions in pywip.functions dynamically
WidgetScanned = ias_session_handling(functions.WidgetScanned)
Question: Is the decorator above suitable for handling sessions in a long-running process? Should I call session.remove()?
The SQLAlchemy session object is a scoped session:
# pywip/database.py
from sqlalchemy.orm import scoped_session, sessionmaker
session = scoped_session(sessionmaker())
I want to keep the session management out of the basic functions. For two reasons:
There is another family of functions, sequence functions. The sequence functions call several of the basic functions. One sequence function should equal one database transaction.
I need to be able to use the library from other environments. a) From a TurboGears web application. In that case, session management is done by TurboGears. b) From an IPython shell. In that case, commit/rollback will be explicit.
(I am truly sorry for the long question. But I felt I needed to explain the scenario. Perhaps not necessary?)

The described decorator is suitable for long running applications, but you can run into trouble if you accidentally share objects between requests. To make the errors appear earlier and not corrupt anything it is better to discard the session with session.remove().
try:
try:
retval = func(*args, **kwargs)
session.commit()
return retval
except:
session.rollback()
raise
finally:
session.remove()
Or if you can use the with context manager:
try:
with session.registry().transaction:
return func(*args, **kwargs)
finally:
session.remove()
By the way, you might want to use .with_lockmode('update') on the query so your validate doesn't run on stale data.

Ask your WonderWare administrator to give you access to the Wonderware Historian, you can track the values of the tags pretty easily via MSSQL calls over sqlalchemy that you can poll every so often.
Another option is to use the archestra toolkit to listen for the internal tag updates and have a server deployed as a platform in the galaxy which you can listen from.

Related

How to handle long SQL query in Flask?

I have a Flask web app that might need to execute some heavy SQL queries via Sqlalchemy given user input. I would like to set a timeout for the query, let's say 20 seconds so if a query takes more than 20 seconds, the server will display an error message to user so they can try again later or with smaller inputs.
I have tried with both multiprocessing and threading modules, both with the Flask development server and Gunicorn without success: the server keeps blocking and no error message is returned. You'll find below an excerpt of the code.
How do you handle slow SQL query in Flask in a user friendly way?
Thanks.
from multiprocessing import Process
#app.route("/long_query")
def long_query():
query = db.session(User)
def run_query():
nonlocal query
query = query.all()
p = Process(target=run_query)
p.start()
p.join(20) # timeout of 20 seconds
if p.is_alive():
p.terminate()
return render_template("error.html", message="please try with smaller input")
return render_template("result.html", data=query)

I would recommend using Celery or something similar (people use python-rq for simple workflows).
Take a look at Flask documentation regarding Celery: http://flask.pocoo.org/docs/0.12/patterns/celery/
As for dealing with the results of a long-running query: you can create an endpoint for requesting task results and have client application periodically checking this endpoint until the results are available.

As leovp mentioned Celery is the way to go if you are working on a long-term project. However, if you are working on a small project where you want something easy to setup, I would suggest going with RQ and it's flask plugin. Also think very seriously about terminating processes while they are querying the database, since they might not be able to clean up after themselves (e.g. releasing locks they have on the db)
Well if you really want to terminate queries on a timeout, I would suggest you use a database that supports it (PostgreSQL is one). I will assume you use PostgreSQL for this section.
from sqlalchemy.interfaces import ConnectionProxy
class ConnectionProxyWithTimeouts(ConnectionProxy):
def cursor_execute(self, execute, cursor, statement, parameters, context, executemany):
timeout = context.execution_options.get('timeout', None)
if timeout:
c = cursor._parent.cursor()
c.execute('SET statement_timeout TO %d;' % int(timeout * 1000))
c.close()
ret = execute(cursor, statement, parameters, context)
c = cursor._parent.cursor()
c.execute('SET statement_timeout TO 0')
c.close()
return ret
else:
return execute(cursor, statement, parameters, context)
Then when you created an engine you would your own connection proxy
engine = create_engine(URL, proxy=TimeOutProxy(), pool_size=1, max_overflow=0)
And you could query then like this
User.query.execution_options(timeout=20).all()
If you want to use the code above, use it only as a base for your own implementation, since I am not 100% sure it's bug free.

SQLAlchemy session and connection relationship

Do queries executed with the same SQLAlchemy session object use the same underlying connection? If not, is there a way to ensure this?
Some background: I have a need to use MySQL's named lock feature, i.e. GET_LOCK() and RELEASE_LOCK() functions. As far as the MySQL server is concerned, only the connection that obtained the lock can release it - so I have to make sure that I either execute these two commands within the same connection or the connection dies to ensure the lock is released.
To make things nicer, I have created a "locked" context like so:
#contextmanager
def mysql_named_lock(session, name, timeout):
"""Get a named mysql lock on a session
"""
lock = session.execute("SELECT GET_LOCK(:name, :timeout)",
name=name, timeout=timeout).scalar()
if lock:
try:
yield session
finally:
session.execute("SELECT RELEASE_LOCK(:name)", name=name)
else:
e = "Count not obtain named lock {} within {} sections".format(
name, timeout)
raise RuntimeError(e)
def my_critical_section(session):
with mysql_named_lock(session, __name__, 10) as lockedsession:
thing = lockedsession.query(MyStuff).one()
return thing
I want to make sure that the two execute calls in mysql_named_lock happen on the same underlying connection or the connection is closed.
Can I assume this would "just work" or is there anything I need to be aware of here?

it will "just work" if (a) your session is a scoped_session and (b) you are using it in a non-concurrent fashion (same pid / thread). If you're too paranoid, make sure (assert) you're using the same connection ID via
session.connection().connection.thread_id()
also, there is no point to pass session as an argument. Init it once, somewhere in your application’s global scope, then call anywhere in a code, you will get the same connection ID.

Pyramid: Multi-threaded Database Operation

My application receives one or more URLs (typically 3-4 URLs) from the user, scrapes certain data from those URLs and writes those data to the database. However, because scraping those data take a little while, I was thinking of running each of those scraping in a separate thread so that the scraping + writing to the database can keep on going in the background so that the user does not have to keep on waiting.
To implement that, I have (relevant parts only):
#view_config(route_name="add_movie", renderer="templates/add_movie.jinja2")
def add_movie(request):
post_data = request.POST
if "movies" in post_data:
movies = post_data["movies"].split(os.linesep)
for movie_id in movies:
movie_thread = Thread(target=store_movie_details, args=(movie_id,))
movie_thread.start()
return {}
def store_movie_details(movie_id):
movie_details = scrape_data(movie_id)
new_movie = Movie(**movie_details) # Movie is my model.
print new_movie # Works fine.
print DBSession.add(movies(**movie_details)) # Returns None.
While the line new_movie does print correct scrapped data, DBSession.add() doesn't work. In fact, it just returns None.
If I remove the threads and just call the method store_movie_details(), it works fine.
What's going on?

Firstly, the SA docs on Session.add() do not mention anything about the method's return value, so I would assume it is expected to return None.
Secondly, I think you meant to add new_movie to the session, not movies(**movie_details), whatever that is :)
Thirdly, the standard Pyramid session (the one configured with ZopeTransactionExtension) is tied to Pyramid's request-response cycle, which may produce unexpected behavior in your situation. You need to configure a separate session which you will need to commit manually in store_movie_details. This session needs to use scoped_session so the session object is thread-local and is not shared across threads.
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
session_factory = sessionmaker(bind=some_engine)
AsyncSession = scoped_session(session_factory)
def store_movie_details(movie_id):
session = AsyncSession()
movie_details = scrape_data(movie_id)
new_movie = Movie(**movie_details) # Movie is my model.
session.add(new_movie)
session.commit()
And, of course, this approach is only suitable for very light-weight tasks, and if you don't mind occasionally losing a task (when the webserver restarts, for example). For anything more serious have a look at Celery etc. as Antoine Leclair suggests.

The transaction manager closes the transaction when the response is returned. DBSession has no transaction in the other threads when the response has been returned. Also, it's probably not a good idea to share a transaction across threads.
This is a typical use case for a worker. Check out Celery or RQ.

gevent and posgres: Asynchronous connection failed

I'm using gevent to handle API I/O on a Django-based web system.
I've monkey-patched using:
import gevent.monkey; gevent.monkey.patch_socket()
I've patched psychopg using:
import psycogreen; psycogreen.gevent.patch_psycopg()
Nonetheless, certain Django calls so Model.save() are failing with the error: "Asynchronous Connection Failed." Do I need to do something else to make postgres greenlet-safe in the Django environment? Is there something else I'm missing?

there is an article on this promblem, unfortunately it's in Russian. Let me quote the final part:
All the connections are stored in django.db.connections, which is
the instance of django.db.utils.ConnectionHandler. Every time ORM
is about to issue a query, it requests a DB connection by calling
connections['default']. In turn, ConnectionHandler.__getattr__ checks if there is a connection in
ConnectionHandler._connections, and creates a new one if it is
empty.
All opened connections should be closed after use. There is a signal
request_finished, which is run by
django.http.HttpResponseBase.close. Django closes DB connections
at the very last moment, when nobody could use it anymore - and it
seems reasonable.
Yet there is tricky part about how ConnectionHandler stores DB
connections. It uses threading.local, which becomes
gevent.local.local after monkeypatching. Declared once, this
structure works just as it was unique at every greenlet. Controller
*some_view* started its work in one greenlet, and now we've got a connection in *ConnectionHandler._connections*. Then we create few
more greenlets and which get an empty
*ConnectionHandlers._connections*, and they've got connectinos from pool. After new greenlets done, the content of their local() is gone,
and DB connections gone withe them without being returned to pool. At
some moment, pool becomes empty
Developing Django+gevent you should always keep that in mind and close
the DB connection by calling django.db.close_connection. It
should be called at exception as well, you can use a decorator for
that, something like:
class autoclose(object):
def __init__(self, f=None):
self.f = f
def __call__(self, *args, **kwargs):
with self:
return self.f(*args, **kwargs)
def __enter__(self):
pass
def __exit__(self, exc_type, exc_info, tb):
from django.db import close_connection
close_connection()
return exc_type is None

Need to communicate update events to multiple running instances of the same python script

I need to communicate update events to all running instances of my python script, and i would like to keep the code as simple as possible. I have zero experience with communicating between running processes. Up until now, i have been reading/writing configuration files, which each instance will read and/or update.
Here is some pseudo code i have written (sort of a simple template) to wrap my head around how to solve this problem. Please try to help me fill in the blanks. And remember, i have no experience with sockets, threads, etc...
import process # imaginary module
class AppA():
def __init__(self):
# Every instance that opens will need to attach
# itself to the "Instance Manager". If no manager
# exists, then we need to spawn it. Only one manager
# will ever exist no matter how many instances are
# running.
try:
hm = process.get_handle(AppA_InstanceManager)
except NoSuchProgError:
hm.spawn_instance(AppA_InstanceManager)
finally:
hm.register(self)
self.instance_manager = hm
def state_update(self):
# This method won't exist in the real code, however,
# it emulates internal state changes for the sake of
# explaination.
#
# When any internal state changes happen, we will then
# propagate the changes outward by calling the
# appropriate method of "self.instance_manager".
self.instance_manager.propagate_state()
def cb_state_update(self):
# Called from the "Instance Manager" only!
#
# This may be as simple as reading a known
# config file. Or could simply pass data
# to this method.
class AppA_InstanceManager():
def __init__(self):
self.instances = []
def register_instance(self, instance):
self.instances.append(instance)
def unregister_instance(self, instance):
# nieve example for now.
self.instances.remove(instance)
def propagate_state(self):
for instance in self.instances:
instance.cb_state_update(data)
if __name__ == '__main__':
app = AppA()
Any Suggestions?

There are a few options for this kind of design.
You could use a message queue, its made for this kind of stuff, e.g. AMQP or some ZeroMQ or something like it.
Or you could use something like Redis or some other (in-memory) database for synchronization.
If you don't want to use something like that, you could use the multiprocessing modules synchronization stuff.
Or use a platform specific IPC system, e.g. shared memory via mmap, sysv sockets, etc.
If you want to do things the way you explained, you could have a look at Twisteds perspective broker.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy session management in long-running process - python

Related

How to handle long SQL query in Flask?

SQLAlchemy session and connection relationship

Pyramid: Multi-threaded Database Operation

gevent and posgres: Asynchronous connection failed

Need to communicate update events to multiple running instances of the same python script

Categories

Resources