Correct use of sqlalchemy sesion with Flask

Correct use of sqlalchemy sesion with Flask - python

I am trying to use sqlalchemy with flask(without the flask-sqlalchemy extension). I am struggling with how to integrate the session object of sqlalchemy with flask.
I have seen several approaches:
1) Create a new session everytime I want to query the database and close the corresponding session at the end of the request(in the request function itself rather than taking help from Flask). This does not scale well as it exhausts the sqlalchemy connection pool quickly with the number of increasing requests.
2) Use the scoped session, query the database and remove the scoped session acquired at the end of request using the flask's app.after_request decorator. This approach works, but I am confused so as how to make use of it, as some places recommend to use the Flask's global request variable g to keep track of which session is allocated to which request while others do not use it at all.
I have decided to use the second option, as the first option is not scalable at all for large applications.
I would like to know if there are any pitfalls with the second approach like concurrent sessions handling the same objects with or without the g operator. Also, does not using the g variable have any effect on sqlalchemy scoped_session?
Any help would be appreciated.

Related

SQLAlchemy used in Flask, Session management implementation

As I cannot use the Flask-SQLAlchemy due to models definitions and use of the database part of the app in other contexts than Flask, I found several ways to manage sessions and I am not sure what to do.
One thing that everyone seems to agree (including me) is that a new session should be created at the beginning of each request and be committed + closed when the request has been processed and the response is ready to be sent back to the client.
Currently, I implemented the session management that way:
I have a database initialization python script which creates the engine (engine = create_engine(app.config["MYSQL_DATABASE_URI"])) and defines the session maker Session = sessionmaker(bind=engine, expire_on_commit=False).
In another file I defined two function decorated with flask's before_request and teardown_request applications decorators.
#app.before_request
def create_db_session():
g.db_session = Session()
#app.teardown_request
def close_db_session(exception):
try:
g.db_session.commit()
except:
g.db_session.rollback()
finally:
g.db_session.close()
I then use the g.db_session when I need to perform queries: g.db_session.query(models.User.user_id).filter_by(username=username)
Is this a correct way to manage sessions ?
I also took a look at the scoped sessions proposed by SQLAlchemy and this might be anotherway of doing things, but I am not sure about how to change my system to use scoped sessions...
If I understood it well, I would not use the g variable, but I would instead always refer to the Session definition declared by Session = scoped_session(sessionmaker(bind=engine, expire_on_commit=False)) and I would not need to initialize a new session explicitly when a request arrives.
I could just perform my queries as usual with Session.query(models.User.user_id).filter_by(username=username) and I would just need to remove the session when the request ends:
#app.teardown_request
def close_db_session(exception):
Session.commit()
Session.remove()
I am a bit lost with this session management topic and I would need help to understand how to manage sessions. Is there a real difference between the two approaches above?

Your approach of managing the session via flask.g is completely acceptable to my point of view. Whatever we are trying to do with SQLAlchemy, one must remember the basic principles:
Always clean up after yourself. At web application runtime, if you spawn a lot of sessions without .close()ing them, this will eventually lead to connection overflow at your DB instance. You are handling this by calling finally: session.close()
Maintain session independence. It's not good if various application contexts ( requests, threads, etc..) share the same session instance, because it's not deterministic. You are doing this by ensuring only one session runs per one request.
The scoped_session can be considered as just an alternative of flask.g - it ensures that within one thread, each call to the Session() constructor returns the same object - https://docs.sqlalchemy.org/en/13/orm/contextual.html#unitofwork-contextual
It's a SQLA batteries included version of your session management code.
So far, if you are using Flask, which is a synchronous framework, I don't think you will have any issues with this setup.

Is storing data in "thread local storage" in a Django application safe, in cases of concurrent requests?

I have seen at many places that using thread local storage to store any data in Django application is not a good practice.
But this is the only way I could store my request object. I need to store it because my application has a complex structure. And I can't keep on passing the request object at each function call or class intialization.
I need the cookies and headers from my request object, to be passed to some api calls I'm making at different places in the application.
I'm using this for reference:
https://blndxp.wordpress.com/2016/03/04/django-get-current-user-anywhere-in-your-code-using-a-middleware/
So I'm using a middleware, as mentioned in the reference.
And, this is how request is stored
from threading import local
_thread_locals = local()
_thread_locals.request = request
And, this is how data is fetched:
getattr(_thread_locals, "request", None)
So does are the data stored in the threads local to that particular request ? Or if another request takes place at the same time, does both of them use the same data ?(Which is certainly not what i want)
Or is there any new way of dealing with this old problem(storing request object globally)
Note: I'm also using async at places in my Django application(If that matters).

Yes, using thread-local storage in Django is safe.
Django uses one thread to handle each request. Django also uses thread-local data itself, for instance for storing the currently activated locale. While appservers such as Gunicorn and uwsgi can be configured to utilize multiple threads, each request will still be handled by a single thread.
However, there have been conflicting opinions on whether using thread-locals is an elegant and well-designed solution. The reasons against using thread-locals boil down to the same reasons why global variables are considered bad practice. This answer discusses a number of them.
Still, storing the request object in thread-local data has become a widely used pattern in the Django community. There is even an app Django-CRUM that contains a CurrentRequestUserMiddleware class and the functions get_current_user() and get_current_request().
Note that as of version 3.0, Django has started to implement asynchronous support. I'm not sure what its implications are for apps like Django-CRUM. For the foreseeable future, however, thread-locals can safely be used with Django.

Global state in a WSGI hosted Flask application

Assume a Flask application that allows to build an object (server-side) through a number of steps (wizard-like ; client-side).
I would like to create an initial object server-side an build it up step by step given the client-side input, keeping the object 'alive' throughout the whole build-process. A unique id will be associated with the creation of each new object / wizard.
Serving the Flask application with the use of WSGI on Apache, requests can go through multiple instance of the Flask application / multiple threads.
How do I keep this object alive server-side, or in other words how to keep some kind of global state?
I like to keep the object in memory, not serialize/deserialize it to/from disk. No cookies either.
Edit:
I'm aware of the Flask.g object but since this is on per request basis this is not a valid solution.
Perhaps it is possible to use some kind of cache layer, e.g.:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
Is this a valid solution? Does this layer live across multiple app instances?

You're looking for sessions.
You said you don't want to use cookies, but did you mean you didn't want to store the data as a cookie or are you avoiding cookies entirely? For the former case, take a look at server side sessions, e.g. Flask-KVSession
Instead of storing data on the client, only a securely generated ID is stored on the client, while the actual session data resides on the server.

Do I authenticate at database level, at Flask User level, or both?

I have an MS-SQL deployed on AWS RDS, that I'm writing a Flask front end for.
I've been following some intro Flask tutorials, all of which seem to pass the DB credentials in the connection string URI. I'm following the tutorial here:
https://medium.com/#rodkey/deploying-a-flask-application-on-aws-a72daba6bb80#.e6b4mzs1l
For deployment, do I prompt for the DB login info and add to the connection string? If so, where? Using SQLAlchemy, I don't see any calls to create_engine (using the code in the tutorial), I just see an initialization using config.from_object, referencing the config.py where the SQLALCHEMY_DATABASE_URI is stored, which points to the DB location. Trying to call config.update(dict(UID='****', PASSWORD='******')) from my application has no effect, and looking in the config dict doesn't seem to have any applicable entries to set for this purpose. What am I doing wrong?
Or should I be authenticating using Flask-User, and then get rid of the DB level authentication? I'd prefer authenticating at the DB layer, for ease of use.

The tutorial you are using uses Flask-Sqlalchemy to abstract the database setup stuff, that's why you don't see engine.connect().
Frameworks like Flask-Sqlalchemy are designed around the idea that you create a connection pool to the database on launch, and share that pool amongst your various worker threads. You will not be able to use that for what you are doing... it takes care of initializing the session and things early in the process.
Because of your requirements, I don't know that you'll be able to make any use of things like connection pooling. Instead, you'll have to handle that yourself. The actual connection isn't too hard...
engine = create_engine('dialect://username:password#host/db')
connection = engine.connect()
result = connection.execute("SOME SQL QUERY")
for row in result:
# Do Something
connection.close()
The issue is that you're going to have to do that in every endpoint. A database connection isn't something you can store in the session- you'll have to store the credentials there and do a connect/disconnect loop in every endpoint you write. Worse, you'll have to either figure out encrypted sessions or server side sessions (without a db connection!) to prevent keeping those credentials in the session from becoming a horrible security leak.
I promise you, it will be easier both now and in the long run to figure out a simple way to authenticate users so that they can share a connection pool that is abstracted out of your app endpoints. But if you HAVE to do it this way, this is how you will do it. (make sure you are closing those connections every time!)

Google App Engine - NDB - Set property value on multiple records

I'm using NDB and Python on Google App Engine. What is the proper way update a property on multiple entities with the same value? The NDB equivalent of:
UPDATE notifications SET read = true WHERE user_id = 123.
The use case is I have these fan-out notifications. And a specific user wants to set all of their notifications as read (potentially 100s). I know that I could use get_async and put_async to fetch each unread notification and set it as read, but I'm worried about the latency that is created by fetching potentially 100s of serializations/deserializations.
Any advice is greatly appericated.

You can call a function for each entity with the map() method of Query. For best performance don't forget the _async.
But one of the most useful service of GAE is Task Queues, especially in cases like this. If you combine Query Cursors and deferred library, you can easily process any number of entities.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.