Global state in a WSGI hosted Flask application - python

Assume a Flask application that allows to build an object (server-side) through a number of steps (wizard-like ; client-side).
I would like to create an initial object server-side an build it up step by step given the client-side input, keeping the object 'alive' throughout the whole build-process. A unique id will be associated with the creation of each new object / wizard.
Serving the Flask application with the use of WSGI on Apache, requests can go through multiple instance of the Flask application / multiple threads.
How do I keep this object alive server-side, or in other words how to keep some kind of global state?
I like to keep the object in memory, not serialize/deserialize it to/from disk. No cookies either.
Edit:
I'm aware of the Flask.g object but since this is on per request basis this is not a valid solution.
Perhaps it is possible to use some kind of cache layer, e.g.:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
Is this a valid solution? Does this layer live across multiple app instances?

You're looking for sessions.
You said you don't want to use cookies, but did you mean you didn't want to store the data as a cookie or are you avoiding cookies entirely? For the former case, take a look at server side sessions, e.g. Flask-KVSession
Instead of storing data on the client, only a securely generated ID is stored on the client, while the actual session data resides on the server.

Related

Flask - Application/Request Context and Sessions [duplicate]

When using sessions, Flask requires a secret key. In every example I've seen, the secret key is somehow generated and then stored either in source code or in configuration file.
What is the reason to store it permanently? Why not simply generate it when the application starts?
app.secret_key = os.urandom(50)
The secret key is used to sign the session cookie. If you had to restart your application, and regenerated the key, all the existing sessions would be invalidated. That's probably not what you want (or at least, not the right way to go about invalidating sessions). A similar case could be made for anything else that relies on the secret key, such as tokens generated by itsdangerous to provide reset password urls (for example).
The application might need to be restarted because of a crash, or because the server rebooted, or because you are pushing a bug fix or new feature, or because the server you're using spawns new processes, etc. So you can't rely on the server being up forever.
The standard practice is to have some throwaway key commited to the repo (so that there's something there for dev machines) and then to set the key in the local config when deploying. This way, the key isn't leaked and doesn't need to be regenerated.
There's also the case of running secondary systems that depend on the app context, such as Celery for running background tasks, or multiple load balanced instances of the application. If each running instance of the application has different settings, they may not work together correctly in some cases.

Is storing data in "thread local storage" in a Django application safe, in cases of concurrent requests?

I have seen at many places that using thread local storage to store any data in Django application is not a good practice.
But this is the only way I could store my request object. I need to store it because my application has a complex structure. And I can't keep on passing the request object at each function call or class intialization.
I need the cookies and headers from my request object, to be passed to some api calls I'm making at different places in the application.
I'm using this for reference:
https://blndxp.wordpress.com/2016/03/04/django-get-current-user-anywhere-in-your-code-using-a-middleware/
So I'm using a middleware, as mentioned in the reference.
And, this is how request is stored
from threading import local
_thread_locals = local()
_thread_locals.request = request
And, this is how data is fetched:
getattr(_thread_locals, "request", None)
So does are the data stored in the threads local to that particular request ? Or if another request takes place at the same time, does both of them use the same data ?(Which is certainly not what i want)
Or is there any new way of dealing with this old problem(storing request object globally)
Note: I'm also using async at places in my Django application(If that matters).
Yes, using thread-local storage in Django is safe.
Django uses one thread to handle each request. Django also uses thread-local data itself, for instance for storing the currently activated locale. While appservers such as Gunicorn and uwsgi can be configured to utilize multiple threads, each request will still be handled by a single thread.
However, there have been conflicting opinions on whether using thread-locals is an elegant and well-designed solution. The reasons against using thread-locals boil down to the same reasons why global variables are considered bad practice. This answer discusses a number of them.
Still, storing the request object in thread-local data has become a widely used pattern in the Django community. There is even an app Django-CRUM that contains a CurrentRequestUserMiddleware class and the functions get_current_user() and get_current_request().
Note that as of version 3.0, Django has started to implement asynchronous support. I'm not sure what its implications are for apps like Django-CRUM. For the foreseeable future, however, thread-locals can safely be used with Django.

Django, global variables and tokens

I'm using django to develop a website. On the server side, I need to transfer some data that must be processed on the second server (on a different machine). I then need a way to retrieve the processed data. I figured that the simplest would be to send back to the Django server a POST request, that would then be handled on a view dedicated for that job.
But I would like to add some minimum security to this process: When I transfer the data to the other machine, I want to join a randomly generated token to it. When I get the processed data back, I expect to also get back the same token, otherwise the request is ignored.
My problem is the following: How do I store the generated token on the Django server?
I could use a global variable, but I had the impression browsing here and there on the web, that global variables should not be used for safety reason (not that I understand why really).
I could store the token on disk/database, but it seems to be an unjustified waste of performance (even if in practice it would probably not change much).
Is there third solution, or a canonical way to do such a thing using Django?
You can store your token in django cache, it will be faster from database or disk storage in most of the cases.
Another approach is to use redis.
You can also calculate your token:
save some random token in settings of both servers
calculate token based on current timestamp rounded to 10 seconds, for example using:
token = hashlib.sha1(secret_token)
token.update(str(rounded_timestamp))
token = token.hexdigest()
if token generated on remote server when POSTing request match token generated on local server, when getting response, request is valid and can be processed.
The simple obvious solution would be to store the token in your database. Other possible solutions are Redis or something similar. Finally, you can have a look at distributed async tasks queues like Celery...

Socket.io connections distribution between several servers

I'm working on DB design tool (python, gevent-socket.io). In these tool multiple users can discuss one DB model, receiving changes in runtime. To support this feature, I'm using socket.io. I'd like to extend number of servers that handle socket.io connection easily. The simplest way to do it is to set up nginx to choose server basing of model ID.
I'd like module approach, where model ID is divided by number of servers. So if I have 3 nodes, model 1 will be handled on first, 2 - on second, 3 - on third, 4 - on first again etc.
My request for model loading looks like /models/, so no problem here - argument can be parsed to find server to handle it. But after model page is loaded, JS tries to establish connection:
var socket = io.connect('/models', {
'reconnection limit': 4000
});
It accesses default endpoint, so server receives following requests:
http://example.com/socket.io/1/xhr-pooling/111111?=1111111
To handle it, I create application this way:
SocketIOServer((app.config['HOST'], app.config['PORT']), app, resource='socket.io', transports=transports).serve_forever()
and then
#bp.route('/<path:remaining>')
def socketio(remaining):
app = current_app._get_current_object()
try:
# Hack: set app instead of request to make it available in the namespace.
socketio_manage(request.environ, {'/models': ModelsNamespace}, app)
except:
app.logger.error("Exception while handling socket.io connection", exc_info=True)
return Response()
I'd like to change it to
http://example.com/socket.io/<model_id>/1/xhr-pooling/111111?=1111111
to be able to choose right server in ngnix. How to do it?
UPDATE
I also like to check user permissions when it tries to establish connection. I'd like to do it in socketio(remaining) method, but, again, I need to know what model he is trying to access.
UPDATE 2
I implemented permission validator, taking model_id from HTTP_REFERER. Seems, it's only part of request that contains identifier of the model (example of values: http://example.com/models/1/).
The first idea - is to tell client side available servers for current time.
Furthermore you can generate server list for client side by priority, just put them in javascript generated array by order.
This answer means that your servers can answer on any models, you can control server loading by changing servers ordering in generated list for new clients.
I think this is more flexible way. But if you want - you can parse query string in nginx and route request on any underlying server - just have a table for "model id-server port" relations
Upd: Just thinking about your task. And find one another solution. When you generate client web page you can inline servers count in js somewhere. Then, when you requesting model updates, just use another parameter founded as
serverId = modelId%ServersCount;
that will be server identificator for routing in nginx.
Then in nginx config you can use simple parsing query string, and routing request to server you can find by serverId parameter.
in "metalanguage" it will be
get parameter serverId to var $servPortSuffix
route request to localhost:80$servPortSuffix
or another routing idea.
You can add additional GET parameters to socket.io via
io.connect(url, {query: "foo=bar"})

Sharing a session store on Redis for a Django and a Express.js Application

I want to create a Django application with some logged-in users. On another side, since I want some real-time capabilities, I want to use an Express.js application.
Now, the problem is, I don't want unauthentified users to access Express.js application's datas. So I have to share a session store between the Express.js and the Django applications.
I thought using Redis would be a good idea, since the volatile keys are perfect for this fit, and I already use Redis for another part of the application.
On the Express.js application, I'd have this kind of code :
[...]
this.sessionStore = new RedisStore;
this.use(express.session({
// Private crypting key
secret: 'keyboard cat', // I'm worried about this for session sharing
store: this.sessionStore,
cookie: {
maxAge: 1800000
}
}))
[...]
On the Django side, I'd think of using the django-redis-session app.
So, is this a good idea? Won't there be any problem? Especially about the secret key, I'm not sure they will both share the same sessions.
You will have to write a custom session store for either Express or Django. Django, by default (as well as in django-redis-sessions) stores sessions as pickled Python objects. Express stores sessions as JSON strings. Express, with connect-redis, stores sessions under the key sess:sessionId in redis, while Django (not totally sure about this) seems to store them under the key sessionId. You might be able to use django-redis-sessions as a base, and override encode, decode, _get_session_key, _set_session_key and perhaps a few others. You would also have to make sure that cookies are stored and encrypted in the same way.
Obviously, it will be way harder to create a session store for Express that can pickle and unpickle Python objects.

Categories

Resources