Where can I increase the limit of what a session can store?
The project sometimes need to pass parameters from one page to another, so I store the parameters data in the session
I choose this method because I can only come up with two methods to let another page get the first page's data; either via the query string, or by session. However, I think the query string can't store too much data, so I choose the second method, is that any other way to achieve this?
Sometime the data's length can reach 25000 items (a little more than 20k), and the website won't pass this on.
I think because the session's limit is 20k, but I don't know where to set it.
I'm using Flask with Python 3.5.
The default Session implementation in Flask stores data in a browser-side cookie. It's a base64-encoded string with an (optionally compressed) JSON string, that is cryptographically signed to prevent tampering.
How large this cookie gets depends on the nature of your data, as compression can bring down the size considerably. The limits of what you can store in a cookie are relatively low and depend on the browser, but typically is 4kb. See http://browsercookielimits.iain.guru/. Suffice it to say that you can't raise this limit.
If you need to store more data, you'll need to pick a different session implementation. Take a look at Flask-Session, which lets you tie a small identifier cookie to server-side stored data (in memcached, redis, the filesystem or a database). This will let you track much more data per browser session.
Related
Supposing I have a mobile app which will send a filled form data (which also contains images) to a Commercial software using its API, and this data should be committed all at once.
Since the mobile does not have enough memory to send all the dataset at once, I need to send it as a Multipart batch.
I use transactions in cases where I want to perform a bunch of operations on the database, but I kind of need them to be performed all at once, meaning that I don't want the database to change out from under me while I'm in the middle of making my changes. And if I'm making a bunch of changes, I don't want users to be able to read my set of documents in that partially changed state. And I certainly don't want a set of operations failing halfway through, leaving me in a weird and inconsistent state forever. It's got to be all or nothing.
I know that Firebase provides the batch write operation which does exactly what I need. However, I need to do this into a local database (like redis or postgres).
The first approach I considered is using POST requests identified by a main session_ID.
- POST /session -> returns new SESSION_ID
- POST [image1] /session/<session_id> -> returns new IMG_ID
- POST [image2] /session/<session_id> -> returns new IMG_ID
- PUT /session/<session_id> -> validate/update metadata
However it does not seem very robust to handle errors.
The second approach I was considering is combining SQLAlchemy session with Celery task using Flask or FastAPI. I am not sure if it is common to do this to solve this issue. I just found this question. I would like to know what do you guys recommend for this second case approach (sending all data parts first, and commit all at once) ?
My idea is to create a hash of a queryset result. For example, product inventory.
Each update of this stock would generate a hash.
This use would be intended to only request this queryset in the API, when there is a change (example: a new product in invetory).
Example for this use:
no change, same hash - no request to get queryset
there was change, different hash. Then a request will be made.
This would be a feature designed for those who are consuming the data and not for the Django that is serving.
Does this make any sense? I saw that in python there is a way to generate a hash from a tuple, in my case it would be to use the frozenset and generate the hash. I don't know if it's a good idea.
I would comment, but I'm waiting on the 50 rep to be able to do that. It sounds like you're trying to cache results so you aren't querying on data that hasn't been changed. If you're not familiar with caching, the idea is to save hard-to-compute answers in memory for frequently queried endpoints/functions.
For example, if I had a program that calculated the first n digits of pi, I may choose to save a map of [digit count -> value] so that if 10 people asked me for the first thousand, I would only calculate it once. Redis is a popular option for caching, and I believe it exists for Django. It allows you to cache some information, set a time before expiration on it, and then wipe specific parts of that information (to force it to recalculate) every time something specific changes (like a new product in inventory).
Everybody should try writing their own cache at least once, like what you're describing, but the de facto professional option is to use a caching library. Your idea is good, it will definitely work, and you will probably want a dict of [hash->result] for each hash, where result is the information you would send back over your API. If you plan to save data so it persists across multiple program starts, remember Python forces random seeds for hashes, causing inconsistent values. Check out this post for more info.
After reading about how to ensure that "remember me" tokens are kept secure and reading the source code for psecio's Gatekeeper PHP library, I've come up with the following strategy for keeping things secure, and I wanted to find out if this is going to go horribly wrong. I'm basically doing the following things:
When a user logs in, generate a cryptographically-secure string using the system's random number generator. (random.SystemRandom() in Python) This is generated by picking random characters from the selection of all lower and uppercase ASCII letters and digits. (''.join(_random_gen.choice(_random_chars) for i in range(length)), as per how Django does the same. _random_gen is the secure random number generator)
The generated token is inserted into a RethinkDB database along with the userid it goes along with and an expiration time 1 minute into the future. A cookie value is then created by using the unique ID that RethinkDB generates to identify that entry and the sha256-hashed token from before. Basically: ':'.join(unique_id, sha256_crypt.encrypt(token)). sha256_crypt is from Python's passlib library.
When a user accesses a page that would require them to be logged in, the actual cookie value is retrieved from the database using the ID that was stored. The hashed cookie is then verified against the actual cookie using sha256_crypt.verify.
If the verification passes and the time value previously stored is less than the current time, then the previous entry in the database is removed and a new ID/token pair is generated to be stored as a cookie.
Is this a good strategy, or is there an obvious flaw that I'm not seeing?
EDIT: After re-reading some Stack Overflow posts that I linked in a comment, I have changed the process above so that the database stores the hashed token, and the actual token is sent back as a cookie. (which will only happen over https, of course)
You should make sure you generate enough characters in your secure string. I would aim for 64 bits of entropy, which means you need at least 11 characters in your string to prevent any type of practical brute force.
This is as per OWASP's recommendation for Session Identifiers:
With a very large web site, an attacker might try 10,000 guesses per
second with 100,000 valid session identifiers available to be guessed.
Given these assumptions, the expected time for an attacker to
successfully guess a valid session identifier is greater than 292
years.
Given 292 years, generating a new one every minute seems a little excessive. Maybe you could change this to refresh it once per day.
I would also add a system wide salt to your hashed, stored value (known as a pepper). This will prevent any precomputed rainbow tables from extracting the original session value if an attacker manages to gain access to your session table. Create a 16 bit cryptographically secure random value to use as your pepper.
Apart from this, I don't see any inherent problems with what you've described. The usual advice applies though: Also use HSTS, TLS/SSL and Secure cookie flags.
Summary: is there a race condition in Django sessions, and how do I prevent it?
I have an interesting problem with Django sessions which I think involves a race condition due to simultaneous requests by the same user.
It has occured in a script for uploading several files at the same time, being tested on localhost. I think this makes simultaneous requests from the same user quite likely (low response times due to localhost, long requests due to file uploads). It's still possible for normal requests outside localhost though, just less likely.
I am sending several (file post) requests that I think do this:
Django automatically retrieves the user's session*
Unrelated code that takes some time
Get request.session['files'] (a dictionary)
Append data about the current file to the dictionary
Store the dictionary in request.session['files'] again
Check that it has indeed been stored
More unrelated code that takes time
Django automatically stores the user's session
Here the check at 6. will indicate that the information has indeed been stored in the session. However, future requests indicate that sometimes it has, sometimes it has not.
What I think is happening is that two of these requests (A and B) happen simultaneously. Request A retrieves request.session['files'] first, then B does the same, changes it and stores it. When A finally finishes, it overwrites the session changes by B.
Two questions:
Is this indeed what is happening? Is the django development server multithreaded? On Google I'm finding pages about making it multithreaded, suggesting that by default it is not? Otherwise, what could be the problem?
If this race condition is the problem, what would be the best way to solve it? It's an inconvenience but not a security concern, so I'd already be happy if the chance can be decreased significantly.
Retrieving the session data right before the changes and saving it right after should decrease the chance significantly I think. However I have not found a way to do this for the request.session, only working around it using django.contrib.sessions.backends.db.SessionStore. However I figure that if I change it that way, Django will just overwrite it with request.session at the end of the request.
So I need a request.session.reload() and request.session.commit(), basically.
Yes, it is possible for a request to start before another has finished. You can check this by printing something at the start and end of a view and launch a bunch of request at the same time.
Indeed the session is loaded before the view and saved after the view. You can reload the session using request.session = engine.SessionStore(session_key) and save it using request.session.save().
Reloading the session however does discard any data added to the session before that (in the view or before it). Saving before reloading would destroy the point of loading late. A better way would be to save the files to the database as a new model.
The essence of the answer is in the discussion of Thomas' answer, which was incomplete so I've posted the complete answer.
Mark just nailed it, only minor addition from me, is how to load that session:
for key in session.keys(): # if you have potential removals
del session[key]
session.update(session.load())
session.modified = False # just making it clean
First line optional, you only need it if certain values might be removed meanwhile from the session.
Last line is optional, if you update the session, then it does not really matter.
That is true. You can confirm it by having a look at the django.contrib.sessions.middleware.SessionMiddleware.
Basically, request.session is loaded before request hits your view (in process_request), and it is updated in the session backend (if needed) after the response has left your view (in process_response).
If what I mean is unclear, you might want to have a look at the django documentation for Middleware.
The best way to solve the issue will depend on what you're trying to achieve with that information. I'll update my answer if you provide that information!
I have a Python Flask app I'm writing, and I'm about to start on the backend. The main part of it involves users POSTing data to the backend, usually a small piece of data every second or so, to later be retrieved by other users. The data will always be retrieved within under an hour, and could be retrieved in as low as a minute. I need a database or storage solution that can constantly take in and store the data, purge all data that was retrieved, and also perform a purge on data that's been in storage for longer than an hour.
I do not need any relational system; JSON/key-value should be able to handle both incoming and outgoing data. And also, there will be very constant reading, writing, and deleting.
Should I go with something like MongoDB? Should I use a database system at all, and instead write to a directory full of .json files constantly, or something? (Using only files is probably a bad idea, but it's kind of the extent of what I need.)
You might look at mongoengine we use it in production with flask(there's an extension) and it has suited our needs well, there's also mongoalchemy which I haven't tried but seems to be decently popular.
The downside to using mongo is that there is no expire automatically, having said that you might take a look at using redis which has the ability to auto expire items. There are a few ORMs out there that might suit your needs.