Secure cookie strategy

Secure cookie strategy - python

After reading about how to ensure that "remember me" tokens are kept secure and reading the source code for psecio's Gatekeeper PHP library, I've come up with the following strategy for keeping things secure, and I wanted to find out if this is going to go horribly wrong. I'm basically doing the following things:
When a user logs in, generate a cryptographically-secure string using the system's random number generator. (random.SystemRandom() in Python) This is generated by picking random characters from the selection of all lower and uppercase ASCII letters and digits. (''.join(_random_gen.choice(_random_chars) for i in range(length)), as per how Django does the same. _random_gen is the secure random number generator)
The generated token is inserted into a RethinkDB database along with the userid it goes along with and an expiration time 1 minute into the future. A cookie value is then created by using the unique ID that RethinkDB generates to identify that entry and the sha256-hashed token from before. Basically: ':'.join(unique_id, sha256_crypt.encrypt(token)). sha256_crypt is from Python's passlib library.
When a user accesses a page that would require them to be logged in, the actual cookie value is retrieved from the database using the ID that was stored. The hashed cookie is then verified against the actual cookie using sha256_crypt.verify.
If the verification passes and the time value previously stored is less than the current time, then the previous entry in the database is removed and a new ID/token pair is generated to be stored as a cookie.
Is this a good strategy, or is there an obvious flaw that I'm not seeing?
EDIT: After re-reading some Stack Overflow posts that I linked in a comment, I have changed the process above so that the database stores the hashed token, and the actual token is sent back as a cookie. (which will only happen over https, of course)

You should make sure you generate enough characters in your secure string. I would aim for 64 bits of entropy, which means you need at least 11 characters in your string to prevent any type of practical brute force.
This is as per OWASP's recommendation for Session Identifiers:
With a very large web site, an attacker might try 10,000 guesses per
second with 100,000 valid session identifiers available to be guessed.
Given these assumptions, the expected time for an attacker to
successfully guess a valid session identifier is greater than 292
years.
Given 292 years, generating a new one every minute seems a little excessive. Maybe you could change this to refresh it once per day.
I would also add a system wide salt to your hashed, stored value (known as a pepper). This will prevent any precomputed rainbow tables from extracting the original session value if an attacker manages to gain access to your session table. Create a 16 bit cryptographically secure random value to use as your pepper.
Apart from this, I don't see any inherent problems with what you've described. The usual advice applies though: Also use HSTS, TLS/SSL and Secure cookie flags.

Related

Is it possible to generate hash from a queryset?

My idea is to create a hash of a queryset result. For example, product inventory.
Each update of this stock would generate a hash.
This use would be intended to only request this queryset in the API, when there is a change (example: a new product in invetory).
Example for this use:
no change, same hash - no request to get queryset
there was change, different hash. Then a request will be made.
This would be a feature designed for those who are consuming the data and not for the Django that is serving.
Does this make any sense? I saw that in python there is a way to generate a hash from a tuple, in my case it would be to use the frozenset and generate the hash. I don't know if it's a good idea.

I would comment, but I'm waiting on the 50 rep to be able to do that. It sounds like you're trying to cache results so you aren't querying on data that hasn't been changed. If you're not familiar with caching, the idea is to save hard-to-compute answers in memory for frequently queried endpoints/functions.
For example, if I had a program that calculated the first n digits of pi, I may choose to save a map of [digit count -> value] so that if 10 people asked me for the first thousand, I would only calculate it once. Redis is a popular option for caching, and I believe it exists for Django. It allows you to cache some information, set a time before expiration on it, and then wipe specific parts of that information (to force it to recalculate) every time something specific changes (like a new product in inventory).
Everybody should try writing their own cache at least once, like what you're describing, but the de facto professional option is to use a caching library. Your idea is good, it will definitely work, and you will probably want a dict of [hash->result] for each hash, where result is the information you would send back over your API. If you plan to save data so it persists across multiple program starts, remember Python forces random seeds for hashes, causing inconsistent values. Check out this post for more info.

How to set session's limit in flask?

Where can I increase the limit of what a session can store?
The project sometimes need to pass parameters from one page to another, so I store the parameters data in the session
I choose this method because I can only come up with two methods to let another page get the first page's data; either via the query string, or by session. However, I think the query string can't store too much data, so I choose the second method, is that any other way to achieve this?
Sometime the data's length can reach 25000 items (a little more than 20k), and the website won't pass this on.
I think because the session's limit is 20k, but I don't know where to set it.
I'm using Flask with Python 3.5.

The default Session implementation in Flask stores data in a browser-side cookie. It's a base64-encoded string with an (optionally compressed) JSON string, that is cryptographically signed to prevent tampering.
How large this cookie gets depends on the nature of your data, as compression can bring down the size considerably. The limits of what you can store in a cookie are relatively low and depend on the browser, but typically is 4kb. See http://browsercookielimits.iain.guru/. Suffice it to say that you can't raise this limit.
If you need to store more data, you'll need to pick a different session implementation. Take a look at Flask-Session, which lets you tie a small identifier cookie to server-side stored data (in memcached, redis, the filesystem or a database). This will let you track much more data per browser session.

Why is there a difference in format of encrypted passwords in Django

I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?

You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.

There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.

Create a receipt for a user form submission

There is a requirement that our users should complete and submit a form once a month. So, each month we should have a form that will contain data for the triplet (username, month, year). I want our users to be able to certify that they did actually submit the form for that particular month by creating a receipt for them. So, for each month there will be a report containing the data the user submitted along with the receipt. I don't want the users to be able to create that receipt by themselves though.
What I was thinking was to create a string that contained username, month, year, secret_word and give the md5 hash of that string to the users as their receipt. That way because the users won't have the secret word they won't be able to generate the md5 hash. However my users will probably complain when they see the complexity of that md5 hash. Also if the find out the secret word they will be able to create receipts for everybody.
Is there a standard way of doing what I ask ? Could you recommend me any other possible solutions ?
I am using Python but some pseudocode or link to the appropriate methods would be ok.

#Serafeim, your approach is very good for the situation. Here are some ideas of extending it:
Make sure that the secret_word (in hashing terms it is called salt) is long enough.
Make the end function a bit more complex, e.g.
hash = h(h(username) + month + year + h(salt))
Use a bit more complex hash function, e.g. SHA1
Don't give the end user the whole hash value. E.g. md5 hex digest contains 32 digits, but it would be enough to have first 5-10 digits of the hash in the report.
Updated:
In case you have resources, generate a random salt per user. Then even if somehow a user will learn the salt and the hash function, it will be still useless for the others.

Safe using user input as key_name?

I would like to use a string that was input by the user in a web form as part of a key name:
user_input = self.request.POST.get('foo')
if user_input:
foo = db.get_or_insert(db.Key('Foo', user_input[:100], parent=my_parent))
Is this safe? Or should I do some inexpensive encoding or hash? If yes, which one?

It's safe as long as you don't care about a malicious user filling up your database with junk. get_or_insert won't let them overwrite existing entries, just add new ones.
Make sure you limit it's length (both in the UI and after it's been recieved), even if you do no other validation on it, so at least they can't just give you crazy big keys either to fill up the database quickly or to crash your app.
Edit: You just commented that you do, in fact, verify that it's a reasonable key. In that case, yes, it's safe.
Keep in mind that the user can probably still figure out what key are already in your database, based on how long it takes you to respond to what they've provided, and you still need to make sure they're authorized to see whatever content they request, or limit them to a small number of requests to they can't just brute-force retrieve all the information linked to the keys you're generating.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.