So an emergency project was dumped on me to merge a MySQL user database into an existing Django user database.
I've figured just about everything out except how to handle the passwords as they use different hashes. I don't know Python, the Django backend, or very much about hashing techniques.
I do have a way to verify users with their emails, I just need a way to take the passwords they give me and save them into the database in a Django-acceptable way. It will be have to be done in Perl since that's the only language I know on the server.
I found this page talking about how Django handles passwords, but I sadly don't understand most of what they're saying. Also, I don't know if it's any help, but the admin area of the Django site gives the "hint" of
"Use '[algo]$[salt]$[hexdigest]'" for the password.
That doesn't mean much to me either, but maybe it does to one of you?
There are basically two ways to handle this: convert existing passwords to a format acceptable by Django, or write your own Django password hasher.
For the first way, as you found, the password field consists of three parts, each separated by a $. (Django 1.6 passwords may have 4 parts, but let's ignore that extra part for now, since Django 1.6 also supports the more traditional 3-part passwords.) The parts are
algorithm, which describes the password hashing algorithm; it will look like md5, pbkdf2, etc.
salt, the salt for the hash algorithm
hexdigest, the hashed password
So, assuming your passwords are already salted and hashed, your script needs to take the hashed/salted passwords in your existing database, separate the salt from the hash, then store them into the database with the appropriate algorithm string prefixed. There should be Perl modules for doing password hashing using various algorithms. Django's recommended algorithm is PBKDF2. bcrypt is also good. Any hash algorithm is fine, though, as far as Django is concerned, as long as it has a built-in hasher for that algorithm (Django has hashers for the most common hashing algorithms).
If your existing passwords are not salted and hashed…well, now would be a good time to do that. ;-)
The alternative way is to just copy the passwords over to the new database as-is, and write your own password hasher to handle them in your Django app. Of course, that would require writing some Python code.
Related
I have an old database where user passwords were hashed with md5 without salt. Now I am converting the project into django and need to update passwords without asking users to log in.
I wrote this hasher:
from django.contrib.auth.hashers import PBKDF2PasswordHasher
class PBKDF2WrappedMD5PasswordHasher(PBKDF2PasswordHasher):
algorithm = 'pbkdf2_wrapped_md5'
def encode_md5_hash(self, md5_hash, salt):
return super().encode(md5_hash, salt)
and converting password like:
for data in old_user_data:
hasher = PBKDF2WrappedMD5PasswordHasher()
random_salt = get_random_string(length=8)
# data['password'] is e.g. '972131D979FF69F96DDFCC7AE3769B31'
user.password = hasher.encode_md5_hash(data['password'], random_salt)
but I can't login with my test-user.
any ideas? :/
I'm afraid you cannot do what you want with this. Hashing is strictly one-way, so there is no way to convert from one hash to another. You WILL have to update these passwords to the new hash one-by-one as users log in.
A decent strategy for implementing this change is:
Mark all of your existing hashes as md5. You can just use some kind of boolean flag/column, but there is an accepted standard for this: https://passlib.readthedocs.io/en/stable/modular_crypt_format.html
When the user logs in, authenticate them by first checking which type of hash they have, and then calculating that hash. If they are still md5, calculate the md5 to log them in; if they are now using pbkdf2, calculate that hash instead.
After authenticating the password, if they are still flagged as md5, calculate the new format hash and replace it - making sure to now flag this as pbkdf2.
IMPORTANT: You will want to test this thoroughly before you release it to the wild. If you make a mistake, you might destroy the credentials of any user logging in. I would recommend temporarily retaining a copy of the old md5 hashes until you confirm production is stable, but make absolutely certain you destroy this copy completely. Your users passwords are not safe as long as the md5 hashes exist whatsoever.
I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?
You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.
There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.
I'm using WTForms-Alchemy to define forms from model objects. I defined a field as a password thus:
password = db.Column(PasswordType(schemes=['pbkdf2_sha512']), nullable=True)
I persist the form to PostgreSQL and I always end up with the wrong hash in the database. Interestingly, this method worked flawlessly on a previous project that used MySQL.
I've now decided to encrypt my passwords by hand by calling pbkdf2_sha512.encrypt and pbkdf2_sha512.verify manually and the hashes are stored correctly.
Am I missing a configuration parameter? Could this be a bug?
I'm not entirely sure what the issue here is, but I wanted to mention that using pbkdf2 is not recommended now-a-days -- if you're storing user password hashes you should preferably be storing passwords using bcrypt. bcrypt is a cpu hard hashing algorithm, which makes it much harder to brute force for potential attackers.
I am experimenting with the passlib.hash.sha256_crypt algorithm in an App Engine app and it seems rather simple to implement.
Is this secure enough with it's default parameters of autogenerated salt and 80,000 rounds? Should it first be padded with random chars?
Password is being posted in from a form and encrypted as above.
I can't judge what "secure enough" means.
Last I read the best options were scrypt, bcrypt, or pbkdf2. If you can't use those then I recommend sha512 with many thousands of iterations. One of the benefits of using passlib CryptContext is you can update your scheme later as needed (and as better implementations become available) while keeping easy compatibility with previously stored passwords. sha512_crypt is very easy to implement on GAE with passlib CryptContext.
I'm not sure padding with characters (on top of salting) adds anything.
I made a topic about the built-in python hash function: Old python hashing done left to right - why is it bad?
The previous topic was about why it was bad for encryption, because we have an application called Gruyere which is filled with security holes, and it uses the hash() to encrypt cookies.
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is salt (which is just '' by default)
I have implemented a more secure encryption method using md5 hashing with salt, but one excercise is to beat this old encryption and I still cannot understand how :-( I've read the string_hash code from python sourcecode but it's not documented and I can't figure it out.
EDIT: The idea is to write a program which can create a valid cookie any valid user, so I think I need to find out cookie_secret somehow
Zack described the answer already in your last question: It's easy to find a collision.
Let's say you save hash("pwd") in the database (that you actually do something different doesn't matter. Now, if you enter "pwd" in the site, you can enter. But how is this checked? Again, the hash of "pwd" is token, and compared to the value in the database. But what if there is a second string, say "hello", and hash("hello") == hash("pwd")? Then you could also use "hello" as password. So to beat the encryption, you don't need to find "pwd", you just need any string which has the same hash-value. You can just search for such a string brute-force (and I guess you can do some optimizations based on the knowledge of the source of hash)