Storing and Hashing Passwords flask - Python [duplicate]

Storing and Hashing Passwords flask - Python [duplicate] - python

This question already has an answer here:
werkzeug.security generate_password_hash alternative without SHA-1
(1 answer)
Closed 3 years ago.
If I'm creating a password for something that is open source (Python - Flask) and I'm hashing passwords, is it secure to just hash them like I have below? Or should I create a config file on the server that isn't in the git repo that stores a salt? Is it less safe when people can see exactly how someone is hashing a password? If someone was able to get their hands onto the database and knew the exact method that was used to hash a password, like the code below, would they be able to reverse it easily? or is there something that I can add to make that difficult?
login_details_dict['account_id'] = account.account_id
login_details_dict['account_password'] = sha256_crypt.hash(account_password)
login_details = login_schema.load(login_details_dict)
login_details.save_to_db()

No; hashing passwords using only one round of SHA-256 for any web application is not secure!
SHA-256 is a hashing algorithm primarily designed for data integrity verification. This means it was optimized for speed. Being optimized for speed makes it vulnerable to bruteforce and dictionary attacks, which consist of guessing the password many times.
Suppose that your login database was leaked, something that happens even to the largest webapps like banks or large corporations. Your password hashes would be exposed to an adversary. What would they do to get to those juicy passwords? They would constantly guess passwords until they found the right one. Bitcoin mining uses a similar mechanism of "hash guessing" for mining, and there are SHA256 ASICs that can perform terahashes per second. Would you feel comfortable with an attacker being able to guess your password trillions of times per second?
A more secure approach would be to use a modern KDF, like Scrypt or Argon2. Modern KDFs are designed to be memory heavy, which limits hashing to the speed of RAM and makes it very difficult to build efficient ASICs for. Because KDFs are slow, it is best to execute the KDF on the client side, then send the KDF hash to the server, and hash the KDF hash one last time on the server side with a fast algorithm like SHA-256. This would allow you to offload the processing to the clients without a hashes from a leaked database being usable as passwords.
Note: JavaScript key derivation can be slow. If you want client-side key derivation to be faster, you could potentially use WebAssembly to accelerate it. Try not to reduce parameters too much; it will make the algorithm easier to bruteforce.
Furthermore, storing a salt in a Git repo sounds like you're planning to have one salt for the entire web application. This is a bad idea, as it means that an attacker can use one iteration of your hashing function/KDF to guess a single password for all your database entries. It's best to generate a random salt for each password, and store it with the password in the database.
While we're here, you might want to protect against timing attacks as well. When comparing the hashes, using a timeable comparison function like a simple "==" would allow attackers to bruteforce a single character in the hash at a time to log in to the system. Using a constant-time comparison function like itsdangerous.constant_time_compare() would protect against this type of attack.
Exposing your source code to the world should not make it insecure if you are using modern security practices. Please mind the best security practices listed above along with others when making your web application.

Related

How do I store Credentials in Djnago [duplicate]

I'm working on a python/django app which, among other things, syncs data to a variety of other services, including samba shares, ssh(scp) servers, Google apps, and others. As such, it needs to store the credentials to access these services. Storing them as unencrypted fields would be, I presume, a Bad Idea, as an SQL injection attack could retrieve the credentials. So I would need to encrypt the creds before storage - are there any reliable libraries to achieve this?
Once the creds are encrypted, they would need to be decrypted before being usable. There are two use cases for my app:
One is interactive - in this case the user would provide the password to unlock the credentials.
The other is an automated sync - this is started by a cron job or similar. Where would I keep the password in order to minimise risk of exploits here?
Or is there a different approach to this problem I should be taking?

I have the same problem and have been researching this the past few days. The solution presented by #Rostislav is pretty good, but it's incomplete and a bit out dated.
On the Algorithm Layer
First, there's a new library for cryptography called, appropriately enough, Cryptography. There are a good number of reasons to use this library instead of PyCrypto, but the main ones that attracted me are:
A core goal is for you to be unable to shoot yourself in the foot. For example, it doesn't have severely outdated hash algos like MD2.
It has strong institutional support
500,000 tests with continuous integration on various platforms!
Their documentation website has a better SSL configuration (near-perfect A+ score instead of a mediocre B rating)
They have a disclosure policy for vulnerabilities.
You can read more about the reasons for creating the new library on LWN.
Second, the other answer recommends using SHA1 as the encryption key. SHA1 is dangerously weak and getting weaker. The replacement for SHA1 is SHA2, and on top of that, you should really being salting your hash and stretching it using either bcrypt or PBKDF2. Salting is important as a protection against rainbow tables and stretching is an important protection against brute forcing.
(Bcrypt is less tested, but is designed to use lots of memory and PBKDF2 is designed to be slow and is recommended by NIST. In my implementation, I use PBKDF2. If you want more on the differences, read this.)
For encryption AES in CBC mode with a 128-bit key should be used, as mentioned above – that hasn't changed, although it's now rolled up into a spec called Fernet. The initialization vector will be generated for you automatically in this library, so you can safely forget about that.
On the Key Generation and Storage Layer
The other answers are quite right to suggest that you need to carefully consider key handling and opt for something like OAuth, if you can. But assuming that's not possible (it isn't in my implementation), you have two use cases: Cron jobs and Interactive.
The cron job use case boils down to the fact that you need to keep a key somewhere safe and use it to run cron jobs. I haven't studied this, so I won't opine here. I think there are a lot of good ways to do this, but I don't know the easiest way.
For the Interactive use case, what you need to do is collect a user's password, use that to generate a key, and then use that key to decrypt the stored credentials.
Bringing it home
Here's how I would do all of the above, using the Cryptography library:
from cryptography.fernet import Fernet
from cryptography.hazmat.primitives.hashes import SHA256
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from cryptography.hazmat.backends import default_backend
secret = "Some secret"
# Generate a salt for use in the PBKDF2 hash
salt = base64.b64encode(os.urandom(12)) # Recommended method from cryptography.io
# Set up the hashing algo
kdf = PBKDF2HMAC(
algorithm=SHA256(),
length=32,
salt=str(salt),
iterations=100000, # This stretches the hash against brute forcing
backend=default_backend(), # Typically this is OpenSSL
)
# Derive a binary hash and encode it with base 64 encoding
hashed_pwd = base64.b64encode(kdf.derive(user_pwd))
# Set up AES in CBC mode using the hash as the key
f = Fernet(hashed_pwd)
encrypted_secret = f.encrypt(secret)
# Store the safe inputs in the DB, but do NOT include a hash of the
# user's password, as that is the key to the encryption! Only store
# the salt, the algo and the number of iterations.
db.store(
user='some-user',
secret=encrypted_secret,
algo='pbkdf2_sha256',
iterations='100000',
salt=salt
)
Decryption then looks like:
# Get the data back from your database
encrypted_secret, algo, iterations, salt = db.get('some-user')
# Set up the Key Derivation Formula (PBKDF2)
kdf = PBKDF2HMAC(
algorithm=SHA256(),
length=32,
salt=str(salt),
iterations=int(iterations),
backend=default_backend(),
)
# Generate the key from the user's password
key = base64.b64encode(kdf.derive(user_pwd))
# Set up the AES encryption again, using the key
f = Fernet(key)
# Decrypt the secret!
secret = f.decrypt(encrypted_secret)
print(" Your secret is: %s" % secret)
Attacks?
Let's assume your DB is leaked to the Internet. What can an attacker do? Well, the key we used for encryption took the 100,000th SHA256 hash of your user's salted password. We stored the salt and our encryption algo in your database. An attacker must therefore either:
Attempt brute force of the hash: Combine the salt with every possible password and hash it 100,000 times. Take that hash and try it as the decryption key. The attacker will have to do 100,000 hashes just to try one password. This is basically impossible.
Try every possible hash directly as the decryption key. This is basically impossible.
Try a rainbow table with pre-computed hashes? Nope, not when random salts are involved.
I think this is pretty much solid.
There is, however, one other thing to think about. PBKDF2 is designed to be slow. It requires a lot of CPU time. This means that you are opening yourself up to DDOS attacks if there's a way for users to generate PBKDF2 hashes. Be prepared for this.
Postscript
All of this said, I think there are libraries that will do some of this for you. Google around for things like django encrypted field. I can't make any promises about those implementations, but perhaps you'll learn something about how others have done this.

First storing on a server credentials enough to login to a multitude of systems looks like a nightmare. Compromising code on your server will leak them all whatever the encryption.
You should store only the credentials that would be necessary to perform your task (i.e. files sync). For servers you should consider using synchronization server like RSync, for Google the protocols like OAuth etc. This way if your server is compromised this will only leak the data not the access to systems.
Next thing is encrypting these credentials. For cryptography I advise you to use PYCrypto.
For all random numbers you would use in your cryptography generate them by Crypto.Random (or some other strong method) to be sure they are strong enough.
You should not encrypt different credentials with the same key. The method I would recommend is this:
Your server should have it's master secret M (derived from /dev/random). Store it in the file owned by root and readable by root only.
When your server starts with root privileges it reads the file into memory and before serving clients drops it's privileges. That's normal practice for web servers and other demons.
When you are to write a new credential (or update existing one) generate a random block S. Take the first half and calculate hash K=H(S1,M). That would be your encryption key.
Use CBC mode to encrypt your data. Take the initialization vector (IV) from S2.
Store S alongside with encrypted data.
When you need to decrypt just take out S create the K and decrypt with the same IV.
For hash I would advise SHA1, for encryption — AES. Hashes and symmetric cyphers are fast enough so going for larger key sizes wouldn't hurt.
This scheme is a bit overshot in some places but again this wouldn't hurt.
But remember again, best way to store credentials is not to store credentials, and when you have to, use the least privileged ones that will allow you to accomplish the task.

Maybe you can rely on a multi-user scheme, by creating :
A user running Django (e.g. django) who does not have the permission to access the credentials
A user having those permissions (e.g. sync).
Both of them can be in the django group, to allow them to access the app. After that, make a script (a Django command, such as manage.py sync-external, for instance) that syncs what you want.
That way, the django user will have access to the app and the sync script, but not the credentials, because only the sync user does. If anyone tries to run that script without the credentials, it will of course result in an error.
Relying on Linux permission model is in my opinion a "Good Idea", but I'm not a security expert, so bear that in mind. If anyone has anything to say about what's above, don't hesitate!

should I store all encrypted password to database using generate_password_hash()

When I use generate_password_hash() function, I get a encrypted password string which contains a random salt.
>>> from werkzeug.security import generate_password_hash, check_password_hash
>>> generate_password_hash('password')
>>> 'pbkdf2:sha1:1000$3j8Brovx$9acddcd67da9e4c913817231c882a0f757e2d095'
If I store this string to database, someone else hacked into my database and get this string, it's easy to get the origin password using brute force cracking becasue the encrypted password contains the salt.
check_password_hash('pbkdf2:sha1:1000$9HycZ0Qa$94f08a91fba1c040c5bffb6c7e1ab5a6ad4818de', 'password')
Should I encrypt the origin password using my own salt first before using generate_password_hash() or is there a better solution?
Thanks.

it's easy to get the origin password using brute force cracking because the encrypted password contains the salt.
No, it's "easy" to brute force, because you're having a low iteration count of 1000.
Should I encrypt the origin password using my own salt first before using generate_password_hash() or is there a better solution?
No, encryption is reversible and since a lost database also means that the encryption key is probably lost too, this would mean that the additional encryption is useless.
An easy fix would be to increase the number of iterations to a million or 10 million depending on what you can afford on your server that your users don't run away because of a slow authentication procedure.
generate_password_hash('password', method='pbkdf2:sha256:1000000')
The problem with PBKDF2 is that it can be easily parallelized, because it doesn't need much memory. There are alternatives such as scrypt and Argon2 which can be configured to require much memory. Memory is currently the main limitation of dedicated password brute forcing machines based on ASICs.
Ultimately, nothing you do, will lead to a secure authentication system if your users are using "password1" as their password. You should require your users to use complicated passwords with at least 12 characters including uppercase letters, lowercase letter and numbers (optionally including special characters). Those should also not be part of a dictionary.
See more: How to securely hash passwords?

When you store password hashes, the main assumption is that it is too difficult to retrieve the password using brute force. If you want it to be safer, go for slower hash algorithims and longer passwords.
Encryption is worse than a hash because hash is irreversible and brute force is the only way to retrieve the password. With encryption, brute force is just one of the options.
Once that is clear, you have the option to have a "secret" salt in the code or salt can be saved with the hash. Saving the salt with the password is safer! Why? Because you have a different salt for each password, so the intruder has to brute force each password separately. If you have one global salt value, brute force can be done for all passwords in the datbase in one go.

How to encrypt and decrypt passwords for selenium testing?

The context is testing of a web app with selenium while using a number of virtual user accounts we created for this very purpose. And so the testing process needs to access our sites and log-on with the virtual user's id and password.
None of these accounts are critical and they are flagged as testing accounts so no damage can be done. Still, it would probably be a good idea to encrypt the passwords and decrypt them prior to use.
If it matter, our test app is written in Python, Django and uses PostgreSQL for the database. It runs on a small Linode instance.
What might best practices be for something like this?
EDIT 1
The other thought I had was to store the credentials on a second machine and access them through and API while only allowing that access to happen from a known server's non-public IP. In other words, get two instances at Linode and create a private machine-to-machine connection within the data center.
In this scenario, access to the first machine would allow someone to potentially make requests to the second machine if they are able to de-obfuscate the API code. If someone really wants the data they can certainly get it.
We could add two factor authentication as a way to gate the tests. In other words, even if you had our unencrypted test_users table you couldn't do anything with them because of the 2FA mechanism in place just for these users.
Being that this is for testing purposes only I am starting to think the best solution might very well be to populate the test_users table with valid passwords only while running a test. We could keep the data safe elsewhere and have a script that uploads the data to the test server when we want to run a test suite. Someone with access to this table could not do thing with it because all the passwords would be invalid. In fact, we could probably use this fact to detect such a breach.
I just hate the idea of storing unencrypted passwords even if it is for test users that can't really do any damage to the actual app (their transactions being virtual).
EDIT 2
An improvement to that would be to go ahead and encrypt the data and keep it in the test server. However, every time the tests are run the system would reach out to us for the crypto key. And, perhaps, after the test is run the data is re-encrypted with a new key. A little convoluted but it would allow for encrypted passwords (and even user id's, just to make it harder) on the test server. The all-important key would be nowhere near the server and it would self-destruct after each use.

What is generally done in a case like this is to put the password through a cryptographic hash function, and store the hashed password.
To verify a login, hash the provided password and compare the calculated hash to the stored version.
The idea behind this is that it is considered impossible to reverse a good cryptographic hash function. So it doesn't matter if an attacker could read the hashed passwords.
Example in Python3:
In [1]: import hashlib
In [2]: hashlib.sha256('This is a test'.encode('utf8')).hexdigest()
Out[2]: 'c7be1ed902fb8dd4d48997c6452f5d7e509fbcdbe2808b16bcf4edce4c07d14e'
In [3]: hashlib.sha256('This is a tist'.encode('utf8')).hexdigest()
Out[3]: 'f80b4162fc28f1f67d1a566da60c6c5c165838a209e89f590986333d62162cba'
In [4]: hashlib.sha256('This is a tst.'.encode('utf8')).hexdigest()
Out[4]: '1133d07c24ef5f46196ff70026b68c4fa703d25a9f12405ff5384044db4e2adf'
(for Python2, just leave out the encode.)
As you can see, even one-letter changes lead to a big change in the hash value.

SHA 512 Password with webapp2 and App Engine?

If you are using webapp2 with Google App Engine you can see there is only one way to create an user with the "create_user" method [auth/models.py line:364]
But that method call to "security.generate_password_hash" method where in not possible use SHA 512
Q1: I would like to know what is the best way to create a SHA 512 Password with webapp2 and App Engine Python?
Q2: Is good idea use SHA 512 instead of encryption offered by webapp2 (SHA1), or it's enough?

As you observe, the default User model doesn't provide any way to customize the hash function being used. You could subclass it and redefine the problematic methods to take a hash parameter, or file a feature request with the webapp2 project.
Webapp2's password hashing has much bigger issues, though, as it doesn't do password stretching. While it optionally(!) salts the hash, it doesn't iterate it, making brute force attacks more practical than they should be for an attacker. It should implement a proper password primitive such as PBKDF2, SCrypt, or BCrypt.
To answer your question about relative strengths of hash functions, while SHA1 is showing some weakness, nobody has successfully generated a collision, much less a preimage. Further, the HMAC construction can result in secure HMACs even with a hash function that's weak against collision attacks; arguably even MD5 would work here.
Of course, attacks only ever get better, never worse, so it's a good idea to prepare for the future. If you're concerned about security, though, you should be much more concerned about the lack of stretching than the choice of hash function. And if you're really concerned about security, you shouldn't be doing authentication yourself - you should be using the Users API or OAuth, so someone else can have the job of securely storing passwords.

Using my own gensalt() - Is it safe enough?

Since the mechanism of bcrypt is:
>>> myhash = bcrypt.hashpw('testpassword', bcrypt.gensalt(12))
>>> myhash
'$2a$12$K1hnCm5z74QtXaynv4.S8.i1FK9xjRr7JSPCRCyB9zpv8xZznZGFi'
>>> bcrypt.hashpw('testpassword', myhash)
'$2a$12$K1hnCm5z74QtXaynv4.S8.i1FK9xjRr7JSPCRCyB9zpv8xZznZGFi'
I want to use it for auth. The problem is that I want to make it from the client, so I need the salt part in the client.
I thought, if I use my own gensalt(username) — which generates a salt from a user name — it could be good for the client to always use the same salt, different from other users.
Is that a good approximation to bcrypt and for my project, or am I breaking the security in bcrypt mechanism?
I’m thinking that if someone wants to decrypt the password, it can’t be possible using rainbow tables because (s)he must use one for each user. I’m not experienced enough in security issues to know if that would be good. Maybe the hashpw is fast enough to do brute force on a PC.

The short answer is: No, what you are describing isn't secure at all.
First of all, bcrypt is not an encryption function and there for its results of this function cannot be "decrypted". bcrypt is a message digest function built using blowfish. Hashes produced by a message digest function are cracked.
It is very problematic for a client to authenticate using a message digest function. Microsoft's NTLM uses a message digest function for authentication and it has been broken many times. I think that this approach to authentication is flawed and should be avoided.
The reason why message digest functions are used is as a defense in depth measure in security in layers. If an attacker is able to find a sql injection vulnerability you want to force them to spend resources to break the hash before they can login. If i can pull the hash out of the database, and use this to login, then your system is totally worthless. Replay attacks are a huge concern when a client authenticates with a hash. If I can sniff the network and replay the login sequence, then this system is totally worthless.
Generate a random salt, bcrypt.gensalt(12) is probably fine. Store the hash and the salt in your database. You must authenticate using a secure transport layer. Make sure you read owasp a9.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.