I have a function for encrypting with SHA-1 in Python, using hashlib. I take a file and encrypt the contents with this hash.
If I set a password for an encrypted text file can I use this password to decrypt and to restore the file with the original text?
Hashing functions are different than normal crypto algorithms. They are oftenly referred to as one-way ciphers, because the process data goes through is irreversible.
Differently than symmetric and assymetric encryption, hashes are used by asserting the hashed values themselves, instead of decrypting and asserting the plain-text values. To validate logins when you're using hashes, you'd hash the password the user just attempted to log in with and compare it with the hash you have on your DB. If they match, login is successful.
Cracking hashes involves guessing hashing various different strings and trying to match hashed values to the ones illegally obtained from a DB. There are lists available on the internet with millions of already hashed values to make hash cracking easier, those are known as Rainbow Tables and they can be easily countered with the use of Salts.
It's also worth noting that since hashing algorithms are able to digest GBs of data into significantly smaller strings, mathematically, two different values may have identical hashes. Even though this is very rare, it is an existing problem, and its known as Hash Collision.
If hashing was reversible, hard drives would be reduntant since we would be able to hash thousands of GBs into a small string of text and reverse them as we pleased. It would allow for data compression and storage in ways that violate physics.
Related Wikipedia Articles:
Hashing Algorithms: http://en.wikipedia.org/wiki/Hash_function
Rainbow Tables: http://en.wikipedia.org/wiki/Rainbow_table
Salts: http://en.wikipedia.org/wiki/Salt_(cryptography)
Collision: http://en.wikipedia.org/wiki/Collision_(computer_science)
Symmetric Encryption: http://en.wikipedia.org/wiki/Symmetric-key_algorithm
Assymetric Encryption: http://en.wikipedia.org/wiki/Public-key_cryptography
SHA-1 is not an encryption algorithm, it's a hashing algorithm. By definition, you can't "decrypt" anything that was hashed with the SHA-1 function, it doesn't have an inverse.
If you have an arbitrary hashed password, there's very little you can do to retrieve the original password - If you're lucky, the password could be in a database of reverse hashes, but that's as far as you can go.
And the message extraction algorithm expects the original password to perform the verification - the algorithm will hash the provided plain-text password and compare it against the stored hashed password, only if they're equal the plain-text message will be revealed.
Hash functions are one way tickets. You cannot use them for encryption.
Hash function algorithms are realised through modulo, xor and other familiar (one way) operations.
You may try to find what argument was used to generate hash but in theory you will never be 100% sure it is the correct value.
For example try with a really simple (useless in cryptography) hash function modulo 10. This function returns ten different values. If it's 7 you may guess the entry was 7 or 137 and 1234567. Same with md5, sha1 and better ones.
As you can see, in the case when you are using hash function that returns only 40 bytes with files that are much bigger (maybe even few hundred megabytes) there in theory exists infinite numbers of files for each possible hash.
Related
I am looking at different alternatives to hash passwords in a Python app. First I was settling for Flask-bcrypt (https://github.com/maxcountryman/flask-bcrypt), but then decided to use Argon2. The most popular Argon2 bindings for Python is argon2-cffi (https://github.com/hynek/argon2-cffi).
According to its' docs (https://argon2-cffi.readthedocs.io/en/stable/api.html), all I need to do is use 3 methods:
hash to hash a password
verify to compare a password to a hash
check_needs_rehash to see if a password should be rehashed after a change in the hashing parameters
Two things puzzle me.
1) The salt is random, using os.urandom. I thus wonder if the verify method is somehow able to extract the salt from the hash? Or in other words, since I have no say in what the salt is and cannot save it, how can the verify method actually ever compare any password to a password that was hashed with a random salt? Am I supposed to somehow parse the salt from the return value of hash myself, and store it separately from the hashed value? Or is the hash supposed to be stored as is in the docs, untouched, and somehow Argon2 is capable of verifying a password against it? And if indeed Argon2 can extract the salt out of the hash, how is using a salt any safer in that case since a hostile entity who gets a hashed password should then also be able to extract the salt?
2) By default I do not supply any secret to the hash method and instead the password itself seems to be used as a secret. Is this secure? What are the downsides for me not supplying a secret to the hashing method?
1) The salt is random, using os.urandom. I thus wonder if the verify method is somehow able to extract the salt from the hash?
The hash method returns a string that encodes the salt, the parameters, and the password hash itself, as shown in the documentation:
>>> from argon2 import PasswordHasher
>>> ph = PasswordHasher()
>>> hash = ph.hash("s3kr3tp4ssw0rd")
>>> hash
'$argon2id$v=19$m=102400,t=2,p=8$tSm+JOWigOgPZx/g44K5fQ$WDyus6py50bVFIPkjA28lQ'
>>> ph.verify(hash, "s3kr3tp4ssw0rd")
True
The format is summarized in the Argon2 reference implementation; perhaps there are other references. In this case:
$argon2id$...
The hash is Argon2id, which is the specific Argon2 variant that everyone should use (combining the side channel resistance of Argon2i with the more difficult-to-crack Argon2d).
...$v=19$...
The version of the hash is 0x13 (19 decimal), meaning Argon2 v1.3, the version adopted by the Password Hashing Competition.
...$m=102400,t=2,p=8$...
The memory use is 100 MB (102400 KB), the time is 2 iterations, and the parallelism is 8 ways.
...$tSm+JOWigOgPZx/g44K5fQ$...
The salt is tSm+JOWigOgPZx/g44K5fQ (base64), or b5 29 be 24 e5 a2 80 e8 0f 67 1f e0 e3 82 b9 7d (hexadecimal).
...$WDyus6py50bVFIPkjA28lQ
The password hash itself is WDyus6py50bVFIPkjA28lQ (base64), or 58 3c ae b3 aa 72 e7 46 d5 14 83 e4 8c 0d bc 95 (hexadecimal).
The verify method takes this string and a candidate password, recomputes the password hash with all the encoded parameters, and compares it to the encoded password hash.
And if indeed Argon2 can extract the salt out of the hash, how is using a salt any safer in that case since a hostile entity who gets a hashed password should then also be able to extract the salt?
The purpose of the salt is to mitigate the batch advantage of multi-target attacks by simply being different for each user.
If everyone used the same salt, then an adversary trying to find the first of $n$ passwords given hashes would need to spend only about $1/n$ the cost that an adversary trying to find a single specific password given its hash would have to spend. Alternatively, an adversary could accelerate breaking individual passwords by doing an expensive precomputation (rainbow tables).
But if everyone uses a different salt, then that batch advantage or precomputation advantage goes away.
Choosing the salt uniformly at random among 32-byte strings is just an easy way to guarantee every user has a distinct salt. In principle, one could imagine an authority handing out everyone in the world a consecutive number to use as their Argon2 salt, but that system doesn't scale very well—I don't just mean that your application could use the counting authority, but every application in the world would have to use the same counting authority, and I think the Count is too busy at Sesame Street to take on that job.
2) By default I do not supply any secret to the hash method and instead the password itself seems to be used as a secret. Is this secure? What are the downsides for me not supplying a secret to the hashing method?
Generally the password is the secret: if someone knows the password then they're supposed to be able to log in; if they don't know the password, they're supposed to be shown the door!
That said, Argon2 also supports a secret key, which is separate from the salt and separate from the password.
If there is a meaningful security boundary between your password database and your application so that it's plausible an adversary might compromise one but not the other, then the application can pick a uniform random 32-byte string as a secret key, and use that with Argon2 so that the password hash is a secret function of the secret password.
That way, an adversary who dumps the password database but not the application's secret key won't even be able to test a guess for a password because they don't know the secret key needed to compute a password's hash.
The output of hash is actually an encoding of the hash, hash parameters, and salt. You don't need to do anything special with it, just store it normally.
Argon2 is a password hashing algorithm. It doesn't (usually) require any secret. This is secure by design. It's possible to use it with a secret value in addition to the password, which should almost never add any security. It's also possible to use it as a key derivation function, which is almost always wasteful. Neither of these things would reduce security, but they're unnecessary so don't bother.
A little late, but pyargon2 is a valid alternative to overcome this. First to install the repo:
pip install pyargon2
then use:
from pyargon2 import hash
password = 'a strong password'
salt = 'a unique salt'
hex_encoded_hash = hash(password, salt)
More information:
https://github.com/ultrahorizon/pyargon2
Credit: https://github.com/jwsi
I've got a passwords on a datastore that were hashed using the method SecureSocialPasswordHasher.passwordHash from the package securesocial.utils.SecureSocialPasswordHasher of SecureSocial, and I have to validate them through Python.
Therefore, the use of SecureSocial (or the whole Play Framework) is out of the question. The question is: What does it use for hashing when calling that method? From the documentation it seems it is Bcrypt, but it wasn't clear enough for me to be sure.
---------EDIT---------
I've been told on SecureSocial forums that indeed it uses Bcrypt with work factor 10 default. However it doens't reflect what I see on the datastore.
There are 2 columns there, one for salt, and another one fro the hashed password. Neither of them have the Bcrypt header (such as $2a$10$). Also, the salt size is only 11 characters long, and the hashed password is only 22 characters long (and no signs of having the salt inside the string).
Found out the default for hashing passwords on SecureSocial is indeed Bcrypt.
The default implementation for it's hash method is:
def hash(plainPassword: String): PasswordInfo = {
PasswordInfo(id, BCrypt.hashpw(plainPassword, BCrypt.gensalt(logRounds)))
}
This applies to the latest version of SecureSocial.
On my specific problem, the main issue was that I was not communicated that the code I was dealing with was using an older version of SecureSocial, and that the has method was overriden.
So an emergency project was dumped on me to merge a MySQL user database into an existing Django user database.
I've figured just about everything out except how to handle the passwords as they use different hashes. I don't know Python, the Django backend, or very much about hashing techniques.
I do have a way to verify users with their emails, I just need a way to take the passwords they give me and save them into the database in a Django-acceptable way. It will be have to be done in Perl since that's the only language I know on the server.
I found this page talking about how Django handles passwords, but I sadly don't understand most of what they're saying. Also, I don't know if it's any help, but the admin area of the Django site gives the "hint" of
"Use '[algo]$[salt]$[hexdigest]'" for the password.
That doesn't mean much to me either, but maybe it does to one of you?
There are basically two ways to handle this: convert existing passwords to a format acceptable by Django, or write your own Django password hasher.
For the first way, as you found, the password field consists of three parts, each separated by a $. (Django 1.6 passwords may have 4 parts, but let's ignore that extra part for now, since Django 1.6 also supports the more traditional 3-part passwords.) The parts are
algorithm, which describes the password hashing algorithm; it will look like md5, pbkdf2, etc.
salt, the salt for the hash algorithm
hexdigest, the hashed password
So, assuming your passwords are already salted and hashed, your script needs to take the hashed/salted passwords in your existing database, separate the salt from the hash, then store them into the database with the appropriate algorithm string prefixed. There should be Perl modules for doing password hashing using various algorithms. Django's recommended algorithm is PBKDF2. bcrypt is also good. Any hash algorithm is fine, though, as far as Django is concerned, as long as it has a built-in hasher for that algorithm (Django has hashers for the most common hashing algorithms).
If your existing passwords are not salted and hashed…well, now would be a good time to do that. ;-)
The alternative way is to just copy the passwords over to the new database as-is, and write your own password hasher to handle them in your Django app. Of course, that would require writing some Python code.
I am experimenting with the passlib.hash.sha256_crypt algorithm in an App Engine app and it seems rather simple to implement.
Is this secure enough with it's default parameters of autogenerated salt and 80,000 rounds? Should it first be padded with random chars?
Password is being posted in from a form and encrypted as above.
I can't judge what "secure enough" means.
Last I read the best options were scrypt, bcrypt, or pbkdf2. If you can't use those then I recommend sha512 with many thousands of iterations. One of the benefits of using passlib CryptContext is you can update your scheme later as needed (and as better implementations become available) while keeping easy compatibility with previously stored passwords. sha512_crypt is very easy to implement on GAE with passlib CryptContext.
I'm not sure padding with characters (on top of salting) adds anything.
I made a topic about the built-in python hash function: Old python hashing done left to right - why is it bad?
The previous topic was about why it was bad for encryption, because we have an application called Gruyere which is filled with security holes, and it uses the hash() to encrypt cookies.
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is salt (which is just '' by default)
I have implemented a more secure encryption method using md5 hashing with salt, but one excercise is to beat this old encryption and I still cannot understand how :-( I've read the string_hash code from python sourcecode but it's not documented and I can't figure it out.
EDIT: The idea is to write a program which can create a valid cookie any valid user, so I think I need to find out cookie_secret somehow
Zack described the answer already in your last question: It's easy to find a collision.
Let's say you save hash("pwd") in the database (that you actually do something different doesn't matter. Now, if you enter "pwd" in the site, you can enter. But how is this checked? Again, the hash of "pwd" is token, and compared to the value in the database. But what if there is a second string, say "hello", and hash("hello") == hash("pwd")? Then you could also use "hello" as password. So to beat the encryption, you don't need to find "pwd", you just need any string which has the same hash-value. You can just search for such a string brute-force (and I guess you can do some optimizations based on the knowledge of the source of hash)