I made a topic about the built-in python hash function: Old python hashing done left to right - why is it bad?
The previous topic was about why it was bad for encryption, because we have an application called Gruyere which is filled with security holes, and it uses the hash() to encrypt cookies.
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is salt (which is just '' by default)
I have implemented a more secure encryption method using md5 hashing with salt, but one excercise is to beat this old encryption and I still cannot understand how :-( I've read the string_hash code from python sourcecode but it's not documented and I can't figure it out.
EDIT: The idea is to write a program which can create a valid cookie any valid user, so I think I need to find out cookie_secret somehow
Zack described the answer already in your last question: It's easy to find a collision.
Let's say you save hash("pwd") in the database (that you actually do something different doesn't matter. Now, if you enter "pwd" in the site, you can enter. But how is this checked? Again, the hash of "pwd" is token, and compared to the value in the database. But what if there is a second string, say "hello", and hash("hello") == hash("pwd")? Then you could also use "hello" as password. So to beat the encryption, you don't need to find "pwd", you just need any string which has the same hash-value. You can just search for such a string brute-force (and I guess you can do some optimizations based on the knowledge of the source of hash)
Related
I want to code a custom key generator in Python. This key will be used as an input (along with the plain text) to AES algorithm for encryption (I will probably use pycrypto or m2crypto libraries for that).
But the key generator has to be custom, as it would generate the key based on the string that would be supplied by the user.
str = date + case-id + name
where:
date = current date when a case was submitted
(we work on separate security analysis cases, submitted on our ticketing tool)
name = person handling the case
case-id = the ticket id with which it was submitted.
This same key needs to be known to the decryptor (on a different system) so that it can decrypt the data.
So the key will have to be fixed for a specific set of date name and case-id for a specific order and will only be different if any of these 3 change in value or order and should not be random every time.
I've gone through some of stackoverflow articles, where it is suggested to use
random_key = os.urandom(16)
but I don't believe this will serve my purpose.
Suggestion on some articles where to start with if I want to design a key generator from scratch, or some pointers on existing libraries will be highly appreciated.
You're looking for a Password hashing algorithm, such as Argon2 or PBKDF2. It will allow you to deterministically extend the 'password' generated from the input values into a suitable key.
However, note that your passwords may still be very weak. I suspect that there is a strong correlation between case-id and date. Names are probably only a small list of people easily found out. Also, isn't this data sent along with the encrypted data by your system? This makes using it as a password a bad idea.
I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?
You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.
There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.
I found the Python package to encrypt some data and see this in python Cryptography:
It is possible to use passwords with Fernet(symmetric key). To do this, you need to run the password through a key derivation function such as PBKDF2HMAC, bcrypt or scrypt.
But, it turns out that a password works in the same way as a key(use password/key to en/decrypt). So why bother to use password instead of key itself?
I mean why not just use key itself:
from cryptography.fernet import Fernet
key = Fernet.generate_key()
token = Fernet(key).encrypt(b"my deep dark secret")
Fernet(key).decrypt(token)
A password is something that can be remembered by a person whereas a key is usually not remembered, because it is long (at least 128 bit or Hex-encoded in 32 characters) and is supposed to be really random (indistinguishable from random noise). If you want to encrypt something with a key, but this key cannot be transmitted by asymmetric cryptography and instead should be given over the phone or should never be written anywhere, then you can't simply generate a random key and use that. You need have a password/passphrase in place to derive the key from.
Example 1:
A personal password safe like KeePass needs a key for encryption/decryption. A user will not be able to simply remember that key, therefore we have a much shorter password, which can be remembered. Now, the security lies in the fact that a slow key derivation function is used to derive a key from the password, so an attacker still has trouble brute-forcing the key even though it depends on a much shorter password.
Example 2:
You compress a document and use the encryption of the compression software. Now you can send the container via e-mail, but you can't send the password along with it. So, you call the person you've sent the e-mail to and tell them the password. A password is much easier to transmit than a long and random key in this way.
I've got a passwords on a datastore that were hashed using the method SecureSocialPasswordHasher.passwordHash from the package securesocial.utils.SecureSocialPasswordHasher of SecureSocial, and I have to validate them through Python.
Therefore, the use of SecureSocial (or the whole Play Framework) is out of the question. The question is: What does it use for hashing when calling that method? From the documentation it seems it is Bcrypt, but it wasn't clear enough for me to be sure.
---------EDIT---------
I've been told on SecureSocial forums that indeed it uses Bcrypt with work factor 10 default. However it doens't reflect what I see on the datastore.
There are 2 columns there, one for salt, and another one fro the hashed password. Neither of them have the Bcrypt header (such as $2a$10$). Also, the salt size is only 11 characters long, and the hashed password is only 22 characters long (and no signs of having the salt inside the string).
Found out the default for hashing passwords on SecureSocial is indeed Bcrypt.
The default implementation for it's hash method is:
def hash(plainPassword: String): PasswordInfo = {
PasswordInfo(id, BCrypt.hashpw(plainPassword, BCrypt.gensalt(logRounds)))
}
This applies to the latest version of SecureSocial.
On my specific problem, the main issue was that I was not communicated that the code I was dealing with was using an older version of SecureSocial, and that the has method was overriden.
I have a function for encrypting with SHA-1 in Python, using hashlib. I take a file and encrypt the contents with this hash.
If I set a password for an encrypted text file can I use this password to decrypt and to restore the file with the original text?
Hashing functions are different than normal crypto algorithms. They are oftenly referred to as one-way ciphers, because the process data goes through is irreversible.
Differently than symmetric and assymetric encryption, hashes are used by asserting the hashed values themselves, instead of decrypting and asserting the plain-text values. To validate logins when you're using hashes, you'd hash the password the user just attempted to log in with and compare it with the hash you have on your DB. If they match, login is successful.
Cracking hashes involves guessing hashing various different strings and trying to match hashed values to the ones illegally obtained from a DB. There are lists available on the internet with millions of already hashed values to make hash cracking easier, those are known as Rainbow Tables and they can be easily countered with the use of Salts.
It's also worth noting that since hashing algorithms are able to digest GBs of data into significantly smaller strings, mathematically, two different values may have identical hashes. Even though this is very rare, it is an existing problem, and its known as Hash Collision.
If hashing was reversible, hard drives would be reduntant since we would be able to hash thousands of GBs into a small string of text and reverse them as we pleased. It would allow for data compression and storage in ways that violate physics.
Related Wikipedia Articles:
Hashing Algorithms: http://en.wikipedia.org/wiki/Hash_function
Rainbow Tables: http://en.wikipedia.org/wiki/Rainbow_table
Salts: http://en.wikipedia.org/wiki/Salt_(cryptography)
Collision: http://en.wikipedia.org/wiki/Collision_(computer_science)
Symmetric Encryption: http://en.wikipedia.org/wiki/Symmetric-key_algorithm
Assymetric Encryption: http://en.wikipedia.org/wiki/Public-key_cryptography
SHA-1 is not an encryption algorithm, it's a hashing algorithm. By definition, you can't "decrypt" anything that was hashed with the SHA-1 function, it doesn't have an inverse.
If you have an arbitrary hashed password, there's very little you can do to retrieve the original password - If you're lucky, the password could be in a database of reverse hashes, but that's as far as you can go.
And the message extraction algorithm expects the original password to perform the verification - the algorithm will hash the provided plain-text password and compare it against the stored hashed password, only if they're equal the plain-text message will be revealed.
Hash functions are one way tickets. You cannot use them for encryption.
Hash function algorithms are realised through modulo, xor and other familiar (one way) operations.
You may try to find what argument was used to generate hash but in theory you will never be 100% sure it is the correct value.
For example try with a really simple (useless in cryptography) hash function modulo 10. This function returns ten different values. If it's 7 you may guess the entry was 7 or 137 and 1234567. Same with md5, sha1 and better ones.
As you can see, in the case when you are using hash function that returns only 40 bytes with files that are much bigger (maybe even few hundred megabytes) there in theory exists infinite numbers of files for each possible hash.