Old python hashing done left to right - why is it bad?

Old python hashing done left to right - why is it bad? - python

I am trying to learn how to defend against security attacks on websites. The link below shows a good tutorial, but I am puzzled by one statement:
In http://google-gruyere.appspot.com/part3#3__client_state_manipulation , under "Cookie manipulation", Gruyere says Pythons hash is insecure since it hashes from left-to-right.
The Gruyere application is using this to encrypt data:
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is a static string (which is just '' by default)
I understand that in more secure hash functions, one change generates a whole new result, but I don't understand why this insecure, because different c_data generates whole different hashes!
EDIT: How would one go about beating a hash like this?

What the comment may be trying to get at is that for most hash functions, if you are given HASH(m) then it is easy to calculate HASH(m . x), for any x (where . is concatenation).
Therefore, if you are user ro, and the server sends you HASH(secret . ro), then you can easily calculate HASH(secret . root), and login as a different user.

I think that's just a bad explanation there. Python's hash() is insecure because it's easy to find collisions, but "hashes from left to right" has nothing to do with why it's easy to find collisions. Cryptographically secure hashes also process data strictly in sequence; they're likely to operate on data 128 or 256 bits at a time rather than one byte at a time, but that's just a detail of the implementation.
(It should be said that hash() being insecure is not a bug in Python, because that's not what it's for. It's an exposed detail of the implementation of Python's dictionaries as hash tables, and you generally don't want a secure hash function for your hash table, because that would slow it down so much that it would defeat the purpose. Python does provide secure hash functions in the hashlib module.)
(The use of an insecure hash is not the only problem with the code you show, but it is by far the most important problem.)

Python's default hashing algorithm (for all types, but it has the most severe consequences for strings as those are commonly hashed for security) is geared towards running fast and playing nice with the implementation of dicts. It's not a cryptographic hashing function, you shouldn't use it for security. Use hashlib for this.

The python built-in hash function is not intended for secure, cryptographic hashing. It's intention is to facilitate storing Python objects into dictionaries efficiently.
The internal hash implementations are too predictable (too many collisions) for secure uses. For example, the following assertions are all true:
hash('a') < hash('b')
hash('b') < hash('c')
hash('c') < hash('d')
This sequential nature makes for great dictionary storage behaviour, for which it was designed.
To create a secure hash, use the hashlib library instead.

One would go about "beating" a hash like that by appending their data to the end of the string being hashed and predicting the hash function output. Let me illustrate this:
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> data = 'root|admin|author'
>>> str(hash('' + data) & 0x7FFFFFF);
'116042699'
>>> data = 'root|admin|authos'
>>> str(hash('' + data) & 0x7FFFFFF);
'116042698'
>>>
Empty string ('') is the cookie secret you mentioned to be an empty string. In this particular example, though not really exploitable, one can see that the hash changed by 1 and the last byte of data changed "by one" too. Now, this example is not really an exploit (omitting the fact that creating a username of the anything_here|admin format makes that user admin) because there's some data after the username (left to right) so even if you create a username that's very close to the one being attacked then the rest of the string changes the hash in a completely undesirable manner. However, if the cookie was in the form of 105770185|user07 instead of 105770185|user07||author then you'd easily create a user "user08" or "user06" and computepredict the hash (hometask: what's the hash for "user08"? ).

Related

Deterministic asymmetric encryption

Although it is not considered safe, I need a Python library that always generates for the same plaintext the same ciphertext using asymmetric encryption scheme.
Meaning that given a plaintext m and a public key k when encrypting m using k I will always get a constant ciphertext c.
It will be even better if there is a way to use the Python library "cryptography" to do so.

You can't find this because public-key encryption cannot possibly be deterministic. Any deterministic public-key encryption scheme is subject to a very simple attack: given a ciphertext, guess what the plaintext might be, and verify the guess by encrypting with the public key. Anyone can carry out this attack since the encryption key is public.
This is completely different from public-key signatures, which can be deterministic because being able to tell that two signatures are from the same message doesn't change anything about the signature's validity. With encryption, being able to tell that two plaintexts are the same message does break the whole purpose of encryption.
There is one scenario in which public-key encryption could be deterministic, and that's if the plaintexts are randomly generated, or derived from randomly generated data, and it's impossible to guess potential plaintexts. However, with such input restrictions, you shouldn't look for an “asymmetric encryption” scheme, but for a lower-level primitive: a trapdoor permutation. This is not a directly usable primitive, but it can be a building block of a cryptographic mechanism (such as a public-key encryption mechanism). So you can't expect libraries to offer that as their interface. Furthermore, typical protocols are not generic in the way they might use a trapdoor permutation. So your protocol definition would call for a specific primitive, not for a “deterministic asymmetric encryption” primitive.
If you think you need “deterministic asymmetric encryption”, you're designing your own cryptographic scheme, and it does not stand a chance of being correct. Don't do that. If you need help solving a problem, ask about your actual problem instead of the dead end you've reached trying to solve it.

Will PKCS11 always find objects in the same order?

I have observed that both the bash command and what is probably a corresponding method from the Python PyKCS11 library seem to always find objects in the same order. My code relies on this being true, but have not read it anywhere, just observed it.
In the terminal:
$ pkcs11-tool --list-objects
Using slot 0 with a present token (0x0)
Public Key Object; RSA 2048 bits
label: bob_key
ID: afe438bbe0e0c2784c5385b8fbaa9146c75d704a
Usage: encrypt, verify, wrap
Public Key Object; RSA 2048 bits
label: alice_key
ID: b03a4f6c375e8a8a53bd7a35947511e25cbdc34b
Usage: encrypt, verify, wrap
With Python:
objects = session.findObjects([(CKA_CLASS, CKO_PUBLIC_KEY)])
for i, object in enumerate(objects):
d = object.to_dict()
print(d['CKA_LABEL'])
output:
bob_key
alice_key
objects is of type list and each element in objects is of type <class 'PyKCS11.CK_OBJECT_HANDLE'>
Will session.findObjects([(CKA_CLASS, CKO_PRIVATE_KEY)]) when run from a logged-in session also always be a list with exactly the same order as the expression above? In this case with two keys, would never want to see Alice come before Bob.

(Wanted to write a comment, but it got quite long...)
PKCS#11 does not guarantee any specific order of returned object handles so it is up to the particular implementation.
Even though your implementation might seem to be consistently giving the same order of objects there are some examples when this could unexpectedly change:
key renewal (keys do not last forever. You will need to generate some new keys in the future)
middleware upgrade (newer implementations might return objects in a different order)
HSM firmware upgrade (major upgrades might change the way objects are stored and change object enumeration order)
HSM recovery from backup (object order can change after HSM restore)
host OS data recovery (some implementatins store HSM objects encrypted in external folders and object search order might be the same as directory listing order which could change without a warning)
HSM change (are you sure that you will be using the same device for the whole lifetime of your application)
Relying on an undefined behaviour in general is a bad practice. Especially in security you should be very cautious.
It is definitely worth the time to stay on the safe side.
I would recommend to perform a separate search for each required object (using some strong identifier -- e.g. label) -- this way you can perform additional checks (e.g. enforce expected object type, ensure that object is unique etc.).
A similar example is Cryptoki object handle re-use. PKCS#11 states that object handle is bound to particular session (i.e. if you obtained object handle in session A you should not use it in session B -- even if both sessions are running in the same application).
There are implementations that preserve object handle for the same object across sessions. There are even implementations that preserve the same object handle in different applications (i.e. if you get object handle 123 in application A you will get object handle 123 in application B for the same object).
This behaviour is even described in the respective developer manual. But if you ask the vendor if you can rely on it you are told that there are some corner cases for some setups and that you must perform additional checks to be 100% sure that it will work as expected...
Good luck with your project!

How to quickly generate an OpenPGP key pair using GnuPG for testing purposes?

I'm testing some code that uses python-gnupg to encrypt/sign/decrypt some plaintext, and I'd like to generate a key pair on the fly. GnuPG is (of course) super paranoid in generating the key pair, and it sucks a lot of entropy from my system.
I found this answer on unix.stackexchange.com, but using rngd to have /dev/random pull from /dev/urandom sounds like a bad idea.
Since I'm testing I don't need high security, I just need the key pair to be generated as quickly as possible.
An idea is to pre-generate some keys offline, and use those keys on my tests. Anyway, I'd like to programmatically generate my temporary key pairs while executing the tests.
This is the code I'm using now (that is, again, super slow and not good for testing):
from tempfile import mkdtemp
import gnupg
def temp_identity():
identity = gnupg.GPG(gnupghome=mkdtemp())
input_data = gpg.gen_key_input(key_type='RSA', key_length=1024)
identity.gen_key(input_data)
return identity

Using any method to change /dev/random to pull out of /dev/urandom is totally fine once the entropy pool was initiated with a proper random state (which is not a problem on hardware x86 machines, but might require discussion for other devices). I strongly recommend watching The plain simple reality of entropy -- Or how I learned to stop worrying and love urandom, a lecture at 32C3.
If you want to fasten-up on-the-fly key generation, consider going for smaller key sizes like RSA 512 (1k keys aren't really secure, either). THis will render keys insecure, but if that's fine for testing -- go for it. Using another algorithm (for example elliptic curves if you already have GnuPG 2.1) might also speed up key generation.
If you really want to stick with /dev/random and smaller key sizes don't provide adequate performance, you can very well pre-generate keys, export them using gpg --export-secret-keys and import them instead of creating new ones.
gpg-agent also knows the option --debug-quick-random, which seems to fit your use case, but I've never used it before. From man gpg-agent:
--debug-quick-random
This option inhibits the use of the very secure random quality level (Libgcrypt’s GCRY_VERY_STRONG_RANDOM) and degrades all request down to standard random quality. It is
only used for testing and shall not be used for any production quality keys. This option is only effective when given on the command line.

is a there md5 decrypt function in python? [duplicate]

This question already has answers here:
Is it possible to decrypt MD5 hashes?
(24 answers)
Closed 2 years ago.
Possible Duplicate:
Is it possible to decrypt md5 hashes?
i used md5.new(); md5.update("aaa"), md5.digest()
to form a md5 hash of the data "aaa" . How to get back the data using python?

You cannot decode an md5 hash, as hashing is a process that is best thought of as one-way encoding (that is to say what is hashed cannot be de-hashed; one can only determine what was hashed, either by examining a list of known hashes, or by hashing a set of inputs and matching the resulting hashes with the hash you are trying to "decode").
Quoting Wikipedia, the key features of such a hashing algorithm are:
it is infeasible to find a message
that has a given hash,
it is
infeasible to modify a message without
changing its hash,
it is infeasible to
find two different messages with the
same hash.
The most common uses of such algorithms today are:
Storing passwords
Verifying the contents of files.
If you want to two-way encrypt the data, you need to look at other cryptographic libraries for Python (As usual, Stackoverflow has a recommendation).

You can't. That's the point - a hash is one-way, it's not the same as an encryption.

I don't know about Python - but hash function are irreversible.
First of all, note that hash functions provide a constant length output - meaning that information will be thrown away (you can hash a file of 3 MB and still only get a result of less than 1 kB).
Additionally, hash functions are made for the fact that they aren't reversible, if you need encryption, don't use hashing but encryption - a major application of hashing is when the database info has leaked (which contained hashes) that the passwords have not been compromised (there are more examples, but this is the most obvious one)

If you want to break a hash, such as a password hash. Then you need a very large lookup table. John the Ripper is commonly used to break passwords using a dictionary, this is a very good method espeically if its a salted password hash.
Another approch is using a Rainbow Table, however these take long time to generate. There are free rainbow tables accessible online.
Here is a python script to perform an md5() brute force attack.

To add to everyone else's point, MD5 is a one-way hash. The common usage is to hash two input values and if the hashed values match, then the input should be the same. Going from an MD5 hashed value to the hash input is nonsensical. What you are probably after is a symmetric encryption algorithm - see two-way keyed encryption/hash algorithm for a good discussion on the subject.

In general, the answers from BlueRaja and Sean are correct. MD5 (and other hash functions) are one-way, you can't reverse the process.
However, if you have a small size of data, you can try to search for a hash collision (another, or the same, piece of data) having the same hash.

Hashes map a bunch of data to a finite (albeit large) set of numeric values/strings.
It is a many-to-one mapping, so that decoding a hash is not only "difficult" in the cryptographic sense, but also conceptually impossible in that even if you could, you would get an infinite set of possible input strings.

arguments to cryptographic functions

I'm a bit confused that the argument to crypto functions is a string. Should I simply wrap non-string arguments with str() e.g.
hashlib.sha256(str(user_id)+str(expiry_time))
hmac.new(str(random.randbits(256)))
(ignore for the moment that random.randbits() might not be cryptographically good).
edit: I realise that the hmac example is silly because I'm not storing the key anywhere!

Well, usually hash-functions (and cryptographic functions generally) work on bytes. The Python strings are basically byte-strings. If you want to compute the hash of some object you have to convert it to a string representation. Just make sure to apply the same operation later if you want to check if the hash is correct. And make sure that your string representation doesn't contain any changing data that you don't want to be checked.
Edit: Due to popular request a short reminder that Python unicode strings don't contain bytes but unicode code points. Each unicode code point contains multiple bytes (2 or 4, depending on how the Python interpreter was compiled). Python strings only contain bytes. So Python strings (type str) are the type most similar to an array of bytes.

You can.
However, for the HMAC, you actually want to store the key somewhere. Without the key, there is no way for you to verify the hash value later. :-)

Oh and Sha256 isn't really an industrial strength cryptographic function (although unfortunately it's used quite commonly on many sites). It's not a real way to protect passwords or other critical data, but more than good enough for generating temporal tokens
Edit: As mentioned Sha256 needs at least some salt. Without salt, Sha256 has a low barrier to being cracked with a dictionary attack (time-wise) and there are plenty of Rainbow tables to use as well. Personally I'd not use anything less than Blowfish for passwords (but that's because I'm paranoid)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.