This question already has answers here:
Is it possible to decrypt MD5 hashes?
(24 answers)
Closed 2 years ago.
Possible Duplicate:
Is it possible to decrypt md5 hashes?
i used md5.new(); md5.update("aaa"), md5.digest()
to form a md5 hash of the data "aaa" . How to get back the data using python?
You cannot decode an md5 hash, as hashing is a process that is best thought of as one-way encoding (that is to say what is hashed cannot be de-hashed; one can only determine what was hashed, either by examining a list of known hashes, or by hashing a set of inputs and matching the resulting hashes with the hash you are trying to "decode").
Quoting Wikipedia, the key features of such a hashing algorithm are:
it is infeasible to find a message
that has a given hash,
it is
infeasible to modify a message without
changing its hash,
it is infeasible to
find two different messages with the
same hash.
The most common uses of such algorithms today are:
Storing passwords
Verifying the contents of files.
If you want to two-way encrypt the data, you need to look at other cryptographic libraries for Python (As usual, Stackoverflow has a recommendation).
You can't. That's the point - a hash is one-way, it's not the same as an encryption.
I don't know about Python - but hash function are irreversible.
First of all, note that hash functions provide a constant length output - meaning that information will be thrown away (you can hash a file of 3 MB and still only get a result of less than 1 kB).
Additionally, hash functions are made for the fact that they aren't reversible, if you need encryption, don't use hashing but encryption - a major application of hashing is when the database info has leaked (which contained hashes) that the passwords have not been compromised (there are more examples, but this is the most obvious one)
If you want to break a hash, such as a password hash. Then you need a very large lookup table. John the Ripper is commonly used to break passwords using a dictionary, this is a very good method espeically if its a salted password hash.
Another approch is using a Rainbow Table, however these take long time to generate. There are free rainbow tables accessible online.
Here is a python script to perform an md5() brute force attack.
To add to everyone else's point, MD5 is a one-way hash. The common usage is to hash two input values and if the hashed values match, then the input should be the same. Going from an MD5 hashed value to the hash input is nonsensical. What you are probably after is a symmetric encryption algorithm - see two-way keyed encryption/hash algorithm for a good discussion on the subject.
In general, the answers from BlueRaja and Sean are correct. MD5 (and other hash functions) are one-way, you can't reverse the process.
However, if you have a small size of data, you can try to search for a hash collision (another, or the same, piece of data) having the same hash.
Hashes map a bunch of data to a finite (albeit large) set of numeric values/strings.
It is a many-to-one mapping, so that decoding a hash is not only "difficult" in the cryptographic sense, but also conceptually impossible in that even if you could, you would get an infinite set of possible input strings.
Related
Although it is not considered safe, I need a Python library that always generates for the same plaintext the same ciphertext using asymmetric encryption scheme.
Meaning that given a plaintext m and a public key k when encrypting m using k I will always get a constant ciphertext c.
It will be even better if there is a way to use the Python library "cryptography" to do so.
You can't find this because public-key encryption cannot possibly be deterministic. Any deterministic public-key encryption scheme is subject to a very simple attack: given a ciphertext, guess what the plaintext might be, and verify the guess by encrypting with the public key. Anyone can carry out this attack since the encryption key is public.
This is completely different from public-key signatures, which can be deterministic because being able to tell that two signatures are from the same message doesn't change anything about the signature's validity. With encryption, being able to tell that two plaintexts are the same message does break the whole purpose of encryption.
There is one scenario in which public-key encryption could be deterministic, and that's if the plaintexts are randomly generated, or derived from randomly generated data, and it's impossible to guess potential plaintexts. However, with such input restrictions, you shouldn't look for an “asymmetric encryption” scheme, but for a lower-level primitive: a trapdoor permutation. This is not a directly usable primitive, but it can be a building block of a cryptographic mechanism (such as a public-key encryption mechanism). So you can't expect libraries to offer that as their interface. Furthermore, typical protocols are not generic in the way they might use a trapdoor permutation. So your protocol definition would call for a specific primitive, not for a “deterministic asymmetric encryption” primitive.
If you think you need “deterministic asymmetric encryption”, you're designing your own cryptographic scheme, and it does not stand a chance of being correct. Don't do that. If you need help solving a problem, ask about your actual problem instead of the dead end you've reached trying to solve it.
So the problem is that I have some secrets (TOTP/HOTP keys) that need to be used consistently by my program, but I don't want a memory dump to just show them all. I'm talking about common people whose computers can be compromised by malware.
If one thinks of encrypting these strings in memory with some symmetric encryption algorithm with a random bytes key, then it's probably harder than just reading the memory (AES-CBC, etc...), and this will require finding the random key in memory, then finding this data in memory that is decryptable with this.
Maybe I can make it even harder by splitting this key into multiple variables?
Well, I'm not a security specialist, and my question is to those who have faced this kind of problem before: What's the best practice to store secret information in memory? Is my concern legitimate, or is it paranoid, and all programs just put their passwords in memory?
I am trying to learn how to defend against security attacks on websites. The link below shows a good tutorial, but I am puzzled by one statement:
In http://google-gruyere.appspot.com/part3#3__client_state_manipulation , under "Cookie manipulation", Gruyere says Pythons hash is insecure since it hashes from left-to-right.
The Gruyere application is using this to encrypt data:
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is a static string (which is just '' by default)
I understand that in more secure hash functions, one change generates a whole new result, but I don't understand why this insecure, because different c_data generates whole different hashes!
EDIT: How would one go about beating a hash like this?
What the comment may be trying to get at is that for most hash functions, if you are given HASH(m) then it is easy to calculate HASH(m . x), for any x (where . is concatenation).
Therefore, if you are user ro, and the server sends you HASH(secret . ro), then you can easily calculate HASH(secret . root), and login as a different user.
I think that's just a bad explanation there. Python's hash() is insecure because it's easy to find collisions, but "hashes from left to right" has nothing to do with why it's easy to find collisions. Cryptographically secure hashes also process data strictly in sequence; they're likely to operate on data 128 or 256 bits at a time rather than one byte at a time, but that's just a detail of the implementation.
(It should be said that hash() being insecure is not a bug in Python, because that's not what it's for. It's an exposed detail of the implementation of Python's dictionaries as hash tables, and you generally don't want a secure hash function for your hash table, because that would slow it down so much that it would defeat the purpose. Python does provide secure hash functions in the hashlib module.)
(The use of an insecure hash is not the only problem with the code you show, but it is by far the most important problem.)
Python's default hashing algorithm (for all types, but it has the most severe consequences for strings as those are commonly hashed for security) is geared towards running fast and playing nice with the implementation of dicts. It's not a cryptographic hashing function, you shouldn't use it for security. Use hashlib for this.
The python built-in hash function is not intended for secure, cryptographic hashing. It's intention is to facilitate storing Python objects into dictionaries efficiently.
The internal hash implementations are too predictable (too many collisions) for secure uses. For example, the following assertions are all true:
hash('a') < hash('b')
hash('b') < hash('c')
hash('c') < hash('d')
This sequential nature makes for great dictionary storage behaviour, for which it was designed.
To create a secure hash, use the hashlib library instead.
One would go about "beating" a hash like that by appending their data to the end of the string being hashed and predicting the hash function output. Let me illustrate this:
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> data = 'root|admin|author'
>>> str(hash('' + data) & 0x7FFFFFF);
'116042699'
>>> data = 'root|admin|authos'
>>> str(hash('' + data) & 0x7FFFFFF);
'116042698'
>>>
Empty string ('') is the cookie secret you mentioned to be an empty string. In this particular example, though not really exploitable, one can see that the hash changed by 1 and the last byte of data changed "by one" too. Now, this example is not really an exploit (omitting the fact that creating a username of the anything_here|admin format makes that user admin) because there's some data after the username (left to right) so even if you create a username that's very close to the one being attacked then the rest of the string changes the hash in a completely undesirable manner. However, if the cookie was in the form of 105770185|user07 instead of 105770185|user07||author then you'd easily create a user "user08" or "user06" and computepredict the hash (hometask: what's the hash for "user08"? ).
Project Euler
I have recently begun to solve some of the Project Euler riddles. I found the discussion forum in the site a bit frustrating (most of the discussions are closed and poorly-threaded), So I have decided to publish my Python solutions on launchpad for discussion.
The problem is that it seems quite unethical to publish these solutions, as it would let other people gain reputation without doing the programming work, which the site deeply discourages.
My Encryption problem
I want to encrypt my answers so that only those who have already solved the riddles can see my code. The logical key would be the answer to the riddle, which is always numeric.
In order to prevent brute-force attacks on my answers, I want to find an encryption algorithm that takes a significantly long time (few seconds) to run.
Do you know any such algorithm? I would fancy a Python package, which I can attach to the code, over an external program that might have portability issues.
Thanks,
Adam
It sounds like people will have to write their own decryption utility, or use something off-the-shelf, or use off-the-shelf components to decrypt your posts.
PBKDF2 is a standardized algorithm for password-based key derivation, defined in PKCS #5. Basically, you can tune "iterations" parameter so that deriving the key from a password (the answer to the Euler problem) would take several seconds. The key can then be used for any common symmetric encryption algorithm, like AES-128.
This has the advantage that most crypto libraries already support PBKDF2. In fact, you might find mail clients that support password-based encryption for S/MIME messages. Then you could just post an S/MIME and people could read it with the mail client. Unfortunately, my mail client (Thunderbird) only supports public-key encryption.
I think Yin Zhu pegged the social aspect of it and Whirlwind the technical. Using your preferred approach of:
python decrypt.py --problem=123 --key=1234567
the key number is readily available to Google, and even without that, slamming through a million keys (assuming a median key length of 5 decimal digits yields less than 20 bits of key) is pretty fast. If I wanted to be more clever I could use plain-text assumptions (e.g. import, for) and vastly reduce my search space.
For all the trouble you're probably best off using something really complicated like:
>>> print codecs.getencoder('rot_13')('import codecs')[0]
vzcbeg pbqrpf
And if you want the solution to Project Euler problem 123, you'll have to beat it out of me...
Yes, you can do this with virtually any symmetric encryption algorithm: DSA, or AES, for example; just use the integer as the key, and pad the key out to the required length of the encryption algorithm's key, and use that key to decrypt the answer.
Keep in mind that if you extend a short key, the encryption won't be very good. The strength of the encryption has a lot more to do with key length and the algorithm itself than how long it takes to run.
This question seems to have some examples of libraries to use with python.
Just use triple DES and use different keys for each iteration, use the number to generate a each of the 3 keys. Pad up the key length with some text, and you're good.
Tripple DES was designed to increase effectiveness against brute force.
It's not the world's most secure option, but I'll keep most bruter's at bay.
If you encrypt your answers, those who have solved the problem simply do not want to see your answers with such effort, provided that they already have plenty of answers to see in the answer page. Those who haven't cannot see. Then your work becomes less useful.
Btw, there are many places providing answers to Project Euler, e.g. Haskell answers, Clojure answers, F# answers. If somebody only wants the answer to a question, he/she could simply run the program. Provided that Python is so popular, google "Python Euler xx" would give you plenty of blogs solving a specific problem.
The simplest approach would be to hash the answer using a secure hash function such as SHA-1, then provide the hash so users can verify their answer. If you want to make brute-forcing more difficult, iterate the hash - eg, provide the result of n recursive applications of SHA1, where n is some parameter you choose to make it difficult to brute-force.
If the number of possible answers is small, though, it'll be difficult to impossible to prevent someone from brute-forcing it even with an expensive hash function.
Edit: Sorry, I misread your original question. If you want to encrypt your answer, you could do that by using the resulting hash, above, as the encryption key for your answer, rather than posting the hash.
If you want an encryption routine that is easy to use and distribute, I recommend Paul Rubin's p3.py. It's probably on the fast side, for how secure it is, but since you seem to be in need of a hurdle to be jumped rather than a siege-resistant wall, it may be a good choice for your purposes.
You could also look into rijndael.py, which is an implementation of AES, and slower than p3.py.
I'm a bit confused that the argument to crypto functions is a string. Should I simply wrap non-string arguments with str() e.g.
hashlib.sha256(str(user_id)+str(expiry_time))
hmac.new(str(random.randbits(256)))
(ignore for the moment that random.randbits() might not be cryptographically good).
edit: I realise that the hmac example is silly because I'm not storing the key anywhere!
Well, usually hash-functions (and cryptographic functions generally) work on bytes. The Python strings are basically byte-strings. If you want to compute the hash of some object you have to convert it to a string representation. Just make sure to apply the same operation later if you want to check if the hash is correct. And make sure that your string representation doesn't contain any changing data that you don't want to be checked.
Edit: Due to popular request a short reminder that Python unicode strings don't contain bytes but unicode code points. Each unicode code point contains multiple bytes (2 or 4, depending on how the Python interpreter was compiled). Python strings only contain bytes. So Python strings (type str) are the type most similar to an array of bytes.
You can.
However, for the HMAC, you actually want to store the key somewhere. Without the key, there is no way for you to verify the hash value later. :-)
Oh and Sha256 isn't really an industrial strength cryptographic function (although unfortunately it's used quite commonly on many sites). It's not a real way to protect passwords or other critical data, but more than good enough for generating temporal tokens
Edit: As mentioned Sha256 needs at least some salt. Without salt, Sha256 has a low barrier to being cracked with a dictionary attack (time-wise) and there are plenty of Rainbow tables to use as well. Personally I'd not use anything less than Blowfish for passwords (but that's because I'm paranoid)