Unhashing a hashed (MD5) email address

Unhashing a hashed (MD5) email address - python

I know that in hashing you, by definition, lose information. However, as email addresses can be restricted - such as with the information available I would know a potential domain of the email, and that it must have an #. Do these constraints change anything about the problem? Or is the best way to simply make a guess and see if the hash is the same? Also MD5 is no longer as secure as it once was.
Thanks

That is the point of Md5 hashing that even a minute change in the string can change the hash completely. So these constraints change nothing about the problem.
However since you said that its an email and that you know about the potential domain then you can try this technique.
Generate a list of potential emails it will be within 26 letters and lets say of maximum size 10.
Then you can generate an md5 for all of these possibilities and check if it is equal to the one you have.
import hashlib
from itertools import combinations
import time
start=time.time()
your_md5_hash='your_md5_hash'
letters='abcdefghijklmnopqrstuvwxyz'
possible_words=[]
for r in range(1,10): #change 10 to the maximum size of your email
for combo in combinations(list(letters), r=r):
res=''.join(combo)
possible_words.append(res)
possible_words=[''.join(x)+'#domain.com' for x in possible_words]
print (len(possible_words))
for x in possible_words:
res=hashlib.md5(x.encode())
if res==your_md5_hash:
print (res)
print (x)
print ("RESULT_FOUND")
exit(0)
print (time.time()-start)
This is brute force approach and if you know the size of your email then this could work. Secondly please note that if you do not know the size then the size of possibilities will increase exponentially.
For instance the length of combinations as of now is 5658536 and it took my basic laptop 6 seconds to process.

Related

Password Cracking

I've asked this question once before on here and didn't get the answer I was looking for, but I got another method of doing it. so here I go again.
(The problem with the previous found answer was that it was extremely efficient, a bit too efficient. I couldn't count comparisons, and I wanted it to be the most bare bone way of finding the password, so that I can implement a rating system to it.)
I am looking to make a password rating program that rates a password based off of the length of time, and the amount of comparisons the program had to make in order to find the correct password.
I'm looking to use multiple different methods in order to crack the input, ex : comparing the input to a database of common words, and generating all possibilities for a password starting with the first character.
Problem :
I can't find a way to start with one element in a list, we will call it A.
Running A through chr(32) - chr(127) ('space' - '~'), and then adding a second element to a list called B.
For the second loop, set A = chr(32) and B would then run through chr(32) - chr(127). A would then turn into chr(33) and B would run through all characters and so forth. until all possible options have been compared, then for the next loop it would add another element onto the list and continue the search, starting with chr(32), chr(32), chr(32-127). Continuing this pattern until it finds the correct password.
This is the closest I could get to something that would work (I know it's terrible).
while ''.join(passCheck) != ''.join(usrPassword) :
for i in range(0, len(usrPassword)) :
if ''.join(passCheck) != ''.join(usrPassword) :
passCheck.append(' ')
for j in range(32, 127) :
if ''.join(passCheck) != ''.join(usrPassword) :
passCheck[i] = chr(j)
for k in range(0, len(passCheck)) :
for l in range(32, 127) :
if ''.join(passCheck) != ''.join(usrPassword) :
passCheck[k] = chr(l)
print(passCheck)

The answer to this question is that you probably want to ask a different one, unfortunately.
Checking password strength is not as simple as calculating all possible combinations, Shannon entropy, etc.
What matters is what's called "guessing entropy", which is a devilishly complex mix of raw bruteforce entropy, known password lists and password leak lists, rules applied to those lists, Levenshtein distance from the elements in those lists, human-generated strings and phrases, keyboard walks, etc.
Further, password strength is deeply rooted in the question "How was this password generated?" ... which is very hard indeed to automatically detect and calculate after the fact. As humans, we can approximate this in many cases by reverse-engineering the psychology of how people select passwords ... but this is still an art, not a science.
For example, 'itsmypartyandillcryifiwantto' is a terrible password, even though its Shannon entropy is quite large. And 'WYuNLDcp0yhsZXvstXko' is a randomly-generated password ... but now that it's public, it's a bad one. And until you know how passwords like 'qscwdvefbrgn' or 'ji32k7au4a83' were generated, they look strong ... but they are definitely not.
So if you apply the strict answer to your question, you're likely to get a tool that dramatically overestimates the strength of many passwords. If your goal is to actually encourage your users to create passwords resistant to bruteforce, you should instead encourage them to use randomly generated passphrases, and ensure that your passwords are stored using a very slow hash (Argon2 family, scrypt, bcrypt, etc. - but be sure to bench these for UX and server performance before choosing one).
References and further reading (disclaimer: some are my answers or have my comments):
https://security.stackexchange.com/a/4631/6203
https://security.stackexchange.com/questions/4630/how-can-we-accurately-measure-a-password-entropy-range
https://security.stackexchange.com/questions/127434/what-are-possible-methods-for-calculating-password-entropy
https://security.stackexchange.com/a/174555/6203

If you check in alphabetic order then you need only one loop to calculate it how many times it would have to check it
You have 96 chars (127-32+1)
password = 'ABC'
rate = 0
for char in password:
rate = rate*96 + (ord(char)-31)
print(rate)

Using hashlib.sha256 to create a unique id; is this guaranteed to be unique?

I am trying to create a unique record id using the following function:
import hashlib
from base64 import b64encode
def make_uid(salt, pepper, key):
s = b64encode(salt)
p = b64encode(pepper)
k = b64encode(key)
return hashlib.sha256(s + p + k).hexdigest()
Where pepper is set like this:
uuid_pepper = uuid.uuid4()
pepper = str(uuid_pepper).encode('ascii')
And salt and key are the same values for every request.
My question is, because of the unique nature of the pepper, will make_uid in this intance always return a unique value, or is there a chance that it can create a duplicate?
The suggested answer is different because I'm not asking about the uniqueness of various uuid types, I'm wondering whether it's at all possible for a sha256 hash to create a collision between two distinct inputs.

I think what you want to know is whether SHA256 is guaranteed to generate a unique hash result. The answer is yes and no. I got the following result from my research, not 100% accurate but close.
In theory, SHA256 will collide. It has 2^256 results. So if we hash 2^256 + 1 times, there must be a collision. Even worse, according to statistics, the possibility of collision within 2^130 times of hashing is 99%.
But you probably won't generate one during your lifetime. Assume we have a computer that can calculate 10,000 hashes per second. It costs this computer 4 * 10^27 years to finish 2^130 hashes. You might not have any idea about how large this number is. The number of years of doing hashing is 2 * 10^22 times of that of human exist on earth. That means that even if you started doing hashing since the first day we were on earth till now, the possibility of collision is still very very small.
Hope that answers your question.

How to generate random numbers that are unique forever in python

I have written a script where I need a unique random number every time I run that script. Just for explaination: suppose that I want my script 5 times. Now I want number generated in all times should be unique?
I have found a lot of infomation about random number uniqueness but those are for one time only.
If you think it is not possible then is there any alternative way etc.?

You could use uuid to generate RFC 4122 UUIDs (Universally Unique IDentifiers). For example uuid4 generates a random UUID:
In [1]: import uuid
In [2]: u = uuid.uuid4()
In [3]: u
Out[3]: UUID('deb1064a-e885-4ebc-9afc-f5291120edf8')
To get the number, access the int attribute of uuid.UUID objects:
In [4]: u.int
Out[4]: 242844464987419882393579831900689854160

Unique and random are contradictory. For anything that's genuinely random there is a (small, maybe infinitessimal) chance of repetition.
If you want something less unwieldy (but less universally unique) than UUIDs you can roll your own combination of a random number (with a small chance of repetition) and a number derived from the time (for example the Unix epoch, which won't ever repeat for a single instance if the script is run less often than once per second).
If the random number is used as part of (say) a filename you can generate a name and then check whether the file already exists. If it does, then reject the random number as already used, and generate another one. Or if you really need to, you could store all random numbers already used somewhere. Load them before each run, add the new number and save after each run.
Finally there are pseudo-"random" generators of the form X(n+1) = (X(n)*a + b) mod M. These are hopeless for security / cryptography because given a few members of the sequence, you can discover the algorithm and predict all future numbers. However, if that predictability is unimportant, then with appropriate constants you can guarantee no repeats until all M members of the sequence have been generated. The numbers are not at all random, but they may appear random to a casual observer.

Cheap mapping of string to small fixed-length string

Just for debugging purposes I would like to map a big string (a session_id, which is difficult to visualize) to a, let's say, 6 character "hash". This hash does not need to be secure in any way, just cheap to compute, and of fixed and reduced length (md5 is too long). The input string can have any length.
How would you implement this "cheap_hash" in python so that it is not expensive to compute? It should generate something like this:
def compute_cheap_hash(txt, length=6):
# do some computation
return cheap_hash
print compute_cheap_hash("SDFSGSADSADFSasdfgsadfSDASAFSAGAsaDSFSA2345435adfdasgsaed")
aBxr5u

I can't recall if MD5 is uniformly distributed, but it is designed to change a lot even for the smallest difference in the input.
Don't trust my math, but I guess the collision chance is 1 in 16^6 for the first 6 digits from the MD5 hexdigest, which is about 1 in 17 millions.
So you can just cheap_hash = lambda input: hashlib.md5(input).hexdigest()[:6].
After that you can use hash = cheap_hash(any_input) anywhere.
PS: Any algorithm can be used; MD5 is slightly cheaper to compute but hashlib.sha256 is also a popular choice.

def cheaphash(string,length=6):
if length<len(hashlib.sha256(string).hexdigest()):
return hashlib.sha256(string).hexdigest()[:length]
else:
raise Exception("Length too long. Length of {y} when hash length is {x}.".format(x=str(len(hashlib.sha256(string).hexdigest())),y=length))
This should do what you need it to do, it simply uses the hashlib module, so make sure to import it before using this function.

I found this similar question: https://stackoverflow.com/a/6048639/647991
So here is the function:
import hashlib
def compute_cheap_hash(txt, length=6):
# This is just a hash for debugging purposes.
# It does not need to be unique, just fast and short.
hash = hashlib.sha1()
hash.update(txt)
return hash.hexdigest()[:length]

Django - make_random_password method, is it truly random?

I am using the following method to create a random code for users as part of a booking process:
User.objects.make_random_password()
When the users turn up at the venue, they will present the password.
Is it safe to assume that two people won't end up with the same code?
Thanks

No, it's not safe to assume that two people can't have the same code. Random doesn't mean unique. It may be unlikely and rare, depending on the length you specify and number of users you are dealing with. But you can't rely on its uniqueness.

It depends on now many users you have, and the password length you choose, and how you use User.objects.make_random_password() For the defaults, the chance is essentially zero, IMO;
This method is implemented using get_random_string(). From the django github repo:
def get_random_string(length=12,
allowed_chars='abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
"""
Returns a securely generated random string.
The default length of 12 with the a-z, A-Z, 0-9 character set returns
a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
"""
if not using_sysrandom:
# This is ugly, and a hack, but it makes things better than
# the alternative of predictability. This re-seeds the PRNG
# using a value that is hard for an attacker to predict, every
# time a random string is required. This may change the
# properties of the chosen random sequence slightly, but this
# is better than absolute predictability.
random.seed(
hashlib.sha256(
"%s%s%s" % (
random.getstate(),
time.time(),
settings.SECRET_KEY)
).digest())
return ''.join([random.choice(allowed_chars) for i in range(length)])
According to github, the current code uses a 12 character password from a string of 62 characters (lower- and uppercase letters and numbers) by default. This makes for 62**12 or 3226266762397899821056 (3.22e21) possible different passwords. This is much larger than the current world population (around 7e9).
The letters are picked from this list of characters by the random.choice() function. The question now becomes how likely it is that the repeated calling of random.choice() returns the same sequence twice?
As you can see from the implementation of get_random_string(), the code tries hard to avoid predictability. When not using the OS's pseudo-random value generator (which on Linux and *BSD gathers real randomness from e.g. the times at which ethernet packets or keypresses arrive), it re-seeds the random module's Mersenne Twister predictable PRNG at each call with a combination of the current random state, the current time and (presumably constant) secret key.
So for two identical passwords to be generated, both the state of the random generator (which is about 8 kiB in python) and the time at which they are generated (measured in seconds since the epoch, as per time.time()) have to be identical. If the system's time is well-maintained and you are running one instance of the password generation program, the chance of that happening is essentially zero. If you start two or more instances of this password generating program at exactly the same time and with exactly the same seed for the PRNG, and combine their output, one would expect some passwords to appear more than once.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.