Django - make_random_password method, is it truly random? - python

I am using the following method to create a random code for users as part of a booking process:
User.objects.make_random_password()
When the users turn up at the venue, they will present the password.
Is it safe to assume that two people won't end up with the same code?
Thanks

No, it's not safe to assume that two people can't have the same code. Random doesn't mean unique. It may be unlikely and rare, depending on the length you specify and number of users you are dealing with. But you can't rely on its uniqueness.

It depends on now many users you have, and the password length you choose, and how you use User.objects.make_random_password() For the defaults, the chance is essentially zero, IMO;
This method is implemented using get_random_string(). From the django github repo:
def get_random_string(length=12,
allowed_chars='abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
"""
Returns a securely generated random string.
The default length of 12 with the a-z, A-Z, 0-9 character set returns
a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
"""
if not using_sysrandom:
# This is ugly, and a hack, but it makes things better than
# the alternative of predictability. This re-seeds the PRNG
# using a value that is hard for an attacker to predict, every
# time a random string is required. This may change the
# properties of the chosen random sequence slightly, but this
# is better than absolute predictability.
random.seed(
hashlib.sha256(
"%s%s%s" % (
random.getstate(),
time.time(),
settings.SECRET_KEY)
).digest())
return ''.join([random.choice(allowed_chars) for i in range(length)])
According to github, the current code uses a 12 character password from a string of 62 characters (lower- and uppercase letters and numbers) by default. This makes for 62**12 or 3226266762397899821056 (3.22e21) possible different passwords. This is much larger than the current world population (around 7e9).
The letters are picked from this list of characters by the random.choice() function. The question now becomes how likely it is that the repeated calling of random.choice() returns the same sequence twice?
As you can see from the implementation of get_random_string(), the code tries hard to avoid predictability. When not using the OS's pseudo-random value generator (which on Linux and *BSD gathers real randomness from e.g. the times at which ethernet packets or keypresses arrive), it re-seeds the random module's Mersenne Twister predictable PRNG at each call with a combination of the current random state, the current time and (presumably constant) secret key.
So for two identical passwords to be generated, both the state of the random generator (which is about 8 kiB in python) and the time at which they are generated (measured in seconds since the epoch, as per time.time()) have to be identical. If the system's time is well-maintained and you are running one instance of the password generation program, the chance of that happening is essentially zero. If you start two or more instances of this password generating program at exactly the same time and with exactly the same seed for the PRNG, and combine their output, one would expect some passwords to appear more than once.

Related

Hashing Algorithm to use for short unique content ID

I was wondering what would be the best hashing algorithm to use to create short + unique ids for list of content items. Each content item is ascii file in the order of 100-500kb.
The requirements I have are:
Must be as short as possible, I have very limited space to store the ids and would like to keep them to say < 10 characters each (when represented as ascii)
Must be unique, i.e. no collisions or at least a negligible chance of collisions
I don't need to it be cryptographically secure
I don't need it to be overly fast (each content item is pretty small)
I am trying to implement this in python so preferably a algorithm that has a python implementation.
In leu of any other recommendation I've currently decided to use the following approach. I am taking blake2 hashing algorithm to create a cryptographically secure hex hash based on the file contents as to minimise the chance of collisions. I then use base64 encoding to map it to an ascii character set of which I just take the first 8 digits of.
Under the assumption that these digits will be perfectly randomised that gives 64^8 combinations that the hash can take. I predict the upper limit to the number of content items I'll ever have is 50k which gives me a probability of at least 1 collision of 0.00044% which I think is acceptably low enough for my use-case (can always up to 9 or 10 digits if needed in the future).
import hashlib
import base64
def get_hash(byte_content, size=8):
hash_bytes = hashlib.blake2b(byte_content,digest_size=size * 3).digest()
hash64 = base64.b64encode(hash_bytes).decode("utf-8")[:size]
return hash64
# Example of use
get_hash(b"some random binary object")

Password Cracking

I've asked this question once before on here and didn't get the answer I was looking for, but I got another method of doing it. so here I go again.
(The problem with the previous found answer was that it was extremely efficient, a bit too efficient. I couldn't count comparisons, and I wanted it to be the most bare bone way of finding the password, so that I can implement a rating system to it.)
I am looking to make a password rating program that rates a password based off of the length of time, and the amount of comparisons the program had to make in order to find the correct password.
I'm looking to use multiple different methods in order to crack the input, ex : comparing the input to a database of common words, and generating all possibilities for a password starting with the first character.
Problem :
I can't find a way to start with one element in a list, we will call it A.
Running A through chr(32) - chr(127) ('space' - '~'), and then adding a second element to a list called B.
For the second loop, set A = chr(32) and B would then run through chr(32) - chr(127). A would then turn into chr(33) and B would run through all characters and so forth. until all possible options have been compared, then for the next loop it would add another element onto the list and continue the search, starting with chr(32), chr(32), chr(32-127). Continuing this pattern until it finds the correct password.
This is the closest I could get to something that would work (I know it's terrible).
while ''.join(passCheck) != ''.join(usrPassword) :
for i in range(0, len(usrPassword)) :
if ''.join(passCheck) != ''.join(usrPassword) :
passCheck.append(' ')
for j in range(32, 127) :
if ''.join(passCheck) != ''.join(usrPassword) :
passCheck[i] = chr(j)
for k in range(0, len(passCheck)) :
for l in range(32, 127) :
if ''.join(passCheck) != ''.join(usrPassword) :
passCheck[k] = chr(l)
print(passCheck)
The answer to this question is that you probably want to ask a different one, unfortunately.
Checking password strength is not as simple as calculating all possible combinations, Shannon entropy, etc.
What matters is what's called "guessing entropy", which is a devilishly complex mix of raw bruteforce entropy, known password lists and password leak lists, rules applied to those lists, Levenshtein distance from the elements in those lists, human-generated strings and phrases, keyboard walks, etc.
Further, password strength is deeply rooted in the question "How was this password generated?" ... which is very hard indeed to automatically detect and calculate after the fact. As humans, we can approximate this in many cases by reverse-engineering the psychology of how people select passwords ... but this is still an art, not a science.
For example, 'itsmypartyandillcryifiwantto' is a terrible password, even though its Shannon entropy is quite large. And 'WYuNLDcp0yhsZXvstXko' is a randomly-generated password ... but now that it's public, it's a bad one. And until you know how passwords like 'qscwdvefbrgn' or 'ji32k7au4a83' were generated, they look strong ... but they are definitely not.
So if you apply the strict answer to your question, you're likely to get a tool that dramatically overestimates the strength of many passwords. If your goal is to actually encourage your users to create passwords resistant to bruteforce, you should instead encourage them to use randomly generated passphrases, and ensure that your passwords are stored using a very slow hash (Argon2 family, scrypt, bcrypt, etc. - but be sure to bench these for UX and server performance before choosing one).
References and further reading (disclaimer: some are my answers or have my comments):
https://security.stackexchange.com/a/4631/6203
https://security.stackexchange.com/questions/4630/how-can-we-accurately-measure-a-password-entropy-range
https://security.stackexchange.com/questions/127434/what-are-possible-methods-for-calculating-password-entropy
https://security.stackexchange.com/a/174555/6203
If you check in alphabetic order then you need only one loop to calculate it how many times it would have to check it
You have 96 chars (127-32+1)
password = 'ABC'
rate = 0
for char in password:
rate = rate*96 + (ord(char)-31)
print(rate)

How to generate random numbers that are unique forever in python

I have written a script where I need a unique random number every time I run that script. Just for explaination: suppose that I want my script 5 times. Now I want number generated in all times should be unique?
I have found a lot of infomation about random number uniqueness but those are for one time only.
If you think it is not possible then is there any alternative way etc.?
You could use uuid to generate RFC 4122 UUIDs (Universally Unique IDentifiers). For example uuid4 generates a random UUID:
In [1]: import uuid
In [2]: u = uuid.uuid4()
In [3]: u
Out[3]: UUID('deb1064a-e885-4ebc-9afc-f5291120edf8')
To get the number, access the int attribute of uuid.UUID objects:
In [4]: u.int
Out[4]: 242844464987419882393579831900689854160
Unique and random are contradictory. For anything that's genuinely random there is a (small, maybe infinitessimal) chance of repetition.
If you want something less unwieldy (but less universally unique) than UUIDs you can roll your own combination of a random number (with a small chance of repetition) and a number derived from the time (for example the Unix epoch, which won't ever repeat for a single instance if the script is run less often than once per second).
If the random number is used as part of (say) a filename you can generate a name and then check whether the file already exists. If it does, then reject the random number as already used, and generate another one. Or if you really need to, you could store all random numbers already used somewhere. Load them before each run, add the new number and save after each run.
Finally there are pseudo-"random" generators of the form X(n+1) = (X(n)*a + b) mod M. These are hopeless for security / cryptography because given a few members of the sequence, you can discover the algorithm and predict all future numbers. However, if that predictability is unimportant, then with appropriate constants you can guarantee no repeats until all M members of the sequence have been generated. The numbers are not at all random, but they may appear random to a casual observer.

What is an efficient way to write password cracking algorithm (python)

This problem might be relatively simple, but I'm given two text files. One text file contains all encrypted passwords encrypted via crypt.crypt in python. The other list contains over 400k+ normal dictionary words.
The assignment is that given 3 different functions which transform strings from their normal case to all different permutations of capitalizations, transforms a letter to a number (if it looks alike, e.g. G -> 6, B -> 8), and reverses a string. The thing is that given the 10 - 20 encrypted passwords in the password file, what is the most efficient way to get the fastest running solution in python to run those functions on dictionary word in the words file? It is given that all those words, when transformed in whatever way, will encrypt to a password in the password file.
Here is the function which checks if a given string, when encrypted, is the same as the encrypted password passed in:
def check_pass(plaintext,encrypted):
crypted_pass = crypt.crypt(plaintext,encrypted)
if crypted_pass == encrypted:
return True
else:
return False
Thanks in advance.
Without knowing details about the underlying hash algorithm and possible weaknesses of the algorithm all you can do is to run a brute-force attack trying all possible transformations of the words in your password list.
The only way to speed up such a brute-force attack is to get more powerful hardware and to split the task and run the cracker in parallel.
On my slow laptop, crypt.crypt takes about 20 microseconds:
$ python -mtimeit -s'import crypt' 'crypt.crypt("foobar", "zappa")'
10000 loops, best of 3: 21.8 usec per loop
so, the brute force approach (really the only sensible one) is "kinda" feasible. By applying your transformation functions you'll get (ballpark estimate) about 100 transformed words per dictionary word (mostly from the capitalization changes), so, about 40 million transformed words out of your whole dictionary. At 20 microseconds each, that will take about 800 seconds, call it 15 minutes, for the effort of trying to crack one of the passwords that doesn't actually correspond to any of the variations; expected time about half that, to crack a password that does correspond.
So, if you have 10 passwords to crack, and they all do correspond to a transformed dictionary word, you should be done in an hour or two. Is that OK? Because there isn't much else you can do except distribute this embarassingly parallel problem over as many nodes and cores as you can grasp (oh, and, use a faster machine in the first place -- that might buy you perhaps a factor of two or thereabouts).
There is no deep optimization trick that you can add, so the general logic will be that of a triple-nested loop: one level loops over the encrypted passwords, one over the words in the dictionary, one over the variants of each dictionary word. There isn't much difference regarding how you nest things (except the loop on the variants must come within the loop on the words, for simplicity). I recommend encapsulating "give me all variants of this word" as a generator (for simplicity, not for speed) and otherwise minimizing the number of function calls (e.g. there is no reason to use that check_pass function since the inline code is just as clear, and will be microscopically faster).

Best seed for parallel process

I need to run a MonteCarlo simulations in parallel on different machines. The code is in c++, but the program is set up and launched with a python script that set a lot of things, in particular the random seed. The function setseed thake a 4 bytes unsigned integer
Using a simple
import time
setseed(int(time.time()))
is not very good because I submit the jobs to a queue on a cluster, they remain pending for some minutes then they starts, but the start time is impredicible, it can be that two jobs start at the same time (seconds), so I switch to:
setseet(int(time.time()*100))
but I'm not happy. What is the best solution? Maybe I can combine information from: time, machine id, process id. Or maybe the best solution is to read from /dev/random (linux machines)?
How to read 4 bytes from /dev/random?
f = open("/dev/random","rb")
f.read(4)
give me a string, I want an integer!
Reading from /dev/random is a good idea. Just convert the 4 byte string into an Integer:
f = open("/dev/random","rb")
rnd_str = f.read(4)
Either using struct:
import struct
rand_int = struct.unpack('I', rnd_string)[0]
Update Uppercase I is needed.
Or multiply and add:
rand_int = 0
for c in rnd_str:
rand_int <<= 8
rand_int += ord(c)
You could simply copy over the four bytes into an integer, that should be the least of your worries.
But parallel pseudo-random number generation is a rather complex topic and very often not done well. Usually you generate seeds on one machine and distribute them to the others.
Take a look at SPRNG, which handles exactly your problem.
If this is Linux or a similar OS, you want /dev/urandom -- it always produces data immediately.
/dev/random may stall waiting for the system to gather randomness. It does produce cryptographic-grade random numbers, but that is overkill for your problem.
You can use a random number as the seed, which has the advantage of being operating-system agnostic (no /dev/random needed), with no conversion from string to int:
Why not simply use
random.randrange(-2**31, 2**31)
as the seed of each process? Slightly different starting times give wildly different seeds, this way…
You could also alternatively use the random.jumpahead method, if you know roughly how many random numbers each process is going to use (the documentation of random.WichmannHill.jumpahead is useful).

Categories

Resources