How to generate random numbers that are unique forever in python

How to generate random numbers that are unique forever in python - python

I have written a script where I need a unique random number every time I run that script. Just for explaination: suppose that I want my script 5 times. Now I want number generated in all times should be unique?
I have found a lot of infomation about random number uniqueness but those are for one time only.
If you think it is not possible then is there any alternative way etc.?

You could use uuid to generate RFC 4122 UUIDs (Universally Unique IDentifiers). For example uuid4 generates a random UUID:
In [1]: import uuid
In [2]: u = uuid.uuid4()
In [3]: u
Out[3]: UUID('deb1064a-e885-4ebc-9afc-f5291120edf8')
To get the number, access the int attribute of uuid.UUID objects:
In [4]: u.int
Out[4]: 242844464987419882393579831900689854160

Unique and random are contradictory. For anything that's genuinely random there is a (small, maybe infinitessimal) chance of repetition.
If you want something less unwieldy (but less universally unique) than UUIDs you can roll your own combination of a random number (with a small chance of repetition) and a number derived from the time (for example the Unix epoch, which won't ever repeat for a single instance if the script is run less often than once per second).
If the random number is used as part of (say) a filename you can generate a name and then check whether the file already exists. If it does, then reject the random number as already used, and generate another one. Or if you really need to, you could store all random numbers already used somewhere. Load them before each run, add the new number and save after each run.
Finally there are pseudo-"random" generators of the form X(n+1) = (X(n)*a + b) mod M. These are hopeless for security / cryptography because given a few members of the sequence, you can discover the algorithm and predict all future numbers. However, if that predictability is unimportant, then with appropriate constants you can guarantee no repeats until all M members of the sequence have been generated. The numbers are not at all random, but they may appear random to a casual observer.

Related

Hashing Algorithm to use for short unique content ID

I was wondering what would be the best hashing algorithm to use to create short + unique ids for list of content items. Each content item is ascii file in the order of 100-500kb.
The requirements I have are:
Must be as short as possible, I have very limited space to store the ids and would like to keep them to say < 10 characters each (when represented as ascii)
Must be unique, i.e. no collisions or at least a negligible chance of collisions
I don't need to it be cryptographically secure
I don't need it to be overly fast (each content item is pretty small)
I am trying to implement this in python so preferably a algorithm that has a python implementation.

In leu of any other recommendation I've currently decided to use the following approach. I am taking blake2 hashing algorithm to create a cryptographically secure hex hash based on the file contents as to minimise the chance of collisions. I then use base64 encoding to map it to an ascii character set of which I just take the first 8 digits of.
Under the assumption that these digits will be perfectly randomised that gives 64^8 combinations that the hash can take. I predict the upper limit to the number of content items I'll ever have is 50k which gives me a probability of at least 1 collision of 0.00044% which I think is acceptably low enough for my use-case (can always up to 9 or 10 digits if needed in the future).
import hashlib
import base64
def get_hash(byte_content, size=8):
hash_bytes = hashlib.blake2b(byte_content,digest_size=size * 3).digest()
hash64 = base64.b64encode(hash_bytes).decode("utf-8")[:size]
return hash64
# Example of use
get_hash(b"some random binary object")

Using hashlib.sha256 to create a unique id; is this guaranteed to be unique?

I am trying to create a unique record id using the following function:
import hashlib
from base64 import b64encode
def make_uid(salt, pepper, key):
s = b64encode(salt)
p = b64encode(pepper)
k = b64encode(key)
return hashlib.sha256(s + p + k).hexdigest()
Where pepper is set like this:
uuid_pepper = uuid.uuid4()
pepper = str(uuid_pepper).encode('ascii')
And salt and key are the same values for every request.
My question is, because of the unique nature of the pepper, will make_uid in this intance always return a unique value, or is there a chance that it can create a duplicate?
The suggested answer is different because I'm not asking about the uniqueness of various uuid types, I'm wondering whether it's at all possible for a sha256 hash to create a collision between two distinct inputs.

I think what you want to know is whether SHA256 is guaranteed to generate a unique hash result. The answer is yes and no. I got the following result from my research, not 100% accurate but close.
In theory, SHA256 will collide. It has 2^256 results. So if we hash 2^256 + 1 times, there must be a collision. Even worse, according to statistics, the possibility of collision within 2^130 times of hashing is 99%.
But you probably won't generate one during your lifetime. Assume we have a computer that can calculate 10,000 hashes per second. It costs this computer 4 * 10^27 years to finish 2^130 hashes. You might not have any idea about how large this number is. The number of years of doing hashing is 2 * 10^22 times of that of human exist on earth. That means that even if you started doing hashing since the first day we were on earth till now, the possibility of collision is still very very small.
Hope that answers your question.

Will changing seed in rnd.randrange prevent identical results? Python 3.3

I am writing a script where I have a list of 7 integers ranging from 66 to 95. I want this list to be pseudo-randomly different each time I call it, but I want to be sure that it never returns the same random arrangement of random numbers.
This is nestled earlier in the file:
rnd = random.Random()
If I set x to 1 in the seed and call the function like so:
rnd.seed(x)
results = [rnd.randrange(66, 96) for i in range(7)]
and then if I call the function over and over, incrementally adding 1 to x in:
rnd.seed(x)
Will I be guaranteed a different set of random arrangements of random numbers until the number of possible iterations loops? When I write this down I feel like the answer is no. And if the answer is no, is there a way to do this? The program will be opened and closed often and if this works I would start it on the seed it last closed on.

There's absolutely no way to guarantee that generating a random list of 7 integers will always be unique, short of checking the newly generated list against a database of all generated lists you've already created. Random is random, and although the chance is slim (1 in 21,870,000,000 to be precise) it will always be possible without some sort of verification of uniqueness.

Django - make_random_password method, is it truly random?

I am using the following method to create a random code for users as part of a booking process:
User.objects.make_random_password()
When the users turn up at the venue, they will present the password.
Is it safe to assume that two people won't end up with the same code?
Thanks

No, it's not safe to assume that two people can't have the same code. Random doesn't mean unique. It may be unlikely and rare, depending on the length you specify and number of users you are dealing with. But you can't rely on its uniqueness.

It depends on now many users you have, and the password length you choose, and how you use User.objects.make_random_password() For the defaults, the chance is essentially zero, IMO;
This method is implemented using get_random_string(). From the django github repo:
def get_random_string(length=12,
allowed_chars='abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
"""
Returns a securely generated random string.
The default length of 12 with the a-z, A-Z, 0-9 character set returns
a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
"""
if not using_sysrandom:
# This is ugly, and a hack, but it makes things better than
# the alternative of predictability. This re-seeds the PRNG
# using a value that is hard for an attacker to predict, every
# time a random string is required. This may change the
# properties of the chosen random sequence slightly, but this
# is better than absolute predictability.
random.seed(
hashlib.sha256(
"%s%s%s" % (
random.getstate(),
time.time(),
settings.SECRET_KEY)
).digest())
return ''.join([random.choice(allowed_chars) for i in range(length)])
According to github, the current code uses a 12 character password from a string of 62 characters (lower- and uppercase letters and numbers) by default. This makes for 62**12 or 3226266762397899821056 (3.22e21) possible different passwords. This is much larger than the current world population (around 7e9).
The letters are picked from this list of characters by the random.choice() function. The question now becomes how likely it is that the repeated calling of random.choice() returns the same sequence twice?
As you can see from the implementation of get_random_string(), the code tries hard to avoid predictability. When not using the OS's pseudo-random value generator (which on Linux and *BSD gathers real randomness from e.g. the times at which ethernet packets or keypresses arrive), it re-seeds the random module's Mersenne Twister predictable PRNG at each call with a combination of the current random state, the current time and (presumably constant) secret key.
So for two identical passwords to be generated, both the state of the random generator (which is about 8 kiB in python) and the time at which they are generated (measured in seconds since the epoch, as per time.time()) have to be identical. If the system's time is well-maintained and you are running one instance of the password generation program, the chance of that happening is essentially zero. If you start two or more instances of this password generating program at exactly the same time and with exactly the same seed for the PRNG, and combine their output, one would expect some passwords to appear more than once.

Maximal Length of List to Shuffle with Python random.shuffle?

I have a list which I shuffle with the Python built in shuffle function (random.shuffle)
However, the Python reference states:
Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated.
Now, I wonder what this "rather small len(x)" means. 100, 1000, 10000,...

TL;DR: It "breaks" on lists with over 2080 elements, but don't worry too much :)
Complete answer:
First of all, notice that "shuffling" a list can be understood (conceptually) as generating all possible permutations of the elements of the lists, and picking one of these permutations at random.
Then, you must remember that all self-contained computerised random number generators are actually "pseudo" random. That is, they are not actually random, but rely on a series of factors to try and generate a number that is hard to be guessed in advanced, or purposefully reproduced. Among these factors is usually the previous generated number. So, in practice, if you use a random generator continuously a certain number of times, you'll eventually start getting the same sequence all over again (this is the "period" that the documentation refers to).
Finally, the docstring on Lib/random.py (the random module) says that "The period [of the random number generator] is 2**19937-1."
So, given all that, if your list is such that there are 2**19937 or more permutations, some of these will never be obtained by shuffling the list. You'd (again, conceptually) generate all permutations of the list, then generate a random number x, and pick the xth permutation. Next time, you generate another random number y, and pick the yth permutation. And so on. But, since there are more permutations than you'll get random numbers (because, at most after 2**19937-1 generated numbers, you'll start getting the same ones again), you'll start picking the same permutations again.
So, you see, it's not exactly a matter of how long your list is (though that does enter into the equation). Also, 2**19937-1 is quite a long number. But, still, depending on your shuffling needs, you should bear all that in mind. On a simplistic case (and with a quick calculation), for a list without repeated elements, 2081 elements would yield 2081! permutations, which is more than 2**19937.

I wrote that comment in the Python source originally, so maybe I can clarify ;-)
When the comment was introduced, Python's Wichmann-Hill generator had a much shorter period, and we couldn't even generate all the permutations of a deck of cards.
The period is astronomically larger now, and 2080 is correct for the current upper bound. The docs could be beefed up to say more about that - but they'd get awfully tedious.
There's a very simple explanation: A PRNG of period P has P possible starting states. The starting state wholly determines the permutation produced. Therefore a PRNG of period P cannot generate more than P distinct permutations (and that's an absolute upper bound - it may not be achieved). That's why comparing N! to P is the correct computation here. And, indeed:
>>> math.factorial(2080) > 2**19937 - 1
False
>>> math.factorial(2081) > 2**19937 - 1
True

What they mean is that permutations on n objects (noted n!) grows absurdly high very fast.
Basically n! = n x n-1 x ... x 1; for example, 5! = 5 x 4 x 3 x 2 x 1 = 120 which means there are 120 possible ways of shuffling a 5-items list.
On the same Python page documentation they give 2^19937-1 as the period, which is 4.something × 10^6001 or something. Based on the Wikipedia page on factorials, I guess 2000! should be around that. (Sorry, I didn't find the exact figure.)
So basically there are so many possible permutations the shuffle will take from that there's probably no real reason to worry about those it won't.
But if it really is an issue (pesky customer asking for a guarantee of randomness perhaps?), you could also offload the task to some third-party; see http://www.random.org/ for example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.