How do I determine what PRNG python.random is using? - python

I'm looking for more information about random. Primarily what was the seed and how many steps have happened in that pseudo random sequence & Was that seed a datetime, i.e. did random use the system provided random or default to time?
I looked at dir(random) and didn't see anything super promising...

The documentation states:
Almost all module functions depend on the basic function random(),
which generates a random float uniformly in the semi-open range [0.0,
1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The
underlying implementation in C is both fast and threadsafe. The
Mersenne Twister is one of the most extensively tested random number
generators in existence. However, being completely deterministic, it
is not suitable for all purposes, and is completely unsuitable for
cryptographic purposes.
You can study the source code here, in particular the default implementation "seeds from current time or from an operating system specific randomness source if available"

Related

In what situations do we specifiy a static seed value when using the random module in Python? [duplicate]

From the docs:
random.seed(a=None, version=2) Initialize the random number generator.
If a is omitted or None, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).
But...if it's truly random...(and I thought I read it uses Mersenne, so it's VERY random)...what's the point in seeding it? Either way the outcome is unpredictable...right?
The default is probably best if you want different random numbers with each run. If for some reason you need repeatable random numbers, in testing for instance, use a seed.
The module actually seeds the generator (with OS-provided random data from urandom if possible, otherwise with the current date and time) when you import the module, so there's no need to manually call seed().
(This is mentioned in the Python 2.7 documentation but, for some reason, not the 3.x documentation. However, I confirmed in the 3.x source that it's still done.)
If the automatic seeding weren't done, you'd get the same sequence of numbers every time you started your program, same as if you manually use the same seed every time.
But...if it's truly random
No, it's pseudo random. If it uses Mersenne Twister, that too is a PRNG.
It's basically an algorithm that generates the exact same sequence of pseudo random numbers out of a given seed. Generating truly random numbers requires special hardware support, it's not something you can do by a pure algorithm.
You might not need to seed it since it seeds itself on first use, unless you have some other or better means of providing a seed than what is time based.
If you use the random numbers for things that are not security related, a time based seed is normally fine. If you use if for security/cryptography, note what the docs say: "and is completely unsuitable for cryptographic purposes"
If you want to reproduce your results, you seed the generator with a known value so you get the same sequence every time.
A Mersenne twister, the random number generator, used by Python is seeded by the operating system served random numbers by default on those platforms where it is possible (Unixen, Windows); however on other platforms the seed defaults to the system time which means very repeatable values if the system time has a bad precision. On such systems seeding with known better random values is thus beneficial. Note that on Python 3 specifically, if version 2 is used, you can pass in any str, bytes, or bytearray to seed the generator; thus taking use of the Mersenne twister's large state better.
Another reason to use a seed value is indeed to guarantee that you get the same sequence of random numbers again and again - by reusing the known seed. Quoting the docs:
Sometimes it is useful to be able to reproduce the sequences given by
a pseudo random number generator. By re-using a seed value, the same
sequence should be reproducible from run to run as long as multiple
threads are not running.
Most of the random module’s algorithms and seeding functions are
subject to change across Python versions, but two aspects are
guaranteed not to change:
If a new seeding method is added, then a backward compatible seeder will be offered.
The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed.
For this however, you mostly want to use the random.Random instances instead of using module global methods (the multiple threads issue, etc).
Finally also note that the random numbers produced by Mersenne twister are unsuitable for cryptographical use; whereas they appear very random, it is possible to fully recover the internal state of the random generator by observing only some hundreds of values from the generator. For cryptographical algorithms, you want to use the SystemRandom class.
In most cases I would say there is no need to care about. But if someone is really willing to do something wired and (s)he could roughly figure out your system time when your code was running, they might be able to brute force replay your random numbers and see which series fits. But I would say this is quite unlikely in most cases.

If you run python random function in a loop a hundred/thousands can numbers repeat themselves? I'm aware of how seeding works

for I in range(100):
print(random()) # everything being imported and using the system time as seed.
# is it possible to get repeating random values?
According to the official documentation:
Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.
It means there is a possibility to getting same values in each iteration. But it is so small possibility.
If you don't require any specific value range, you can add i variable to each iteration to ensure that the values ​​are not repeated. but it doesn't make much sense if you use random().
Otherwise each iteration can be repeated without problems, that is the meaning of random.
import random
for i in range(100):
print(random.random() + i)

Best practices for seeding random and numpy.random in the same program

In order to make random simulations we run reproducible later, my colleagues and I often explicitly seed the random or numpy.random modules' random number generators using the random.seed and np.random.seed methods. Seeding with an arbitrary constant like 42 is fine if we're just using one of those modules in a program, but sometimes, we use both random and np.random in the same program. I'm unsure whether there are any best practices I should be following about how to seed the two RNGs together.
In particular, I'm worried that there's some sort of trap we could fall into where the two RNGs together behave in a "non-random" way, such as both generating the exact same sequence of random numbers, or one sequence trailing the other by a few values (e.g. the kth number from random is always the k+20th number from np.random), or the two sequences being related to each other in some other mathematical way. (I realise that pseudo-random number generators are all imperfect simulations of true randomness, but I want to avoid exacerbating this with poor seed choices.)
With this objective in mind, are there any particular ways we should or shouldn't seed the two RNGs? I've used, or seen colleagues use, a few different tactics, like:
Using the same arbitrary seed:
random.seed(42)
np.random.seed(42)
Using two different arbitrary seeds:
random.seed(271828)
np.random.seed(314159)
Using a random number from one RNG to seed the other:
random.seed(42)
np.random.seed(random.randint(0, 2**32))
... and I've never noticed any strange outcomes from any of these approaches... but maybe I've just missed them. Are there any officially blessed approaches to this? And are there any possible traps that I can spot and raise the alarm about in code review?
I will discuss some guidelines on how multiple pseudorandom number generators (PRNGs) should be seeded. I assume you're not using random-behaving numbers for information security purposes (if you are, only a cryptographic generator is appropriate and this advice doesn't apply).
To reduce the risk of correlated pseudorandom numbers, you can use PRNG algorithms, such as SFC and other so-called "counter-based" PRNGs (Salmon et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011), that support independent "streams" of pseudorandom numbers. There are other strategies as well, and I explain more about this in "Seeding Multiple Processes".
If you can use NumPy 1.17, note that that version introduced a new PRNG system and added SFC (SFC64) to its repertoire of PRNGs. For NumPy-specific advice on parallel pseudorandom generation, see "Parallel Random Number Generation" in the NumPy documentation.
You should avoid seeding PRNGs (especially several at once) with timestamps.
You mentioned this question in a comment, when I started writing this answer. The advice there is not to seed multiple instances of the same kind of PRNG. This advice, however, doesn't apply as much if the seeds are chosen to be unrelated to each other, or if a PRNG with a very big state (such as Mersenne Twister) or a PRNG that gives each seed its own nonoverlapping pseudorandom number sequence (such as SFC) is used. The accepted answer there (at the time of this writing) demonstrates what happens when multiple instances of .NET's System.Random, with sequential seeds, are used, but not necessarily what happens with PRNGs of a different design, PRNGs of multiple designs, or PRNGs initialized with unrelated seeds. Moreover, .NET's System.Random is a poor choice for a PRNG precisely because it allows only seeds no more than 32 bits long (so the number of pseudorandom sequences it can produce is limited), and also because it has implementation bugs (if I understand correctly) that have been preserved for backward compatibility.

Should I seed the random number generator?

From the docs:
random.seed(a=None, version=2) Initialize the random number generator.
If a is omitted or None, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).
But...if it's truly random...(and I thought I read it uses Mersenne, so it's VERY random)...what's the point in seeding it? Either way the outcome is unpredictable...right?
The default is probably best if you want different random numbers with each run. If for some reason you need repeatable random numbers, in testing for instance, use a seed.
The module actually seeds the generator (with OS-provided random data from urandom if possible, otherwise with the current date and time) when you import the module, so there's no need to manually call seed().
(This is mentioned in the Python 2.7 documentation but, for some reason, not the 3.x documentation. However, I confirmed in the 3.x source that it's still done.)
If the automatic seeding weren't done, you'd get the same sequence of numbers every time you started your program, same as if you manually use the same seed every time.
But...if it's truly random
No, it's pseudo random. If it uses Mersenne Twister, that too is a PRNG.
It's basically an algorithm that generates the exact same sequence of pseudo random numbers out of a given seed. Generating truly random numbers requires special hardware support, it's not something you can do by a pure algorithm.
You might not need to seed it since it seeds itself on first use, unless you have some other or better means of providing a seed than what is time based.
If you use the random numbers for things that are not security related, a time based seed is normally fine. If you use if for security/cryptography, note what the docs say: "and is completely unsuitable for cryptographic purposes"
If you want to reproduce your results, you seed the generator with a known value so you get the same sequence every time.
A Mersenne twister, the random number generator, used by Python is seeded by the operating system served random numbers by default on those platforms where it is possible (Unixen, Windows); however on other platforms the seed defaults to the system time which means very repeatable values if the system time has a bad precision. On such systems seeding with known better random values is thus beneficial. Note that on Python 3 specifically, if version 2 is used, you can pass in any str, bytes, or bytearray to seed the generator; thus taking use of the Mersenne twister's large state better.
Another reason to use a seed value is indeed to guarantee that you get the same sequence of random numbers again and again - by reusing the known seed. Quoting the docs:
Sometimes it is useful to be able to reproduce the sequences given by
a pseudo random number generator. By re-using a seed value, the same
sequence should be reproducible from run to run as long as multiple
threads are not running.
Most of the random module’s algorithms and seeding functions are
subject to change across Python versions, but two aspects are
guaranteed not to change:
If a new seeding method is added, then a backward compatible seeder will be offered.
The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed.
For this however, you mostly want to use the random.Random instances instead of using module global methods (the multiple threads issue, etc).
Finally also note that the random numbers produced by Mersenne twister are unsuitable for cryptographical use; whereas they appear very random, it is possible to fully recover the internal state of the random generator by observing only some hundreds of values from the generator. For cryptographical algorithms, you want to use the SystemRandom class.
In most cases I would say there is no need to care about. But if someone is really willing to do something wired and (s)he could roughly figure out your system time when your code was running, they might be able to brute force replay your random numbers and see which series fits. But I would say this is quite unlikely in most cases.

Reproducibility of python pseudo-random numbers across systems and versions?

I need to generate a controlled sequence of pseudo-random numbers, given an initial parameter. For that I'm using the standard python random generator, seeded by this parameter. I'd like to make sure that I will generate the same sequence across systems (Operating system, but also Python version).
In summary: Does python ensure the reproducibility / portability of it's pseudo-random number generator across implementation and versions?
No, it doesn't. There's no such promise in the random module's documentation.
What the docs do contain is this remark:
Changed in version 2.3: MersenneTwister replaced Wichmann-Hill as the default generator
So a different RNG was used prior to Python 2.3.
So far, I've been using numpy.random.RandomState for reproducible pseudo-randomness, though it too does not make the formal promise you're after.
If you want full reproducibility, you might want to include a copy of random's source in your program, or hack together a "P²RNG" (pseudo-pseudo-RNG) from hashlib.
Not necessarily.
As described in the documentation, the random module has used the Mersenne twister to generate random numbers since version 2.3, but used Wichmann-Hill before that.
(If a seed is not provided, the method of obtaining the seed also does depend on the operating system, the Python version, and factors such as the system time).
#reubano - 3.2 changed the integer functions in random, to produce more evenly distributed (which inevitably means different) output.
That change was discussed in Issue9025, where the team discuss whether they have an obligation to stick to the previous output, even when it was defective. They conclude that they do not. The docs for the module guarantee consistency for random.random() - one might assume that the functions which call it (like random.randrange()) are implicitly covered under that guarantee, but that doesn't seem to be the case.
Just as a heads up: in addition to the 2.3 change, python 3 gives numbers from python 2.x from randrange and probably other functions, even if the numbers from random.random are similar.
I just found out that there is also a difference between python3.7 and python3.8.
The following code behaves the same
from random import Random
seed = 317
rand = Random(seed)
rand.getrandbits(64)
but if you use from _random import Random instead, it behaves differently.

Categories

Resources