Storing a Random state

Storing a Random state - python

I'm designing a program which:
Includes randomness
Can stop executing and save its state at certain points (in XML)
Can start executing starting from a saved state
Is deterministic (so the program can run from the same state twice and produces the same result)
The problem here is saving the randomness. I can initialize it at start, but from state to state I may generate anywhere from 0 to 1000 random numbers.
Therefore, I have 3 options I can see:
Store the seed, and number of times a number has been randomly generated, then when loading the state, run the random number generator that many times.
On state save, increment the seed by N
On state save, randomly generate the next seed
The problem with option 1 is the run time, and is pretty infeasible.
However, I'm unsure whether 2 or 3 will produce good random results. If I run two random generators, one seeded with X, the other seeded with X+1, how different will their results be? What if the first is seeded with X, and the second is seeded with X.random()?
In case it makes a difference, I'm using Python 3.

You can save the state of the PRNG using random.getstate() (then, e.g., use pickle to save it to disk. Later, a random.setstate(state) will return your PRNG to exactly the state it was in.

Related

Is there anyway to calculate the next seed knowing the previous seed?

I'm trying to develop a program in Python to predict the outcome of a Pseudo Random Number Generator.
I already have a program that gets the seed of the previously generated number using seed = random.getstate(). My question is whether there is any way to calculate the next seed that will be used, so I can predict the next number.

The reason that pseudorandom number generators are so named is that they're deterministic; they generate a sequence of numbers that appear to be random, but which aren't really. If you start a PRNG with the same seed, you'll get the same sequence every time.
I already have a programm that gets the seed of the previous generated number using seed = random.getstate()
You're not really getting a seed here, but rather the internal state of the PRNG. You could save that state and set it again later. That could be useful for testing, or just to continue with the same sequence.
Now, my question is if there is anyway to calculate the next seed that will be used, so I can predict the number.
Again, not really a seed, which is the initial value that you supply to start a PRNG sequence. What you're getting is the internal state of the PRNG. But yes, if you have that state, then it's trivial to predict the next number: just call random.setstate(...) with the state that you got, generate the next number, and then call random.setstate(...) again to put the PRNG back in the same state so that it'll again generate that same number.

Python opencv cv2.KMEANS_RANDOM_CENTERS

Based on the cv2.kmeans function, I have written a function "F(Image)" with "label" as output.
ret,label,center=cv2.kmeans(Image,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
The output of F(Image), "label", is later used for other image processing.
However, I need to run F(Image) for numerous images. I noticed that the labels are different if I ran, say, F(Image1) and F(Image2) consecutively versus F(Image1) and F(Image2) separately.
My suspicion is that every time cv2.KMEANS_RANDOM_CENTERS is ran, it starts at a different random number.
Without going into the source code of cv2.KMEANS_RANDOM_CENTERS, is there any way to ensure that the labels are the same every time I run the code? Or run F(Image1) and F(Image2) as in they are ran separately.

cv2.kmeans() takes 2 type of flags: cv2.KMEANS_PP_CENTERS and cv2.KMEANS_RANDOM_CENTERS.
cv2.KMEANS_RANDOM_CENTERS:
With this flag enabled, the method always starts with a random set of initial samples, and tries to converge from there depending upon your TermCirteria.
Pros:
Saves computation Time.
Cons:
Doesn't guarantee same labels for the exact same image.
cv2.KMEANS_PP_CENTERS:
With this flag enabled, the method first iterates the whole image to determine the probable centers and then starts to converge.
Pros:
Would yield optimum and consistent results for same input image
Cons:
Takes extra computation time for iterating all the pixels and determining probable samples.
Note: I have also read about another flag cv::KMEANS_USE_INITIAL_LABELS, using which you can pass custom samples, which are used by the method to converge, But in the documentation linked, that flag is not mentioned, Not Sure if it has been deprecated or the documentation is not updated.

The method only keeps the best labels after each iteration. So if the number of iterations you set is high enough, let's say cv2.kmeans(Image,K,None,criteria,100,cv2.KMEANS_RANDOM_CENTERS), the output result would be similar.

Should I seed the random number generator?

From the docs:
random.seed(a=None, version=2) Initialize the random number generator.
If a is omitted or None, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).
But...if it's truly random...(and I thought I read it uses Mersenne, so it's VERY random)...what's the point in seeding it? Either way the outcome is unpredictable...right?

The default is probably best if you want different random numbers with each run. If for some reason you need repeatable random numbers, in testing for instance, use a seed.

The module actually seeds the generator (with OS-provided random data from urandom if possible, otherwise with the current date and time) when you import the module, so there's no need to manually call seed().
(This is mentioned in the Python 2.7 documentation but, for some reason, not the 3.x documentation. However, I confirmed in the 3.x source that it's still done.)
If the automatic seeding weren't done, you'd get the same sequence of numbers every time you started your program, same as if you manually use the same seed every time.

But...if it's truly random
No, it's pseudo random. If it uses Mersenne Twister, that too is a PRNG.
It's basically an algorithm that generates the exact same sequence of pseudo random numbers out of a given seed. Generating truly random numbers requires special hardware support, it's not something you can do by a pure algorithm.
You might not need to seed it since it seeds itself on first use, unless you have some other or better means of providing a seed than what is time based.
If you use the random numbers for things that are not security related, a time based seed is normally fine. If you use if for security/cryptography, note what the docs say: "and is completely unsuitable for cryptographic purposes"

If you want to reproduce your results, you seed the generator with a known value so you get the same sequence every time.

A Mersenne twister, the random number generator, used by Python is seeded by the operating system served random numbers by default on those platforms where it is possible (Unixen, Windows); however on other platforms the seed defaults to the system time which means very repeatable values if the system time has a bad precision. On such systems seeding with known better random values is thus beneficial. Note that on Python 3 specifically, if version 2 is used, you can pass in any str, bytes, or bytearray to seed the generator; thus taking use of the Mersenne twister's large state better.
Another reason to use a seed value is indeed to guarantee that you get the same sequence of random numbers again and again - by reusing the known seed. Quoting the docs:
Sometimes it is useful to be able to reproduce the sequences given by
a pseudo random number generator. By re-using a seed value, the same
sequence should be reproducible from run to run as long as multiple
threads are not running.
Most of the random module’s algorithms and seeding functions are
subject to change across Python versions, but two aspects are
guaranteed not to change:
If a new seeding method is added, then a backward compatible seeder will be offered.
The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed.
For this however, you mostly want to use the random.Random instances instead of using module global methods (the multiple threads issue, etc).
Finally also note that the random numbers produced by Mersenne twister are unsuitable for cryptographical use; whereas they appear very random, it is possible to fully recover the internal state of the random generator by observing only some hundreds of values from the generator. For cryptographical algorithms, you want to use the SystemRandom class.

In most cases I would say there is no need to care about. But if someone is really willing to do something wired and (s)he could roughly figure out your system time when your code was running, they might be able to brute force replay your random numbers and see which series fits. But I would say this is quite unlikely in most cases.

What is pseudo random?

I was reading the docs for the random module and noticed it said pseudo random and thought doesnt pseudo mean False so i was wondering what it means when it says that
For Example:
import random
print random.randint(1,2)
print random.randint(1,3)
does this still mean that the first print statement has a 50% chance of printing 1 and a 50% chance of printing 2
and that the second print statement has a 33% chance of printing one and a 33% chance of printing 2 etc.
if not then how are the pseudo random numbers generated ?

To produce true randomness requires specialized hardware that measures random events, such as radioactive decay (random) or brownian motion (also essentially random). Most computers obviously don't have these, so instead you have to use a really complex, evenly distributed, hard to predict 'pseudorandom' algorithm that starts with a number determined by, for example, the current timestamp. Such algorithms are plenty good enough for standard use cases needing 'randomness' as long as you're careful to not seed two random number generators with the same timestamp (start them at the same time on different threads, for example), which will make them do identical things. A common example of such a random number generator is Mersenne Twister: http://en.wikipedia.org/wiki/Mersenne_twister
A site that offers truly random values, explains a lot about randomness and pseudorandomness and has some yummy statistics about its randomness: http://www.random.org/ (see Learn More and Statistics) (It actually seems that it relies on measuring tiny fluctuations in a chaotic system, e.g. atmospheric noise, but the statistics show that it is so much like true randomness you can't tell it apart!)

Seeding random in django

In a view in django I use random.random(). How often do I have to call random.seed()?
One time for every request?
One time for every season?
One time while the webserver is running?

Don't set the seed.
The only time you want to set the seed is if you want to make sure that the same events keep happening. For example, if you don't want to let players cheat in your game you can save the seed, and then set it when they load their game. Then no matter how many times they save + reload, it still gives the same outcomes.

Call random.seed() rarely if at all.
To be random, you must allow the random number generator to run without touching the seed. The sequence of numbers is what's random. If you change the seed, you start a new sequence. The seed values may not be very random, leading to problems.
Depending on how many numbers you need, you can consider resetting the seed from /dev/random periodically.
You should try to reset the seed just before you've used up the previous seed. You don't get the full 32 bits of randomness, so you might want to reset the seed after generating 2**28 numbers.

It really depends on what you need the random number for. Use some experimentation to find out if it makes any difference. You should also consider that there is actually a pattern to pseudo-random numbers. Does it make a difference to you if someone can possible guess the next random number? If not, seed it once at the start of a session or when the server first starts up.
Seeding once at the start of the session would probably make the most sense, IMO. This way the user will get a set of pseudo-random numbers throughout their session. If you seed every time a page is served, they aren't guaranteed this.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.