I want to generate multiple streams of random numbers in python.
I am writing a program for simulating queues system and want one stream for the inter-arrival time and another stream for the service time and so on.
numpy.random() generates random numbers from a global stream.
In matlab there is something called RandStream which enables me to create multiple streams.
Is there any way to create something like RandStream in Python
Both Numpy and the internal random generators have instantiatable classes.
For just random:
import random
random_generator = random.Random()
random_generator.random()
#>>> 0.9493959884174072
And for Numpy:
import numpy
random_generator = numpy.random.RandomState()
random_generator.uniform(0, 1, 10)
#>>> array([ 0.98992857, 0.83503764, 0.00337241, 0.76597264, 0.61333436,
#>>> 0.0916262 , 0.52129459, 0.44857548, 0.86692693, 0.21150068])
You do not need to use the RandomGen package. Simply initiate two streams would suffice. For example:
import numpy as np
prng1 = np.random.RandomState()
prng2 = np.random.RandomState()
prng1.seed(1)
prng2.seed(1)
Now if you progress both streams using prngX.rand(), you will find that the two streams will give you identical results, which means they are independent streams with the same seed.
To use the random package, simply swap out np.random.RandomState() for random.Random().
For the sake of reproducibility you can pass a seed directly to random.Random() and then call variables from there. Each initiated instance would then run independently from the other. For example, if you run:
import random
rg1 = random.Random(1)
rg2 = random.Random(2)
rg3 = random.Random(1)
for i in range(5): print(rg1.random())
print('')
for i in range(5): print(rg2.random())
print('')
for i in range(5): print(rg3.random())
You'll get:
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
0.956034271889
0.947827487059
0.0565513677268
0.0848719951589
0.835498878129
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
Veedrac's answer did not address how one might generate independent streams.
The best way I could find to generate independent streams is to use a replacement for numpy's RandomState. This is provided by the RandomGen package.
It supports independent random streams, but these use one of three random number generators: PCG64, ThreeFry or Philox. If you want to use the more conventional MT19937, you can rely on jumping instead.
numpy added feature to generate independent streams of Random Numbers using SeedSequence. This process a user-provided seed, typically as an integer of some size, and to convert it into an initial state for a BitGenerator. It uses hashing techniques to ensure that low-quality seeds are turned into high quality initial states (at least, with very high probability).
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]
each stream is PCG64 generator. Random numbers can be generated sequentially as follows -
for i in 1:K
instance[i] = [s.uniform() for s in streams]
There are more ways to generate independent streams of random numbers, check numpydocs.
Related
I've used numpy.random.randint(lower limit, upper limit, size) to make a 2D array with random numbers within the given range. Now I want to freeze this randomly generated array for the follow up steps. So that the numbers don't change every time I run the entire script. Is there a way to do this?
Thanks.
Set a seed so that the random numbers generated are same every time you run it.
numpy.random.seed(0)
Docs
By seeding random by hand you can get the same random number whenever you call it.
You can seed the random() function using seed() function. The input is the seed and the same seed input will result the same output.
from numpy import random
random.seed(1)
first = random.randint(10)
random.seed(1)
second = random.randint(10)
In this code both first and second will be same.
I've searched around but couldn't find an explanation. Please help me. Thanks.
I understand that Python will use system's time if a seed is not provided for random (to the best of my knowledge). My question is: How does Python use this time? Is it the timestamp or some other format?
I ran the following code;
from time import time
import random
t1 = time() #this gave 1590236721.1549928
data = [random.randint(0, 100) for x in range(10)]
t2 = time() #this also gave 1590236721.1549928
Since t1 == t2, I guessed that if UNIX timestamp is used as seed, it should be t1 but after trying it like so;
random.seed(t1)
data1 = [random.randint(0, 100) for x in range(10)]
I got different values: data != data1.
I need more explanations/ clarifications. Thanks.
Python 2
In this Q&A : (for python2.7) random: what is the default seed? You can see that python is not using the result of the time() function "as-is" at all to get the initial seed (and usually, it tries to get urandom values if it can from the OS, first, see https://docs.python.org/2/library/os.html#os.urandom.
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = long(_hexlify(_urandom(2500)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
Python 3
a) As in Python 2, if your OS provides random numbers (with urandom), like in *Nix systems, Python will try to use this (see https://docs.python.org/3/library/random.html#bookkeeping-functions). On Windows, it's using Win32 API's CryptGenRandom
b) even if it is using time(), maybe it's using the time at the start of your program, which may be different than the first time() call you use, so I don't think you can rely on your method of testing.
Last word of general advice: if you want reproducibility, you should explicitly set the seed yourself, and not rely on those implementation details.
when you use the program more than once without hitting in the seed you're using the utc down to the second, so you get a different results each time.
every nano second will give you a different time.
try this:
import random
print ("Random number with seed 42")
random.seed(42)
print ("first Number ", random.randint(1,99))
random.seed(42)
print ("Second Number ", random.randint(1,99))
random.seed(42)
print ("Third Number ", random.randint(1,99))
Say I have some python code:
import random
r=random.random()
Where is the value of r seeded from in general?
And what if my OS has no random, then where is it seeded?
Why isn't this recommended for cryptography? Is there some way to know what the random number is?
Follow da code.
To see where the random module "lives" in your system, you can just do in a terminal:
>>> import random
>>> random.__file__
'/usr/lib/python2.7/random.pyc'
That gives you the path to the .pyc ("compiled") file, which is usually located side by side to the original .py where readable code can be found.
Let's see what's going on in /usr/lib/python2.7/random.py:
You'll see that it creates an instance of the Random class and then (at the bottom of the file) "promotes" that instance's methods to module functions. Neat trick. When the random module is imported anywhere, a new instance of that Random class is created, its values are then initialized and the methods are re-assigned as functions of the module, making it quite random on a per-import (erm... or per-python-interpreter-instance) basis.
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
randint = _inst.randint
The only thing that this Random class does in its __init__ method is seeding it:
class Random(_random.Random):
...
def __init__(self, x=None):
self.seed(x)
...
_inst = Random()
seed = _inst.seed
So... what happens if x is None (no seed has been specified)? Well, let's check that self.seed method:
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
"""
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
super(Random, self).seed(a)
self.gauss_next = None
The comments already tell what's going on... This method tries to use the default random generator provided by the OS, and if there's none, then it'll use the current time as the seed value.
But, wait... What the heck is that _urandom(16) thingy then?
Well, the answer lies at the beginning of this random.py file:
from os import urandom as _urandom
from binascii import hexlify as _hexlify
Tadaaa... The seed is a 16 bytes number that came from os.urandom
Let's say we're in a civilized OS, such as Linux (with a real random number generator). The seed used by the random module is the same as doing:
>>> long(binascii.hexlify(os.urandom(16)), 16)
46313715670266209791161509840588935391L
The reason of why specifying a seed value is considered not so great is that the random functions are not really "random"... They're just a very weird sequence of numbers. But that sequence will be the same given the same seed. You can try this yourself:
>>> import random
>>> random.seed(1)
>>> random.randint(0,100)
13
>>> random.randint(0,100)
85
>>> random.randint(0,100)
77
No matter when or how or even where you run that code (as long as the algorithm used to generate the random numbers remains the same), if your seed is 1, you will always get the integers 13, 85, 77... which kind of defeats the purpose (see this about Pseudorandom number generation) On the other hand, there are use cases where this can actually be a desirable feature, though.
That's why is considered "better" relying on the operative system random number generator. Those are usually calculated based on hardware interruptions, which are very, very random (it includes interruptions for hard drive reading, keystrokes typed by the human user, moving a mouse around...) In Linux, that O.S. generator is /dev/random. Or, being a tad picky, /dev/urandom (that's what Python's os.urandom actually uses internally) The difference is that (as mentioned before) /dev/random uses hardware interruptions to generate the random sequence. If there are no interruptions, /dev/random could be exhausted and you might have to wait a little bit until you can get the next random number. /dev/urandom uses /dev/random internally, but it guarantees that it will always have random numbers ready for you.
If you're using linux, just do cat /dev/random on a terminal (and prepare to hit Ctrl+C because it will start output really, really random stuff)
borrajax#borrajax:/tmp$ cat /dev/random
_+�_�?zta����K�����q�ߤk��/���qSlV��{�Gzk`���#p$�*C�F"�B9��o~,�QH���ɭ�f��̬po�2o�(=��t�0�p|m�e
���-�5�߁ٵ�ED�l�Qt�/��,uD�w&m���ѩ/��;��5Ce�+�M����
~ �4D��XN��?ס�d��$7Ā�kte▒s��ȿ7_���- �d|����cY-�j>�
�b}#�W<դ���8���{�1»
. 75���c4$3z���/̾�(�(���`���k�fC_^C
Python uses the OS random generator or a time as a seed. This means that the only place where I could imagine a potential weakness with Python's random module is when it's used:
In an OS without an actual random number generator, and
In a device where time.time is always reporting the same time (has a broken clock, basically)
If you are concerned about the actual randomness of the random module, you can either go directly to os.urandom or use the random number generator in the pycrypto cryptographic library. Those are probably more random. I say more random because...
Image inspiration came from this other SO answer
Python, NumPy and R all use the same algorithm (Mersenne Twister) for generating random number sequences. Thus, theoretically speaking, setting the same seed should result in same random number sequences in all 3. This is not the case. I think the 3 implementations use different parameters causing this behavior.
R
>set.seed(1)
>runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
Python
In [3]: random.seed(1)
In [4]: [random.random() for x in range(5)]
Out[4]:
[0.13436424411240122,
0.8474337369372327,
0.763774618976614,
0.2550690257394217,
0.49543508709194095]
NumPy
In [23]: import numpy as np
In [24]: np.random.seed(1)
In [25]: np.random.rand(5)
Out[25]:
array([ 4.17022005e-01, 7.20324493e-01, 1.14374817e-04,
3.02332573e-01, 1.46755891e-01])
Is there some way, where NumPy and Python implementation could produce the same random number sequence? Ofcourse as some comments and answers point out, one could use rpy. What I am specifically looking for is to fine tune the parameters in the respective calls in Python and NumPy to get the sequence.
Context: The concern comes from an EDX course offering in which R is used. In one of the forums, it was asked if Python could be used and the staff replied that some assignments would require setting specific seeds and submitting answers.
Related:
Comparing Matlab and Numpy code that uses random number generation From this it seems that the underlying NumPy and Matlab implementation are similar.
python vs octave random generator: This question does come fairly close to the intended answer. Some sort of wrapper around the default state generator is required.
use rpy2 to call r in python, here is a demo, the numpy array data is sharing memory with x in R:
import rpy2.robjects as robjects
data = robjects.r("""
set.seed(1)
x <- runif(5)
""")
print np.array(data)
data[1] = 1.0
print robjects.r["x"]
I realize this is an old question, but I've stumbled upon the same problem recently, and created a solution which can be useful to others.
I've written a random number generator in C, and linked it to both R and Python. This way, the random numbers are guaranteed to be the same in both languages since they are generated using the same C code.
The program is called SyncRNG and can be found here: https://github.com/GjjvdBurg/SyncRNG.
I am currently experimenting with Actor-concurreny (on Python), because I want to learn more about this. Therefore I choosed pykka, but when I test it, it's more than half as slow as an normal function.
The Code is only to look if it works; it's not meant to be elegant. :)
Maybe I made something wrong?
from pykka.actor import ThreadingActor
import numpy as np
class Adder(ThreadingActor):
def add_one(self, i):
l = []
for j in i:
l.append(j+1)
return l
if __name__ == '__main__':
data = np.random.random(1000000)
adder = Adder.start().proxy()
adder.add_one(data)
adder.stop()
This runs not so fast:
time python actor.py
real 0m8.319s
user 0m8.185s
sys 0m0.140s
And now the dummy 'normal' function:
def foo(i):
l = []
for j in i:
l.append(j+1)
return l
if __name__ == '__main__':
data = np.random.random(1000000)
foo(data)
Gives this result:
real 0m3.665s
user 0m3.348s
sys 0m0.308s
So what is happening here is that your functional version is creating two very large lists which is the bulk of the time. When you introduce actors, mutable data like lists must be copied before being sent to the actor to maintain proper concurrency. Also the list created inside the actor must be copied as well when sent back to the sender. This means that instead of two very large lists being created we have four very large lists created instead.
Consider designing things so that data is constructed and maintained by the actor and then queried by calls to the actor minimizing the size of messages getting passed back and forth. Try to apply the principal of minimal data movement. Passing the List in the functional case is only efficient because the data is not actually moving do to leveraging a shared memory space. If the actor was on a different machine we would not have the benefit of a shared memory space even if the message data was immutable and didn't need to be copied.