Python: where is random.random() seeded? - python

Say I have some python code:
import random
r=random.random()
Where is the value of r seeded from in general?
And what if my OS has no random, then where is it seeded?
Why isn't this recommended for cryptography? Is there some way to know what the random number is?

Follow da code.
To see where the random module "lives" in your system, you can just do in a terminal:
>>> import random
>>> random.__file__
'/usr/lib/python2.7/random.pyc'
That gives you the path to the .pyc ("compiled") file, which is usually located side by side to the original .py where readable code can be found.
Let's see what's going on in /usr/lib/python2.7/random.py:
You'll see that it creates an instance of the Random class and then (at the bottom of the file) "promotes" that instance's methods to module functions. Neat trick. When the random module is imported anywhere, a new instance of that Random class is created, its values are then initialized and the methods are re-assigned as functions of the module, making it quite random on a per-import (erm... or per-python-interpreter-instance) basis.
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
randint = _inst.randint
The only thing that this Random class does in its __init__ method is seeding it:
class Random(_random.Random):
...
def __init__(self, x=None):
self.seed(x)
...
_inst = Random()
seed = _inst.seed
So... what happens if x is None (no seed has been specified)? Well, let's check that self.seed method:
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
"""
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
super(Random, self).seed(a)
self.gauss_next = None
The comments already tell what's going on... This method tries to use the default random generator provided by the OS, and if there's none, then it'll use the current time as the seed value.
But, wait... What the heck is that _urandom(16) thingy then?
Well, the answer lies at the beginning of this random.py file:
from os import urandom as _urandom
from binascii import hexlify as _hexlify
Tadaaa... The seed is a 16 bytes number that came from os.urandom
Let's say we're in a civilized OS, such as Linux (with a real random number generator). The seed used by the random module is the same as doing:
>>> long(binascii.hexlify(os.urandom(16)), 16)
46313715670266209791161509840588935391L
The reason of why specifying a seed value is considered not so great is that the random functions are not really "random"... They're just a very weird sequence of numbers. But that sequence will be the same given the same seed. You can try this yourself:
>>> import random
>>> random.seed(1)
>>> random.randint(0,100)
13
>>> random.randint(0,100)
85
>>> random.randint(0,100)
77
No matter when or how or even where you run that code (as long as the algorithm used to generate the random numbers remains the same), if your seed is 1, you will always get the integers 13, 85, 77... which kind of defeats the purpose (see this about Pseudorandom number generation) On the other hand, there are use cases where this can actually be a desirable feature, though.
That's why is considered "better" relying on the operative system random number generator. Those are usually calculated based on hardware interruptions, which are very, very random (it includes interruptions for hard drive reading, keystrokes typed by the human user, moving a mouse around...) In Linux, that O.S. generator is /dev/random. Or, being a tad picky, /dev/urandom (that's what Python's os.urandom actually uses internally) The difference is that (as mentioned before) /dev/random uses hardware interruptions to generate the random sequence. If there are no interruptions, /dev/random could be exhausted and you might have to wait a little bit until you can get the next random number. /dev/urandom uses /dev/random internally, but it guarantees that it will always have random numbers ready for you.
If you're using linux, just do cat /dev/random on a terminal (and prepare to hit Ctrl+C because it will start output really, really random stuff)
borrajax#borrajax:/tmp$ cat /dev/random
_+�_�?zta����K�����q�ߤk��/���qSlV��{�Gzk`���#p$�*C�F"�B9��o~,�QH���ɭ�f�޺�̬po�2o𷿟�(=��t�0�p|m�e
���-�5�߁ٵ�ED�l�Qt�/��,uD�w&m���ѩ/��;��5Ce�+�M����
~ �4D��XN��?ס�d��$7Ā�kte▒s��ȿ7_���- �d|����cY-�j>�
�b}#�W<դ���8���{�1»
. 75���c4$3z���/̾�(�(���`���k�fC_^C
Python uses the OS random generator or a time as a seed. This means that the only place where I could imagine a potential weakness with Python's random module is when it's used:
In an OS without an actual random number generator, and
In a device where time.time is always reporting the same time (has a broken clock, basically)
If you are concerned about the actual randomness of the random module, you can either go directly to os.urandom or use the random number generator in the pycrypto cryptographic library. Those are probably more random. I say more random because...
Image inspiration came from this other SO answer

Related

Seeding the random generator for tests

I made it work using factory-boy's get_random_state/set_random_state, although it wasn't easy. And the biggest downside is that the values are big. So the thing that comes to mind is to write it to a file. But then if I accidentally run the tests not telling it to seed from the file, the value is lost. Now that I think about it, I can display the value too (think tee). But still I'd like to reduce it to 4-5 digits.
My idea is as follows. Normally when you run tests it somewhere says, "seed: 4215." Then to reproduce the same result I've got to do SEED=4215 ./manage.py test or something.
I did some experiments with factory-boy, but then I realized that I can't achieve this even with the random module itself. I tried different ideas. All of them failed so far. The simplest is this:
import random
import os
if os.getenv('A'):
random.seed(os.getenv('A'))
else:
seed = random.randint(0, 1000)
random.seed(seed)
print('seed: {}'.format(seed))
print(random.random())
print(random.random())
/app $ A= python a.py
seed: 62
0.9279915658776743
0.17302689004804395
/app $ A=62 python a.py
0.461603098412836
0.7402019819205794
Why do the results differ? And how to make them equal?
Currently your types are different:
if os.getenv('A'):
random.seed(os.getenv('A'))
else:
seed = random.randint(0, 1000)
random.seed(seed)
print('seed: {}'.format(seed))
In the first case, you have a str and in the second an int. You can fix this by casting an int in the first case:
random.seed(int(os.getenv("A")))
I'm also not entirely following your need to seed random directly; I think with Factory Boy you can use factory.random.reseed_random (source).

How does Python's random use system time?

I've searched around but couldn't find an explanation. Please help me. Thanks.
I understand that Python will use system's time if a seed is not provided for random (to the best of my knowledge). My question is: How does Python use this time? Is it the timestamp or some other format?
I ran the following code;
from time import time
import random
t1 = time() #this gave 1590236721.1549928
data = [random.randint(0, 100) for x in range(10)]
t2 = time() #this also gave 1590236721.1549928
Since t1 == t2, I guessed that if UNIX timestamp is used as seed, it should be t1 but after trying it like so;
random.seed(t1)
data1 = [random.randint(0, 100) for x in range(10)]
I got different values: data != data1.
I need more explanations/ clarifications. Thanks.
Python 2
In this Q&A : (for python2.7) random: what is the default seed? You can see that python is not using the result of the time() function "as-is" at all to get the initial seed (and usually, it tries to get urandom values if it can from the OS, first, see https://docs.python.org/2/library/os.html#os.urandom.
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = long(_hexlify(_urandom(2500)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
Python 3
a) As in Python 2, if your OS provides random numbers (with urandom), like in *Nix systems, Python will try to use this (see https://docs.python.org/3/library/random.html#bookkeeping-functions). On Windows, it's using Win32 API's CryptGenRandom
b) even if it is using time(), maybe it's using the time at the start of your program, which may be different than the first time() call you use, so I don't think you can rely on your method of testing.
Last word of general advice: if you want reproducibility, you should explicitly set the seed yourself, and not rely on those implementation details.
when you use the program more than once without hitting in the seed you're using the utc down to the second, so you get a different results each time.
every nano second will give you a different time.
try this:
import random
print ("Random number with seed 42")
random.seed(42)
print ("first Number ", random.randint(1,99))
random.seed(42)
print ("Second Number ", random.randint(1,99))
random.seed(42)
print ("Third Number ", random.randint(1,99))

What are use cases to hand over different numbers in random.seed( 0 )

What are use cases to hand over different numbers in random.seed(0)?
import random
random.seed(0)
random.random()
For example, to use random.seed(17) or random.seed(9001) instead of always using random.seed(0). Both return the same "pseudo" random numbers that can be used for testing.
import random
random.seed(17)
random.random()
Why dont use always random.seed(0)?
The seed is saying "random, but always the same randomness". If you want to randomize, e.g. search results, but not for every search you could pass the current day.
If you want to randomize per user you could use a user ID and so on.
An application should specify its own seed (e.g., with random.seed()) only if it needs reproducible "randomness"; examples include unit tests, games that display a "code" based on the seed to players, and simulations. Specifying a seed this way is not appropriate where information security is involved. See also my article on this matter.

Generate multiple independent random streams in python

I want to generate multiple streams of random numbers in python.
I am writing a program for simulating queues system and want one stream for the inter-arrival time and another stream for the service time and so on.
numpy.random() generates random numbers from a global stream.
In matlab there is something called RandStream which enables me to create multiple streams.
Is there any way to create something like RandStream in Python
Both Numpy and the internal random generators have instantiatable classes.
For just random:
import random
random_generator = random.Random()
random_generator.random()
#>>> 0.9493959884174072
And for Numpy:
import numpy
random_generator = numpy.random.RandomState()
random_generator.uniform(0, 1, 10)
#>>> array([ 0.98992857, 0.83503764, 0.00337241, 0.76597264, 0.61333436,
#>>> 0.0916262 , 0.52129459, 0.44857548, 0.86692693, 0.21150068])
You do not need to use the RandomGen package. Simply initiate two streams would suffice. For example:
import numpy as np
prng1 = np.random.RandomState()
prng2 = np.random.RandomState()
prng1.seed(1)
prng2.seed(1)
Now if you progress both streams using prngX.rand(), you will find that the two streams will give you identical results, which means they are independent streams with the same seed.
To use the random package, simply swap out np.random.RandomState() for random.Random().
For the sake of reproducibility you can pass a seed directly to random.Random() and then call variables from there. Each initiated instance would then run independently from the other. For example, if you run:
import random
rg1 = random.Random(1)
rg2 = random.Random(2)
rg3 = random.Random(1)
for i in range(5): print(rg1.random())
print('')
for i in range(5): print(rg2.random())
print('')
for i in range(5): print(rg3.random())
You'll get:
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
0.956034271889
0.947827487059
0.0565513677268
0.0848719951589
0.835498878129
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
Veedrac's answer did not address how one might generate independent streams.
The best way I could find to generate independent streams is to use a replacement for numpy's RandomState. This is provided by the RandomGen package.
It supports independent random streams, but these use one of three random number generators: PCG64, ThreeFry or Philox. If you want to use the more conventional MT19937, you can rely on jumping instead.
numpy added feature to generate independent streams of Random Numbers using SeedSequence. This process a user-provided seed, typically as an integer of some size, and to convert it into an initial state for a BitGenerator. It uses hashing techniques to ensure that low-quality seeds are turned into high quality initial states (at least, with very high probability).
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]
each stream is PCG64 generator. Random numbers can be generated sequentially as follows -
for i in 1:K
instance[i] = [s.uniform() for s in streams]
There are more ways to generate independent streams of random numbers, check numpydocs.

Python's random: What happens if I don't use seed(someValue)?

a)In this case does the random number generator uses the system's clock (making the seed change) on each run?
b)Is the seed used to generate the pseudo-random values of expovariate(lambda)?
"Use the Source, Luke!"...;-). Studying https://svn.python.org/projects/python/trunk/Lib/random.py will rapidly reassure you;-).
What happens when seed isn't set (that's the "i is None" case):
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
and the expovariate:
random = self.random
u = random()
while u <= 1e-7:
u = random()
return -_log(u)/lambd
obviously uses the same underlying random generator as every other method, and so is identically affected by the seeding or lack thereof (really, how else would it have been done?-)
a) It typically uses the system clock, the clock on some systems may only have ms precision and so seed twice very quickly may result in the same value.
seed(self, a=None)
Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
http://pydoc.org/2.5.1/random.html#Random-seed
b) I would imagine expovariate does, but I can't find any proof. It would be silly if it didn't.
current system time is used; current system time is also used to initialize the generator when the module is first imported. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).
Random Docs

Categories

Resources