I've searched around but couldn't find an explanation. Please help me. Thanks.
I understand that Python will use system's time if a seed is not provided for random (to the best of my knowledge). My question is: How does Python use this time? Is it the timestamp or some other format?
I ran the following code;
from time import time
import random
t1 = time() #this gave 1590236721.1549928
data = [random.randint(0, 100) for x in range(10)]
t2 = time() #this also gave 1590236721.1549928
Since t1 == t2, I guessed that if UNIX timestamp is used as seed, it should be t1 but after trying it like so;
random.seed(t1)
data1 = [random.randint(0, 100) for x in range(10)]
I got different values: data != data1.
I need more explanations/ clarifications. Thanks.
Python 2
In this Q&A : (for python2.7) random: what is the default seed? You can see that python is not using the result of the time() function "as-is" at all to get the initial seed (and usually, it tries to get urandom values if it can from the OS, first, see https://docs.python.org/2/library/os.html#os.urandom.
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = long(_hexlify(_urandom(2500)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
Python 3
a) As in Python 2, if your OS provides random numbers (with urandom), like in *Nix systems, Python will try to use this (see https://docs.python.org/3/library/random.html#bookkeeping-functions). On Windows, it's using Win32 API's CryptGenRandom
b) even if it is using time(), maybe it's using the time at the start of your program, which may be different than the first time() call you use, so I don't think you can rely on your method of testing.
Last word of general advice: if you want reproducibility, you should explicitly set the seed yourself, and not rely on those implementation details.
when you use the program more than once without hitting in the seed you're using the utc down to the second, so you get a different results each time.
every nano second will give you a different time.
try this:
import random
print ("Random number with seed 42")
random.seed(42)
print ("first Number ", random.randint(1,99))
random.seed(42)
print ("Second Number ", random.randint(1,99))
random.seed(42)
print ("Third Number ", random.randint(1,99))
Related
So, really I'm just confused, I've been learning python, and I was given an exercise to find the performance speed of a function, however after finishing the code I received an error in the time output, it was 3.215000000000856e-06, this value varies with every time I run the program though so you probably won't get the same output.(in reality it was less then a second.) I went through the video where it explained how to write how they did it and changed a how I wrote a statement, now my code is Identical, to theirs but with different variable names, I ran the program and the same problem, however they didn't experience this issue, heres the code:
import time
SetContainer = {I for I in range(1002)}
ListContainer = [I for I in range(1002)]
def Search(Value, Container):
if Value in Container:
return True
else:
return False
def Function_Speed(Func, HMT = 1, **arg):
sum = 0
for I in range(HMT):
start = time.perf_counter()
print (Func(**arg))
end = time.perf_counter()
sum = sum + (end - start)
return (sum, )
print (Function_Speed(Search, Value = 402,
Container = SetContainer))
Possible Answers?:
Could it be my hardware? My version of Python is no longer supported(The video is over a year old I'm using 3.6) or it turns out I screwed up.
(Edit:) By the way it does work when the function is printed so this example works, but without print (Func(**arg)) and instead Func(**arg) it doesn't work.
First, your capitalization of variable and function violates convention. Though not a syntax error, it makes it difficult for Python programmers to follow your code.
Second, the result you got makes sense. A single iteration of the search on a modern computer takes very little time. If your printed result was in the form 1.23456e-05, then that is a valid number so small that the default representation shifted to scientific notation.
Add a value for HMT, starting with 100000, and see what is output.
I'm slicing a quite big pandas series (~5M) using .loc and I stumble upon some weird behavior when checking times in an attempt to optimize my code.
It's weird that the first slicing attempt like series_object.loc[some_indexes] is taking 100x longer than the following ones.
When I try timeit it does not reflect this behaviour, but when checking the partial laps using `time``, we can see that the first lap is taking much longer than the following ones.
Is .loc using some sort of cacheing? if that's so, how does garbage collection is not influencing this?
Is timeit doing the cacheing even with garbage collector disabled and not behaving as it's suppose?
Which time should I trust that my app in production will take when running in a live environment?
I tried this on windows and linux machines using different versions of python (3.6, 3.7 and 2.7) and the behavior is always the same.
Thanks in advance for you help. This thing is banging my head for a week already and I miss not doubting %timeit :)
to reproduce:
Save the following code to a python file eg.:test_loc_times.py
import pandas as pd
import numpy as np
import timeit
import time, gc
def get_data():
ids = np.arange(size_bigseries)
big_series = pd.Series(index=ids, data=np.random.rand(len(ids)), name='{} elements series'.format(len(ids)))
small_slice = np.arange(size_slice)
return big_series, small_slice
# Method to test: a simple pandas slicing with .loc
def basic_loc_indexing(pd_series, slice_ids):
return pd_series.loc[slice_ids].dropna()
# method to time it
def timing_it(func, n, *args):
gcold = gc.isenabled()
gc.disable()
times = []
for i in range(n):
s = time.time()
func(*args)
times.append((time.time()-s)*1000)
if gcold:
gc.enable()
return times
if __name__ == '__main__':
import sys
n_tries = int(sys.argv[1]) if len(sys.argv)>1 and sys.argv[1] is not None else 1000
size_bigseries = int(sys.argv[2]) if len(sys.argv)>2 and sys.argv[2] is not None else 5000000 #5M
size_slice = int(sys.argv[3]) if len(sys.argv)>3 and sys.argv[3] is not None else 100 #5M
#1: timeit()
big_series, small_slice = get_data()
time_with_timeit = timeit.timeit('basic_loc_indexing(big_series, small_slice)',"gc.disable(); from __main__ import basic_loc_indexing, big_series, small_slice",number=n_tries)
print("using timeit: {:.6f}ms".format(time_with_timeit/n_tries*1000))
del big_series, small_slice
#2: time()
big_series, small_slice = get_data()
time_with_time = timing_it(basic_loc_indexing, n_tries, big_series, small_slice)
print("using time: {:.6f}ms".format(np.mean(time_with_time)))
print('head detail: {}\n'.format(time_with_time[:5]))
try out:
Run
python test_loc_times.py 1000 5000000 100
This will run timeit and time 1000 laps on slicing 100 elements from a 5M pandas.Series.
you can try it yourself with other values and the first run it always taking longer.
stdout:
>>> using timeit: 0.789754ms
>>> using time: 0.829869ms
>>> head detail: [145.02716064453125, 0.7691383361816406, 0.7028579711914062, 0.5738735198974609, 0.6380081176757812]
Weird right?
edit:
I found this answer which might be related. What do you think?
This code is likely not idempotent (has side effects that impact its execution).
timeit will run the code once first to measure the time and deduce the number of loops and runs it should use. If your code is not idempotent (has side effects, like cashing) then that first run (not recorded) will be longer and the subsequent (faster runs) will be measured and reported.
You can take a look at the arguments you can pass to timeit (see the doc) to specify the number of loops and forgo that initial run.
Also note that (taken from the doc linked above):
The times reported by %timeit will be slightly higher than those reported by the timeit.py script when variables are accessed. This is due to the fact that %timeit executes the statement in the namespace of the shell, compared with timeit.py, which uses a single setup statement to import function or create variables. Generally, the bias does not matter as long as results from timeit.py are not mixed with those from %timeit.
Edit: Missed the fact that you were passing the number of runs to timeit. In that case, only the latter part of my answer applies, but the numbers you are seeing seem to point to another issue...
I am very new to python, and am running into an issue I don't fully understand. I am trying to get a random variable to run multiple times, but for some reason it just returns the same random value x times.
I am not entirely certain what to try aside from the code I have already done.
lowTreasureList = "50 gold", "Healing Potion", "10x Magic Arrows", "+1 Magic Weapon"
def ranLowLoot(lowLootGiven):
# This function returns a random string from the passed list of strings.
lootIndex = random.randint(0, len(lowLootGiven) - 1)
return lowLootGiven[lootIndex]
lowLoot = ranLowLoot(lowTreasureList)
treasureSelection = int(input())
if treasureSelection == 1:
numLowTreasure = int(input('How many treasures? '))
for i in range(numLowTreasure):
ranLowLoot(lowTreasureList)
print(lowLoot)
When I do this I get the same random treasure (numLowTreasure) times, but I am trying to get it to select a new random treasure each time.
If you haven't already, it will help to read the documentation on the random module.
There are three alternatives to random.randint that are more suited to your purpose:
random.randrange(start, stop, [step]): step is optional and defaults to one. This will save you the len(...) - 1 you are using to get lootIndex, since stop is an exclusive bound.
random.randrange(stop): uses a default start of zero and default step of 1, which will save you passing 0 as your start index.
random.choice(seq): you can pass your function's parameter lowLootGiven to this as seq, which will save you from using indices and writing your own function entirely.
As for why you're getting the repeated treasure, that's because you aren't updating your variable lowLoot in your for loop. You should write:
for i in range(numLowTreasure):
lowLoot = ranLowLoot(lowTreasureList)
print(lowLoot)
Last thing I want to say is that python is nice for writing simple things quickly. Even if there was some bigger context that you were writing this code in, I might have written it like this:
lowTreasureList = ("50 gold", "Healing Potion", "10x Magic Arrows", "+1 Magic Weapon")
if int(input()) == 1:
for i in range(int(input('How many treasures? '))):
print(random.choice(lowTreasureList))
Using the round parentheses around the tuple declaration like I did isn't necessary in this case, but I like to use it because if you want to make the tuple declaration span multiple lines, it won't work without them.
Reading documentation on standard libraries is something I almost always find helpful. I think Python's documentation is great, and if it's bit too much to digest early on, I found tutorialspoint to be a good place to start.
The problem is that in the main loop you are discarding the result of the call to ranLowLoot(). As a minimal fix, in the main loop assign the result of that function call. Use:
lowLoot = ranLowLoot(lowTreasureList)
rather than simply:
ranLowLoot(lowTreasureList)
As a better fix, ditch your function completely and just use random.choice() (which does what you are trying to do, with much less fuss):
import random
lowTreasureList = ["50 gold", "Healing Potion", "10x Magic Arrows", "+1 Magic Weapon"]
treasureSelection = int(input())
if treasureSelection == 1:
numLowTreasure = int(input('How many treasures? '))
for i in range(numLowTreasure):
lowLoot = random.choice(lowTreasureList)
print(lowLoot)
Say I have some python code:
import random
r=random.random()
Where is the value of r seeded from in general?
And what if my OS has no random, then where is it seeded?
Why isn't this recommended for cryptography? Is there some way to know what the random number is?
Follow da code.
To see where the random module "lives" in your system, you can just do in a terminal:
>>> import random
>>> random.__file__
'/usr/lib/python2.7/random.pyc'
That gives you the path to the .pyc ("compiled") file, which is usually located side by side to the original .py where readable code can be found.
Let's see what's going on in /usr/lib/python2.7/random.py:
You'll see that it creates an instance of the Random class and then (at the bottom of the file) "promotes" that instance's methods to module functions. Neat trick. When the random module is imported anywhere, a new instance of that Random class is created, its values are then initialized and the methods are re-assigned as functions of the module, making it quite random on a per-import (erm... or per-python-interpreter-instance) basis.
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
randint = _inst.randint
The only thing that this Random class does in its __init__ method is seeding it:
class Random(_random.Random):
...
def __init__(self, x=None):
self.seed(x)
...
_inst = Random()
seed = _inst.seed
So... what happens if x is None (no seed has been specified)? Well, let's check that self.seed method:
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
"""
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
super(Random, self).seed(a)
self.gauss_next = None
The comments already tell what's going on... This method tries to use the default random generator provided by the OS, and if there's none, then it'll use the current time as the seed value.
But, wait... What the heck is that _urandom(16) thingy then?
Well, the answer lies at the beginning of this random.py file:
from os import urandom as _urandom
from binascii import hexlify as _hexlify
Tadaaa... The seed is a 16 bytes number that came from os.urandom
Let's say we're in a civilized OS, such as Linux (with a real random number generator). The seed used by the random module is the same as doing:
>>> long(binascii.hexlify(os.urandom(16)), 16)
46313715670266209791161509840588935391L
The reason of why specifying a seed value is considered not so great is that the random functions are not really "random"... They're just a very weird sequence of numbers. But that sequence will be the same given the same seed. You can try this yourself:
>>> import random
>>> random.seed(1)
>>> random.randint(0,100)
13
>>> random.randint(0,100)
85
>>> random.randint(0,100)
77
No matter when or how or even where you run that code (as long as the algorithm used to generate the random numbers remains the same), if your seed is 1, you will always get the integers 13, 85, 77... which kind of defeats the purpose (see this about Pseudorandom number generation) On the other hand, there are use cases where this can actually be a desirable feature, though.
That's why is considered "better" relying on the operative system random number generator. Those are usually calculated based on hardware interruptions, which are very, very random (it includes interruptions for hard drive reading, keystrokes typed by the human user, moving a mouse around...) In Linux, that O.S. generator is /dev/random. Or, being a tad picky, /dev/urandom (that's what Python's os.urandom actually uses internally) The difference is that (as mentioned before) /dev/random uses hardware interruptions to generate the random sequence. If there are no interruptions, /dev/random could be exhausted and you might have to wait a little bit until you can get the next random number. /dev/urandom uses /dev/random internally, but it guarantees that it will always have random numbers ready for you.
If you're using linux, just do cat /dev/random on a terminal (and prepare to hit Ctrl+C because it will start output really, really random stuff)
borrajax#borrajax:/tmp$ cat /dev/random
_+�_�?zta����K�����q�ߤk��/���qSlV��{�Gzk`���#p$�*C�F"�B9��o~,�QH���ɭ�f��̬po�2o�(=��t�0�p|m�e
���-�5�߁ٵ�ED�l�Qt�/��,uD�w&m���ѩ/��;��5Ce�+�M����
~ �4D��XN��?ס�d��$7Ā�kte▒s��ȿ7_���- �d|����cY-�j>�
�b}#�W<դ���8���{�1»
. 75���c4$3z���/̾�(�(���`���k�fC_^C
Python uses the OS random generator or a time as a seed. This means that the only place where I could imagine a potential weakness with Python's random module is when it's used:
In an OS without an actual random number generator, and
In a device where time.time is always reporting the same time (has a broken clock, basically)
If you are concerned about the actual randomness of the random module, you can either go directly to os.urandom or use the random number generator in the pycrypto cryptographic library. Those are probably more random. I say more random because...
Image inspiration came from this other SO answer
I want to generate multiple streams of random numbers in python.
I am writing a program for simulating queues system and want one stream for the inter-arrival time and another stream for the service time and so on.
numpy.random() generates random numbers from a global stream.
In matlab there is something called RandStream which enables me to create multiple streams.
Is there any way to create something like RandStream in Python
Both Numpy and the internal random generators have instantiatable classes.
For just random:
import random
random_generator = random.Random()
random_generator.random()
#>>> 0.9493959884174072
And for Numpy:
import numpy
random_generator = numpy.random.RandomState()
random_generator.uniform(0, 1, 10)
#>>> array([ 0.98992857, 0.83503764, 0.00337241, 0.76597264, 0.61333436,
#>>> 0.0916262 , 0.52129459, 0.44857548, 0.86692693, 0.21150068])
You do not need to use the RandomGen package. Simply initiate two streams would suffice. For example:
import numpy as np
prng1 = np.random.RandomState()
prng2 = np.random.RandomState()
prng1.seed(1)
prng2.seed(1)
Now if you progress both streams using prngX.rand(), you will find that the two streams will give you identical results, which means they are independent streams with the same seed.
To use the random package, simply swap out np.random.RandomState() for random.Random().
For the sake of reproducibility you can pass a seed directly to random.Random() and then call variables from there. Each initiated instance would then run independently from the other. For example, if you run:
import random
rg1 = random.Random(1)
rg2 = random.Random(2)
rg3 = random.Random(1)
for i in range(5): print(rg1.random())
print('')
for i in range(5): print(rg2.random())
print('')
for i in range(5): print(rg3.random())
You'll get:
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
0.956034271889
0.947827487059
0.0565513677268
0.0848719951589
0.835498878129
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
Veedrac's answer did not address how one might generate independent streams.
The best way I could find to generate independent streams is to use a replacement for numpy's RandomState. This is provided by the RandomGen package.
It supports independent random streams, but these use one of three random number generators: PCG64, ThreeFry or Philox. If you want to use the more conventional MT19937, you can rely on jumping instead.
numpy added feature to generate independent streams of Random Numbers using SeedSequence. This process a user-provided seed, typically as an integer of some size, and to convert it into an initial state for a BitGenerator. It uses hashing techniques to ensure that low-quality seeds are turned into high quality initial states (at least, with very high probability).
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]
each stream is PCG64 generator. Random numbers can be generated sequentially as follows -
for i in 1:K
instance[i] = [s.uniform() for s in streams]
There are more ways to generate independent streams of random numbers, check numpydocs.