Freezing output after running numpy.random.randint() in Python - python

I've used numpy.random.randint(lower limit, upper limit, size) to make a 2D array with random numbers within the given range. Now I want to freeze this randomly generated array for the follow up steps. So that the numbers don't change every time I run the entire script. Is there a way to do this?
Thanks.

Set a seed so that the random numbers generated are same every time you run it.
numpy.random.seed(0)
Docs

By seeding random by hand you can get the same random number whenever you call it.
You can seed the random() function using seed() function. The input is the seed and the same seed input will result the same output.
from numpy import random
random.seed(1)
first = random.randint(10)
random.seed(1)
second = random.randint(10)
In this code both first and second will be same.

Related

Passing random variables to sklearn random search (RandomizedSearchCV)

What is the use of reciprocal() and expon() in the below code?
svm_grid_R = {'kernel':["linear","rbf"], 'C': reciprocal(20,200000), "gamma" : expon(scale=1.0)}
Why can't we just use range()? What range does expon(scale=1.0) and reciprocal(20,200000) signify?
For context the code which uses these parameters is given below:
svm_reg = SVR()
rnd_search = RandomizedSearchCV(svm_reg, param_distributions=svm_grid_R,
n_iter=50, cv=5, scoring='neg_mean_squared_error',
verbose=2, random_state=42)
rnd_search.fit(housing_prepared, housing_labels)
I suggest you check the part of your script where the functions are imported in order to figure out what they are. From your question, I infer the following:
reciprocal should be coming from from scipy.stats import reciprocal, which will give you a reciprocal random variable.
expon should be coming from from scipy.stats import expon, which will give you an exponential random variable.
In your code, you are passing these random variables as the C and gamma parameters to the random search. This means that the random parameters used by the search will be sampled from these two distributions.
Technically, you could also use range to tell the search to randomly sample the numbers from a given sequence. Another way to do this is pass the search a random variable from which to sample random parameters. Your code is taking the second approach.
To better understand what the second approach is all about, try the following:
# Import the distribution
from scipy.stats import expon
# Initialize a random variable with lambda=1 (scale=1)
exponential_rv = expon(scale=1)
# Draw a random sample from this distribution
exponential_rv.rvs()
> 0.780028923390962
In this specific case, your search would be passing C=0.780028923390962 to your support vector machine.

Running same python code multiple times and getting inconsistent results

I am new to Python, so I am not sure if this problem is due to my inexperience or whether this is a glitch.
I am running this code multiple times on the same data (no random number generation) and getting different results. This has occurred with more than one variable so far, and obviously I cannot proceed with the analysis until I figure out which results are trustworthy. Here is a short sample of the results I have obtained after running the code four times. Why is there such a discrepancy between these outputs? I am puzzled and greatly appreciate your advice.
Linear Regression
from scipy.stats import linregress
import scipy.stats
from scipy.signal import welch
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
part_022_o = pd.read_excel(r'C:\Users\Me\Desktop\Behavioral Data Processed\part_022_combined_other.xlsx')
distance_o = part_022_o["distance"]
fs = 200
f, Pwelch_spec = signal.welch(distance_o, fs=fs, window='hanning',nperseg=400, noverlap=200, scaling='density', average='mean')
log_f = np.log(f, where=f>0)
log_pwelch = np.log(Pwelch_spec, where=Pwelch_spec>0)
idx = np.isfinite(log_f) & np.isfinite(log_pwelch)
polynomial_coefficients = np.polyfit(log_f[idx],log_pwelch[idx],1)
print(polynomial_coefficients)
scipy.stats.linregress(log_f[idx], log_pwelch[idx])
Results First Attempt
[ 0.00324568 -2.82962602]
Results Second Attempt
[-2.70137164 6.97117509]
Results Third Attempt
[-2.70137164 6.97117509]
Results Fourth Attempt
[-2.28028005 5.53839502]
The same thing happens when I use scipy.stats.linregress().
Thank you,
Confused
Edit: full code added.
Also, the issue appears to be related to np.log(), since only the values of "log_f" array seem to be changing with the different outputs. It is hard to be certain that nothing else is changing (e.g. log_pwelch), but differences in output clearly correspond to differences in the first value of the "log_f" array.
Edit: I have narrowed the issue down to np.log(f, where=f>0). The first value in the f array is zero. According to the documentation of numpy log, "...Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized." Apparently this means that the value or variable is unpredictable and can vary from trial to trial, which is exactly what I am observing. Given my inexperience with Python, I am not sure what the best solution is (e.g. specifying the out-array in the log function, use a random seed, just note the regression coefficients whenever the value of zero is unchanged after log, etc.)
Try to use a random seed to reproduce results. Do this with the following code at the top of your program:
import numpy as np
np.random.seed(123) or any number you want
see here for more info: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
A random seed ensures you get repeatable results when some part of your program is generating numbers at random.
Try finding out what the functions (np.polyfit(), np.log()) are actually doing using documentation.
This is standard practice for scikit-learn and ML to use a seed value.

What are use cases to hand over different numbers in random.seed( 0 )

What are use cases to hand over different numbers in random.seed(0)?
import random
random.seed(0)
random.random()
For example, to use random.seed(17) or random.seed(9001) instead of always using random.seed(0). Both return the same "pseudo" random numbers that can be used for testing.
import random
random.seed(17)
random.random()
Why dont use always random.seed(0)?
The seed is saying "random, but always the same randomness". If you want to randomize, e.g. search results, but not for every search you could pass the current day.
If you want to randomize per user you could use a user ID and so on.
An application should specify its own seed (e.g., with random.seed()) only if it needs reproducible "randomness"; examples include unit tests, games that display a "code" based on the seed to players, and simulations. Specifying a seed this way is not appropriate where information security is involved. See also my article on this matter.

Numpy random seed valid for entire jupyter notebook

I'm using functions from numpy.random on a Jupyter Lab notebook and I'm trying to set the seed using numpy.random.seed(333). This works as expected only when the seed setting is in the same notebook cell as the code. For example, if I have a script like this:
import numpy as np
np.random.seed(44)
ll = [3.2,77,4535,123,4]
print(np.random.choice(ll))
print(np.random.choice(ll))
The output from both np.random.choice(ll) will be same, because the seed is set:
# python seed.py
4.0
123.0
# python seed.py
4.0
123.0
Now, if I try to do the same on the Jupyter notebook, I get different results:
# in [11]
import numpy as np
# even if I set the seed here the other cells don't see it
np.random.seed(333)
# in [12]
np.random.choice([1,23,44,3,2])
23
# gets the same numbers
# in [13]
np.random.choice([1,23,44,3,2])
44
# gets different numbers every time I run this cell again
Is there a way to set the numpy random seed globally in a Jupyter lab notebook?
Because you're repeatedly calling randint, it generates different numbers each time. It's important to note that seed does not make the function consistently return the same number, but rather makes it such that the same sequence of numbers will be produced if you repeatedly run randint the same amount of times. Therefore, you're getting the same sequence of numbers every time you re-run the random.randint, rather than it always producing the same number.
Re-seeding in that specific cell, before each random.randint call, should work, if you want the same random number every single time. Otherwise, you can expect to consistently get the same sequence of numbers, but not to get the same number every time.
Because you run np.random.choice() in the cells different from np.random.seed(). Try to run np.random.seed() and np.random.choice() in the same cell, you'll get same number.
# in [11]
import numpy as np
# even if I set the seed here the other cells don't see it
np.random.seed(333)
np.random.choice([1,23,44,3,2])
2
# gets the same numbers
# in [12]
import numpy as np
# even if I set the seed here the other cells don't see it
np.random.seed(333)
np.random.choice([1,23,44,3,2])
2
# gets the same numbers

Generate multiple independent random streams in python

I want to generate multiple streams of random numbers in python.
I am writing a program for simulating queues system and want one stream for the inter-arrival time and another stream for the service time and so on.
numpy.random() generates random numbers from a global stream.
In matlab there is something called RandStream which enables me to create multiple streams.
Is there any way to create something like RandStream in Python
Both Numpy and the internal random generators have instantiatable classes.
For just random:
import random
random_generator = random.Random()
random_generator.random()
#>>> 0.9493959884174072
And for Numpy:
import numpy
random_generator = numpy.random.RandomState()
random_generator.uniform(0, 1, 10)
#>>> array([ 0.98992857, 0.83503764, 0.00337241, 0.76597264, 0.61333436,
#>>> 0.0916262 , 0.52129459, 0.44857548, 0.86692693, 0.21150068])
You do not need to use the RandomGen package. Simply initiate two streams would suffice. For example:
import numpy as np
prng1 = np.random.RandomState()
prng2 = np.random.RandomState()
prng1.seed(1)
prng2.seed(1)
Now if you progress both streams using prngX.rand(), you will find that the two streams will give you identical results, which means they are independent streams with the same seed.
To use the random package, simply swap out np.random.RandomState() for random.Random().
For the sake of reproducibility you can pass a seed directly to random.Random() and then call variables from there. Each initiated instance would then run independently from the other. For example, if you run:
import random
rg1 = random.Random(1)
rg2 = random.Random(2)
rg3 = random.Random(1)
for i in range(5): print(rg1.random())
print('')
for i in range(5): print(rg2.random())
print('')
for i in range(5): print(rg3.random())
You'll get:
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
0.956034271889
0.947827487059
0.0565513677268
0.0848719951589
0.835498878129
0.134364244112
0.847433736937
0.763774618977
0.255069025739
0.495435087092
Veedrac's answer did not address how one might generate independent streams.
The best way I could find to generate independent streams is to use a replacement for numpy's RandomState. This is provided by the RandomGen package.
It supports independent random streams, but these use one of three random number generators: PCG64, ThreeFry or Philox. If you want to use the more conventional MT19937, you can rely on jumping instead.
numpy added feature to generate independent streams of Random Numbers using SeedSequence. This process a user-provided seed, typically as an integer of some size, and to convert it into an initial state for a BitGenerator. It uses hashing techniques to ensure that low-quality seeds are turned into high quality initial states (at least, with very high probability).
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]
each stream is PCG64 generator. Random numbers can be generated sequentially as follows -
for i in 1:K
instance[i] = [s.uniform() for s in streams]
There are more ways to generate independent streams of random numbers, check numpydocs.

Categories

Resources