Numpy random seed valid for entire jupyter notebook - python

I'm using functions from numpy.random on a Jupyter Lab notebook and I'm trying to set the seed using numpy.random.seed(333). This works as expected only when the seed setting is in the same notebook cell as the code. For example, if I have a script like this:
import numpy as np
np.random.seed(44)
ll = [3.2,77,4535,123,4]
print(np.random.choice(ll))
print(np.random.choice(ll))
The output from both np.random.choice(ll) will be same, because the seed is set:
# python seed.py
4.0
123.0
# python seed.py
4.0
123.0
Now, if I try to do the same on the Jupyter notebook, I get different results:
# in [11]
import numpy as np
# even if I set the seed here the other cells don't see it
np.random.seed(333)
# in [12]
np.random.choice([1,23,44,3,2])
23
# gets the same numbers
# in [13]
np.random.choice([1,23,44,3,2])
44
# gets different numbers every time I run this cell again
Is there a way to set the numpy random seed globally in a Jupyter lab notebook?

Because you're repeatedly calling randint, it generates different numbers each time. It's important to note that seed does not make the function consistently return the same number, but rather makes it such that the same sequence of numbers will be produced if you repeatedly run randint the same amount of times. Therefore, you're getting the same sequence of numbers every time you re-run the random.randint, rather than it always producing the same number.
Re-seeding in that specific cell, before each random.randint call, should work, if you want the same random number every single time. Otherwise, you can expect to consistently get the same sequence of numbers, but not to get the same number every time.

Because you run np.random.choice() in the cells different from np.random.seed(). Try to run np.random.seed() and np.random.choice() in the same cell, you'll get same number.
# in [11]
import numpy as np
# even if I set the seed here the other cells don't see it
np.random.seed(333)
np.random.choice([1,23,44,3,2])
2
# gets the same numbers
# in [12]
import numpy as np
# even if I set the seed here the other cells don't see it
np.random.seed(333)
np.random.choice([1,23,44,3,2])
2
# gets the same numbers

Related

Freezing output after running numpy.random.randint() in Python

I've used numpy.random.randint(lower limit, upper limit, size) to make a 2D array with random numbers within the given range. Now I want to freeze this randomly generated array for the follow up steps. So that the numbers don't change every time I run the entire script. Is there a way to do this?
Thanks.
Set a seed so that the random numbers generated are same every time you run it.
numpy.random.seed(0)
Docs
By seeding random by hand you can get the same random number whenever you call it.
You can seed the random() function using seed() function. The input is the seed and the same seed input will result the same output.
from numpy import random
random.seed(1)
first = random.randint(10)
random.seed(1)
second = random.randint(10)
In this code both first and second will be same.

Running same python code multiple times and getting inconsistent results

I am new to Python, so I am not sure if this problem is due to my inexperience or whether this is a glitch.
I am running this code multiple times on the same data (no random number generation) and getting different results. This has occurred with more than one variable so far, and obviously I cannot proceed with the analysis until I figure out which results are trustworthy. Here is a short sample of the results I have obtained after running the code four times. Why is there such a discrepancy between these outputs? I am puzzled and greatly appreciate your advice.
Linear Regression
from scipy.stats import linregress
import scipy.stats
from scipy.signal import welch
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
part_022_o = pd.read_excel(r'C:\Users\Me\Desktop\Behavioral Data Processed\part_022_combined_other.xlsx')
distance_o = part_022_o["distance"]
fs = 200
f, Pwelch_spec = signal.welch(distance_o, fs=fs, window='hanning',nperseg=400, noverlap=200, scaling='density', average='mean')
log_f = np.log(f, where=f>0)
log_pwelch = np.log(Pwelch_spec, where=Pwelch_spec>0)
idx = np.isfinite(log_f) & np.isfinite(log_pwelch)
polynomial_coefficients = np.polyfit(log_f[idx],log_pwelch[idx],1)
print(polynomial_coefficients)
scipy.stats.linregress(log_f[idx], log_pwelch[idx])
Results First Attempt
[ 0.00324568 -2.82962602]
Results Second Attempt
[-2.70137164 6.97117509]
Results Third Attempt
[-2.70137164 6.97117509]
Results Fourth Attempt
[-2.28028005 5.53839502]
The same thing happens when I use scipy.stats.linregress().
Thank you,
Confused
Edit: full code added.
Also, the issue appears to be related to np.log(), since only the values of "log_f" array seem to be changing with the different outputs. It is hard to be certain that nothing else is changing (e.g. log_pwelch), but differences in output clearly correspond to differences in the first value of the "log_f" array.
Edit: I have narrowed the issue down to np.log(f, where=f>0). The first value in the f array is zero. According to the documentation of numpy log, "...Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized." Apparently this means that the value or variable is unpredictable and can vary from trial to trial, which is exactly what I am observing. Given my inexperience with Python, I am not sure what the best solution is (e.g. specifying the out-array in the log function, use a random seed, just note the regression coefficients whenever the value of zero is unchanged after log, etc.)
Try to use a random seed to reproduce results. Do this with the following code at the top of your program:
import numpy as np
np.random.seed(123) or any number you want
see here for more info: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
A random seed ensures you get repeatable results when some part of your program is generating numbers at random.
Try finding out what the functions (np.polyfit(), np.log()) are actually doing using documentation.
This is standard practice for scikit-learn and ML to use a seed value.

TensorFlow: Resetting the seed to a constant value does not yield repeating results

I'm getting non-repeating results in TensorFlow (version 1.4.0), even though I'm resetting the seed to the same value each time:
import tensorflow as tf
sess = tf.InteractiveSession()
for i in range(5):
tf.set_random_seed(1234)
print(sess.run(tf.random_uniform([1])))
The output is:
[ 0.96046877]
[ 0.85591054]
[ 0.20277488]
[ 0.81463408]
[ 0.75180626]
I don't understand how this is consistent with the documentation :
If the graph-level seed is set, but the operation seed is not: The system deterministically picks an operation seed in conjunction with the graph-level seed so that it gets a unique random sequence.
Doesn't this mean that if I set the graph-level seed (as I did, using set_random_seed), and I don't set the operation seed (as in my case, where I didn't specify the seed argument in random_uniform), I should expect to get repeating results?
A similar issue is addressed in this question, but the main emphasis here is to understand what the documentation means.
Additional details:
>> tf.__version__
'1.4.0-rc0'
>> tf.__git_version__
'v1.3.0-rc1-3732-g2dd1015'
EDIT 1:
I think I have a conjecture as to why this is the behavior.
The sentence "The system deterministically picks an operation seed in conjunction with the graph-level seed" does not mean that the op seed is a function of the graph seed only. Rather, it is also a function of some internal counter variable. The reason this makes sense is that otherwise, running
tf.set_random_seed(1234)
print(sess.run(tf.random_uniform([1])))
print(sess.run(tf.random_uniform([1])))
would yield two identical results (This is due to the fact(?) that the random number generator is stateless, and each op has its own seed).
To confirm this, I found that when terminating python and opening it again, the whole sequence of code at the beginning of the question does yield repeating results. Also, delving into the source of TensorFlow, I see that the file random_seed.py has the lines
if graph_seed is not None:
if op_seed is None:
op_seed = ops.get_default_graph()._last_id
seeds = _truncate_seed(graph_seed), _truncate_seed(op_seed)
which shows that the seed of an op is a combination of two seeds: the graph seed and the _last_id property of the graph, which is a counter that increases on each op that is added to the graph.

Differences between ipython %run and promt

I've the following simple erroneous code
from numpy import random, sqrt
points = random.randn(20,3);
points = points / sqrt(sum(points**2,1))
In ipython (with %autoreload 2) if I copy and paste it into the terminal I get a ValueError as one would expect. If I save this as a file and use %run then it runs without error (it shouldn't).
What's going on here?
I just figured it out, but I had written the question and it might be useful to someone else.
It is a difference between the numpy sum and the native sum. Changing the first line to
from numpy import random, sqrt, sum
fixes it as %run uses the native version by default (at least with my settings). The native run does not take an axis parameter, but does not throw an error either, because it is a start parameter, which is in effect just an offset to the sum. So,
>>> sum([1,2,3],10000)
10006
for the native version. And "axis out of bounds" for the numpy one.

Reusing module references in Python (Matplotlib)

I think I may have misunderstood something here... But here goes.
I'm using the psd method in matplotlib inside a loop, I'm not making it plot anything, I just want the numerical result, so:
import pylab as pyl
...
psdResults = pyl.psd(inputData, NFFT=512, Fs=sampleRate, window=blackman)
But that's being looped 36 times every time I run the function it's in.
I'm getting a slow memory leak when I run my program over time, so used 'heapy' to monitor this, and every time I run the function, it adds 36 to these 3 heaps:
dict matplotlib.line.Line26
dict matplotlib.transforms.CompositeAffine2D
dict matplotlib.path.Path
I can only conclude that each time I use the psd method it merely adds it to some dictionary somewhere, whereas I want to effectively wipe the memory - i.e. reset pylab each loop so it doesn't store anything.
I could be misinterpreting heapy but it seems pretty clear that pylab is just growing each loop even though I only want to use it's psd method, I don't want it saving the results anywhere itself !
Cheers
Try this:
from matplotlib import mlab
psdResults = mlab.psd(inputData, NFFT=512, Fs=sampleRate, window=blackman)
Does that improve the situation?

Categories

Resources