I have a script using np.random.randint that was seeded withnp.random.seed(0). Now I'd like to find how I can get the same numbers by creating a separate rng object rng = np.random.default_rng(0). Naively I would have expected these two ways to be identical, but apparently this is not the case. The reason is that I'm trying to reproduce some error from some script that used the built in method, but unfortunately now another script also influences the state of that built in RNG, which is why I'd like to decouple these.
Can anyone tell me what I have to do to get the same numbers form the rng as I do get from np.random's built in RNG?
The reason
import numpy as np
def builtin(seed=0):
np.random.seed(seed)
print(np.random.randint(0, 10, 5))
def default(seed=0):
rng = np.random.default_rng(seed)
print(rng.integers(0, 10, 5))
# unfortunately do not produce the same results:
builtin()
default()
numpy.random.default_rng uses the new-style Generator API. This is mostly superior to the old functionality, and should be preferred in new code unless you have a specific reason not to use it, but it is not backward compatible with the old API.
The old API's way of creating a new RNG object is numpy.random.RandomState:
rng = np.random.RandomState(0)
print(rng.randint(0, 10, 5))
Related
I am new to Python, so I am not sure if this problem is due to my inexperience or whether this is a glitch.
I am running this code multiple times on the same data (no random number generation) and getting different results. This has occurred with more than one variable so far, and obviously I cannot proceed with the analysis until I figure out which results are trustworthy. Here is a short sample of the results I have obtained after running the code four times. Why is there such a discrepancy between these outputs? I am puzzled and greatly appreciate your advice.
Linear Regression
from scipy.stats import linregress
import scipy.stats
from scipy.signal import welch
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
part_022_o = pd.read_excel(r'C:\Users\Me\Desktop\Behavioral Data Processed\part_022_combined_other.xlsx')
distance_o = part_022_o["distance"]
fs = 200
f, Pwelch_spec = signal.welch(distance_o, fs=fs, window='hanning',nperseg=400, noverlap=200, scaling='density', average='mean')
log_f = np.log(f, where=f>0)
log_pwelch = np.log(Pwelch_spec, where=Pwelch_spec>0)
idx = np.isfinite(log_f) & np.isfinite(log_pwelch)
polynomial_coefficients = np.polyfit(log_f[idx],log_pwelch[idx],1)
print(polynomial_coefficients)
scipy.stats.linregress(log_f[idx], log_pwelch[idx])
Results First Attempt
[ 0.00324568 -2.82962602]
Results Second Attempt
[-2.70137164 6.97117509]
Results Third Attempt
[-2.70137164 6.97117509]
Results Fourth Attempt
[-2.28028005 5.53839502]
The same thing happens when I use scipy.stats.linregress().
Thank you,
Confused
Edit: full code added.
Also, the issue appears to be related to np.log(), since only the values of "log_f" array seem to be changing with the different outputs. It is hard to be certain that nothing else is changing (e.g. log_pwelch), but differences in output clearly correspond to differences in the first value of the "log_f" array.
Edit: I have narrowed the issue down to np.log(f, where=f>0). The first value in the f array is zero. According to the documentation of numpy log, "...Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized." Apparently this means that the value or variable is unpredictable and can vary from trial to trial, which is exactly what I am observing. Given my inexperience with Python, I am not sure what the best solution is (e.g. specifying the out-array in the log function, use a random seed, just note the regression coefficients whenever the value of zero is unchanged after log, etc.)
Try to use a random seed to reproduce results. Do this with the following code at the top of your program:
import numpy as np
np.random.seed(123) or any number you want
see here for more info: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
A random seed ensures you get repeatable results when some part of your program is generating numbers at random.
Try finding out what the functions (np.polyfit(), np.log()) are actually doing using documentation.
This is standard practice for scikit-learn and ML to use a seed value.
I'm looking for a way to output multiple values using the generic_filter module in scipy.ndimage like so:
import numpy as np
from scipy import ndimage
a = np.array([range(1,5),range(5,9),range(9,13),range(13,17)])
def summary(a):
minVal = np.min(a)
maxVal = np.max(a)
return [minVal,maxVal]
[arrMin, arrMax] = ndimage.generic_filter(a, summary, footprint=np.ones((3,3)))
But I keep getting the error that a float is expected.
I've played with the 'output' parameter, like so:
arrMin = np.zeros(np.shape(a))
arrMax = np.zeros(np.shape(a))
ndimage.generic_filter(a, summary, footprint=np.ones((3,3)), output = [arrMin, arrMax])
to no avail. I've also tried returning a named tuple, a class, or a dictionary, as per this question none of which have worked.
Based on the comments, you want to perform multiple filters simultaneously rather than performing them separately.
Unfortunately I do not think this filter works that way. It expects you to return a single filtered output value for each corresponding input value. I looked for a way to do simultaneous filters with numpy/scipy but couldn't find anything.
If you can manage a data flow that allows you to load the image, filter, process and produce some small result data in separate parallel paths (one for each filter), then you may get some benefit from using multiprocessing but if you use it naively it's likely to take more time than doing everything sequentially. If you really have a bottleneck that multiprocessing solves you should also look into sharing your input array rather than loading it in each process.
I'm fairly new to programming, but this problem happens in python and in excel as well.
I'm using the following formulas for the RC transfer function
s/(s+1) for High Pass
1/(s+1) for Low Pass
with s = jwRC
below is the code I used in python
from pylab import *
from numpy import *
from cmath import *
"""
Generating a transfer function for RC filters.
Importing modules for complex math and plotting.
"""
f = arange(1, 5000, 1)
w = 2.0j*pi*f
R=100
C=1E-5
hp_tf = (w*R*C)/(w*R*C+1) # High Pass Transfer function
lp_tf = 1/(w*R*C+1) # Low Pass Transfer function
plot(f, hp_tf) # plot high pass transfer function
plot(f, lp_tf, '-r') # plot low pass transfer function
xscale('log')
I can't post images yet so I can't show the plot. But the issue here is the cutoff frequency is different for each one. They should cross at y=0.707, but they actually cross at about 0.5.
I figure my formula or method is wrong somewhere, but I can't find the mistake can anyone help me out?
Also, on a related note, I tried to convert to dB scale and I get the following error:
TypeError: only length-1 arrays can be converted to Python scalars
I'm using the following
debl=20*log(hp_tf)
This is a classical example why you should avoid pylab and more generally imports of the form
from module import *
unless you know exactly what it does, since it hopelessly clutters the name space.
Using,
import matplotlib.pyplot as plt
import numpy as np
and then calling np.log and plt.plot etc. will solve your problem.
Furether explanations
What's happening here is that,
from pylab import *
defines a log function from numpy that operate on arrays (the one you want).
However, the later import,
from cmath import *
overwrites it with a version that only accepts scalars, hence your error.
Python, NumPy and R all use the same algorithm (Mersenne Twister) for generating random number sequences. Thus, theoretically speaking, setting the same seed should result in same random number sequences in all 3. This is not the case. I think the 3 implementations use different parameters causing this behavior.
R
>set.seed(1)
>runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
Python
In [3]: random.seed(1)
In [4]: [random.random() for x in range(5)]
Out[4]:
[0.13436424411240122,
0.8474337369372327,
0.763774618976614,
0.2550690257394217,
0.49543508709194095]
NumPy
In [23]: import numpy as np
In [24]: np.random.seed(1)
In [25]: np.random.rand(5)
Out[25]:
array([ 4.17022005e-01, 7.20324493e-01, 1.14374817e-04,
3.02332573e-01, 1.46755891e-01])
Is there some way, where NumPy and Python implementation could produce the same random number sequence? Ofcourse as some comments and answers point out, one could use rpy. What I am specifically looking for is to fine tune the parameters in the respective calls in Python and NumPy to get the sequence.
Context: The concern comes from an EDX course offering in which R is used. In one of the forums, it was asked if Python could be used and the staff replied that some assignments would require setting specific seeds and submitting answers.
Related:
Comparing Matlab and Numpy code that uses random number generation From this it seems that the underlying NumPy and Matlab implementation are similar.
python vs octave random generator: This question does come fairly close to the intended answer. Some sort of wrapper around the default state generator is required.
use rpy2 to call r in python, here is a demo, the numpy array data is sharing memory with x in R:
import rpy2.robjects as robjects
data = robjects.r("""
set.seed(1)
x <- runif(5)
""")
print np.array(data)
data[1] = 1.0
print robjects.r["x"]
I realize this is an old question, but I've stumbled upon the same problem recently, and created a solution which can be useful to others.
I've written a random number generator in C, and linked it to both R and Python. This way, the random numbers are guaranteed to be the same in both languages since they are generated using the same C code.
The program is called SyncRNG and can be found here: https://github.com/GjjvdBurg/SyncRNG.
I am interested in using python to compute a confidence interval from a student t.
I am using the StudentTCI() function in Mathematica and now need to code the same function in python http://reference.wolfram.com/mathematica/HypothesisTesting/ref/StudentTCI.html
I am not quite sure how to build this function myself, but before I embark on that, is this function in python somewhere? Like numpy? (I haven't used numpy and my advisor advised not using numpy if possible).
What would be the easiest way to solve this problem? Can I copy the source code from the StudentTCI() in numpy (if it exists) into my code as a function definition?
edit: I'm going to need to build the Student TCI using python code (if possible). Installing scipy has turned into a dead end. I am having the same problem everyone else is having, and there is no way I can require Scipy for the code I distribute if it takes this long to set up.
Anyone know how to look at the source code for the algorithm in the scipy version? I'm thinking I'll refactor it into a python definition.
I guess you could use scipy.stats.t and its interval method:
In [1]: from scipy.stats import t
In [2]: t.interval(0.95, 10, loc=1, scale=2) # 95% confidence interval
Out[2]: (-3.4562777039298762, 5.4562777039298762)
In [3]: t.interval(0.99, 10, loc=1, scale=2) # 99% confidence interval
Out[3]: (-5.338545334351676, 7.338545334351676)
Sure, you can make your own function if you like. Let's make it look like in Mathematica:
from scipy.stats import t
def StudentTCI(loc, scale, df, alpha=0.95):
return t.interval(alpha, df, loc, scale)
print StudentTCI(1, 2, 10)
print StudentTCI(1, 2, 10, 0.99)
Result:
(-3.4562777039298762, 5.4562777039298762)
(-5.338545334351676, 7.338545334351676)