Estimated Gaussian distribution parameters are wrong - python

I'm experimenting with Gaussian distribution and its likelihood.
To figure out max likelihood I differentiate likelihood with respect to mu (expectation) and sigma (mean), which equal to data.mean() and data.std() correspondingly
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
import math
from scipy.stats import norm
def calculate_likelihood(x, mu, sigma):
n = len(x)
likelihood = n/2.0 * np.log(2 * np.pi) + n/2.0 * math.log(sigma **2 ) + 1/(2*sigma**2) * sum([(x_i - mu)**2 for x_i in x ])
return likelihood
def estimate_gaussian_parameters_from_data(data):
return data.mean(), data.std()
def main():
mu = 0
sigma = 2
x_values = np.linspace(mu - 3*sigma, mu + 3*sigma, 1000)
y_values_1 = mlab.normpdf(x_values, mu, sigma)
estimated_mu, estimated_sigma = estimate_gaussian_parameters_from_data(y_values_1)
if (__name__ == "__main__"):
main()
I expected that estimated_mu and estimated_sigma should approximately be equal to mu and sigma, but that's not the case. Instead of 0 and 2 I get 0.083 and 0.069. Do I understand anything wrong?

mlab.normpdf is a pdf it returns the probability of x. Since mean is 0 you will see points around 0 having high probability. y_values_1's are the probability densities.
s = np.random.normal(0, 2, 1000)
The above code samples 1000 points which are normally distributed with mean 0 and std 2
np.mean(s) == 0.018308805079364696 and np.std(s) == 1.9467605916031896

Related

Phase Shift between two noisy signals of Stochastic Resonance

for task 3 I wanna try to calculate phase shift between my noisy signal and Periodic forcing, I used Cross correlation method but phase shift becomes very small and does not make sense, I filtered my noisy signal or different methods but did not work , do you have any Idea how I should calculate exactly phase shift ?
At first I plotted Noisy signal and than Calculated phase shift between noisy signal(x) and periodic forcing (x2)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as signal
simulation_time = 100.0
sigma = 0.3
A =10
T=100
omega = 0.1
dt = 0.001
sqdt = np.sqrt(dt) #Precompute
time = np.arange(0, simulation_time, dt)
x = np.empty(len(time))
x2 = np.empty(len(time))
#Initial conditions
x[0] = 1.0
i = 1
for t in time[1:]:
fx = x[i-1] - x[i-1]*x[i-1]*x[i-1] + A*np.cos(omega*t)
x2[i]=A*np.cos(omega*t)
gx = sigma
x[i] = x[i-1] + dt*fx + sqdt*gx*np.random.standard_normal()
i += 1
plt.figure()
plt.plot(time, x)
plt.plot(time, x2)
plt.xlabel('time')
plt.ylabel('signal')
plt.show()
#cross corelation method
output = np.correlate(x,x2*0.15,mode='same')
lags = time-50
plt.plot(lags,output)
maximum = np.max(output)
phase_shift = lags[output==maximum][0]
phase_shift
0.05799999999999983
When in doubt, verify against an FFT. This is not as efficient, but is more intuitive to interpret:
import numpy as np
import matplotlib.pyplot as plt
simulation_time = 100.0
sigma = 0.3
A = 10
T = 100
omega = 0.1
dt = 0.001
sqdt = np.sqrt(dt) # Precompute
time = np.arange(0, simulation_time, dt)
x = np.empty(len(time))
x2 = np.empty(len(time))
# Initial conditions
x[0] = 1.0
i = 1
for t in time[1:]:
fx = x[i-1] - x[i-1]*x[i-1]*x[i-1] + A*np.cos(omega*t)
x2[i] = A*np.cos(omega*t)
gx = sigma
x[i] = x[i-1] + dt*fx + sqdt*gx*np.random.standard_normal()
i += 1
spectrum = np.fft.rfft(x)
freqs = np.fft.rfftfreq(len(x), d=dt)
i_peak = np.argmax(spectrum)
phase = np.angle(spectrum[i_peak])
shift = phase+np.pi/2
print(f'Peak frequency: {freqs[i_peak]} Hz - should be close to {omega/2/np.pi:.6}')
print(f'Peak phase: {phase:.6} rad')
print(f'Phase shift from cosine: {shift:.3} rad, or {shift/2/np.pi:.1%}')
plt.figure()
plt.plot(time, x)
plt.plot(time, x2)
plt.xlabel('time')
plt.ylabel('signal')
plt.show()
Peak frequency: 0.02 Hz - should be close to 0.0159155
Peak phase: -1.4046 rad
Phase shift from cosine: 0.166 rad, or 2.6%

SKlearn Gaussian Process with constant, manually set correlation

I want to use the Gaussian Process approximation for a simple 1D test function to illustrate a few things. I want to iterate over a few different values for the correlation matrix (since this is 1D it is just a single value) and show what effect different values have on the approximation. My understanding is, that "theta" is the parameter for this. Therefore I want to set the theta value manually and don't want any optimization/changes to it. I thought the constant kernel and the clone_with_theta function might get me what I want but I didn't get it to work. Here is what I have so far:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as ConstantKernel
def f(x):
"""The function to predict."""
return x/2 + ((1/10 + x) * np.sin(5*x - 1))/(1 + x**2 * (np.sin(x - (1/2))**2))
# ----------------------------------------------------------------------
# Data Points
X = np.atleast_2d(np.delete(np.linspace(-1,1, 7),4)).T
y = f(X).ravel()
# Instantiate a Gaussian Process model
kernel = ConstantKernel(constant_value=1, constant_value_bounds='fixed')
theta = np.array([0.5,0.5])
kernel = kernel.clone_with_theta(theta)
gp = GaussianProcessRegressor(kernel=kernel, optimizer=None)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
# Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)
# Plot
# ...
I programmed a simple implementation myself now, which allows to set correlation (here 'b') manually:
import numpy as np
from numpy.linalg import inv
def f(x):
"""The function to predict."""
return x/2 + ((1/10 + x) * np.sin(5*x - 1))/(1 + x**2 * (np.sin(x - (1/2))**2))
def kriging_approx(x,xt,yt,b,mu,R_inv):
N = yt.size
one = np.matrix(np.ones((yt.size))).T
r = np.zeros((N))
for i in range(0,N):
r[i]= np.exp(-b * (xt[i]-x)**2)
y = mu + np.matmul(np.matmul(r.T,R_inv),yt - mu*one)
y = y[0,0]
return y
def calc_R (x,b):
N = x.size
# setup R
R = np.zeros((N,N))
for i in range(0,N):
for j in range(0,N):
R[i][j] = np.exp(-b * (x[i]-x[j])**2)
R_inv = inv(R)
return R, R_inv
def calc_mu_sig (yt, R_inv):
N = yt.size
one = np.matrix(np.ones((N))).T
mu = np.matmul(np.matmul(one.T,R_inv),yt) / np.matmul(np.matmul(one.T,R_inv),one)
mu = mu[0,0]
sig2 = (np.matmul(np.matmul((yt - mu*one).T,R_inv),yt - mu*one))/(N)
sig2 = sig2[0,0]
return mu, sig2
# ----------------------------------------------------------------------
# Data Points
xt = np.linspace(-1,1, 7)
yt = np.matrix((f(xt))).T
# Calc R
R, R_inv = calc_R(xt, b)
# Calc mu and sigma
mu_dach, sig_dach2 = calc_mu_sig(yt, R_inv)
# Point to get approximation for
x = 1
y_approx = kriging_approx(x, xt, yt, b, mu_dach, R_inv)

Exponential Fit to Data Favours Smaller Values?

I am trying to apply an exponential fit to my data to determine the point at which the value drops by 1/e. When plotted, the fit seems to favor smaller values and does not portray the true relationship.
import numpy as np
import matplotlib
matplotlib.use("TkAgg") # need to set the TkAgg backend explicitly otherwise it introduced a low-level error
from matplotlib import pyplot as plt
import scipy as sc
def autoCorrelation(sample, longTime, temp, plotTau = False ):
# compute empirical autocovariance with lag tau averaged over time longTime
sample.takeTimeStep(timesteps=1500) # 1500 timesteps to let sample reach equilibrium
M = np.zeros(longTime)
for tau in range(longTime):
M[tau] = sample.calcMagnetisation()
sample.takeTimeStep()
M_ave = np.average(M) #time - average
M = (M - M_ave)
autocorrelation = np.correlate(M, M, mode='full')
autocorrelation /= autocorrelation.max() # normalise such that max autocorrelation is 1
autocorrelationArray = autocorrelation[int(len(autocorrelation)/2):]
x = np.arange(0, len(autocorrelationArray), 1)
# apply exponential fit
def exponenial(x, a, b):
return a * np.exp(-b * x)
popt, pcov = curve_fit(exponenial, x, np.absolute(autocorrelationArray)) # array, 2d array
yy = exponenial(x, *popt)
plt.plot(x, np.absolute(autocorrelationArray), 'o', x, yy)
plt.title('Exponential Fit of Magnetisation Autocorrelation against Time for Temperature = ' + str(T) + ' J/k')
plt.xlabel('Time / Number of Iterations ')
plt.ylabel('Magnetisation Autocorrelation')
plt.show()
# prints tau_e value b from exponential a * np.exp(-b * x)
print('tau_e is ' + str(1/popt[1])) # units converted to time steps by taking reciprocal
if __name__ == '__main__':
#plot autocorrelation against time
longTime = 100
temp = [1, 2, 2.3, 2.6, 3, 4]
for T in temp:
magnet = Ising(30, T) # (N, temp)
autoCorrelation(magnet, longTime, temp)
Note: Ising is a class in another .py file containing the functions takeTimeStep and calcMagnetisation.
Expect greater values of tau_e

Is there an easy way to get probability density of normal distribution with the help of numpy?

I know, I can use my own function like this:
def gauss(x, mu, sigma):
return (2*pi)**(-0.5) * sigma**(-1) * math.exp( - 0.5 * ((x - mu) / sigma)**2)
Probably, someone knows what standard numpy or scipy function exists to do the same?
Thanks!
You can use scipy:
from scipy.stats import norm
x = np.arange(20)
mu = 5
sigma = 3
mypdf = norm.pdf(x=x, loc=mu, scale=sigma)
You can also use numpy to generate a sample from a normal distribution:
import numpy as np
mu = 0.0
sigma = 0.1
rand_vector = np.random.normal(mu, sigma, (4, 1))
print rand_vector
prints:
[[-0.0003717 ]
[ 0.11439928]
[-0.11803113]
[ 0.01302493]]

Python: Random number generator with mean and Standard Deviation

I need to know how to generate 1000 random numbers between 500 and 600 that has a mean = 550 and standard deviation = 30 in python.
import pylab
import random
xrandn = pylab.zeros(1000,float)
for j in range(500,601):
xrandn[j] = pylab.randn()
???????
You are looking for stats.truncnorm:
import scipy.stats as stats
a, b = 500, 600
mu, sigma = 550, 30
dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)
values = dist.rvs(1000)
There are other choices for your problem too. Wikipedia has a list of continuous distributions with bounded intervals, depending on the distribution you may be able to get your required characteristics with the right parameters. For example, if you want something like "a bounded Gaussian bell" (not truncated) you can pick the (scaled) beta distribution:
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
def my_distribution(min_val, max_val, mean, std):
scale = max_val - min_val
location = min_val
# Mean and standard deviation of the unscaled beta distribution
unscaled_mean = (mean - min_val) / scale
unscaled_var = (std / scale) ** 2
# Computation of alpha and beta can be derived from mean and variance formulas
t = unscaled_mean / (1 - unscaled_mean)
beta = ((t / unscaled_var) - (t * t) - (2 * t) - 1) / ((t * t * t) + (3 * t * t) + (3 * t) + 1)
alpha = beta * t
# Not all parameters may produce a valid distribution
if alpha <= 0 or beta <= 0:
raise ValueError('Cannot create distribution for the given parameters.')
# Make scaled beta distribution with computed parameters
return scipy.stats.beta(alpha, beta, scale=scale, loc=location)
np.random.seed(100)
min_val = 1.5
max_val = 35
mean = 9.87
std = 3.1
my_dist = my_distribution(min_val, max_val, mean, std)
# Plot distribution PDF
x = np.linspace(min_val, max_val, 100)
plt.plot(x, my_dist.pdf(x))
# Stats
print('mean:', my_dist.mean(), 'std:', my_dist.std())
# Get a large sample to check bounds
sample = my_dist.rvs(size=100000)
print('min:', sample.min(), 'max:', sample.max())
Output:
mean: 9.87 std: 3.100000000000001
min: 1.9290674232087306 max: 25.03903889816994
Probability density function plot:
Note that not every possible combination of bounds, mean and standard deviation will produce a valid distribution in this case, though, and depending on the resulting values of alpha and beta the probability density function may look like an "inverted bell" instead (even though mean and standard deviation would still be correct).
I'm not exactly sure what the OP desired, but if he just wanted an array xrandn fulfilling the bottom plot - below I present the steps:
First, create a standard distribution (Gaussian distribution), the easiest way might be to use numpy:
import numpy as np
random_nums = np.random.normal(loc=550, scale=30, size=1000)
And then you keep only the numbers within the desired range with a list comprehension:
random_nums_filtered = [i for i in random_nums if i>500 and i<600]

Categories

Resources