Why does simple gradient descent diverge?

Why does simple gradient descent diverge? - python

This is my second attempt at implementing gradient descent in one variable and it always diverges. Any ideas?
This is simple linear regression for minimizing the residual sum of squares in one variable.
def gradient_descent_wtf(xvalues, yvalues):
tolerance = 0.1
#y=mx+b
#some line to predict y values from x values
m=1.
b=1.
#a predicted y-value has value mx + b
for i in range(0,10):
#calculate y-value predictions for all x-values
predicted_yvalues = list()
for x in xvalues:
predicted_yvalues.append(m*x + b)
# predicted_yvalues holds the predicted y-values
#now calculate the residuals = y-value - predicted y-value for each point
residuals = list()
number_of_points = len(yvalues)
for n in range(0,number_of_points):
residuals.append(yvalues[n] - predicted_yvalues[n])
## calculate the residual sum of squares from the residuals, that is,
## square each residual and add them all up. we will try to minimize
## the residual sum of squares later.
residual_sum_of_squares = 0.
for r in residuals:
residual_sum_of_squares += r**2
print("RSS = %s" % residual_sum_of_squares)
##
##
##
#now make a version of the residuals which is multiplied by the x-values
residuals_times_xvalues = list()
for n in range(0,number_of_points):
residuals_times_xvalues.append(residuals[n] * xvalues[n])
#now create the sums for the residuals and for the residuals times the x-values
residuals_sum = sum(residuals)
residuals_times_xvalues_sum = sum(residuals_times_xvalues)
# now multiply the sums by a positive scalar and add each to m and b.
residuals_sum *= 0.1
residuals_times_xvalues_sum *= 0.1
b += residuals_sum
m += residuals_times_xvalues_sum
#and repeat until convergence.
#convergence occurs when ||sum vector|| < some tolerance.
# ||sum vector|| = sqrt( residuals_sum**2 + residuals_times_xvalues_sum**2 )
#check for convergence
magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
if magnitude_of_sum_vector < tolerance:
break
return (b, m)
Result:
gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567])
RSS = 370433.0
RSS = 300170125.7
RSS = 4.86943013045e+11
RSS = 7.90447409339e+14
RSS = 1.28312217794e+18
RSS = 2.08287421094e+21
RSS = 3.38110045417e+24
RSS = 5.48849288217e+27
RSS = 8.90939341376e+30
RSS = 1.44624932026e+34
Out[108]:
(-3.475524066284303e+16, -2.4195981188763203e+17)

The gradients are huge -- hence you are following large vectors for long distances (0.1 times a large number is large). Find unit vectors in the appropriate direction. Something like this (with comprehensions replacing your loops):
def gradient_descent_wtf(xvalues, yvalues):
tolerance = 0.1
m=1.
b=1.
for i in range(0,10):
predicted_yvalues = [m*x+b for x in xvalues]
residuals = [y-y_hat for y,y_hat in zip(yvalues,predicted_yvalues)]
residual_sum_of_squares = sum(r**2 for r in residuals) #only needed for debugging purposes
print("RSS = %s" % residual_sum_of_squares)
residuals_times_xvalues = [r*x for r,x in zip(residuals,xvalues)]
residuals_sum = sum(residuals)
residuals_times_xvalues_sum = sum(residuals_times_xvalues)
# (residuals_sum,residual_times_xvalues_sum) is a vector which points in the negative
# gradient direction. *Find a unit vector which points in same direction*
magnitude = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
residuals_sum /= magnitude
residuals_times_xvalues_sum /= magnitude
b += residuals_sum * (0.1)
m += residuals_times_xvalues_sum * (0.1)
#check for convergence -- this needs work!
magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
if magnitude_of_sum_vector < tolerance:
break
return (b, m)
For example:
>>> gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567])
RSS = 370433.0
RSS = 368732.1655050716
RSS = 367039.18363896786
RSS = 365354.0543519137
RSS = 363676.7775934381
RSS = 362007.3533123621
RSS = 360345.7814567845
RSS = 358692.061974069
RSS = 357046.1948108295
RSS = 355408.17991291644
(1.1157111313023558, 1.9932828425473605)
which is certainly much more plausible.
It isn't a trivial matter to make a numerically stable gradient-descent algorithm. You might want to consult a decent textbook in numerical analysis.

First, Your code is right.
But you should consider something about math when you do linear regression.
For example, the residual is -205.8 and your learning rate is 0.1 so you will get a huge descent step -25.8.
It's a so large step that you can't go back to the correct m and b. You have to make your step small enough.
There are two ways to make gradient descent step reasonable:
initialize a small learning rate, such as 0.001 and 0.0003.
Divide your step by the total amount of your input values.

Related

What drives numerical instability in eigenvalue computations in python?

Let's say I have a data matrix X with num_samples = 1600, dim_data = 2, from which I can build a 1600*1600 similarity matrix S using the rbf kernel. I can normalize each row of the matrix, by multiplying all entries of the row by (1 / sum(entries of the row)). This procedure gives me a (square) right stochastic matrix, which we expect to have an eigenvalue equal to 1 associated to a constant eigenvector full of 1s.
We can easily check that this is indeed an eigenvector by taking its product with the matrix. However, using scipy.linalg.eig the obtained eigenvector associated to eigenvalue 1 is only piecewise constant.
I have tried scipy.linalg.eig on similarly sized matrices with randomly generated data which I transformed into stochastic matrices and consistently obtained a constant eigenvector associated to eigenvalue 1.
My question is then, what factors may cause numerical instabilities when computing eigenvalues of stochastic matrices using scipy.linalg.eig?
Reproducible example:
def kernel(sigma,X):
"""
param sigma: variance
param X: (num_samples,data_dim)
"""
squared_norm = np.expand_dims(np.sum(X**2,axis=1),axis=1) + np.expand_dims(np.sum(X**2,axis=1),axis=0)-2*np.einsum('ni,mi->nm',X,X)
return np.exp(-0.5*squared_norm/sigma**2)
def normalize(array):
degrees = []
M = array.shape[0]
for i in range(M):
norm = sum(array[i,:])
degrees.append(norm)
degrees_matrix = np.diag(np.array(degrees))
P = np.matmul(np.linalg.inv(degrees_matrix),array)
return P
#generate the data
points = np.linspace(0,4*np.pi,1600)
Z = np.zeros((1600,2))
Z[0:800,:] = np.array([2.2*np.cos(points[0:800]),2.2*np.sin(points[0:800])]).T
Z[800:,:] = np.array([4*np.cos(points[0:800]),4*np.sin(points[0:800])]).T
X = np.zeros((1600,2))
X[:,0] = np.where(Z[:,1] >= 0, Z[:,0] + .8 + params[1], Z[:,0] - .8 + params[2])
X[:,1] = Z[:,1] + params[0]
#create the stochastic matrix P
P = normalize(kernel(.05,X))
#inspect the eigenvectors
e,v = scipy.linalg.eig(P)
p = np.flip(np.argsort(e))
e = e[p]
v = v[:,p]
plot_array(v[:,0])
#check on synthetic data:
Y = np.random.normal(size=(1600,2))
P = normalize(kernel(Y))
#inspect the eigenvectors
e,v = scipy.linalg.eig(P)
p = np.flip(np.argsort(e))
e = e[p]
v = v[:,p]
plot_array(v[:,0])
Using the code provided by Ahmed AEK, here are some results on the divergence of the obtained eigenvector from the constant eigenvector.
[-1.36116641e-05 -1.36116641e-05 -1.36116641e-05 ... 5.44472888e-06
5.44472888e-06 5.44472888e-06]
norm = 0.9999999999999999
max difference = 0.04986484253966891
max difference / element value -3663.3906291852545
UPDATE:
I have observed that a low value of sigma in the construction of the kernel matrix produces a less sharp decay in the (sorted) eigenvalues. In fact, for sigma=0.05, the first 4 eigenvalues produced by scipy.linalg.eig are rounded up to 1. This may be linked to the imprecision in the eigenvectors. When sigma is increased to 0.5, I do obtain a constant eigenvector.
First 5 eigenvectors in the sigma=0.05 case
First 5 eigenvectors in the sigma=0.5 case

the computer has an expected accuracy of 14 digits as of 64 bits of float as shown here, which means that any result will only be accurate up to 14 digits.
using the below code you can check this result:
Y = np.random.normal(size=(1600,2))
P = normalize(kernel(5,Y))
P = P / np.sum(P,axis=1)
#inspect the eigenvectors
e,v = np.linalg.eig(P)
p = np.flip(np.argsort(e))
a = np.isclose(e,1)
e1 = e[a]
v1 = v[:,a]
v11 = v1[:,0]
print(v11)
print('norm = ',np.sum(v11**2))
print('max difference = ',np.amax(np.abs(np.diff(v11))))
print('max difference / element value =',np.amax(np.abs(np.diff(v11)))/v11[0])
result is:
[0.025+0.j 0.025+0.j 0.025+0.j ... 0.025+0.j 0.025+0.j 0.025+0.j]
norm = (1+0j)
max difference = 1.97758476261356e-16
max difference / element value = (7.91033905045416e-15+0j)
as you can see, the difference is accuate to within 8e-15 which is around 14 digits of precision, the norm will sometimes be 0.99999999999998, which is within 14 digits of precision.

Smooth a curve in Python while preserving the value and slope at the end points

I have two solutions to this problem actually, they are both applied below to a test case. The thing is that none of them is perfect: first one only take into account the two end points, the other one can't be made "arbitrarily smooth": there is a limit in the amount of smoothness one can achieve (the one I am showing).
I am sure there is a better solution, that kind-of go from the first solution to the other and all the way to no smoothing at all. It may already be implemented somewhere. Maybe solving a minimization problem with an arbitrary number of splines equidistributed?
Thank you very much for your help
Ps: the seed used is a challenging one
import matplotlib.pyplot as plt
from scipy import interpolate
from scipy.signal import savgol_filter
import numpy as np
import random
def scipy_bspline(cv, n=100, degree=3):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
"""
cv = np.asarray(cv)
count = cv.shape[0]
degree = np.clip(degree,1,count-1)
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Return samples
max_param = count - (degree * (1-periodic))
spl = interpolate.BSpline(kv, cv, degree)
return spl(np.linspace(0,max_param,n))
def round_up_to_odd(f):
return np.int(np.ceil(f / 2.) * 2 + 1)
def generateRandomSignal(n=1000, seed=None):
"""
Parameters
----------
n : integer, optional
Number of points in the signal. The default is 1000.
Returns
-------
sig : numpy array
"""
np.random.seed(seed)
print("Seed was:", seed)
steps = np.random.choice(a=[-1, 0, 1], size=(n-1))
roughSig = np.concatenate([np.array([0]), steps]).cumsum(0)
sig = savgol_filter(roughSig, round_up_to_odd(n/10), 6)
return sig
# Generate a random signal to illustrate my point
n = 1000
t = np.linspace(0, 10, n)
seed = 45136. # Challenging seed
sig = generateRandomSignal(n=1000, seed=seed)
sigInit = np.copy(sig)
# Add noise to the signal
mean = 0
std = sig.max()/3.0
num_samples = n/5
idxMin = n/2-100
idxMax = idxMin + num_samples
tCut = t[idxMin+1:idxMax]
noise = np.random.normal(mean, std, size=num_samples-1) + 2*std*np.sin(2.0*np.pi*tCut/0.4)
sig[idxMin+1:idxMax] += noise
# Define filtering range enclosing the noisy area of the signal
idxMin -= 20
idxMax += 20
# Extreme filtering solution
# Spline between first and last points, the points in between have no influence
sigTrim = np.delete(sig, np.arange(idxMin,idxMax))
tTrim = np.delete(t, np.arange(idxMin,idxMax))
f = interpolate.interp1d(tTrim, sigTrim, kind='quadratic')
sigSmooth1 = f(t)
# My attempt. Not bad but not perfect because there is a limit in the maximum
# amount of smoothing we can add (degree=len(tSlice) is the maximum)
# If I could do degree=10*len(tSlice) and converging to the first solution
# I would be done!
sigSlice = sig[idxMin:idxMax]
tSlice = t[idxMin:idxMax]
cv = np.stack((tSlice, sigSlice)).T
p = scipy_bspline(cv, n=len(tSlice), degree=len(tSlice))
tSlice = p.T[0]
sigSliceSmooth = p.T[1]
sigSmooth2 = np.copy(sig)
sigSmooth2[idxMin:idxMax] = sigSliceSmooth
# Plot
plt.figure()
plt.plot(t, sig, label="Signal")
plt.plot(t, sigSmooth1, label="Solution 1")
plt.plot(t, sigSmooth2, label="Solution 2")
plt.plot(t[idxMin:idxMax], sigInit[idxMin:idxMax], label="What I'd want (kind of, smoother will be even better actually)")
plt.plot([t[idxMin],t[idxMax]], [sig[idxMin],sig[idxMax]],"o")
plt.legend()
plt.show()
sys.exit()

Yes, a minimization is a good way to approach this smoothing problem.
Least squares problem
Here is a suggestion for a least squares formulation: let s[0], ..., s[N] denote the N+1 samples of the given signal to smooth, and let L and R be the desired slopes to preserve at the left and right endpoints. Find the smoothed signal u[0], ..., u[N] as the minimizer of
min_u (1/2) sum_n (u[n] - s[n])² + (λ/2) sum_n (u[n+1] - 2 u[n] + u[n-1])²
subject to
s[0] = u[0], s[N] = u[N] (value constraints),
L = u[1] - u[0], R = u[N] - u[N-1] (slope constraints),
where in the minimization objective, the sums are over n = 1, ..., N-1 and λ is a positive parameter controlling the smoothing strength. The first term tries to keep the solution close to the original signal, and the second term penalizes u for bending to encourage a smooth solution.
The slope constraints require that
u[1] = L + u[0] = L + s[0] and u[N-1] = u[N] - R = s[N] - R. So we can consider the minimization as over only the interior samples u[2], ..., u[N-2].
Finding the minimizer
The minimizer satisfies the Euler–Lagrange equations
(u[n] - s[n]) / λ + (u[n+2] - 4 u[n+1] + 6 u[n] - 4 u[n-1] + u[n-2]) = 0
for n = 2, ..., N-2.
An easy way to find an approximate solution is by gradient descent: initialize u = np.copy(s), set u[1] = L + s[0] and u[N-1] = s[N] - R, and do 100 iterations or so of
u[2:-2] -= (0.05 / λ) * (u - s)[2:-2] + np.convolve(u, [1, -4, 6, -4, 1])[4:-4]
But with some more work, it is possible to do better than this by solving the E–L equations directly. For each n, move the known quantities to the right-hand side: s[n] and also the endpoints u[0] = s[0], u[1] = L + s[0], u[N-1] = s[N] - R, u[N] = s[N]. The you will have a linear system "A u = b", and matrix A has rows like
0, ..., 0, 1, -4, (6 + 1/λ), -4, 1, 0, ..., 0.
Finally, solve the linear system to find the smoothed signal u. You could use numpy.linalg.solve to do this if N is not too large, or if N is large, try an iterative method like conjugate gradients.

you can apply a simple smoothing method and plot the smooth curves with different smoothness values to see which one works best.
def smoothing(data, smoothness=0.5):
last = data[0]
new_data = [data[0]]
for datum in data[1:]:
new_value = smoothness * last + (1 - smoothness) * datum
new_data.append(new_value)
last = datum
return new_data
You can plot this curve for multiple values of smoothness and pick the curve which suits your needs. You can also apply this method only on a range of values in the actual curve by defining start and end

How can I stop my Runge-Kutta2 (Heun) method from exploding?

I am currently trying to write some python code to solve an arbitrary system of first order ODEs, using a general explicit Runge-Kutta method defined by the values alpha, gamma (both vectors of dimension m) and beta (lower triangular matrix of dimension m x m) of the Butcher table which are passed in by the user. My code appears to work for single ODEs, having tested it on a few different examples, but I'm struggling to generalise my code to vector valued ODEs (i.e. systems).
In particular, I try to solve a Van der Pol oscillator ODE (reduced to a first order system) using Heun's method defined by the Butcher Tableau values given in my code, but I receive the errors
"RuntimeWarning: overflow encountered in double_scalars f = lambda t,u: np.array(... etc)" and
"RuntimeWarning: invalid value encountered in add kvec[i] = f(t+alpha[i]*h,y+h*sum)"
followed by my solution vector that is clearly blowing up. Note that the commented out code below is one of the examples of single ODEs that I tried and is solved correctly. Could anyone please help? Here is my code:
import numpy as np
def rk(t,y,h,f,alpha,beta,gamma):
'''Runga Kutta iteration'''
return y + h*phi(t,y,h,f,alpha,beta,gamma)
def phi(t,y,h,f,alpha,beta,gamma):
'''Phi function for the Runga Kutta iteration'''
m = len(alpha)
count = np.zeros(len(f(t,y)))
kvec = k(t,y,h,f,alpha,beta,gamma)
for i in range(1,m+1):
count = count + gamma[i-1]*kvec[i-1]
return count
def k(t,y,h,f,alpha,beta,gamma):
'''returning a vector containing each step k_{i} in the m step Runga Kutta method'''
m = len(alpha)
kvec = np.zeros((m,len(f(t,y))))
kvec[0] = f(t,y)
for i in range(1,m):
sum = np.zeros(len(f(t,y)))
for l in range(1,i+1):
sum = sum + beta[i][l-1]*kvec[l-1]
kvec[i] = f(t+alpha[i]*h,y+h*sum)
return kvec
def timeLoop(y0,N,f,alpha,beta,gamma,h,rk):
'''function that loops through time using the RK method'''
t = np.zeros([N+1])
y = np.zeros([N+1,len(y0)])
y[0] = y0
t[0] = 0
for i in range(1,N+1):
y[i] = rk(t[i-1],y[i-1], h, f,alpha,beta,gamma)
t[i] = t[i-1]+h
return t,y
#################################################################
'''f = lambda t,y: (c-y)**2
Y = lambda t: np.array([(1+t*c*(c-1))/(1+t*(c-1))])
h0 = 1
c = 1.5
T = 10
alpha = np.array([0,1])
gamma = np.array([0.5,0.5])
beta = np.array([[0,0],[1,0]])
eff_rk = compute(h0,Y(0),T,f,alpha,beta,gamma,rk, Y,11)'''
#constants
mu = 100
T = 1000
h = 0.01
N = int(T/h)
#initial conditions
y0 = 0.02
d0 = 0
init = np.array([y0,d0])
#Butcher Tableau for Heun's method
alpha = np.array([0,1])
gamma = np.array([0.5,0.5])
beta = np.array([[0,0],[1,0]])
#rhs of the ode system
f = lambda t,u: np.array([u[1],mu*(1-u[0]**2)*u[1]-u[0]])
#solving the system
time, sol = timeLoop(init,N,f,alpha,beta,gamma,h,rk)
print(sol)

Your step size is not small enough. The Van der Pol oscillator with mu=100 is a fast-slow system with very sharp turns at the switching of the modes, so rather stiff. With explicit methods this requires small step sizes, the smallest sensible step size is 1e-5 to 1e-6. You get a solution on the limit cycle already for h=0.001, with resulting velocities up to 150.
You can reduce some of that stiffness by using a different velocity/impulse variable. In the equation
x'' - mu*(1-x^2)*x' + x = 0
you can combine the first two terms into a derivative,
mu*v = x' - mu*(1-x^2/3)*x
so that
x' = mu*(v+(1-x^2/3)*x)
v' = -x/mu
The second equation is now uniformly slow close to the limit cycle, while the first has long relatively straight jumps when v leaves the cubic v=x^3/3-x.
This integrates nicely with the original h=0.01, keeping the solution inside the box [-3,3]x[-2,2], even if it shows some strange oscillations that are not present for smaller step sizes and the exact solution.

How to implement this statistical error for Monte Carlo sampling 2D Ising model?

I'm having a problem with the Python implementation of this statistical error equation that I found in "Monte Carlo simulations in statistical physics" by Binder(page 94, 2014). First of all, the main body of code of concern for my algorithm looks like:
SCALE_11 = DISCARDED_SAMPS/(MC_STEPS*L*L)
SCALE_22 = DISCARDED_SAMPS*DISCARDED_SAMPS*SCALE_1/MC_STEPS
for k in range(NT):
E1 = E2 = M1 = M2 = np.float_(0.0)
inv_T = 1.0 / T[k] # beta = 1/k_B*Temperature, set k_B = 1
prob = np.exp(-4*inv_T*np.array([0,1,2], dtype=np.uint8))
p_i = []
lattice = 2*np.random.randint(2, size=(L, L), dtype=np.int16)-1
for i in range(EQ_STEPS): # equilibrate
sweep(lattice, prob)# 1 sweep = L**2 Monte Carlo moves per EQ_STEP
for i in range(MC_STEPS):
sweep(lattice, prob)
for i in range(MC_STEPS)
sweep(lattice, prob)
if i % DISCARDED_SAMPS == 0: # the remainder
energy = hamiltonian(lattice) # calculate the energy
mag = abs(np.sum(lattice)) # calculate the magnetisation
E1 += energy # the estimator
M1 += mag # the estimator
E2 += energy*energy # the estimator
M2 += mag*mag # the estimator
E[k] = SCALE_11*E1
M[k] = SCALE_11*M1
C[k] = (SCALE_11*E2 - SCALE_22*E1*E1)*inv_T*inv_T
X[k] = (SCALE_11*M2 - SCALE_22*M1*M1)*inv_T
Basically in my algorithm I'm iterating over a range of different temperature and do the Monte Carlo simulation. At each temperature my first loop equilibrates the system and my second loop further sweeps it and make an observation at every DISCARDED_SAMPS(DS) steps because the samples are correlated.
The E, M, C, X are the energy, magnetisation, specific heat and susceptibility(all per spin) respectively. The specific heat and susceptibility equations have the form
Now in my algorithm the <E^2> and <E> is done on the code, essentially I'm just performing
which was quite straight forward. But now I'm looking to implement a statistical error equation from the aforementioned textbook where it's given as:
Where A is an observable which can be E, M etc. I'm having problem implementing this equation if we take n as number of times i % DISCARDED_SAMPS == 0 happened and A^2, A are now averages over MC_STEPS and A_0 is the observable before the first i % DISCARDED_SAMPS == 0 happened. How could one possibly implement this on Python?

Why does the code at the end give a singular matrix error?

I tried the mixture of Gaussians model and am facing several problems. I have pasted my whole code at ideone. The code is at: https://ideone.com/dNYtZ2
When I try to run fitMixGauss(data, k)), I get a singular matrix error from the function below.
def fitMixGauss(data, k):
"""
Estimate a k MoG model that would fit the data. Incremently plots the outcome.
Keyword arguments:
data -- d by n matrix containing data points.
k -- scalar representing the number of gaussians to use in the MoG model.
Returns:
mixGaussEst -- dict containing the estimated MoG parameters.
"""
# MAIN E-M ROUTINE
# In the E-M algorithm, we calculate a complete posterior distribution over
# the (nData) hidden variables in the E-Step.
# In the M-Step, we update the parameters of the Gaussians (mean, cov, w).
nDims, nData = data.shape
postHidden = np.zeros(shape=(k, nData))
# we will initialize the values to random values
mixGaussEst = dict()
mixGaussEst['d'] = nDims
mixGaussEst['k'] = k
mixGaussEst['weight'] = (1 / k) * np.ones(shape=(k))
mixGaussEst['mean'] = 2 * np.random.randn(nDims, k)
mixGaussEst['cov'] = np.zeros(shape=(nDims, nDims, k))
for cGauss in range(k):
mixGaussEst['cov'][:, :, cGauss] = 2.5 + 1.5 * np.random.uniform() * np.eye(nDims)
# calculate current likelihood
# TO DO - fill in this routine
logLike = getMixGaussLogLike(data, mixGaussEst)
print('Log Likelihood Iter 0 : {:4.3f}\n'.format(logLike))
nIter = 30;
logLikeVec = np.zeros(shape=(2 * nIter))
boundVec = np.zeros(shape=(2 * nIter))
fig, ax = plt.subplots(1, 1)
for cIter in range(nIter):
# ===================== =====================
# Expectation step
# ===================== =====================
curCov = mixGaussEst['cov']
curWeight = mixGaussEst['weight']
curMean = mixGaussEst['mean']
num= np.zeros(shape=(k,nData))
for cData in range(nData):
# TO DO (g) : fill in column of 'hidden' - calculate posterior probability that
# this data point came from each of the Gaussians
# replace this:
thisData = data[:,cData]
#for c in range(k):
# num[c] = mixGaussEst['weight'][c] * (1/((2*np.pi)**(nDims)*np.linalg.det(mixGaussEst['cov'][:,:,c]))**(1/2))*np.exp(-0.5*(np.transpose(thisData-mixGaussEst['mean'][:,c])))#np.linalg.inv(mixGaussEst['cov'][:,:,c])#(thisData-mixGaussEst['mean'][:,c])
thisdata = data[:,cData];
denominatorExp = 0
for j in range(k):
mu = curMean[:,j]
sigma = curCov[:,:,j]
curNorm = (1/((2*np.pi)**(nDims)*np.linalg.det(sigma))**(1/2))*np.exp(-0.5*(np.transpose(thisData-mu)))#np.linalg.inv(sigma)#(mu)
num[j,cData] = curWeight[j]*curNorm
denominatorExp = denominatorExp + num[j,cData]
postHidden[:, cData] = num[:,cData]/denominatorExp
# ===================== =====================
# Maximization Step
# ===================== =====================
# for each constituent Gaussian
for cGauss in range(k):
# TO DO (h): Update weighting parameters mixGauss.weight based on the total
# posterior probability associated with each Gaussian. Replace this:
#mixGaussEst['weight'][cGauss] = mixGaussEst['weight'][cGauss]
sum_Kth_Gauss_Resp = np.sum(postHidden[cGauss,:])
mixGaussEst['weight'][cGauss] = sum_Kth_Gauss_Resp /np.sum(postHidden)
#mixGaussEst['weight'][cGauss] = np.sum(postHidden[cGauss,:])/sum(sum(postHidden[:,:]));
# TO DO (i): Update mean parameters mixGauss.mean by weighted average
# where weights are given by posterior probability associated with
# Gaussian. Replace this:
#mixGaussEst['mean'][:,cGauss] = mixGaussEst['mean'][:,cGauss]
numerator = 0
for j in range(nData):
numerator = numerator + postHidden[cGauss,j]*data[:,j]
numerator = np.dot( postHidden[cGauss,:],data[0,:])
mixGaussEst['mean'][:,cGauss] = numerator / sum_Kth_Gauss_Resp
# TO DO (j): Update covarance parameter based on weighted average of
# square distance from update mean, where weights are given by
# posterior probability associated with Gaussian
#mixGaussEst['cov'][:,:,cGauss] = mixGaussEst['cov'][:,:,cGauss]
muMatrix = mixGaussEst['mean'][:,cGauss]
muMatrix = muMatrix.reshape((2,1))
numerator = 0
for j in range(nData):
kk=data[:,j]
kk.reshape((2,1))
numerator_i = postHidden[cGauss,j]*(kk-muMatrix)#np.transpose(kk-muMatrix)
numerator = numerator + numerator_i
mixGaussEst['cov'][:,:,cGauss] = numerator /sum_Kth_Gauss_Resp
# draw the new solution
drawEMData2d(data, mixGaussEst)
time.sleep(0.7)
fig.canvas.draw()
# calculate the log likelihood
logLike = getMixGaussLogLike(data, mixGaussEst)
print('Log Likelihood After Iter {} : {:4.3f}\n'.format(cIter, logLike))
return mixGaussEst
Log likelihood give nan and why does the whole code(in ideone) at the end give a singular matrix error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does simple gradient descent diverge? - python

Related

What drives numerical instability in eigenvalue computations in python?

Smooth a curve in Python while preserving the value and slope at the end points

How can I stop my Runge-Kutta2 (Heun) method from exploding?

How to implement this statistical error for Monte Carlo sampling 2D Ising model?

Why does the code at the end give a singular matrix error?

Categories

Resources