Make a standard deviation always positive in Metropolis-Hastings - python

I am trying Bayesian regression using Metropolis-Hastings. The test data is generated as follows (Python code, I didn't copy the entire code):
trueA = 5 ; trueB = 7 ;trueSD = 10 ; sample_size = 261
x = np.arange(-sample_size/8, sample_size/8, (sample_size*2/8)/sample_size)
y = trueA *x + trueB + npr.normal(loc=0, scale=trueSD, size=sample_size)
I defined log likelihood as follows:
def likelihood(param):
a = param[0][0] ; b = param[0][1] ; sd = param[0][2] ; pred = a*x + b
sumSqError = np.power((y - pred), 2).sum()
likelihoodsum = ((sample_size/2)*(np.log(1)-np.log(np.power(sd,2)))) + (- 1/(2*np.power(sd,2)) * sumSqError)
return likelihoodsum
To make next points, I prepared the following function:
def next_param(param, param_index):
a_next = param[0][0] ; b_next = param[0][1] ; sd_next = param[0][2]
if param_index == 0:
a_next = param[0][0] + npr.normal(0, 0.1)
elif param_index == 1:
b_next = param[0][1] + npr.normal(0, 0.1)
elif param_index == 2:
sd_next = param[0][2] + npr.normal(0, 0.1)
return np.array([[a_next, b_next, sd_next]])
This code works well (acceptance rate is high enough and I can estimate the parameters), though I know sd_next can go negative in the above code, which is weird.
So, I decided to use log for sd_next:
elif param_index == 2:
sd_next = np.log(param[0][2]) + npr.normal(0, 0.1)
return np.array([[a_next, b_next, np.exp(sd_next)]])
However, the estimated parameters are far from the true values. How can I make a standard deviation always positive in Metropolis-Hastings?
JFI, here is MCMC part:
num_sampling = 1000
chain = np.zeros((num_sampling, 1, 3))
chain[0][0][0] = 20 # starting value for a
chain[0][0][1] = 15 # starting value for b
chain[0][0][2] = 15 # starting value for sd
num_accepted = 0
for i in range(num_sampling-1):
chain_previous = chain[i][:]
chain_new = np.zeros((1, 1, 3))
for p in range(3):
proposal = next_param(chain_previous, p)
probab = likelihood(proposal) - likelihood(chain_previous)
if 0 < probab:
chain_new[0][0][p] = proposal[0][p]
num_accepted += 1
else:
chain_new[0][0][p] = chain[i][0][p]
chain[i+1] = chain_new[0][:]

It is not weird at all that you get a negative standard deviation $\sigma$ when your proposal is a Normal distribution, with support $(-\infty,+\infty)$.
And the Metropolis-Hastings accept-reject step should also include the prior distribution on the three parameters. Including the Jacobian when the proposal is on $\log\sigma$.
As written the Metropolis-Hastings accept-reject step is incorrect!
if 0 < probab:
is not the right condition for accepting a move to the proposed value: one should compared the (log-)probability with a (log-)uniform. In the current format, you converge to a maximum of the likelihood.

Related

Invalid Syntax error on addition when running FEniCS in Windows subsystem

I am currently working on a project where we are solving a system of PDE's in FEniCs. I have created the following code in order to solve the system but I get an invalid syntax error on
a = a0 + a1
I am not that good in Python and I have never used FEniCS before. I am also using a windows subsystem in order to run it which makes it extra complicated for me to understand any error that I might have made. I appreciate any suggestions you may have and I apologize in advance if I ask obvious questions!
from fenics import *
# Create mesh and define function space
mesh = Mesh (" circle.xml ")
# Construct the finite element space
V = VectorFunctionSpace (mesh , 'P', 1)
# Define parameters :
T = 150
dt = 0.5
alpha = 0.4
beta = 2
gamma = 0.8
delta = 1
# Class representing the intial conditions
class InitialConditions ( UserExpression ):
def eval (self , values , x):
values [0] = Expression(("(4/25)-2*pow(10,-7)*(x[0]-0.1*x[1]-225)*(x[0]-0.1*x[1]-675)"),degree=2)
values [1] = Expression(("(22/45)-3*pow(10,-5)*(x[0]-450)-1.2*pow(10,-4)*(x[1]-150)"),degree=2)
def value_shape ( self ):
return (2 ,)
# Define initial condition
indata = InitialConditions(degree =2)
u0 = Function (V)
u0 = interpolate (indata,V)
# Test and trial functions
u,v = TrialFunction(V), TestFunction(V)
# Create bilinear and linear forms
a0 = (u[0]*v[0]*dx) + (0.5*delta*dt*inner(grad(u[0]),grad(v[0]))*dx)
a1 = (u[1]*v[1]*dx) + (0.5*delta*dt*inner(grad(u[1]),grad(v[1]))*dx)
L0 = (u0[0]*v[0]*dx) - (0.5*delta*dt*inner(grad(u0[0]),grad(v[0]))*dx) - (dt*u0[0]*v[0]*dx*(((u0[0]*u0[1])/(u0[0]+alpha))-u0[0]*(1-u0[0]))*dx)
L1 = (u0[1]*v[1]*dx) - (0.5*delta*dt*inner(grad(u0[1]),grad(v[1]))*dx) - (dt*u0[1]*v[1]*dx*(-beta*((u0[0]*u0[1])/(u0[0]+alpha))-gamma*u0[1]*dx)
a = a0 + a1
L = L0 + L1
#Set up boundary condition
g = Constant([0.0,0.0])
bc = DirichletBC(V,u_initial,DirichletBoundary())
bc = [] #NEUMANN
#Assemble matrix
A = assemble(a)
# Set an output file
out_file = File("Results.pvd","compressed")
# Set initial condition
u = Function(V)
u.assign(u0)
t = 0.0
out_file << (u,t)
u_initial = Function(V)
u_initial.assign(u0)
t_save = 0
num_samples = 20
# Time - stepping
while t < T:
# assign u0
u0.assign(u)
#Assemble vector and apply boundary conditions
A = assemble(a)
b = asseble(L)
t_save += dt
if t_save > T/ num_samples or t >= T-dt:
print("Saving!")
#Save the solution to file
out_file << (uv,t)
t_save = 0
#Move to next interval and adjust boundary condition
t += dt
There's a typo in your line b = asseble(L) --> b=assemble(L)
Perhaps this tiny error is giving you the issue? Although I'd imagine the error message would be more descriptive.

On the implementation of a simple ant colony algorithm

In this paper, a very simple model is described to illustrate how the ant colony algorithm works. In short, it assumes two nodes which are connected via two links one of which is shorter. Then, given a pheromone increment and a pheromone evaporation dynamics, one expects that all ants eventually pick the shorter path.
Now, I'm trying to replicate the simulation of this paper corresponding to scenario above whose result should be (more or less) like below.
Here is an implementation of mine (taking the same specification as that of the test above).
import random
import matplotlib.pyplot as plt
N = 10
l1 = 1
l2 = 2
ru = 0.5
Q = 1
tau1 = 0.5
tau2 = 0.5
epochs = 150
success = [0 for x in range(epochs)]
def compute_probability(tau1, tau2):
return tau1/(tau1 + tau2), tau2/(tau1 + tau2)
def select_path(prob1, prob2):
if prob1 > prob2:
return 1
if prob1 < prob2:
return 2
if prob1 == prob2:
return random.choice([1,2])
def update_accumulation(link_id):
global tau1
global tau2
if link_id == 1:
tau1 += Q / l1
return tau1
if link_id == 2:
tau2 += Q / l2
return tau2
def update_evapuration():
global tau1
global tau2
tau1 *= (1-ru)
tau2 *= (1-ru)
return tau1, tau2
def report_results(success):
plt.plot(success)
plt.show()
for epoch in range(epochs-1):
temp = 0
for ant in range(N-1):
prob1, prob2 = compute_probability(tau1, tau2)
selected_path = select_path(prob1,prob2)
if selected_path == 1:
temp += 1
update_accumulation(selected_path)
update_evapuration()
success[epoch] = temp
report_results(success)
However, what I get is fairly weird as below.
It seems that my understanding of how pheromone should be updated is flawed.
So, can one address what I am missing in this implementation?
Three problems in the proposed approach:
As #Mark mentioned in his comment, you need a weighted random choice. Otherwise the proposed approach will likely always pick one of the paths and the plot will result in a straight line as you show above. However, I think this was part of the solution, because even with this, you will likely still get a straight line because of early convergence, which led two problem two.
Ant Colony Optimization is a metaheuristic that needs several (hyper) parameters configured to guide the search for a certain solution (e.g., tau from above or number of ants). Fine tuning this parameters is important because you can converge early on a particular result (which is fine to some extent - if you want to use it as an heuristic). But the purpose of a metaheuristic is to provide you with some middle ground between the exact and heuristic algorithms, which makes the continous exploration/exploitation an important part of its workings. This means the parameters need to be careful optimised for your problem size/type.
Given that the ACO uses a probabilistic approach for guiding the search (and as the plot from the referenced paper is showing), you will need to run the experiment several times and compute some statistic on those numbers. In my case below, I computed the average over 100 samples.
import random
import matplotlib.pyplot as plt
N = 10
l1 = 1.1
l2 = 1.5
ru = 0.05
Q = 1
tau1 = 0.5
tau2 = 0.5
samples = 10
epochs = 150
success = [0 for x in range(epochs)]
def compute_probability(tau1, tau2):
return tau1/(tau1 + tau2), tau2/(tau1 + tau2)
def weighted_random_choice(choices):
max = sum(choices.values())
pick = random.uniform(0, max)
current = 0
for key, value in choices.items():
current += value
if current > pick:
return key
def select_path(prob1, prob2):
choices = {1: prob1, 2: prob2}
return weighted_random_choice(choices)
def update_accumulation(link_id):
global tau1
global tau2
if link_id == 1:
tau1 += Q / l1
else:
tau2 += Q / l2
def update_evaporation():
global tau1
global tau2
tau1 *= (1-ru)
tau2 *= (1-ru)
def report_results(success):
plt.ylim(0.0, 1.0)
plt.xlim(0, 150)
plt.plot(success)
plt.show()
for sample in range(samples):
for epoch in range(epochs):
temp = 0
for ant in range(N):
prob1, prob2 = compute_probability(tau1, tau2)
selected_path = select_path(prob1, prob2)
if selected_path == 1:
temp += 1
update_accumulation(selected_path)
update_evaporation()
ratio = ((temp + 0.0) / N)
success[epoch] += ratio
# reset pheromone values here to evaluate new sample
tau1 = 0.5
tau2 = 0.5
success = [x / samples for x in success]
for x in success:
print(x)
report_results(success)
The code above should return something close to the desired plot.

Integrating in Python

I've been meaning to integrate in python, but I don't use Scipy, Numpy, or none of any other programs which go can be integrated into python. I'm pretty much a novice when it comes to coding, but I need help integrating. I've copied a small code but I still need to improve it.
def LeftEndSum(startingx, endingx, numberofRectangles) :
width = (float(endingx) - float(startingx)) / numberofRectangles
runningSum = 0
for i in range(numberofRectangles) :
height = f(startingx + i*width)
area = height * width
runningSum += area
return runningSum
I'm trying to integrate, but i want to get a list of data points that I can then graph into a plot at the end of the integration
I had an idea of defining an interval [a,b] and a delta n=(a change in # boxes between points), where I can do a convergence test in order to stop the loop to get the points. the convergence test would go
if I (n(old value)+delta(n))-I (n(old value))/I (n(old)) < epsilon
where epsilon=1x10^-6
in which if the integrated values the code breaks
Let's say you want to integrate y = x*x from 0.0 to 10.0, we know the answer is 333.33333. Here's some code to do that:
def y_equals_x_squared(x):
y = x*x
return y
def LeftEndSum(startingx, endingx, numberofRectangles) :
width = (float(endingx) - float(startingx)) / numberofRectangles
#print "width = " + str(width)
runningSum = 0
i = 1
while i <= numberofRectangles:
x = (endingx - startingx)/(numberofRectangles) * (i - 1) + startingx
height = y_equals_x_squared(x)
area = height * width
#print "i, x , height, area = " + str(i) + ", " + str(x) + ", " + str(height) + ", " + str(area)
runningSum += area
i += 1
return runningSum
#-----------------------------------------------------------------------------
startingx = 0.0
endingx = 10.0
#
numberofRectangles = 3
old_answer = LeftEndSum(startingx, endingx, numberofRectangles)
#
numberofRectangles = 4
new_answer = LeftEndSum(startingx, endingx, numberofRectangles)
#
delta_answer = abs(new_answer - old_answer)
#
tolerance = 0.0001
max_iterations = 500
iteration_count = 0
iterations = []
answers = []
while delta_answer > tolerance:
numberofRectangles += 100
new_answer = LeftEndSum(startingx, endingx, numberofRectangles)
delta_answer = abs(new_answer - old_answer)
old_answer = new_answer
iteration_count += 1
iterations.append(iteration_count)
answers.append(new_answer)
print "iteration_count, new_answer = " + str(iteration_count) + ", " + str(new_answer)
if(iteration_count > max_iterations):
print "reached max_iterations, breaking"
break
#
OutputFile = "Integration_Results.txt"
with open(OutputFile, 'a') as the_file:
for index in range(len(answers)):
the_file.write(str(index) + " " + str(answers[index]) + "\n")
#
import matplotlib.pyplot as plt
#
fig, ax = plt.subplots()
ax.plot(iterations, answers, 'r-', label = "Increasing # Rectangles")
title_temp = "Simple Integration"
plt.title(title_temp, fontsize=12, fontweight='bold', color='green')
ax.legend(loc='best', ncol=1, fancybox=True, shadow=True)
plt.xlabel('Number of Iterations')
plt.ylabel('Answer')
ax.grid(True)
plt.show(block=True)
Notice we plot the answer versus number of iterations at the end and it approaches the real answer very slowly as the number of iterations increase. There are other integration methods that will do better than simple rectangles such as the trapezoidal rule. When you put in a while loop and check for a tolerance always put in a max_iterations check so you don't get stuck in an infinite loop.
You can check your answer here:
http://www.integral-calculator.com/
That's how we know the answer is 333.3333
It makes more sense to include f as a parameter in the function. Why hardwire in a fixed formula? Furthermore, the code can be drastically simplified by using the built-in function sum applied to a generator:
def riemannSum(f,a,b,n,sample = 'L'):
"""computes Riemann sum of f over [a,b] using n rectangles and left ('L'), right ('R') or midpoints ('M')"""
h = (b-a)/float(n)
if sample.upper() == 'L':
s = a #s = first sample point
elif sample.upper() == 'R':
s = a + h
else:
s = a + h/2.0
return h*sum(f(s+i*h) for i in range(n))
You can define functions explicitly and then integrate them:
>>> def reciprocal(x): return 1/x
>>> riemannSum(reciprocal,1,2,100)
0.6956534304818242
(the exact value is the natural log of 2, which is approximately 0.693147)
Or, you can use anonymous functions (lambda expressions):
>>> riemannSum(lambda x: x**2,0,1,100,'m')
0.333325
Or, you can use functions already in the math module:
>>> riemannSum(math.sin,0,math.pi,10)
1.9835235375094546
None of these methods are very accurate. A more accurate on is Simpson's Rule which is also pretty easy to do in Python:
def simpsonsRule(f,a,b,n):
if n%2 == 1:
return "Not applicable"
else:
h = (b-a)/float(n)
s = f(a) + sum((4 if i%2 == 1 else 2)*f(a+i*h) for i in range(1,n)) + f(b)
return s*h/3.0
For example:
>>> simpsonsRule(math.sin,0,math.pi,10)
2.0001095173150043
This is much more accurate than the Riemann sum with 10 rectangles (the true value is 2).

Perceptron problems

I am trying to make a training set of data points by making a line (perceptron) f and making the points on one side +1 and -1 on the other. Then making a new line g and trying to get it as close to f as possible by updating with w = w+ y(t)x(t) where w is weights and y(t) is +1,-1 and x(t) is coordinates of a missclassified point. after implementing this tho i am not getting a very good fit from g to f. here is my code and some sample outputs.
import random
random.seed()
points = [ [1, random.randint(-25, 25), random.randint(-25,25), 0] for k in range(1000)]
weights = [.1,.1,.1]
misclassified = []
############################################################# Function f
interceptf = (0,random.randint(-5,5))
slopef = (random.randint(-10, 10),random.randint(-10,10))
point1f = ((interceptf[0] + slopef[0]),(interceptf[1] + slopef[1]))
point2f = ((interceptf[0] - slopef[0]),(interceptf[1] - slopef[1]))
############################################################# Function G starting
interceptg = (-weights[0],weights[2])
slopeg = (-weights[1],weights[2])
point1g = ((interceptg[0] + slopeg[0]),(interceptg[1] + slopeg[1]))
point2g = ((interceptg[0] - slopeg[0]),(interceptg[1] - slopeg[1]))
#############################################################
def isLeft(a, b, c):
return ((b[0] - a[0])*(c[1] - a[1]) - (b[1] - a[1])*(c[0] - a[0])) > 0
for i in points:
if isLeft(point1f,point2f,i):
i[3]=1
else:
i[3]=-1
for i in points:
if (isLeft(point1g,point2g,i)) and (i[3] == -1):
misclassified.append(i)
if (not isLeft(point1g,point2g,i)) and (i[3] == 1):
misclassified.append(i)
print len(misclassified)
while misclassified:
first = misclassified[0]
misclassified.pop(0)
a = [first[0],first[1],first[2]]
b = first[3]
a[:] = [x*b for x in a]
weights = [(x + y) for x, y in zip(weights,a)]
interceptg = (-weights[0],weights[2])
slopeg = (-weights[1],weights[2])
point1g = ((interceptg[0] + slopeg[0]),(interceptg[1] + slopeg[1]))
point2g = ((interceptg[0] - slopeg[0]),(interceptg[1] - slopeg[1]))
check = 0
for i in points:
if (isLeft(point1g,point2g,i)) and (i[3] == -1):
check += 1
if (not isLeft(point1g,point2g,i)) and (i[3] == 1):
check += 1
print weights
print check
117 <--- number of original missclassifieds with g
[-116.9, -300.9, 190.1] <--- final weights
617 <--- number of original missclassifieds with g after algorithm
956 <--- number of original missclassifieds with g
[-33.9, -12769.9, -572.9] <--- final weights
461 <--- number of original missclassifieds with g after algorithm
There are at least few problems with your algorithm:
Your "while" conditions is wrong - the perceptron learning is not about iterating once through all misclassified points as you do now. The algorithm should iterate through all the points for as long as any of them is missclassified. In particular - each update can make some correctly classified point as the wrong one, so you have to always iterate through all of them and check if everything is fine.
I am pretty sure that what you actually wanted is update rule in form of (y(i)-p(i))x(i) where p(i) is predicted label and y(i) is a true label (but this obviously degenrates to your method if you only update misclassifieds)

Python Implementation of Viterbi Algorithm

I'm doing a Python project in which I'd like to use the Viterbi Algorithm. Does anyone know of a complete Python implementation of the Viterbi algorithm? The correctness of the one on Wikipedia seems to be in question on the talk page. Does anyone have a pointer?
Here's mine. Its paraphrased directly from the psuedocode implemenation from wikipedia. It uses numpy for conveince of their ndarray but is otherwise a pure python3 implementation.
import numpy as np
def viterbi(y, A, B, Pi=None):
"""
Return the MAP estimate of state trajectory of Hidden Markov Model.
Parameters
----------
y : array (T,)
Observation state sequence. int dtype.
A : array (K, K)
State transition matrix. See HiddenMarkovModel.state_transition for
details.
B : array (K, M)
Emission matrix. See HiddenMarkovModel.emission for details.
Pi: optional, (K,)
Initial state probabilities: Pi[i] is the probability x[0] == i. If
None, uniform initial distribution is assumed (Pi[:] == 1/K).
Returns
-------
x : array (T,)
Maximum a posteriori probability estimate of hidden state trajectory,
conditioned on observation sequence y under the model parameters A, B,
Pi.
T1: array (K, T)
the probability of the most likely path so far
T2: array (K, T)
the x_j-1 of the most likely path so far
"""
# Cardinality of the state space
K = A.shape[0]
# Initialize the priors with default (uniform dist) if not given by caller
Pi = Pi if Pi is not None else np.full(K, 1 / K)
T = len(y)
T1 = np.empty((K, T), 'd')
T2 = np.empty((K, T), 'B')
# Initilaize the tracking tables from first observation
T1[:, 0] = Pi * B[:, y[0]]
T2[:, 0] = 0
# Iterate throught the observations updating the tracking tables
for i in range(1, T):
T1[:, i] = np.max(T1[:, i - 1] * A.T * B[np.newaxis, :, y[i]].T, 1)
T2[:, i] = np.argmax(T1[:, i - 1] * A.T, 1)
# Build the output, optimal model trajectory
x = np.empty(T, 'B')
x[-1] = np.argmax(T1[:, T - 1])
for i in reversed(range(1, T)):
x[i - 1] = T2[x[i], i]
return x, T1, T2
I found the following code in the example repository of Artificial Intelligence: A Modern Approach. Is something like this what you're looking for?
def viterbi_segment(text, P):
"""Find the best segmentation of the string of characters, given the
UnigramTextModel P."""
# best[i] = best probability for text[0:i]
# words[i] = best word ending at position i
n = len(text)
words = [''] + list(text)
best = [1.0] + [0.0] * n
## Fill in the vectors best, words via dynamic programming
for i in range(n+1):
for j in range(0, i):
w = text[j:i]
if P[w] * best[i - len(w)] >= best[i]:
best[i] = P[w] * best[i - len(w)]
words[i] = w
## Now recover the sequence of best words
sequence = []; i = len(words)-1
while i > 0:
sequence[0:0] = [words[i]]
i = i - len(words[i])
## Return sequence of best words and overall probability
return sequence, best[-1]
Hmm I can post mine. Its not pretty though, please let me know if you need clarification. I wrote this relatively recently for specifically part of speech tagging.
class Trellis:
trell = []
def __init__(self, hmm, words):
self.trell = []
temp = {}
for label in hmm.labels:
temp[label] = [0,None]
for word in words:
self.trell.append([word,copy.deepcopy(temp)])
self.fill_in(hmm)
def fill_in(self,hmm):
for i in range(len(self.trell)):
for token in self.trell[i][1]:
word = self.trell[i][0]
if i == 0:
self.trell[i][1][token][0] = hmm.e(token,word)
else:
max = None
guess = None
c = None
for k in self.trell[i-1][1]:
c = self.trell[i-1][1][k][0] + hmm.t(k,token)
if max == None or c > max:
max = c
guess = k
max += hmm.e(token,word)
self.trell[i][1][token][0] = max
self.trell[i][1][token][1] = guess
def return_max(self):
tokens = []
token = None
for i in range(len(self.trell)-1,-1,-1):
if token == None:
max = None
guess = None
for k in self.trell[i][1]:
if max == None or self.trell[i][1][k][0] > max:
max = self.trell[i][1][k][0]
token = self.trell[i][1][k][1]
guess = k
tokens.append(guess)
else:
tokens.append(token)
token = self.trell[i][1][token][1]
tokens.reverse()
return tokens
I have just corrected the pseudo implementation of Viterbi in Wikipedia. From the initial (incorrect) version, it took me a while to figure out where I was going wrong but I finally managed it, thanks partly to Kevin Murphy's implementation of the viterbi_path.m in the MatLab HMM toolbox.
In the context of an HMM object with variables as shown:
hmm = HMM()
hmm.priors = np.array([0.5, 0.5]) # pi = prior probs
hmm.transition = np.array([[0.75, 0.25], # A = transition probs. / 2 states
[0.32, 0.68]])
hmm.emission = np.array([[0.8, 0.1, 0.1], # B = emission (observation) probs. / 3 obs modes
[0.1, 0.2, 0.7]])
The Python function to run Viterbi (best-path) algorithm is below:
def viterbi (self,observations):
"""Return the best path, given an HMM model and a sequence of observations"""
# A - initialise stuff
nSamples = len(observations[0])
nStates = self.transition.shape[0] # number of states
c = np.zeros(nSamples) #scale factors (necessary to prevent underflow)
viterbi = np.zeros((nStates,nSamples)) # initialise viterbi table
psi = np.zeros((nStates,nSamples)) # initialise the best path table
best_path = np.zeros(nSamples); # this will be your output
# B- appoint initial values for viterbi and best path (bp) tables - Eq (32a-32b)
viterbi[:,0] = self.priors.T * self.emission[:,observations(0)]
c[0] = 1.0/np.sum(viterbi[:,0])
viterbi[:,0] = c[0] * viterbi[:,0] # apply the scaling factor
psi[0] = 0;
# C- Do the iterations for viterbi and psi for time>0 until T
for t in range(1,nSamples): # loop through time
for s in range (0,nStates): # loop through the states #(t-1)
trans_p = viterbi[:,t-1] * self.transition[:,s]
psi[s,t], viterbi[s,t] = max(enumerate(trans_p), key=operator.itemgetter(1))
viterbi[s,t] = viterbi[s,t]*self.emission[s,observations(t)]
c[t] = 1.0/np.sum(viterbi[:,t]) # scaling factor
viterbi[:,t] = c[t] * viterbi[:,t]
# D - Back-tracking
best_path[nSamples-1] = viterbi[:,nSamples-1].argmax() # last state
for t in range(nSamples-1,0,-1): # states of (last-1)th to 0th time step
best_path[t-1] = psi[best_path[t],t]
return best_path
This is an old question, but none of the other answers were quite what I needed because my application doesn't have specific observed states.
Taking after #Rhubarb, I've also re-implemented Kevin Murphey's Matlab implementation (see viterbi_path.m), but I've kept it closer to the original. I've included a simple test case as well.
import numpy as np
def viterbi_path(prior, transmat, obslik, scaled=True, ret_loglik=False):
'''Finds the most-probable (Viterbi) path through the HMM state trellis
Notation:
Z[t] := Observation at time t
Q[t] := Hidden state at time t
Inputs:
prior: np.array(num_hid)
prior[i] := Pr(Q[0] == i)
transmat: np.ndarray((num_hid,num_hid))
transmat[i,j] := Pr(Q[t+1] == j | Q[t] == i)
obslik: np.ndarray((num_hid,num_obs))
obslik[i,t] := Pr(Z[t] | Q[t] == i)
scaled: bool
whether or not to normalize the probability trellis along the way
doing so prevents underflow by repeated multiplications of probabilities
ret_loglik: bool
whether or not to return the log-likelihood of the best path
Outputs:
path: np.array(num_obs)
path[t] := Q[t]
'''
num_hid = obslik.shape[0] # number of hidden states
num_obs = obslik.shape[1] # number of observations (not observation *states*)
# trellis_prob[i,t] := Pr((best sequence of length t-1 goes to state i), Z[1:(t+1)])
trellis_prob = np.zeros((num_hid,num_obs))
# trellis_state[i,t] := best predecessor state given that we ended up in state i at t
trellis_state = np.zeros((num_hid,num_obs), dtype=int) # int because its elements will be used as indicies
path = np.zeros(num_obs, dtype=int) # int because its elements will be used as indicies
trellis_prob[:,0] = prior * obslik[:,0] # element-wise mult
if scaled:
scale = np.ones(num_obs) # only instantiated if necessary to save memory
scale[0] = 1.0 / np.sum(trellis_prob[:,0])
trellis_prob[:,0] *= scale[0]
trellis_state[:,0] = 0 # arbitrary value since t == 0 has no predecessor
for t in xrange(1, num_obs):
for j in xrange(num_hid):
trans_probs = trellis_prob[:,t-1] * transmat[:,j] # element-wise mult
trellis_state[j,t] = trans_probs.argmax()
trellis_prob[j,t] = trans_probs[trellis_state[j,t]] # max of trans_probs
trellis_prob[j,t] *= obslik[j,t]
if scaled:
scale[t] = 1.0 / np.sum(trellis_prob[:,t])
trellis_prob[:,t] *= scale[t]
path[-1] = trellis_prob[:,-1].argmax()
for t in range(num_obs-2, -1, -1):
path[t] = trellis_state[(path[t+1]), t+1]
if not ret_loglik:
return path
else:
if scaled:
loglik = -np.sum(np.log(scale))
else:
p = trellis_prob[path[-1],-1]
loglik = np.log(p)
return path, loglik
if __name__=='__main__':
# Assume there are 3 observation states, 2 hidden states, and 5 observations
priors = np.array([0.5, 0.5])
transmat = np.array([
[0.75, 0.25],
[0.32, 0.68]])
emmat = np.array([
[0.8, 0.1, 0.1],
[0.1, 0.2, 0.7]])
observations = np.array([0, 1, 2, 1, 0], dtype=int)
obslik = np.array([emmat[:,z] for z in observations]).T
print viterbi_path(priors, transmat, obslik) #=> [0 1 1 1 0]
print viterbi_path(priors, transmat, obslik, scaled=False) #=> [0 1 1 1 0]
print viterbi_path(priors, transmat, obslik, ret_loglik=True) #=> (array([0, 1, 1, 1, 0]), -7.776472586614755)
print viterbi_path(priors, transmat, obslik, scaled=False, ret_loglik=True) #=> (array([0, 1, 1, 1, 0]), -8.0120386579275227)
Note that this implementation does not use emission probabilities directly but uses a variable obslik. Generally, emissions[i,j] := Pr(observed_state == j | hidden_state == i) for a particular observed state i, making emissions.shape == (num_hidden_states, num_obs_states).
However, given a sequence observations[t] := observation at time t, all the Viterbi Algorithm requires is the likelihood of that observation for each hidden state. Hence, obslik[i,t] := Pr(observations[t] | hidden_state == i). The actual value the of the observed state isn't necessary.
I have modified #Rhubarb's answer for the condition where the marginal probabilities are already known (e.g by computing the Forward Backward algorithm).
def viterbi (transition_probabilities, conditional_probabilities):
# Initialise everything
num_samples = conditional_probabilities.shape[1]
num_states = transition_probabilities.shape[0] # number of states
c = np.zeros(num_samples) #scale factors (necessary to prevent underflow)
viterbi = np.zeros((num_states,num_samples)) # initialise viterbi table
best_path_table = np.zeros((num_states,num_samples)) # initialise the best path table
best_path = np.zeros(num_samples).astype(np.int32) # this will be your output
# B- appoint initial values for viterbi and best path (bp) tables - Eq (32a-32b)
viterbi[:,0] = conditional_probabilities[:,0]
c[0] = 1.0/np.sum(viterbi[:,0])
viterbi[:,0] = c[0] * viterbi[:,0] # apply the scaling factor
# C- Do the iterations for viterbi and psi for time>0 until T
for t in range(1, num_samples): # loop through time
for s in range (0,num_states): # loop through the states #(t-1)
trans_p = viterbi[:, t-1] * transition_probabilities[:,s] # transition probs of each state transitioning
best_path_table[s,t], viterbi[s,t] = max(enumerate(trans_p), key=operator.itemgetter(1))
viterbi[s,t] = viterbi[s,t] * conditional_probabilities[s][t]
c[t] = 1.0/np.sum(viterbi[:,t]) # scaling factor
viterbi[:,t] = c[t] * viterbi[:,t]
## D - Back-tracking
best_path[num_samples-1] = viterbi[:,num_samples-1].argmax() # last state
for t in range(num_samples-1,0,-1): # states of (last-1)th to 0th time step
best_path[t-1] = best_path_table[best_path[t],t]
return best_path

Categories

Resources