Perceptron problems - python

I am trying to make a training set of data points by making a line (perceptron) f and making the points on one side +1 and -1 on the other. Then making a new line g and trying to get it as close to f as possible by updating with w = w+ y(t)x(t) where w is weights and y(t) is +1,-1 and x(t) is coordinates of a missclassified point. after implementing this tho i am not getting a very good fit from g to f. here is my code and some sample outputs.
import random
random.seed()
points = [ [1, random.randint(-25, 25), random.randint(-25,25), 0] for k in range(1000)]
weights = [.1,.1,.1]
misclassified = []
############################################################# Function f
interceptf = (0,random.randint(-5,5))
slopef = (random.randint(-10, 10),random.randint(-10,10))
point1f = ((interceptf[0] + slopef[0]),(interceptf[1] + slopef[1]))
point2f = ((interceptf[0] - slopef[0]),(interceptf[1] - slopef[1]))
############################################################# Function G starting
interceptg = (-weights[0],weights[2])
slopeg = (-weights[1],weights[2])
point1g = ((interceptg[0] + slopeg[0]),(interceptg[1] + slopeg[1]))
point2g = ((interceptg[0] - slopeg[0]),(interceptg[1] - slopeg[1]))
#############################################################
def isLeft(a, b, c):
return ((b[0] - a[0])*(c[1] - a[1]) - (b[1] - a[1])*(c[0] - a[0])) > 0
for i in points:
if isLeft(point1f,point2f,i):
i[3]=1
else:
i[3]=-1
for i in points:
if (isLeft(point1g,point2g,i)) and (i[3] == -1):
misclassified.append(i)
if (not isLeft(point1g,point2g,i)) and (i[3] == 1):
misclassified.append(i)
print len(misclassified)
while misclassified:
first = misclassified[0]
misclassified.pop(0)
a = [first[0],first[1],first[2]]
b = first[3]
a[:] = [x*b for x in a]
weights = [(x + y) for x, y in zip(weights,a)]
interceptg = (-weights[0],weights[2])
slopeg = (-weights[1],weights[2])
point1g = ((interceptg[0] + slopeg[0]),(interceptg[1] + slopeg[1]))
point2g = ((interceptg[0] - slopeg[0]),(interceptg[1] - slopeg[1]))
check = 0
for i in points:
if (isLeft(point1g,point2g,i)) and (i[3] == -1):
check += 1
if (not isLeft(point1g,point2g,i)) and (i[3] == 1):
check += 1
print weights
print check
117 <--- number of original missclassifieds with g
[-116.9, -300.9, 190.1] <--- final weights
617 <--- number of original missclassifieds with g after algorithm
956 <--- number of original missclassifieds with g
[-33.9, -12769.9, -572.9] <--- final weights
461 <--- number of original missclassifieds with g after algorithm

There are at least few problems with your algorithm:
Your "while" conditions is wrong - the perceptron learning is not about iterating once through all misclassified points as you do now. The algorithm should iterate through all the points for as long as any of them is missclassified. In particular - each update can make some correctly classified point as the wrong one, so you have to always iterate through all of them and check if everything is fine.
I am pretty sure that what you actually wanted is update rule in form of (y(i)-p(i))x(i) where p(i) is predicted label and y(i) is a true label (but this obviously degenrates to your method if you only update misclassifieds)

Related

How to include numbers we need in a list which is generated by some stochastic algorithm

I need to implement a stochastic algorithm that provides as output the times and the states at the corresponding time points of a dynamic system. We include randomness in defining the time points by retrieving a random number from the uniform distribution. What I want to do, is to find the state at time points 0,1,2,...,24. Given the randomness of the algorithm, the time points 1, 2, 3,...,24 are not necessarily hit. We my include rounding at two decimal places, but even with rounding I can not find/insert all of these time points. The question is, how to change the code so as to be able to include in the list of the time points the numbers 1, 2,..., 24 while preserving the stochasticity of the algorithm ? Thanks for any suggestion.
import numpy as np
import random
import math as m
np.random.seed(seed = 5)
# Stoichiometric matrix
S = np.array([(-1, 0), (1, -1)])
# Reaction parameters
ke = 0.3; ka = 0.5
k = [ke, ka]
# Initial state vector at time t0
X1 = [200]; X2 = [0]
# We will update it for each time.
X = [X1, X2]
# Initial time is t0 = 0, which we will update.
t = [0]
# End time
tfinal = 24
# The propensity vector R concerning the last/updated value of time
def ReactionRates(k, X1, X2):
R = np.zeros((2,1))
R[0] = k[1] * X1[-1]
R[1] = k[0] * X2[-1]
return R
# We implement the Gillespie (SSA) algorithm
while True:
# Reaction propensities/rates
R = ReactionRates(k,X1,X2)
propensities = R
propensities_sum = sum(R)[0]
if propensities_sum == 0:
break
# we include randomness
u1 = np.random.uniform(0,1)
delta_t = (1/propensities_sum) * m.log(1/u1)
if t[-1] + delta_t > tfinal:
break
t.append(t[-1] + delta_t)
b = [0,R[0], R[1]]
u2 = np.random.uniform(0,1)
# Choose j
lambda_u2 = propensities_sum * u2
for j in range(len(b)):
if sum(b[0:j-1+1]) < lambda_u2 <= sum(b[1:j+1]):
break # out of for j
# make j zero based
j -= 1
# We update the state vector
X1.append(X1[-1] + S.T[j][0])
X2.append(X2[-1] + S.T[j][1])
# round t values
t = [round(tt,2) for tt in t]
print("The time steps:", t)
print("The second component of the state vector:", X2)
After playing with your model, I conclude, that interpolation works fine.
Basically, just append the following lines to your code:
ts = np.arange(tfinal+1)
xs = np.interp(ts, t, X2)
and if you have matplotlib installed, you can visualize using
import matplotlib.pyplot as plt
plt.plot(t, X2)
plt.plot(ts, xs)
plt.show()

Cannot pass one test case using Bellman Ford Algorithm

Problem Description
Task. Given an directed graph with possibly negative edge weights and with 𝑛 vertices and 𝑚 edges, check
whether it contains a cycle of negative weight.
Input Format. A graph is given in the standard format.
Constraints. 1 ≤ 𝑛 ≤ 103
, 0 ≤ 𝑚 ≤ 104
, edge weights are integers of absolute value at most 103
.
Output Format. Output 1 if the graph contains a cycle of negative weight and 0 otherwise.
Here is my solution:
import sys
def negative_cycle(adj, cost):
#write your code here
distance = [float('inf')] * len(adj)
distance[0] = 0
edges = []
for i in range(len(adj)):
for j in adj[i]:
edges.append([i,j])
for _ in range(len(adj)-1):
for i in edges:
a = i[0]
b = adj[a].index(i[1])
if distance[i[1]] > distance[a] + cost[a][b] and distance[a] != float('inf'):
distance[i[1]] = distance[a] + cost[a][b]
for i in edges:
a, b = i[0], adj[i[0]].index(i[1])
if distance[i[1]] > distance[a] + cost[a][b] and distance[a] != float('inf'):
return 1
return 0
if __name__ == '__main__':
input = sys.stdin.read()
data = list(map(int, input.split()))
n, m = data[0:2]
data = data[2:]
edges = list(zip(zip(data[0:(3 * m):3], data[1:(3 * m):3]), data[2:(3 * m):3]))
data = data[3 * m:]
adj = [[] for _ in range(n)]
cost = [[] for _ in range(n)]
for ((a, b), w) in edges:
adj[a - 1].append(b - 1)
cost[a - 1].append(w)
print(negative_cycle(adj, cost))
My code works for most test cases but fails on one test case.
Failed case #12/19: Wrong answer
(Time used: 0.23/10.00, memory used: 14229504/2147483648.)
The input format is
Number of vertices, number of edges
Vertice 1, vertice 2, weight of edge
...
...
... For all edges
May I know whats the error in this code?
cost[a - 1].append(w)
Here, you're not taking the cost of the edges properly. It should be cost[a - 1][b - 1] = w, the cost of the edge going from A -> B.

Use loop for np.random or reshape array to matrix?

I am new to programming in general, however I am trying really hard for a project to randomly choose some outcomes depending on the probability of that outcome happening for lotteries that i have generated and i would like to use a loop to get random numbers each time.
This is my code:
import numpy as np
p = np.arange(0.01, 1, 0.001, dtype = float)
alpha = 0.5
alpha = float(alpha)
alpha = np.zeros((1, len(p))) + alpha
def w(alpha, p):
return np.exp(-(-np.log(p))**alpha)
w = w(alpha, p)
def P(w):
return np.exp(np.log2(w))
prob_win = P(w)
prob_lose = 1 - prob_win
E = 10
E = float(E)
E = np.zeros((1, len(p))) + E
b = 0
b = float(b)
b = np.zeros((1, len(p))) + b
def A(E, b, prob_win):
return (E - b * (1 - prob_win)) / prob_win
a = A(E, b, prob_win)
a = a.squeeze()
prob_array = (prob_win, prob_lose)
prob_matrix = np.vstack(prob_array).T.squeeze()
outcomes_array = (a, b)
outcomes_matrix = np.vstack(outcomes_array).T
outcome_pairs = np.vsplit(outcomes_matrix, len(p))
outcome_pairs = np.array(outcome_pairs).astype(np.float)
prob_pairs = np.vsplit(prob_matrix, len(p))
prob_pairs = np.array(prob_pairs)
nominalized_prob_pairs = [outcome_pairs / np.sum(outcome_pairs) for
outcome_pairs in np.vsplit(prob_pairs, len(p)) ]
The code works fine but I would like to use a loop or something similar for the next line of code as I want to get for each row/ pair of probabilities to get 5 realizations. When i use size = 5 i just get a really long list but I do not know which values still belong to the pairs as when size = 1
realisations = np.concatenate([np.random.choice(outcome_pairs[i].ravel(),
size=1 , p=nominalized_prob_pairs[i].ravel()) for i in range(len(outcome_pairs))])
or if I use size=5 as below how can I match the realizations to the initial probabilities? Do i need to cut the array after every 5th element and then store the values in a matrix with 5 columns and a new row for every 5th element of the initial array? if yes how could I do this?
realisations = np.concatenate([np.random.choice(outcome_pairs[i].ravel(),
size=1 , p=nominalized_prob_pairs[i].ravel()) for i in range(len(outcome_pairs))])
What are you trying to produce exactly ? Be more concise.
Here is a starter clean code where you can produce linear data.
import numpy as np
def generate_data(n_samples, variance):
# generate 2D data
X = np.random.random((n_samples, 1))
# adding a vector of ones to ease calculus
X = np.concatenate((np.ones((n_samples, 1)), X), axis=1)
# generate two random coefficients
W = np.random.random((2, 1))
# construct targets with our data and weights
y = X # W
# add some noise to our data
y += np.random.normal(0, variance, (n_samples, 1))
return X, y, W
if __name__ == "__main__":
X, Y, W = generate_data(10, 0.5)
# check random value of x for example
for x in X:
print(x, end=' --> ')
if x[1] <= 0.4:
print('prob <= 0.4')
else:
print('prob > 0.4')

Adaptive mesh refinement - Python

i'm currently incredibly stuck on what isn't working in my code and have been staring at it for hours. I have created some functions to approximate the solution to the laplace equation adaptively using the finite element method then estimate it's error using the dual weighted residual. The error function should give a vector of errors (one error for each element), i then choose the biggest errors, add more elements around them, solve again and then recheck the error; however i have no idea why my error estimate isn't changing!
My first 4 functions are correct but i will include them incase someone wants to try the code:
def Poisson_Stiffness(x0):
"""Finds the Poisson equation stiffness matrix with any non uniform mesh x0"""
x0 = np.array(x0)
N = len(x0) - 1 # The amount of elements; x0, x1, ..., xN
h = x0[1:] - x0[:-1]
a = np.zeros(N+1)
a[0] = 1 #BOUNDARY CONDITIONS
a[1:-1] = 1/h[1:] + 1/h[:-1]
a[-1] = 1/h[-1]
a[N] = 1 #BOUNDARY CONDITIONS
b = -1/h
b[0] = 0 #BOUNDARY CONDITIONS
c = -1/h
c[N-1] = 0 #BOUNDARY CONDITIONS: DIRICHLET
data = [a.tolist(), b.tolist(), c.tolist()]
Positions = [0, 1, -1]
Stiffness_Matrix = diags(data, Positions, (N+1,N+1))
return Stiffness_Matrix
def NodalQuadrature(x0):
"""Finds the Nodal Quadrature Approximation of sin(pi x)"""
x0 = np.array(x0)
h = x0[1:] - x0[:-1]
N = len(x0) - 1
approx = np.zeros(len(x0))
approx[0] = 0 #BOUNDARY CONDITIONS
for i in range(1,N):
approx[i] = math.sin(math.pi*x0[i])
approx[i] = (approx[i]*h[i-1] + approx[i]*h[i])/2
approx[N] = 0 #BOUNDARY CONDITIONS
return approx
def Solver(x0):
Stiff_Matrix = Poisson_Stiffness(x0)
NodalApproximation = NodalQuadrature(x0)
NodalApproximation[0] = 0
U = scipy.sparse.linalg.spsolve(Stiff_Matrix, NodalApproximation)
return U
def Dualsolution(rich_mesh,qoi_rich_node): #BOUNDARY CONDITIONS?
"""Find Z from stiffness matrix Z = K^-1 Q over richer mesh"""
K = Poisson_Stiffness(rich_mesh)
Q = np.zeros(len(rich_mesh))
Q[qoi_rich_node] = 1.0
Z = scipy.sparse.linalg.spsolve(K,Q)
return Z
My error indicator function takes in an approximation Uh, with the mesh it is solved over, and finds eta = (f - Bu)z.
def Error_Indicators(Uh,U_mesh,Z,Z_mesh,f):
"""Take in U, Interpolate to same mesh as Z then solve for eta vector"""
u_inter = interp1d(U_mesh,Uh) #Interpolation of old mesh
U2 = u_inter(Z_mesh) #New function u for the new mesh to use in
Bz = Poisson_Stiffness(Z_mesh)
Bz = Bz.tocsr()
eta = np.empty(len(Z_mesh))
for i in range(len(Z_mesh)):
for j in range(len(Z_mesh)):
eta[i] += (f[i] - Bz[i,j]*U2[j])
for i in range(len(Z)):
eta[i] = eta[i]*Z[i]
return eta
My next function seems to adapt the mesh very well to the given error indicator! Just no idea why the indicator seems to stay the same regardless?
def Mesh_Refinement(base_mesh,tolerance,refinement,z_mesh,QOI_z_mesh):
"""Solve for U on a normal mesh, Take in Z, Find error indicators, adapt. OUTPUT NEW MESH"""
New_mesh = base_mesh
Z = Dualsolution(z_mesh,QOI_z_mesh) #Solve dual solution only once
f = np.empty(len(z_mesh))
for i in range(len(z_mesh)):
f[i] = math.sin(math.pi*z_mesh[i])
U = Solver(New_mesh)
eta = Error_Indicators(U,base_mesh,Z,z_mesh,f)
while max(abs(k) for k in eta) > tolerance:
orderedeta = np.sort(eta) #Sort error indicators LENGTH 40
biggest = np.flipud(orderedeta[int((1-refinement)*len(eta)):len(eta)])
position = np.empty(len(biggest))
ratio = float(len(New_mesh))/float(len(z_mesh))
for i in range(len(biggest)):
position[i] = eta.tolist().index(biggest[i])*ratio #GIVES WHAT NUMBER NODE TO REFINE
refine = np.zeros(len(position))
for i in range(len(position)):
refine[i] = math.floor(position[i])+0.5 #AT WHAT NODE TO PUT NEW ELEMENT 5.5 ETC
refine = np.flipud(sorted(set(refine)))
for i in range(len(refine)):
New_mesh = np.insert(New_mesh,refine[i]+0.5,(New_mesh[refine[i]+0.5]+New_mesh[refine[i]-0.5])/2)
U = Solver(New_mesh)
eta = Error_Indicators(U,New_mesh,Z,z_mesh,f)
print eta
An example input for this would be:
Mesh_Refinement(np.linspace(0,1,3),0.1,0.2,np.linspace(0,1,60),20)
I understand there is alot of code here but i am at a loss, i have no idea where to turn!
Please consider this piece of code from def Error_Indicators:
eta = np.empty(len(Z_mesh))
for i in range(len(Z_mesh)):
for j in range(len(Z_mesh)):
eta[i] = (f[i] - Bz[i,j]*U2[j])
Here you override eta[i] each j iteration, so the inner cycle proves useless and you can go directly to the last possible j. Did you mean to find a sum of the (f[i] - Bz[i,j]*U2[j]) series?
eta = np.empty(len(Z_mesh))
for i in range(len(Z_mesh)):
for j in range(len(Z_mesh)):
eta[i] += (f[i] - Bz[i,j]*U2[j])

Python Implementation of Viterbi Algorithm

I'm doing a Python project in which I'd like to use the Viterbi Algorithm. Does anyone know of a complete Python implementation of the Viterbi algorithm? The correctness of the one on Wikipedia seems to be in question on the talk page. Does anyone have a pointer?
Here's mine. Its paraphrased directly from the psuedocode implemenation from wikipedia. It uses numpy for conveince of their ndarray but is otherwise a pure python3 implementation.
import numpy as np
def viterbi(y, A, B, Pi=None):
"""
Return the MAP estimate of state trajectory of Hidden Markov Model.
Parameters
----------
y : array (T,)
Observation state sequence. int dtype.
A : array (K, K)
State transition matrix. See HiddenMarkovModel.state_transition for
details.
B : array (K, M)
Emission matrix. See HiddenMarkovModel.emission for details.
Pi: optional, (K,)
Initial state probabilities: Pi[i] is the probability x[0] == i. If
None, uniform initial distribution is assumed (Pi[:] == 1/K).
Returns
-------
x : array (T,)
Maximum a posteriori probability estimate of hidden state trajectory,
conditioned on observation sequence y under the model parameters A, B,
Pi.
T1: array (K, T)
the probability of the most likely path so far
T2: array (K, T)
the x_j-1 of the most likely path so far
"""
# Cardinality of the state space
K = A.shape[0]
# Initialize the priors with default (uniform dist) if not given by caller
Pi = Pi if Pi is not None else np.full(K, 1 / K)
T = len(y)
T1 = np.empty((K, T), 'd')
T2 = np.empty((K, T), 'B')
# Initilaize the tracking tables from first observation
T1[:, 0] = Pi * B[:, y[0]]
T2[:, 0] = 0
# Iterate throught the observations updating the tracking tables
for i in range(1, T):
T1[:, i] = np.max(T1[:, i - 1] * A.T * B[np.newaxis, :, y[i]].T, 1)
T2[:, i] = np.argmax(T1[:, i - 1] * A.T, 1)
# Build the output, optimal model trajectory
x = np.empty(T, 'B')
x[-1] = np.argmax(T1[:, T - 1])
for i in reversed(range(1, T)):
x[i - 1] = T2[x[i], i]
return x, T1, T2
I found the following code in the example repository of Artificial Intelligence: A Modern Approach. Is something like this what you're looking for?
def viterbi_segment(text, P):
"""Find the best segmentation of the string of characters, given the
UnigramTextModel P."""
# best[i] = best probability for text[0:i]
# words[i] = best word ending at position i
n = len(text)
words = [''] + list(text)
best = [1.0] + [0.0] * n
## Fill in the vectors best, words via dynamic programming
for i in range(n+1):
for j in range(0, i):
w = text[j:i]
if P[w] * best[i - len(w)] >= best[i]:
best[i] = P[w] * best[i - len(w)]
words[i] = w
## Now recover the sequence of best words
sequence = []; i = len(words)-1
while i > 0:
sequence[0:0] = [words[i]]
i = i - len(words[i])
## Return sequence of best words and overall probability
return sequence, best[-1]
Hmm I can post mine. Its not pretty though, please let me know if you need clarification. I wrote this relatively recently for specifically part of speech tagging.
class Trellis:
trell = []
def __init__(self, hmm, words):
self.trell = []
temp = {}
for label in hmm.labels:
temp[label] = [0,None]
for word in words:
self.trell.append([word,copy.deepcopy(temp)])
self.fill_in(hmm)
def fill_in(self,hmm):
for i in range(len(self.trell)):
for token in self.trell[i][1]:
word = self.trell[i][0]
if i == 0:
self.trell[i][1][token][0] = hmm.e(token,word)
else:
max = None
guess = None
c = None
for k in self.trell[i-1][1]:
c = self.trell[i-1][1][k][0] + hmm.t(k,token)
if max == None or c > max:
max = c
guess = k
max += hmm.e(token,word)
self.trell[i][1][token][0] = max
self.trell[i][1][token][1] = guess
def return_max(self):
tokens = []
token = None
for i in range(len(self.trell)-1,-1,-1):
if token == None:
max = None
guess = None
for k in self.trell[i][1]:
if max == None or self.trell[i][1][k][0] > max:
max = self.trell[i][1][k][0]
token = self.trell[i][1][k][1]
guess = k
tokens.append(guess)
else:
tokens.append(token)
token = self.trell[i][1][token][1]
tokens.reverse()
return tokens
I have just corrected the pseudo implementation of Viterbi in Wikipedia. From the initial (incorrect) version, it took me a while to figure out where I was going wrong but I finally managed it, thanks partly to Kevin Murphy's implementation of the viterbi_path.m in the MatLab HMM toolbox.
In the context of an HMM object with variables as shown:
hmm = HMM()
hmm.priors = np.array([0.5, 0.5]) # pi = prior probs
hmm.transition = np.array([[0.75, 0.25], # A = transition probs. / 2 states
[0.32, 0.68]])
hmm.emission = np.array([[0.8, 0.1, 0.1], # B = emission (observation) probs. / 3 obs modes
[0.1, 0.2, 0.7]])
The Python function to run Viterbi (best-path) algorithm is below:
def viterbi (self,observations):
"""Return the best path, given an HMM model and a sequence of observations"""
# A - initialise stuff
nSamples = len(observations[0])
nStates = self.transition.shape[0] # number of states
c = np.zeros(nSamples) #scale factors (necessary to prevent underflow)
viterbi = np.zeros((nStates,nSamples)) # initialise viterbi table
psi = np.zeros((nStates,nSamples)) # initialise the best path table
best_path = np.zeros(nSamples); # this will be your output
# B- appoint initial values for viterbi and best path (bp) tables - Eq (32a-32b)
viterbi[:,0] = self.priors.T * self.emission[:,observations(0)]
c[0] = 1.0/np.sum(viterbi[:,0])
viterbi[:,0] = c[0] * viterbi[:,0] # apply the scaling factor
psi[0] = 0;
# C- Do the iterations for viterbi and psi for time>0 until T
for t in range(1,nSamples): # loop through time
for s in range (0,nStates): # loop through the states #(t-1)
trans_p = viterbi[:,t-1] * self.transition[:,s]
psi[s,t], viterbi[s,t] = max(enumerate(trans_p), key=operator.itemgetter(1))
viterbi[s,t] = viterbi[s,t]*self.emission[s,observations(t)]
c[t] = 1.0/np.sum(viterbi[:,t]) # scaling factor
viterbi[:,t] = c[t] * viterbi[:,t]
# D - Back-tracking
best_path[nSamples-1] = viterbi[:,nSamples-1].argmax() # last state
for t in range(nSamples-1,0,-1): # states of (last-1)th to 0th time step
best_path[t-1] = psi[best_path[t],t]
return best_path
This is an old question, but none of the other answers were quite what I needed because my application doesn't have specific observed states.
Taking after #Rhubarb, I've also re-implemented Kevin Murphey's Matlab implementation (see viterbi_path.m), but I've kept it closer to the original. I've included a simple test case as well.
import numpy as np
def viterbi_path(prior, transmat, obslik, scaled=True, ret_loglik=False):
'''Finds the most-probable (Viterbi) path through the HMM state trellis
Notation:
Z[t] := Observation at time t
Q[t] := Hidden state at time t
Inputs:
prior: np.array(num_hid)
prior[i] := Pr(Q[0] == i)
transmat: np.ndarray((num_hid,num_hid))
transmat[i,j] := Pr(Q[t+1] == j | Q[t] == i)
obslik: np.ndarray((num_hid,num_obs))
obslik[i,t] := Pr(Z[t] | Q[t] == i)
scaled: bool
whether or not to normalize the probability trellis along the way
doing so prevents underflow by repeated multiplications of probabilities
ret_loglik: bool
whether or not to return the log-likelihood of the best path
Outputs:
path: np.array(num_obs)
path[t] := Q[t]
'''
num_hid = obslik.shape[0] # number of hidden states
num_obs = obslik.shape[1] # number of observations (not observation *states*)
# trellis_prob[i,t] := Pr((best sequence of length t-1 goes to state i), Z[1:(t+1)])
trellis_prob = np.zeros((num_hid,num_obs))
# trellis_state[i,t] := best predecessor state given that we ended up in state i at t
trellis_state = np.zeros((num_hid,num_obs), dtype=int) # int because its elements will be used as indicies
path = np.zeros(num_obs, dtype=int) # int because its elements will be used as indicies
trellis_prob[:,0] = prior * obslik[:,0] # element-wise mult
if scaled:
scale = np.ones(num_obs) # only instantiated if necessary to save memory
scale[0] = 1.0 / np.sum(trellis_prob[:,0])
trellis_prob[:,0] *= scale[0]
trellis_state[:,0] = 0 # arbitrary value since t == 0 has no predecessor
for t in xrange(1, num_obs):
for j in xrange(num_hid):
trans_probs = trellis_prob[:,t-1] * transmat[:,j] # element-wise mult
trellis_state[j,t] = trans_probs.argmax()
trellis_prob[j,t] = trans_probs[trellis_state[j,t]] # max of trans_probs
trellis_prob[j,t] *= obslik[j,t]
if scaled:
scale[t] = 1.0 / np.sum(trellis_prob[:,t])
trellis_prob[:,t] *= scale[t]
path[-1] = trellis_prob[:,-1].argmax()
for t in range(num_obs-2, -1, -1):
path[t] = trellis_state[(path[t+1]), t+1]
if not ret_loglik:
return path
else:
if scaled:
loglik = -np.sum(np.log(scale))
else:
p = trellis_prob[path[-1],-1]
loglik = np.log(p)
return path, loglik
if __name__=='__main__':
# Assume there are 3 observation states, 2 hidden states, and 5 observations
priors = np.array([0.5, 0.5])
transmat = np.array([
[0.75, 0.25],
[0.32, 0.68]])
emmat = np.array([
[0.8, 0.1, 0.1],
[0.1, 0.2, 0.7]])
observations = np.array([0, 1, 2, 1, 0], dtype=int)
obslik = np.array([emmat[:,z] for z in observations]).T
print viterbi_path(priors, transmat, obslik) #=> [0 1 1 1 0]
print viterbi_path(priors, transmat, obslik, scaled=False) #=> [0 1 1 1 0]
print viterbi_path(priors, transmat, obslik, ret_loglik=True) #=> (array([0, 1, 1, 1, 0]), -7.776472586614755)
print viterbi_path(priors, transmat, obslik, scaled=False, ret_loglik=True) #=> (array([0, 1, 1, 1, 0]), -8.0120386579275227)
Note that this implementation does not use emission probabilities directly but uses a variable obslik. Generally, emissions[i,j] := Pr(observed_state == j | hidden_state == i) for a particular observed state i, making emissions.shape == (num_hidden_states, num_obs_states).
However, given a sequence observations[t] := observation at time t, all the Viterbi Algorithm requires is the likelihood of that observation for each hidden state. Hence, obslik[i,t] := Pr(observations[t] | hidden_state == i). The actual value the of the observed state isn't necessary.
I have modified #Rhubarb's answer for the condition where the marginal probabilities are already known (e.g by computing the Forward Backward algorithm).
def viterbi (transition_probabilities, conditional_probabilities):
# Initialise everything
num_samples = conditional_probabilities.shape[1]
num_states = transition_probabilities.shape[0] # number of states
c = np.zeros(num_samples) #scale factors (necessary to prevent underflow)
viterbi = np.zeros((num_states,num_samples)) # initialise viterbi table
best_path_table = np.zeros((num_states,num_samples)) # initialise the best path table
best_path = np.zeros(num_samples).astype(np.int32) # this will be your output
# B- appoint initial values for viterbi and best path (bp) tables - Eq (32a-32b)
viterbi[:,0] = conditional_probabilities[:,0]
c[0] = 1.0/np.sum(viterbi[:,0])
viterbi[:,0] = c[0] * viterbi[:,0] # apply the scaling factor
# C- Do the iterations for viterbi and psi for time>0 until T
for t in range(1, num_samples): # loop through time
for s in range (0,num_states): # loop through the states #(t-1)
trans_p = viterbi[:, t-1] * transition_probabilities[:,s] # transition probs of each state transitioning
best_path_table[s,t], viterbi[s,t] = max(enumerate(trans_p), key=operator.itemgetter(1))
viterbi[s,t] = viterbi[s,t] * conditional_probabilities[s][t]
c[t] = 1.0/np.sum(viterbi[:,t]) # scaling factor
viterbi[:,t] = c[t] * viterbi[:,t]
## D - Back-tracking
best_path[num_samples-1] = viterbi[:,num_samples-1].argmax() # last state
for t in range(num_samples-1,0,-1): # states of (last-1)th to 0th time step
best_path[t-1] = best_path_table[best_path[t],t]
return best_path

Categories

Resources