how to run hidden markov models in Python with hmmlearn? - python

I tried to use hmmlearn from GitHub to run a binary hidden markov model. This does not work:
import hmmlearn.hmm as hmm
transmat = np.array([[0.7, 0.3],
[0.3, 0.7]])
emitmat = np.array([[0.9, 0.1],
[0.2, 0.8]])
obs = np.array([0, 0, 1, 0, 0])
startprob = np.array([0.5, 0.5])
h = hmm.MultinomialHMM(n_components=2, startprob=startprob,
transmat=transmat)
h.emissionprob_ = emitmat
# fails
h.fit([0, 0, 1, 0, 0])
# fails
h.decode([0, 0, 1, 0, 0])
print h
I get this error:
ValueError: zero-dimensional arrays cannot be concatenated
What is the right way to use this module? Note I am using the version of hmmlearn that was separated from sklearn, because apparently sklearn doesn't maintain hmmlearn anymore.

Fit accepts list of sequences and not a single sequence (as in general you can have multiple, independent sequences observed from different runs of your experiments/observations). Thus simply put your list inside another list
import hmmlearn.hmm as hmm
import numpy as np
transmat = np.array([[0.7, 0.3],
[0.3, 0.7]])
emitmat = np.array([[0.9, 0.1],
[0.2, 0.8]])
startprob = np.array([0.5, 0.5])
h = hmm.MultinomialHMM(n_components=2, startprob=startprob,
transmat=transmat)
h.emissionprob_ = emitmat
# works fine
h.fit([[0, 0, 1, 0, 0]])
# h.fit([[0, 0, 1, 0, 0], [0, 0], [1,1,1]]) # this is the reason for such
# syntax, you can fit to multiple
# sequences
print h.decode([0, 0, 1, 0, 0])
print h
gives
(-4.125363362578882, array([1, 1, 1, 1, 1]))
MultinomialHMM(algorithm='viterbi',
init_params='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
n_components=2, n_iter=10,
params='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
random_state=<mtrand.RandomState object at 0x7fe245ac7510>,
startprob=None, startprob_prior=1.0, thresh=0.01, transmat=None,
transmat_prior=1.0)

Related

In a numpy array (a list of tuples), the processing speed is slow by extending () many times. I want to make that part faster

There is a numpy array that can be formed by combining an array of tuples in a for loop like "res" in this code. (Variable names and contents are simplified from the actual code.)
If you take a closer look at this, a for loop is executed for the length of arr_2, and the array extends () is executed.It turns out that the processing speed becomes extremely heavy when arr_2 becomes long.
Wouldn't it be possible to process at high speed by making array creation well?
# -*- coding: utf-8 -*-
import numpy as np
arr_1 = np.array([[0, 0, 1], [0, 0.5, -1], [-1, 0, -1], [0, -0.5, -1], [1, 0, -1]])
arr_2 = np.array([[0, 1, 2], [0, 1, 2]])
all_arr = []
for p in arr_2:
all_arr = [
(arr_1[0], p), (arr_1[1], p), (arr_1[2], p),
(arr_1[0], p), (arr_1[1], p), (arr_1[4], p),
(arr_1[0], p), (arr_1[2], p), (arr_1[3], p),
(arr_1[0], p), (arr_1[3], p), (arr_1[4], p),
(arr_1[1], p), (arr_1[2], p), (arr_1[4], p),
(arr_1[2], p), (arr_1[3], p), (arr_1[4], p)]
all_arr.extend(all_arr)
vtype = [('type_a', np.float32, 3), ('type_b', np.float32, 3)]
res = np.array(all_arr, dtype=vtype)
print(res)
I couldn't figure out why you used this indexing for arr_1 so I just copied it
import numpy as np
arr_1 = np.array([[0, 0, 1], [0, 0.5, -1], [-1, 0, -1], [0, -0.5, -1], [1, 0, -1]])
arr_2 = np.array([[0, 1, 2], [0, 1, 2]])
weird_idx = np.array([0,1,2,0,1,4,0,2,3,0,3,4,1,2,4,2,3,4])
weird_arr1 = arr_1[weird_idx]
all_arr = [(wiered_arr1[i],arr_2[j]) for j in range(len(arr_2)) for i in range(len(wiered_arr1)) ]
vtype = [('type_a', np.float32, 3), ('type_b', np.float32, 3)]
res = np.array(all_arr, dtype=vtype)
you can also repeat the arrays
arr1_rep = np.tile(weird_arr1.T,2).T
arr2_rep = np.repeat(arr_2,weird_arr1.shape[0],0)
res = np.empty(arr1_rep.shape[0],dtype=vtype)
res['type_a']=arr1_rep
res['type_b']=arr2_rep
Often with structured arrays it is faster to assign by field instead of the list of tuples approach:
In [388]: idx = [0,1,2,0,1,4,0,2,3,0,3,4,1,2,4,2,3,4]
In [400]: res1 = np.zeros(36, dtype=vtype)
In [401]: res1['type_a'][:18] = arr_1[idx]
In [402]: res1['type_a'][18:] = arr_1[idx]
In [403]: res1['type_b'][:18] = arr_2[0]
In [404]: res1['type_b'][18:] = arr_2[1]
In [405]: np.allclose(res['type_a'], res1['type_a'])
Out[405]: True
In [406]: np.allclose(res['type_b'], res1['type_b'])
Out[406]: True

Masking Using Pixel Statistics

I'm trying to mask bad pixels in a dataset taken from a detector. In my attempt to come up with a general way to do this so I can run the same code across different images, I tried a few different methods, but none of them ended up working. I'm pretty new with coding and data analysis in Python, so I could use a hand putting things in terms that the computer will understand.
As an example, consider the matrix
A = np.array([[3,5,50],[30,2,6],[25,1,1]])
What I'm wanting to do is set any element in A that is two standard deviations away from the mean equal to zero. The reason for this is that later in the code, I'm defining a function that only uses the nonzero values for the calculation, since the zeros are part of the mask.
I know this masking technique works, but I tried extending the following code to work with the standard deviation:
mask = np.ones(np.shape(A))
mask.flat[A.flat > 20] = 0
What I tried was:
mask = np.ones(np.shape(A))
for i,j in A:
mask.flat[A[i,j] - 2*np.std(A) < np.mean(A) < A[i,j] + 2*np.std(A)] = 0
Which throws the error:
ValueError: too many values to unpack (expected 2)
If anyone has a better technique to statistically remove bad pixels in an image, I'm all ears. Thanks for the help!
==========
EDIT
After some trial and error, I got to a place that could help clarify my question. The new code is:
for i in A:
for j in i:
mask.flat[ j - 2*np.std(A) < np.mean(A) < j + 2*np.std(A)] = 0
This throws an error saying 'unsupported iterator index'. What I'm wanting to happen is that the for loop iterates across each element in the array, checks if it's less/greater than 2 standard deviations from the mean, and it is, sets it to zero.
Here is an approach that will be sligthly faster on larger images:
import numpy as np
import matplotlib.pyplot as plt
# generate dummy image
a = np.random.randint(1,5, (5,5))
# generate dummy outliers
a[4,4] = 20
a[2,3] = -6
# initialise mask
mask = np.ones_like(a)
# subtract mean and normalise to standard deviation.
# then any pixel in the resulting array that has an absolute value > 2
# is more than two standard deviations away from the mean
cond = (a-np.mean(a))/np.std(a)
# find those pixels and set them to zero.
mask[abs(cond) > 2] = 0
Inspection:
a
array([[ 1, 1, 3, 4, 2],
[ 1, 2, 4, 1, 2],
[ 1, 4, 3, -6, 1],
[ 2, 2, 1, 3, 2],
[ 4, 1, 3, 2, 20]])
np.round(cond, 2)
array([[-0.39, -0.39, 0.11, 0.36, -0.14],
[-0.39, -0.14, 0.36, -0.39, -0.14],
[-0.39, 0.36, 0.11, -2.12, -0.39],
[-0.14, -0.14, -0.39, 0.11, -0.14],
[ 0.36, -0.39, 0.11, -0.14, 4.32]])
mask
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 0]])
You A is three dimensional so you need to unpack using three variables like below.
A = np.array([[3,5,50],[30,2,6],[25,1,1]])
for i in A:
for j in i:
print(j)

Building an upper triangular matrix recursively

Ive been breaking my head over trying to come up with a recursive way to build the following matrix in python. It is quite a challenge without pointers. Could anyone maybe help me out?
The recursion is the following:
T0 = 1,
Tn+1 = [[Tn, Tn],
[ 0, Tn]]
I have tried many iterations of some recursive function, but I cannot wrap my head around it.
def T(n, arr):
n=int(n)
if n == 0:
return 1
else:
c = 2**(n-1)
Tn = np.zeros((c,c))
Tn[np.triu_indices(n=c)] = self.T(n=n-1, arr=arr)
return Tn
arr = np.zeros((8,8))
T(arr=arr, n=3)
It's not hard to do this, but you need to be careful about the meaning of the zero in the recursion. This isn't really precise for larger values of n:
Tn+1 = [[Tn, Tn],
[ 0, Tn]]
Because that zero can represent a block of zeros for example on the second iteration you have this:
[1, 1, 1, 1],
[0, 1, 0, 1],
[0, 0, 1, 1],
[0, 0, 0, 1]
Those four zeros in the bottom-left are all represented by the one zero in the formula. The block of zeros needs to be the same shape as the blocks around it.
After that it's a matter of making Numpy put thing in the right order and shape for you. numpy.block is really handy for this and makes it pretty simple:
import numpy as np
def makegasket(n):
if n == 0:
return np.array([1], dtype=int)
else:
node = makegasket(n-1)
return np.block([[node, node], [np.zeros(node.shape, dtype=int), node]])
makegasket(3)
Result:
array([[1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1],
[0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1]])
If you use larger n you might enjoy matplotlib.pyplot.imshow for display:
from matplotlib.pyplot import imshow
# ....
imshow(makegasket(7))
You don't really need a recursive function to implement this recursion. The idea is to start with the UR corner and build outward. You can even start with the UL corner to avoid some of the book-keeping and flip the matrix along either axis, but this won't be as efficient in the long run.
def build_matrix(n):
size = 2**n
# Depending on the application, even dtype=np.bool might work
matrix = np.zeros((size, size), dtype=np.int)
# This is t[0]
matrix[0, -1] = 1
for i in range(n):
k = 2**i
matrix[:k, -2 * k:-k] = matrix[k:2 * k, -k:] = matrix[:k, -k:]
return matrix
Just for fun, here is a plot of timing results for this implementation vs #Mark Meyer's answer. It shows the slight timing advantage (also memory) of using a looping approach in this case:
Both algorithms run out of memory around n=15 on my machine, which is not too surprising.

Coefficients of Charpoly using Sympy in Python

I am new to using the library Sympy. I am need to extract all coefficients of the characteristic polynomial to be used later.
For example, my code is:
import sympy as sp
M = sp.Matrix([[0, 0, 0, 1, 0, 1], [0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, -1], [1, 0, -1, 0, 1, 0], [0, 0, 0, 1, 0, 0], [-1, 0, 1, 0, 0, 0]])
lamda = symbols('lamda')
p = M.charpoly(lamda)
print(p)
print(p.coeffs())
which gives output:
PurePoly(lamda**6 + lamda**4 - lamda**2, lamda, domain='ZZ')
[1, 1, -1]
However, I need [1, 0, 1, 0, 1, 0, 0], which includes the zero coefficients of the lamda too the exponents 4, 3, 1, and 0, terms. I would normally use a for loop to iterate over the equation to see which terms are missing so a zero can be inserted into the appropriate spot in the array of coefficients. However, when I attempted to do so, I received an error saying PurePoly type doesn't support indexing. So, I was wondering if anyone knows how to make sympy include the zeros or a way to do it myself? I need will eventually have to incorporate this code into a loop for lots of matrices so I can't manually do it.
Thanks.
When I have questions like this I hope for some sort of intelligent naming of methods for objects and look through the directory of the object:
>>> print([w for w in dir(p) if 'coeff' in w])
['all_coeffs', 'as_coeff_Add', 'as_coeff_Mul', ...]
That all_coeffs is the one you want:
>>> help(p.all_coeffs)
Help on method all_coeffs in module sympy.polys.polytools:
all_coeffs(f) method of sympy.polys.polytools.PurePoly instance
Returns all coefficients from a univariate polynomial ``f``.
>>> p.all_coeffs()
[1,0,1,0,−1,0,0]

Matrix Expression from String

Context: I'm doing a bunch of simulations that require me to implement different Hamiltonians. These Hamiltonians are just matrices, built out of Kronecker products of some common elements, with some prefactors that I have to calculate based on the system parameters. E.g, using ⊗ for the Kronecker product
H = w1(a,b,c) * sigmax ⊗ I + w2(x,y,z)*I ⊗ sigmay
I was hoping I could make a simple parser that could read in the values of a,b,c,x,y,z and an expression for the Hamiltonian and construct the necessary matrix. Sympy seems like an obvious candidate, but I can't get a matrix expression to build using strings.
from sympy import symbols,Matrix,MatrixSymbol
from sympy.physics import msigma
from sympy.physics.quantum import TensorProduct
w1,w2 = symbols('w1 w2')
X1 = MatrixSymbol('X1',4,4)
X2 = MatrixSymbol('X2',4,4)
x = msigma(1)
x_1 = TensorProduct(eye(2),x)
x_2 = TensorProduct(x,eye(2))
exp = w1*X1 + w2*X2
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)]).as_explicit()
will work. But, trying
exp = MatrixExpr('w1*X1+w2*X2')
or
exp = MatrixExpr(sympify('w1*X1+w2*X2'))
or even
exp = sympify('w1*X1 + w2*X2')
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)])
won't.
It also won't work if I change w1 or w2 to be 1x1 instances of a MatrixSymbol.
What am I doing wrong here? This is my first time using sympy so I'm very clear that I may just be missing something.
Let's look what's going on in simpler case:
exp = sympify('w1*X1'); right_exp = w1*X1
type(exp), type(right_exp)
Out[47]: (sympy.core.mul.Mul, sympy.matrices.expressions.matmul.MatMul)
Looks like simpify doesn'y understand that X1 is a matrix. So, if we mention it explicit, everything will be allright:
exp = sympify("w1*MatrixSymbol('X1',4,4)")
exp.subs([(w1,0.5),(X1,x_1)]).as_explicit()
Out[49]:
Matrix([
[ 0, 0.5, 0, 0],
[0.5, 0, 0, 0],
[ 0, 0, 0, 0.5],
[ 0, 0, 0.5, 0]])
right_exp.subs([(w1,0.5),(X1,x_1)]).as_explicit()
Out[50]:
Matrix([
[ 0, 0.5, 0, 0],
[0.5, 0, 0, 0],
[ 0, 0, 0, 0.5],
[ 0, 0, 0.5, 0]])
And the final statement:
exp = sympify("w1*MatrixSymbol('X1',4,4)+w2*MatrixSymbol('X2',4,4)")
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)]).as_explicit()
Out[63]:
Matrix([
[ 0, 0.5, 2, 0],
[0.5, 0, 0, 2],
[ 2, 0, 0, 0.5],
[ 0, 2, 0.5, 0]])
What's going on? If you read Basics of expressions in SymPy you can find there statement that "matrices aren’t sympifiable" and simpify interprets X1 as a symbol.
It's hard to say how to behave in another situations. There are notes in docs that warn:
Sometimes autosimplification during sympification results in
expressions that are very different in structure than what was
entered.

Categories

Resources