I'm trying to use hmmlearn to get the most likely hidden state sequence from a Hidden Markov Model, given start probabilities, transition probabilities, and emission probabilities.
I have two hidden states and four possible emission values, so I'm doing this:
num_states = 2
num_observations = 4
start_probs = np.array([0.2, 0.8])
trans_probs = np.array([[0.75, 0.25], [0.1, 0.9]])
emission_probs = np.array([[0.3, 0.2, 0.2, 0.3], [0.3, 0.3, 0.3, 0.1]])
model = hmm.MultinomialHMM(n_components=num_states)
model.startprob_ = start_probs
model.transmat_ = trans_probs
model.emissionprob_ = emission_probs
seq = np.array([[3, 3, 2, 2]]).T
model.fit(seq)
log_prob, state_seq = model.decode(seq)
My stack trace points to the decode call and throws this error:
ValueError: too many values to unpack (expected 2)
I thought decode (looking at the docs) returns a log probability and the state sequence, so I'm confused.
Any idea?
Thanks!
The call model.fit(seq) requires seq to be a list of lists, as you correctly set it up like this.
However, model.decode(seq) requires seq to only be a list, not a list of lists. Thus,
model.fit([[3, 3, 2, 2]])
log_prob, state_seq = model.decode([3, 3, 2, 2])
should work without throwing an error.
See also here.
The error ValueError: too many values to unpack (expected 2) is thrown from a function called by a function called by a function... inside decode. So, the error does not mean that the number of returned objects of decode was wrong, but from framelogprob.shape somewhere inside the base.py. A more meaningful error message would make life easier here.
I had the same issue and it drove me crazy. Hope my post helps somebody.
Related
I'm trying to implement a fast entropy calculation for a float list of probabilities.
Instead of looping through a list, checking if not zero each time, I'm attempting to mask zeros using numpy's built in masking functionality. It works absolutely fine, unless I try to put it into a function, at which point it breaks. Any suggestions?
# Works fine!!
distribution = np.array([0.20, 0.3, 0.25, 0.25, 0])
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
entropy = -np.sum(distribution * log_dist)
print(entropy)
# Breaks!
def calculate_entropy(distribution):
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
entropy = -np.sum(distribution * log_dist)
return entropy
calculate_entropy([0.20, 0.3, 0.25, 0.25, 0])
output:
nan
Error message:
/var/folders/bt/vk3t9rnn2jz5d1wgj2rc3v200000gn/T/ipykernel_61321/2272953976.py:3: RuntimeWarning: divide by zero encountered in log2
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
/var/folders/bt/vk3t9rnn2jz5d1wgj2rc3v200000gn/T/ipykernel_61321/2272953976.py:4: RuntimeWarning: invalid value encountered in multiply
entropy = -np.sum(distribution * log_dist)
I was expecting the function to work exactly the same, what am I missing?
Ugh, I'm an idiot. I forgot to convert the list into a numpy array. fix:
def calculate_entropy(distribution):
distribution = np.array(distribution)
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
entropy = -np.sum(distribution * log_dist)
return entropy
calculate_entropy([0.20, 0.3, 0.25, 0.25, 0])
I try tu run a model with python (no built by me) and I obtain this error:
This comes from:
from seirsplus.models import *
import networkx
numNodes = 10000
baseGraph = networkx.barabasi_albert_graph(n=numNodes, m=9)
G_normal = custom_exponential_graph(baseGraph, scale=100)
# Social distancing interactions:
G_distancing = custom_exponential_graph(baseGraph, scale=10)
# Quarantine interactions:
G_quarantine = custom_exponential_graph(baseGraph, scale=5)
model = SEIRSNetworkModel(G=G_normal, beta=0.155, sigma=1/5.2, gamma=1/12.39, mu_I=0.0004, p=0.5,
Q=G_quarantine, beta_D=0.155, sigma_D=1/5.2, gamma_D=1/12.39, mu_D=0.0004,
theta_E=0.02, theta_I=0.02, phi_E=0.2, phi_I=0.2, psi_E=1.0, psi_I=1.0, q=0.5,
initI=10)
checkpoints = {'t': [20, 100], 'G': [G_distancing, G_normal], 'p': [0.1, 0.5], 'theta_E': [0.02, 0.02], 'theta_I': [0.02, 0.02], 'phi_E': [0.2, 0.2], 'phi_I': [0.2, 0.2]}
model.run(T=300, checkpoints=checkpoints)
model.figure_infections()
i leave it with an image to see the highlighted part.
From what I understand, this has to do with the way the class SEIRSNetworkModel is constructed. I already forked the original repository:
https://github.com/ryansmcgee/seirsplus/wiki/SEIRSNetworkModel-class
but I don´t know where to look for this constructor, and what to search for in order to fix this problem. This may be too simple, but I can´t find my way.
I'd appreciate any help, as simple as possible please since you can see I don't know how to navigate in here.
I am working on a detailed code that requires optimization, which I have simplified for the MWE. I am trying to find the optimal value of arg_opt that minimizes the value that is obtained from a different function.
I believe it is a simple error or my understanding is wrong. But wouldn't the final optimized solution be independent of the initial guess (for small variations as in this case). For this MWE, I get the same minimized value, but the final value of x is different. I would have expected only minor differences, what is the source of this discrepancy?
MWE
import numpy as np
from scipy import optimize
def fn_cubic(arg_1, arg_2, arg_3, data):
return (arg_1 ** 3 + arg_2 ** 2 + arg_3 + np.sum(np.exp(data))) / 100
arg_opt_1 = np.ones(shape=(3)) * 2
arg_opt_2 = np.ones(shape=(3)) * 3
data_original = [1, 5, 4, 10, 3, 9, 6, 3]
data = np.zeros(shape=len(data_original))
pos_data = np.array([1, 3, 2])
def function_to_optimize(arg_opt, arg_1, arg_2, arg_3):
for x, y in enumerate(arg_opt):
data[pos_data[x]] = data_original[pos_data[x]] * y
value = fn_cubic(arg_1, arg_2, arg_3, data)
return value
opt_sol_1 = optimize.minimize(function_to_optimize, arg_opt_1, args=(0.1, 0.2, 0.3))
opt_sol_2 = optimize.minimize(function_to_optimize, arg_opt_2, args=(0.1, 0.2, 0.3))
print(' 1:', opt_sol_1.x, '\n','2:', opt_sol_2.x)
Output
1: [-1.10240891e+03 -9.28714306e-01 -1.17584215e+02]
2: [-1.98936327e+03 -9.68415948e-01 -1.53438039e+03]
There's no particular guarantee about the relationship between the initial guess and the point found by the optimizer. You can even get different x values by giving the same initial guess and using different solver methods.
One thing to keep in mind is that the function you're choosing to optimize is kind of weird. The only way it uses the "data" is to exponentiate it and sum it. This means that it is "collapsing" a lot of potential variation in the argument. (For instance, permuting the values of data will not change the value of the objective function.)
In other words, there are many different "data" values that will give the same result, so it's not surprising that the solver sometimes finds different ones.
I'm using statsmodels' weighted least squares regression, but getting some really huge values.
Here's my code:
X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]])
y = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 1])
w = np.array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
temp_g = sm.WLS(y, X, w).fit()
Now, what I understand is that in WLS regression, just like in any linear regression problem, we provide the endog vector and the exog vector and the function can find the line of the best fit and tell us what the coefficients/regression parameters for each observation ought to be. For example, in my data, where each observation consists of 3 features, I'm expecting there to be 3 parameters.
So I fetch them like this:
parameters = temp_g.params # I'm hoping I've got this right! Or do I need to use "fittedvalues" instead?
The issue is that I'm getting really huge values like this:
temp g params :
[ -7.66645036e+198 -9.01935337e+197 5.86257969e+198]
or this:
temp g params :
[-2.77777778 -0.44444444 1.88888889]
Which is creating problems in further usage of these parameters, especially since I have some exponents to work with as well, and I need to raise e to the power of some of the regression parameters, which is proving impossible, given such big numbers. Because I keep getting overflow errors when using exp().
Is this normal? Am I doing something wrong? Or is there a specific way to make them useful?
Hi I'm a scikit newbie here. I'm trying to train the computer that given an array of float decide between the 3 classes. I was classifying the classes as 0, 0.5, and 1. I also tried 0, 1.0, and 2.0 . I still get the following error:
File "/Library/Python/2.7/site-packages/sklearn/utils/multiclass.py", line 85, in unique_labels
raise ValueError("Mix type of y not allowed, got types %s" % ys_types)
ValueError: Mix type of y not allowed, got types set(['continuous', 'multiclass'])
I have no idea what that error means
Try using integer types for your target labels. Or, perhaps better, use string labels like ['a', 'b', 'c'] but with more descriptive names.
If you check the code for this file multiclass.py (code is here) and look for the function type_of_target, you'll see that it is well-documented for this case.
Because some of the data are treated as float type (when 0.5 is included), it will believe you've got continuous-valued outputs, which won't do for multiclass discrete classification.
On the other hand, it will look at [0, 1.0, 2.0] like it is one integer and two floats, which is why you get both continuous and multiclass. Switching the last example to [0, 1, 2] should work. The documentation also makes it sound like switching to [0.0, 1.0. 2.0] would also work, but be careful and test that first.
Its hard to tell for sure without the code, but my guess is that the shape of your y data is not what is expected.
For example when my code threw this error it was because I was trying to pass y data into classification_report in the shape of (60000, 10, 2) when it was expecting it to be in the shape of (60000, 10)
I was re-running cells where I called to_categorical(y_test) more than once... When I loaded my code into a proper script and ran it it worked fine :)