I am trying to predict the wind power of a wind farm using activepower, temperature, winddirection, windspeed to train my model. This is my first time working with hmms and I am confused on how to do a good prediction using continuous observations.
I am confused on how the mixture coefficient should be used in this prediction. In the following code, the mixture coefficient was left as 1, which is the default.
As well, should I be calculating the covariance matrix, mean vector, state transition matrix, and observation matrix? and how can this be done?
features=np.column_stack((activepower,temperature,winddirection,windspeed))
test_data=np.column_stack((activepower_2,temperature_2,winddirection_2,windspeed_2))
features_model = GaussianHMM(n_components=4)
features_model.fit(features)
results = features_model.score(test_data)
forecast,pred_states=features_model.sample(1008)
The code above gives me a prediction with a root mean square error (rmse) of 598.37. I know this can be improved by switching from a hold-out method to rolling window prediction. I am also using 4 hidden states for my model since it gave me the lowest rmse.
Related
I am running a multinomial logistic regression in sklearn, using sklearn.linear_model.LogisticRegression(multiclass="multinomial"). The dependent categorical variable has 3 options: Agree, Disagree, Unsure. The independent variables are two categorical variables: Education and Gender (binary gender for simplicity in this example). I get different results when I hand-calculate the probabilities from the regression coefficients versus use the built-in predict_proba().
mnlr = LogisticRegression(multi_class="multinomial")
mnlr.fit(
pd.get_dummies(df[["Education","Gender"]]),
preprocessing.LabelEncoder().fit_transform(df["statement"])
)
I concatenate the outputs of mnlr.intercept_ and mnlr.coef_ into a regression coefficients table that looks like this:
Using mnlr.predict_proba(), I get results that I cast into a dataframe to which I add the independent variables like this:
These sum to 1 across the 3 potential categories for each data point.
However, I cannot seem to reproduce these results when I try to calculate the predicted probabilities by hand from the logistic regression coefficients.
First, for each Gender x Education combination, I calculate the logit (aka log-odds, if I understand correctly) by simply adding the intercept and the relevant variable terms. For example, to get the logit for a Woman with a Bachelor's degree with the Agree regression: 0.88076 + 0.21827 + 0.21687 = 1.31590. The table of logits looks like this:
From this table, as I understand it, I should be able to convert these logits (log-odds) to predicted probabilities: p = e^logit/(1+e^logit) for a given model and respondent (e.g., probability that Women with Bachelor's Agree with the statement). When I try this, however, I get much different results than I receive from .predict_proba() and the hand-calculated probabilities do not sum to 1, as indicated in the table below:
For example, Women with Bachelor's here have a 0.78850 probability to Agree with the statement, in place of the 0.7819 probability. Additionally, the hand-calculated probabilities across the 3 categories do not sum to 1, but rather to 1.47146.
I am almost certain this is a basic error on my part, but I cannot for the life of me figure it out. What am I doing incorrectly?
I figured this one out eventually. The answer is probably obvious to folks who really know multinomial logistic regression. The struggle I was having was that I needed to apply the softmax function (also known more descriptively as the normalized exponential function) to the logits. This function involves exponentiating the logit (log-odds) for each class and then dividing it by the sum of exponentiated logits for all classes. In this example, for Women with a Bachelor's degree, this would mean:
=
= 0.737007424626824
Hopefully this will be helpful to anyone else trying to understand how to do this by hand! (Which for me is really useful for trying to apply model-based inference as an alternative to design-based inference in sample surveys).
Sources that got me here:
How do I correctly manually recreate sklearn (python) logistic regression predict_proba outcome for multiple classification, https://en.wikipedia.org/wiki/Softmax_function
I'm running a multivariate regression in statsmodels. However, I would like to manually alter one of the coefficients for an independent variable prior to predicting. How would I go about doing that?
For example, say I train my data on a 2 year time period starting 4 years back. I return coefficients for wind, rain, and sun.
Now say that I train my data on the most recent 2 years of data and again get the coefficients in the regression output.
If I want to use the wind coefficient from the first regression output with the rain and sun coefficients from the second regression, how do I manually change wind prior to using predict?
EDIT:
Regression code/parameters:
model = sm.OLS(y[:train],X[:train]).fit()
predictions = model.predict(X[-test:])
Where X is [['rain','sun','wind']] and y is ['growth']
The prediction in OLS is just a linear function of the explanatory variables, x dot params.
my_params = results.params.copy()
my_params[2] = -99999
my_predict = x.dot(my_params)
I recommend not changing any numbers directly in the model, because then any inferential results are invalid for the changed model.
If you have known parameters, then you can estimate a restricted model, e.g. with GLM.fit_constrained, or add them to the offset in GLM.
I have been learning for myself for several months artificial intelligence through a project of character recognition and transcription of handwriting. Until now I have successfully used Keras, Theano and Tensorflow by implementing CNN, CTC neural networks.
Today, I try to use Gaussian mixture models, the first step towards hidden markov models with Gaussian emission. To do so, I used the sklearn mixture with pca reduction to select the best model with Akaike and Bayesian information criterion. With type of covariance Full for Aic which provides a nice U-curve and Tied for Bic, because with Full covariance Bic gives just a linear curve. With 12.000 samples, I get the best model at 60 n-components for Aic and 120 n-components for Bic.
My input images have 64 pixels aside which represent only the capital letters of the English alphabet, 26 categories numbered from 0 to 25.
The fit method of Sklearn GaussianMixture ignore labels and the predict method returns the position of the component (0 to 59 or 0 to 119) into the n-components regarding the probabilities.
How to retrieve the original label the position of the character in a list using sklearn GaussianMixture ?
So, you want to use GaussianMixture in a generative classifier. You need to compute P(Y|X) for each label and estimate label according to these probabilities. To do so, you need to keep a GMM for each label and train with data from corresponding label. Then score method will give you likelihood, P(X|Y), of given data (or log-likelihood, you may want to check that). If you multiple likelihood with prior, you get posterior, P(Y|X). For each label, you will get a posterior e.g. P(Y=0|X), P(Y=1|X), ... Label with the maximum posterior probability can be reported as estimated label.
You can get some hints from the code sample below. (Here it is assumed that prior probabilities are equal, you need to consider that in your implementation)
Y_predicted = clf.predict(X_test)
score = np.empty((Y_test.shape[0], 10))
predictor_list = []
for i in range(10):
predictor = GMM()
predictor.fit(X[Y==i])
predictor_list.append(predictor)
score[:, i] = predictor.score(X_test)
Y_predicted = np.argmax(score, axis=1)
I'm using a linear regression model to predict the weather data for one year. Prediction is done using Python's sklearn library. The problem is that I need to find the accuracy of the prediction. After a quick internet search I found out that r^2 is the way to find out the accuracy. I calculated the r value as follows:
r value
0.0919309031356
Coefficients:
[-20.01071429 0. ]
Residual sum of squares: 19331.78
Variance score: -0.23
The problem is that I need to show the accuracy as a percentage. How do I do that? Do I need use a tool to find out the accuracy?
Maybe this question is more complicated than I think, but why not just
r = str((r**2) * 100) + '%'
For regression problems, you can use the following metrics to determine the quality of the fit (http://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics):
Mean squared error. Fit is good when the value is as low as possible
R^2 score. Fit is good when the value is 1 or close to it.
You can also calculate prediction error with:
(Actual value - Predicted value)/Actual value.
However, I am not sure if this is a common metric to evaluate a linear regression fit.
In Neural Networks, the number of samples used for training data is 5000 and before the data is given for training it was normalized using the formula
y - mean(y)
y' = -----------
stdev(y)
Now I want to de-normalise the data after getting the predicted output. Generally for prediction a test data data is used which is 2000 samples. In order to de-normalize, following formula is used
y = y' * stdev(y) + mean(y)
This approach is taken from the following thread
[How to denormalise (de-standardise) neural net predictions after normalising input data
Could anyone explain me how the same mean and standard deviation used in normalizing the training data(5000*2100) could be used in de-normalizing the predicted data as you know for prediction test data(2000*2100) is used,both the counts are different.
The denormalization equation is simple algebra: it's the same equation as normalization, but solved for y instead of y'. The function is to reverse the normalization process, recovering the "shape" of the original data; that's why you have to use the original stdev and mean.
Normalization is a process of shifting the data to center on 0 (using the mean), and then squeezing the distribution to a standard normal curve (for a new stdev of 1.0). To return to the original shape, you have to un-shift and un-squeeze the same amounts as the original distribution.
Note that we expect the predicted data to have a mean of 0 and a stdev around 1.0 (with some change in variations due to the central tendency theorem). Your worry is not silly: we do have a different population count for the stdev.