So I am attempting to do some time series analysis with the statsmodel package in python. I have some code that was given to me in a class - but it doesn't work! I've narrowed down the error to the function below, but am getting a strange error metric that I can't solve.
def model_ARIMA_2(ts, order):
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.arima_model import ARIMAResults
model = ARIMA(ts, order = order)
model_fit = model.fit(disp=0, method='mle', trend='nc')
BIC = ARIMAResults(model_fit, order).bic
print('Testing model of order: ' + str(order) + ' with BIC = ' + str(BIC))
return(BIC, order, model_fit)
order = (1,1,1)
model_ARIMA_2(decomp.resid[6:-6], order)
And I get the error: AttributeError: 'ARIMAResults' object has no attribute 'endog'
My data looks like:
I've tried searching this online and haven't found anything helpful. Does anyone know why this error is cropping up and what the solution may be?
Thanks!
It looks like the error occurs when you are trying to extract the BIC.
When you fit an ARIMA model, in your case model_fit = model.fit(disp=0, method='mle', trend='nc'), Statsmodels returns an ARIMAResults object (see the documentation for the fit method). So you are attempting to create an ARIMAResults object from an ARIMAResults object, which is causing your error.
You should be able get the BIC directly from the object returned when you fit the model (i.e. BIC = model_fit.bic) as well as all other model fitting statsmodels reports.
It will be useful to become familiar with the methods and attributes of ARIMAResults objects which can be found here.
Best of luck!
Related
I am having problems using sklearn's SequentialFeatureSelector for feature selection prior to clustering.
Please see my code:
Shooting_ST_array = np.array(Shooting_ST)
kmeans1 = KMeans(n_clusters=4)
kmeans1 = kmeans1.fit(Shooting_ST_array)
sfs = SequentialFeatureSelector(estimator=kmeans1, n_features_to_select=5, direction='backward')
sfs.fit(Shooting_ST_array, y=None)
sfs.get_feature_names_out(Shooting_ST.columns)
Depending on whether I used forward or backward selection, the algorithm returns the first 5 or last 5 columns.
I am also getting the following attribute error:
AttributeError: 'NoneType' object has no attribute 'split'
Does anyone have an idea what I am doing wrong? I have unfortunately not found any example implementations of the method for unlabelled data.
I have divided my data into training and validation samples and have successfully fit my model with three types of linear models. What I cannot figure out how to do is apply the model to the validation sample to evaluate the fit. When I attempt to apply the model to the holdout sample (sorry, I know that this isn't a reproducible example but I think that the issue is pretty clear. I'm just putting this snippet here for completeness. Please be gentle!):
valid = validation.loc[:, x + [ "sale_amt"]]
holdout1 = m1.predict(valid)
I get the following error message:
AttributeError Traceback (most recent call last)
in ()
8
9 valid = validation.loc[:, x + [ "sale_amt"]]
---> 10 holdout1 = m1.predict(valid)
AttributeError: 'OLS' object has no attribute 'predict'`
Other Python OLS regression packages have a 'predict' method, but it doesn't seem that PySAL does. I realize that the function coefficients (betas) are available and will pursue applying them to my validation data directly, but I was hoping that there is a simple answer that I just missed.
I apologize if it is bad form to answer my own question, but I did come up with a solution. I contacted Daniel Arribas-Bel, one of the PySAL developers, and he helped guide me to the result I was seeking. Note that my PySAL OLS object is named m1, and my validation dataframe is called 'validation':
m1 = ps.model.spreg.OLS(...)
m1.intercept = m1.betas[0] # Get the intercept from the betas array
m1.coefficients = m1.betas[1:len(m1.betas)] # Get the coefficients from the betas array
validation['predicted_price'] = m1.intercept + validation.loc[:, x].dot( m1.coefficients)
Note that this is the method I would use for a non-spatial model adapted for the KNN model I built in PySAL and might not be technically fully correct for a spatial model. Caveat emptor.
I am fitting my function with experimental data. The function is complicated enough that I am unable to post here, but my fitting module looks like this:
out_put = scipy.optimize.leastsq(func, initial parameter, full_output=True, ftol=0.001, xtol=0.001, gtol = 0.001)
fitter_sol = out_put[0]
error = np.sqrt(out_put[1].diagonal())
The last line of code gives an error under execution, and the error looks like:
AttributeError: 'NoneType' object has no attribute 'diagonal'
What could be the potential source of this error?
The docs say the second result of leastsq is:
None if a singular matrix encountered (indicates very flat curvature in some direction).
So your input is a singular matrix.
Can Label Propagation be used for semi-supervised regression tasks in scikit-learn?
According to its API, the answer is YES.
http://scikit-learn.org/stable/modules/label_propagation.html
However, I got the error message when I tried to run the following code.
from sklearn import datasets
from sklearn.semi_supervised import label_propagation
import numpy as np
rng=np.random.RandomState(0)
boston = datasets.load_boston()
X=boston.data
y=boston.target
y_30=np.copy(y)
y_30[rng.rand(len(y))<0.3]=-999
label_propagation.LabelSpreading().fit(X,y_30)
It shows that "ValueError: Unknown label type: 'continuous'" in the label_propagation.LabelSpreading().fit(X,y_30) line.
How should I solve the problem? Thanks a lot.
It looks like the error in the documentation, code itself clearly is classification only (beggining of the .fit call of the BasePropagation class):
check_classification_targets(y)
# actual graph construction (implementations should override this)
graph_matrix = self._build_graph()
# label construction
# construct a categorical distribution for classification only
classes = np.unique(y)
classes = (classes[classes != -1])
In theory you could remove the "check_classification_targets" call and use "regression like method", but it will not be the true regression since you will never "propagate" any value which is not encountered in the training set, you will simply treat the regression value as the class identifier. And you will be unable to use value "-1" since it is a codename for "unlabeled"...
I am a complete beginner, and I'm currently doing this tutorial about logit regression models in python 3.4, with statsmodels 0.6.1 and Pycharm community version 4.5.1:
http://blog.yhathq.com/posts/logistic-regression-and-python.html
It runs smoothly. I try to add my own lines, to try out a few things.
After the part when I fit the data
train_cols = data.columns[1:]
logit = sm.logit(data['admit'], data[train_cols])
result = logit.fit()
and I print out the summary
print(result.summary())
I tried to take a little detour from the tutorial, to print only the Goodness of Fit measurement (in this case, it is a pseudo R-squared value). According to the documentation it is a method of result object (same as summary), so it should work like this:
print(result.prsquared())
However, running this code results in a TypeError on a line containing only print(result.prsquared()):
TypeError: 'numpy.float64' object is not callable
It really bugs me, because if I would to compare several models, pseudo R-squared would be my first choice to do it.
prsquared is an attribute, not a function. Try:
print(result.prsquared)