I'm building a dynamic factor model using the excellent python package statsmodels, and I would like to pickle an estimated parameter vector so I can build the model again later, and load those params into it. (C.f., this Notebook built by Chad Fulton: https://github.com/ChadFulton/tsa-notebooks/blob/master/dfm_coincident.ipynb.)
In the following block of code, initial parameters are estimated with mod.fit() (using the Powell algo) and then given back to mod.fit() to complete the estimation (using the EM algo) using the initial parameters as initial_res.params. (The latter is a Pandas Series.)
mod = sm.tsa.DynamicFactor(endog, k_factors=1, factor_order=2, error_order=2)
initial_res = mod.fit(method='powell', disp=False)
res = mod.fit(initial_res.params)
I would like to pickle res.params (again, a small Pandas Series, and a small disk footprint). Then later build the model from scratch again, and load my saved parameters into it without having to re-estimate the model. Anyone know how that can be done?
Examples I have seen suggest pickling the results object res, but that can be a pretty big save. Building it from scratch is pretty simple, but estimation takes a while. It may be that estimation starting from the saved optimal params is quicker; but still, that's pretty amateurish, right?
TIA,
Drew
You can use the smooth method on any state space model to construct a results object from specific parameters. In your example:
mod = sm.tsa.DynamicFactor(endog, k_factors=1, factor_order=2, error_order=2)
initial_res = mod.fit(method='powell', disp=False)
res = mod.fit(initial_res.params)
res.params.to_csv(...)
# ...later...
params = pd.read_csv(...)
res = mod.smooth(params)
I am comparing SARIMAX fitting results between R (3.3.1) forecast package (7.3) and Python's (3.5.2) statsmodels (0.8).
The R-code is:
library(forecast)
data("AirPassengers")
Arima(AirPassengers, order=c(2,1,1), seasonal=list(order=c(0,1,0),
period=12))$aic
[1] 1017.848
The Python code is:
from statsmodels.tsa.statespace import sarimax
import pandas as pd
AirlinePassengers =
pd.Series([112,118,132,129,121,135,148,148,136,119,104,118,115,126,
141,135,125,149,170,170,158,133,114,140,145,150,178,163,
172,178,199,199,184,162,146,166,171,180,193,181,183,218,
230,242,209,191,172,194,196,196,236,235,229,243,264,272,
237,211,180,201,204,188,235,227,234,264,302,293,259,229,
203,229,242,233,267,269,270,315,364,347,312,274,237,278,
284,277,317,313,318,374,413,405,355,306,271,306,315,301,
356,348,355,422,465,467,404,347,305,336,340,318,362,348,
363,435,491,505,404,359,310,337,360,342,406,396,420,472,
548,559,463,407,362,405,417,391,419,461,472,535,622,606,
508,461,390,432])
AirlinePassengers.index = pd.DatetimeIndex(end='1960-12-31',
periods=len(AirlinePassengers), freq='1M')
print(sarimax.SARIMAX(AirlinePassengers,order=(2,1,1),
seasonal_order=(0,1,0,12)).fit().aic)
Which throws an error: ValueError: Non-stationary starting autoregressive parameters found with enforce_stationarity set to True.
If I set enforce_stationarity (and enforce_invertibility, which is also required) to False, the model fit works but AIC is very poor (>1400).
Using some other model parameters for the same data, e.g., ARIMA(0,1,1)(0,0,1)[12] I can get identical results from R and Python with stationarity and invertibility checks enabled in Python.
My main question is: What explains the difference in behavior with some model parameters? Are statsmodels' invertibility checks different from forecast's Arima and is the other somehow "more correct"?
I also found a pull request related to fixing an invertibility calculation bug in statsmodels: https://github.com/statsmodels/statsmodels/pull/3506
After re-installing statsmodels with the latest source code from Github, I still get the same error with the code above, but setting enforce_stationarity=False and enforce_invertibility=False I get aic of around 1010 which is lower than in the R case. But model parameters are also vastly different.
I am trying to get the predictive distribution from my model, which happens to be a custom defined probability. Which happens to be a mixture of Normals.
with RE2:
trace = pm.variational.sample_vp(v_params, draws=5000)
trigger.set_value(triggers_test)
cc.set_value(cc_test)
y_output.set_value(np.zeros((len(y_test),)))
ppc = pm.sample_ppc(trace, model=RE2, samples=2000)
y_pred = ppc['R'].mean(axis=0)[:,None]
However, I get the error: AttributeError: 'DensityDist' object has no attribute 'random'. Is there a way to sample from the distribution? I am able to get the trace, and I can play around with this a bit, but I'm hoping that there is something better.
If it helps:
R = pm.DensityDist('R', logp_nmix(mus, stds, pi), observed=y_output)
I was able to get the posterior properly (i.e. pm.sample_ppc working) when pm.DensityDist was applied to a latent variable rather than a observed variable.
I'm using an AR model to fit my data and I think that I have done that successfully, but now I want to actually see what the fitted model parameters are and I am running into some trouble. Here is my code
model=ar.AR(df['price'],freq='M')
ar_res=model.fit(maxlags=50,ic='bic')
which runs without any error. However when I try to print the model parameters with the following code
print ar_res.params
I get the error
AssertionError: Index length did not match values
I am unable to reproduce this with current master.
import statsmodels.api as sm
from pandas.util import testing
df = testing.makeTimeDataFrame()
mod = sm.tsa.AR(df['A'])
res = mod.fit(maxlags=10, ic='bic')
res.params