Python: Super Dictionary of Simple OLS

Python: Super Dictionary of Simple OLS - python

I am trying to build a super dictionary which holds within a number of lower level libraries
Concept
I have interest rates for my retail bank for the last 12 years and I am trying to model the interest rates by using a portfolio of different bonds.
Regression formula
Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
In words, Y_Lag = alpha + beta(X_Lag) + Error term
Data
Note: Y = Historic Rate
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)),
columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
Code thus far
#Import packages required for the analysis
import pandas as pd
import numpy as np
import statsmodels.api as sm
def Simulation(TotalSim,j):
#super dictionary to hold all iterations of the loop
Super_fit_d = {}
for i in range(1,TotalSim):
#Create a introductory loop to run the first set of regressions
#Each loop produces a univariate regression
#Each loop has a fixed lag of i
fit_d = {} # This will hold all of the fit results and summaries
for col in [x for x in df.columns if x != 'Historic Rate']:
Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
# Need to remove the NaN for fit
Y = Y[Y.notnull()]
X = df[col] - df[col].shift(i)
X = X[X.notnull()]
#Y now has more observations than X due to lag, drop rows to match
Y = Y.drop(Y.index[0:i-1])
if j = 1:
X = sm.add_constant(X) # Add a constant to the fit
fit_d[col] = sm.OLS(Y,X).fit()
#append the dictionary for each lag onto the super dictionary
Super_fit_d[lag_i] = fit_d
#Check the output for one column
fit_d['Overnight'].summary()
#Check the output for one column in one segment of the super dictionary
Super_fit_d['lag_5'].fit_d['Overnight'].summary()
Simulation(11,1)
Question
I seem to be overwriting my dictionary with every loop and I'm not evaluating the i properly to index the iteration as lag_1, lag_2, lag_3 etc. How do I fix this?
Thanks in advance

There are a couple of issues here:
you sometimes use i and sometimes lag_i, but only i is defined. I changed all to lag_i for consistency
if j = 1 is incorrect syntax. You need if j == 1
You need to return fit_d so that it persists after your loop
I got it done by applying those changes
import pandas as pd
import numpy as np
import statsmodels.api as sm
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)),
columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
def Simulation(TotalSim,j):
Super_fit_d = {}
for lag_i in range(1,TotalSim):
#Create a introductory loop to run the first set of regressions
#Each loop produces a univariate regression
#Each loop has a fixed lag of i
fit_d = {} # This will hold all of the fit results and summaries
for col in [x for x in df.columns if x != 'Historic Rate']:
Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
# Need to remove the NaN for fit
Y = Y[Y.notnull()]
X = df[col] - df[col].shift(lag_i)
X = X[X.notnull()]
#Y now has more observations than X due to lag, drop rows to match
Y = Y.drop(Y.index[0:lag_i-1])
if j == 1:
X = sm.add_constant(X) # Add a constant to the fit
fit_d[col] = sm.OLS(Y,X).fit()
#append the dictionary for each lag onto the super dictionary
# return fit_d
Super_fit_d[lag_i] = fit_d
return Super_fit_d
test_dict = Simulation(11,1)
First lag
test_dict[1]['Overnight'].summary()
Out[76]:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: Historic Rate R-squared: 0.042
Model: OLS Adj. R-squared: 0.033
Method: Least Squares F-statistic: 4.303
Date: Fri, 28 Sep 2018 Prob (F-statistic): 0.0407
Time: 11:15:13 Log-Likelihood: -280.39
No. Observations: 99 AIC: 564.8
Df Residuals: 97 BIC: 570.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0048 0.417 -0.012 0.991 -0.833 0.823
Overnight 0.2176 0.105 2.074 0.041 0.009 0.426
==============================================================================
Omnibus: 1.449 Durbin-Watson: 2.756
Prob(Omnibus): 0.485 Jarque-Bera (JB): 1.180
Skew: 0.005 Prob(JB): 0.554
Kurtosis: 2.465 Cond. No. 3.98
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""
Second Lag
test_dict[2]['Overnight'].summary()
Out[77]:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: Historic Rate R-squared: 0.001
Model: OLS Adj. R-squared: -0.010
Method: Least Squares F-statistic: 0.06845
Date: Fri, 28 Sep 2018 Prob (F-statistic): 0.794
Time: 11:15:15 Log-Likelihood: -279.44
No. Observations: 98 AIC: 562.9
Df Residuals: 96 BIC: 568.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0315 0.428 0.074 0.941 -0.817 0.880
Overnight 0.0291 0.111 0.262 0.794 -0.192 0.250
==============================================================================
Omnibus: 2.457 Durbin-Watson: 2.798
Prob(Omnibus): 0.293 Jarque-Bera (JB): 1.735
Skew: 0.115 Prob(JB): 0.420
Kurtosis: 2.391 Cond. No. 3.84
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""

Related

Repeated columns of a single variable when using statsmodels.formula.api package ols function in python

I am trying to perform multiple linear regression using the statsmodels.formula.api package in python and have listed the code that i have used to perform this regression below.
auto_1= pd.read_csv("Auto.csv")
formula = 'mpg ~ ' + " + ".join(auto_1.columns[1:-1])
results = smf.ols(formula, data=auto_1).fit()
print(results.summary())
The data consists the following variables - mpg, cylinders, displacement, horsepower, weight , acceleration, year, origin and name. When the print result comes up, it shows multiple rows of the horsepower column and the regression results are also not correct. Im not sure why?
screenshot of repeated rows

It's likely because of the data type of the horsepower column. If its values are categories or just strings, the model will use treatment (dummy) coding for them by default, producing the results you are seeing. Check the data type (run auto_1.dtypes) and cast the column to a numeric type (it's best to do it when you are first reading the csv file with the dtype= parameter of the read_csv() method.
Here is an example where a column with numeric values is cast (i.e. converted) to strings (or categories):
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
df = pd.DataFrame(
{
'mpg': np.random.randint(20, 40, 50),
'horsepower': np.random.randint(100, 200, 50)
}
)
# convert integers to strings (or categories)
df['horsepower'] = (
df['horsepower'].astype('str') # same result with .astype('category')
)
formula = 'mpg ~ horsepower'
results = smf.ols(formula, df).fit()
print(results.summary())
Output (dummy coding):
OLS Regression Results
==============================================================================
Dep. Variable: mpg R-squared: 0.778
Model: OLS Adj. R-squared: -0.207
Method: Least Squares F-statistic: 0.7901
Date: Sun, 18 Sep 2022 Prob (F-statistic): 0.715
Time: 20:17:51 Log-Likelihood: -110.27
No. Observations: 50 AIC: 302.5
Df Residuals: 9 BIC: 380.9
Df Model: 40
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
Intercept 32.0000 5.175 6.184 0.000 20.294 43.706
horsepower[T.103] -4.0000 7.318 -0.547 0.598 -20.555 12.555
horsepower[T.112] -1.0000 7.318 -0.137 0.894 -17.555 15.555
horsepower[T.116] -9.0000 7.318 -1.230 0.250 -25.555 7.555
horsepower[T.117] 6.0000 7.318 0.820 0.433 -10.555 22.555
horsepower[T.118] 2.0000 7.318 0.273 0.791 -14.555 18.555
horsepower[T.120] -4.0000 6.338 -0.631 0.544 -18.337 10.337
etc.
Now, converting the strings back to integers:
df['horsepower'] = pd.to_numeric(df.horsepower)
# or df['horsepower'] = df['horsepower'].astype('int')
results = smf.ols(formula, df).fit()
print(results.summary())
Output (as expected):
OLS Regression Results
==============================================================================
Dep. Variable: mpg R-squared: 0.011
Model: OLS Adj. R-squared: -0.010
Method: Least Squares F-statistic: 0.5388
Date: Sun, 18 Sep 2022 Prob (F-statistic): 0.466
Time: 20:24:54 Log-Likelihood: -147.65
No. Observations: 50 AIC: 299.3
Df Residuals: 48 BIC: 303.1
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 31.7638 3.663 8.671 0.000 24.398 39.129
horsepower -0.0176 0.024 -0.734 0.466 -0.066 0.031
==============================================================================
Omnibus: 3.529 Durbin-Watson: 1.859
Prob(Omnibus): 0.171 Jarque-Bera (JB): 1.725
Skew: 0.068 Prob(JB): 0.422
Kurtosis: 2.100 Cond. No. 834.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Dataframe count in Statsmodels OLS formula

I'm trying to do a linear regression to predict the count of a dataframe based on the count of another dataframe. I am using statsmodels. I have tried the following:
X = df1.count()
Y = df2.count()
from statsmodels.formula.api import ols
fit = ols(Y ~ X, data=kak).fit()
fit.summary()
Using the X and Y variables in the OLS formula is not allowed and I have no idea what to fill in at the data= keyword argument. How would I go about doing this?

Let's assume you have to 1D arrays X and Y:
import numpy as np
X = np.arange(100)
Y = 2*X + 5
Then you can run a linear regression using the line below. There are two important things:
The first argument to ols is a str containing a formula. "Y ~ X" and not Y ~ X (note the double quotes).
The second argument data can be any Python object as long as it has keys data["X"] and data["Y"]. I wrote a dict here, but it would also work with a DataFrame. It basically allows statsmodels to understand who are X and Y in the formula you gave to it.
ols("Y ~ X", {"X": X, "Y": Y}).fit().summary()
Output:
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 1.916e+33
Date: Fri, 21 May 2021 Prob (F-statistic): 0.00
Time: 14:01:48 Log-Likelihood: 3055.1
No. Observations: 100 AIC: -6106.
Df Residuals: 98 BIC: -6101.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 5.0000 2.62e-15 1.91e+15 0.000 5.000 5.000
X 2.0000 4.57e-17 4.38e+16 0.000 2.000 2.000
==============================================================================
Omnibus: 220.067 Durbin-Watson: 0.015
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9.816
Skew: 0.159 Prob(JB): 0.00739
Kurtosis: 1.498 Cond. No. 114.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

lm in R vs statsmodels.api OLS in Python

I get completely different results with the same datasets in R and Python. I cannot understand why it happens.
R:
library(RcppCNPy)
d <- npyLoad("/home/vvkovalchuk/bin/src/python/asks1.npy")
datas = npyLoad('/home/vvkovalchuk/bin/src/python/bids2.npy')
m <- lm(d ~ datas)
summary(m)
Python:
import time
import numpy
import statsmodels.api as sm
from math import log
Y = numpy.load('./asks1.npy', allow_pickle=True)
X = numpy.load('./bids2.npy', allow_pickle=True)
X3 = sm.add_constant(X)
res_ols = sm.OLS(Y, X3).fit()
print(res_ols.params)
What am I doing wrong?
Results:
R:
Call:
lm(formula = d ~ datas)
Residuals:
Min 1Q Median 3Q Max
-6.089e+06 8.797e+07 2.163e+08 2.179e+08 1.122e+10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.561e+00 2.253e+06 0 1
datas 3.809e+03 2.164e+09 0 1
Residual standard error: 208100000 on 14639 degrees of freedom
Multiple R-squared: 0.2735, Adjusted R-squared: 0.2735
F-statistic: 5512 on 1 and 14639 DF, p-value: < 2.2e-16
Python:
OLS Regression Results
Dep. Variable: y R-squared: 0.112
Model: OLS Adj. R-squared: 0.112
Method: Least Squares F-statistic: 1846.
Date: Thu, 25 Mar 2021 Prob (F-statistic): 0.00
Time: 13:08:43 Log-Likelihood: 1.6948e+05
No. Observations: 14641 AIC: -3.390e+05
Df Residuals: 14639 BIC: -3.389e+05
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0004 3.07e-06 126.136 0.000 0.000 0.000
x1 0.1478 0.003 42.969 0.000 0.141 0.155
Omnibus: 3251.130 Durbin-Watson: 0.004
Prob(Omnibus): 0.000 Jarque-Bera (JB): 14606.605
Skew: 1.019 Prob(JB): 0.00
Kurtosis: 7.449 Cond. No. 1.83e+05
I also tried to swap arguments in OLS function. Still getting incorrect results. Could this be related to NAs?

How to Incorporate and Forecast Lagged Time-Series Variables in a Python Regression Model

I'm trying to figure out how to incorporate lagged dependent variables into statsmodel or scikitlearn to forecast time series with AR terms but cannot seem to find a solution.
The general linear equation looks something like this:
y = B1*y(t-1) + B2*x1(t) + B3*x2(t-3) + e
I know I can use pd.Series.shift(t) to create lagged variables and then add it to be included in the model and generate parameters, but how can I get a prediction when the code does not know which variable is a lagged dependent variable?
In SAS's Proc Autoreg, you can designate which variable is a lagged dependent variable and will forecast accordingly, but it seems like there are no options like that in Python.
Any help would be greatly appreciated and thank you in advance.

Since you're already mentioned statsmodels in your tags you may want to take a look at statsmodels - ARIMA, i.e.:
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(endog=t, order=(2, 0, 0)) # p=2, d=0, q=0 for AR(2)
fit = model.fit()
fit.summary()
But like you mentioned, you could create new variables manually the way you described (I used some random data):
import numpy as np
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])
df['random_variable'] = np.random.randint(0, 10, len(df))
df['y'] = np.random.rand(len(df))
df.index = df['date']
df = df[['y', 'value', 'random_variable']]
df.columns = ['y', 'x1', 'x2']
shifts = 3
for variable in df.columns.values:
for t in range(1, shifts + 1):
df[f'{variable} AR({t})'] = df.shift(t)[variable]
df = df.dropna()
>>> df.head()
y x1 x2 ... x2 AR(1) x2 AR(2) x2 AR(3)
date ...
1991-10-01 0.715115 3.611003 7 ... 5.0 7.0 7.0
1991-11-01 0.202662 3.565869 3 ... 7.0 5.0 7.0
1991-12-01 0.121624 4.306371 7 ... 3.0 7.0 5.0
1992-01-01 0.043412 5.088335 6 ... 7.0 3.0 7.0
1992-02-01 0.853334 2.814520 2 ... 6.0 7.0 3.0
[5 rows x 12 columns]
I'm using the model you describe in your post:
model = sm.OLS(df['y'], df[['y AR(1)', 'x1', 'x2 AR(3)']])
fit = model.fit()
>>> fit.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.696
Model: OLS Adj. R-squared: 0.691
Method: Least Squares F-statistic: 150.8
Date: Tue, 08 Oct 2019 Prob (F-statistic): 6.93e-51
Time: 17:51:20 Log-Likelihood: -53.357
No. Observations: 201 AIC: 112.7
Df Residuals: 198 BIC: 122.6
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
y AR(1) 0.2972 0.072 4.142 0.000 0.156 0.439
x1 0.0211 0.003 6.261 0.000 0.014 0.028
x2 AR(3) 0.0161 0.007 2.264 0.025 0.002 0.030
==============================================================================
Omnibus: 2.115 Durbin-Watson: 2.277
Prob(Omnibus): 0.347 Jarque-Bera (JB): 1.712
Skew: 0.064 Prob(JB): 0.425
Kurtosis: 2.567 Cond. No. 41.5
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""
Hope this helps you getting started.

Dual beta in python - multiple linear regression with dummy variable in statsmodel

I am trying to calculate the dual beta in python using a statsmodel regression. Unfortunately I am prompting an error message.
The regression equation for dual betas is given here
Dual Beta Formula
I am neglecting the risk free rate (rf) for now, but implementation should be similiar once I add it. For now my code looks as follows, where my 'spx.xlsx' file simple has two columns with returns, called 'SPXrets' and 'AAPLrets' (+ one column with dates):
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
df = pd.read_excel('spx.xlsx')
print(df.columns)
mod = smf.ols(formula='AAPLrets ~ SPXrets', data=df)
res = mod.fit()
print(res.summary())
Prompting an patsy error:
PatsyError: intercept term cannot interact with anything else
AAPLrets ~ SPXrets:c(D) + SPXrets:(1-c(D))
Grateful for any help - many thanks!

Edit:
After my initial suggestions, the OP has changed both the title and the provided code snippet. My suggestions have since been edited accordingly.
New suggestion:
I suspect you're experiencing some problems with your dataset.
I suggest that you tell us a little more about the data source, how you've loaded the data, what it looks like (structure) and what type your columns have (string, float etc).
What I can tell you right now, is that your snippet runs fine with some sample data like this:
Code:
CONret DAXret:c(D) DAXret:(1-c(D)) AAPLrets SPXrets dummy
2017-01-08 109 107 122 101 100 0
2017-01-09 117 108 124 113 147 0
2017-01-10 142 108 130 107 103 1
2017-01-11 106 121 149 103 104 1
2017-01-12 124 149 143 112 126 0
Output:
OLS Regression Results
==============================================================================
Dep. Variable: AAPLrets R-squared: 0.095
Model: OLS Adj. R-squared: 0.004
Method: Least Squares F-statistic: 1.044
Date: Thu, 14 Feb 2019 Prob (F-statistic): 0.331
Time: 16:00:01 Log-Likelihood: -48.388
No. Observations: 12 AIC: 100.8
Df Residuals: 10 BIC: 101.7
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 84.3198 31.143 2.708 0.022 14.929 153.711
SPXrets 0.2635 0.258 1.022 0.331 -0.311 0.838
==============================================================================
Omnibus: 5.649 Durbin-Watson: 1.882
Prob(Omnibus): 0.059 Jarque-Bera (JB): 2.933
Skew: 1.202 Prob(JB): 0.231
Kurtosis: 3.290 Cond. No. 872.
==============================================================================
Here's the whole thing:
# imports
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
import statsmodels.api as sm
# sample data
np.random.seed(1)
rows = 12
listVars= ['CONret','DAXret:c(D)', 'DAXret:(1-c(D))', 'AAPLrets', 'SPXrets']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)
df = df.set_index(rng)
df['dummy'] = np.random.randint(2, size=df.shape[0])
mod = smf.ols(formula='AAPLrets ~ SPXrets', data=df)
res = mod.fit()
res.summary()
Another suggestion:
Personally, I'd feel much more comfortable without patsy.
The snippet below will let you run a linear regression and select whether to return the model summary, or a dataframe with other details like coefficient p-values and r-squared.
# Imports
import pandas as pd
import numpy as np
import statsmodels.api as sm
# sample data
np.random.seed(1)
rows = 12
listVars= ['CONret','DAXret:c(D)', 'DAXret:(1-c(D))', 'AAPLrets', 'SPXrets']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)
df = df.set_index(rng)
df['dummy'] = np.random.randint(2, size=df.shape[0])
def LinReg(df, y, x, const, results):
betas = x.copy()
# Model with out without a constant
if const == True:
x = sm.add_constant(df[x])
model = sm.OLS(df[y], x).fit()
else:
model = sm.OLS(df[y], df[x]).fit()
# Estimates of R2 and p
res1 = {'Y': [y],
'R2': [format(model.rsquared, '.4f')],
'p': [model.pvalues.tolist()],
'start': [df.index[0]],
'stop': [df.index[-1]],
'obs' : [df.shape[0]],
'X': [betas]}
df_res1 = pd.DataFrame(data = res1)
# Regression Coefficients
theParams = model.params[0:]
coefs = theParams.to_frame()
df_coefs = pd.DataFrame(coefs.T)
xNames = list(df_coefs)
xValues = list(df_coefs.loc[0].values)
xValues2 = [ '%.2f' % elem for elem in xValues ]
res2 = {'Independent': [xNames],
'beta': [xValues2]}
df_res2 = pd.DataFrame(data = res2)
# All results
df_res = pd.concat([df_res1, df_res2], axis = 1)
df_res = df_res.T
df_res.columns = ['results']
if results == 'summary':
return(model.summary())
print(model.summary())
else:
return(df_res)
df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))', 'dummy'], const = True, results = 'summary')
print(df_regression)
Test run 1:
df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))'], const = True, results = '')
Output 1:
results
Y CONret
R2 0.0813
p [0.13194822614949883, 0.45726622261432304, 0.9...
start 2017-01-01 00:00:00
stop 2017-01-12 00:00:00
obs 12
X [DAXret:c(D), DAXret:(1-c(D)), dummy]
Independent [const, DAXret:c(D), DAXret:(1-c(D)), dummy]
beta [88.94, 0.24, -0.01, 2.20]
Test run 2:
df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))', 'dummy'], const = True, results = 'summary')
Output 2:
OLS Regression Results
==============================================================================
Dep. Variable: CONret R-squared: 0.081
Model: OLS Adj. R-squared: -0.263
Method: Least Squares F-statistic: 0.2361
Date: Thu, 14 Feb 2019 Prob (F-statistic): 0.869
Time: 16:04:02 Log-Likelihood: -47.138
No. Observations: 12 AIC: 102.3
Df Residuals: 8 BIC: 104.2
Df Model: 3
Covariance Type: nonrobust
===================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------
const 88.9438 53.019 1.678 0.132 -33.318 211.205
DAXret:c(D) 0.2350 0.301 0.781 0.457 -0.459 0.929
DAXret:(1-c(D)) -0.0060 0.391 -0.015 0.988 -0.908 0.896
dummy 2.2005 8.973 0.245 0.812 -18.490 22.891
==============================================================================
Omnibus: 1.025 Durbin-Watson: 2.354
Prob(Omnibus): 0.599 Jarque-Bera (JB): 0.720
Skew: 0.540 Prob(JB): 0.698
Kurtosis: 2.477 Cond. No. 2.15e+03
==============================================================================

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Super Dictionary of Simple OLS - python

Related

Repeated columns of a single variable when using statsmodels.formula.api package ols function in python

Dataframe count in Statsmodels OLS formula

lm in R vs statsmodels.api OLS in Python

How to Incorporate and Forecast Lagged Time-Series Variables in a Python Regression Model

Dual beta in python - multiple linear regression with dummy variable in statsmodel

Categories

Resources