I'm trying to do 2 stage least squares regression in python using the statsmodels library:
from statsmodels.sandbox.regression.gmm import IV2SLS
resultIV = IV2SLS(dietdummy['Log Income'],
dietdummy.drop(['Log Income', 'Diabetes']),
dietdummy.drop(['Log Income', 'Reads Nutri')
Reads Nutri is my endogenous variable my instrument is Diabetes and my dependent variable is Log Income.
Did I do this right? It is much different than the way I would do it on Stata.
Also, when I do resultIV.summary(), I get a TypeError (something to do with the F statistic being nonetype). How can I resolve this?
I found this question when I wanted to do an IV2SLS regression myself and had the same problem. So, just for everybody else who landed here.
The documentation of statsmodels shows how to use this command. Your arguments are endog, exog, and instrumentin that order where exog includes variables which are instrumented and instrument the instruments and other control variables. In that sense, your model is fine.
The TypeError you found is currently an open bug in versions 0.6.0 and 0.8.1. and will be fixed in 0.9.0 according to the milestone.
Update (28.06.2018): Version 9.0.0 was released on 15 May and should include a fix for the aforementioned bug.
Personally, I found the IV2SLS function in linearmodels 4.5 to be more intuitive than the statsmodels version, as it has separate parameters for the dependent variable and the endogenous variable(s), whereas the statsmodels version doesn't. The results I got from the linearmodels function lined up with what I would get with an Excel add-in I got through school.
If you choose to use the linearmodels function, this guide should also help. For instance, it showed me that I needed to add in a constant for my function to produce the correct output.
Related
Recently, I saw a post about obatining Stata-like margins using Statsmodels:
statsmodels get_margeff() for OLS
A user („tmck“) was working on a implementation in Statsmodels.
I tried to comment under the post to inquire whether there are any new developments, however, I could not (obviously, one needs a track record of postings before comments are allowed).
Does anyone know more? Has something already been developed? When will there be a possiblity to calucalte margins with Statsmodels in the same way as with Stata or the ‚margins‘ package in R (https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html).
Best and thanks in advance
I am running a fixed effects panel regression use the PanelOLS() function in linearmodels 4.5.
While trying to add the 'entity_effects=True' and 'time_effects=True' in the model estimation, it returned 'AbsorbingEffectError':
The model cannot be estimated. The included effects have fully absorbed
one or more of the variables. This occurs when one or more of the dependent
variable is perfectly explained using the effects included in the model.
How can I fix the 'AbsorbingEffectError'?
panel = panel.set_index(['firm', 'Date'])
exog_vars = panel[['ex_mkt', 'MV', 'ROA', 'BTM','leverage','2nd']]
exog = sm.add_constant(exog_vars)
y = panel[['ex_firm']]
model = PanelOLS(y, exog_vars,entity_effects=True).fit(cov_type='clustered', cluster_entity=True)
I am following the exact same steps as the example of the Fixed Effects model from the documentationhttps://bashtage.github.io/linearmodels/doc/panel/examples/examples.html#
I think G.mc and TiTo have a good point and I had the same issue today.
It appears that if you have a 'constant' variable (which means no variation), then this problem appears in python.
I tried as well in stata, and it seems to work even though constants are included.
By constant I mean the usual 'c' introduced in the analysis and any other variable which in fact is static over the time period.
So mixed-effects regression model is used when I believe that there is dependency with a particular group of a feature. I've attached the Wiki link because it explains better than me. (https://en.wikipedia.org/wiki/Mixed_model)
Although I believe that there are many occasions in which we need to consider the mixed-effects, there aren't too many modules that support this.
R has lme4 and Python seems to have a similar module, but they are both statistic driven; they do not use the cost function algorithm such as gradient boosting.
In Machine Learning setting, how would you handle the situation that you need to consider mixed-effects? Are there any other models that can handle longitudinal data with mixed-effects(random-effects)?
(R seems to have a package that supports mixed-effects: https://rd.springer.com/article/10.1007%2Fs10994-011-5258-3
But I am looking for a Python solution.
There are, at least, two ways to handle longitudinal data with mixed-effects in Python:
StatsModel for linear mixed effects;
MERF for mixed effects random forest.
If you go for StatsModel, I'd recommend you to do some of the examples provided here. If you go for MERF, I'd say that the best starting point is here.
I hope it helps!
I have applied Boruta on my dataset to determine the importance of features with respect to a predictor variable. However it is unable to determine the importance of several features.They are being shown as tentative.
Is there any function as TentativeRoughFix in Python. The TentativeRoughFix function is present in R-language. If there is any such function, can anybody guide me towards it. Or any suggestion regarding how to change the importance of variables from "tentative" to "important" or "not important" in python will be very appreciated.
There are plenty of options for feature selection in scikit-learn (see docu).
There is also a Boruta python implementation Boruta_py, but I never tested it.
I am trying to use GARCH(1,1) to find the hedge ratio as described in this paper http://search.livjm.ac.uk/AFE/AFE_docs/cibef0402.pdf. However, Python does not offer packages for GARCH(1,1), thus I think I have to implement it myself.
The data I have for the Index and the Futures are their daily returns. I would like to write a function that takes in the daily returns and output the beta of GARCH as the hedging ratio. However, I am at loss where to start writing the GARCH function. Could anyone outline step-by-step the algorithm for GARCH(1,1) in this case?
There is an implementation of this in the Python statsmodels library. The source code is available here.
There is also ARCH models in Python