I am using tsa.DynamicFactor of statsmodels with multiple Factors. I would like to include some contraints that impose that some factors can affect only some series. How can I do that?
I couldn't find anything online.
thanks a lot!
Best
Instead of DynamicFactor you need to use DynamicFactorMQ which allows to specify which observed variables load on which factors via a combination of factors, factor_orders and factor_multiplicities arguments. The docs have some usage examples.
Related
I have dataset in pandas DataFrame. I build a function which returns me a dataframe which looks like this:
Feature_name_1 Feature_name_2 corr_coef p-value
ABC DCA 0.867327 0.02122
So it's taking independent variables and returns correlation coefficient of them.
Is there is any easy way I can check in this way non-linear relationship?
In above case I used scipy Pearson correlation but I cannot find how to check non-linear? I found only more sophisticated methods and I would like have something easy to implement as above.
It will be enough if method will be easy to implement it's not necessary have to be from scipy on other specific packages
Regress your dependent variables on your independent variables and examine the residuals. If your residuals show a pattern there is likely a nonlinear relationship.
It may also be the case that your model is missing a cross term or could benefit from a transformation or something along those lines. I might be wrong but I'm not aware of a cut and dry test for non linearity.
Quick google search returned this which seems like it might be useful for you.
https://stattrek.com/regression/residual-analysis.aspx
Edit: Per the comment below, this is very general method that helps verify the linear regression assumptions.
I am trying to translate some Matlab code into Python. The following is the particular line I want to translate.
options = optimset('Display','off','Diagnostics','off','MaxIter',2000,'TolFun',1e-10,'TolX',1e-10 )
I was wondering if there existed a similar structure in Python. Can someone also explain more about what optimset actually does in this particular case?
With optimset you set the options for an optimization problem solver. Here you can find details about the options.
'Display','off', - displays no output.
'Diagnostics','off', - does not display any diagnostics
'MaxIter',2000, - Maximum number of iterations allowed is set to 2000.
'TolFun',1e-10, - Termination tolerance on the function value.
'TolX',1e-10 - Termination tolerance on x.
How to do this in python depends on the package you want to use. You can e.g. use scipy which provides a wide range of solvers for all kinds of problems.
I'm learning python to make models these days. I read the documentation of scipy.optimize.fmin. It also recommends scipy.optimize.minimize. It seems that scipy.optimize.minimize is a more advanced method. Real wonder what's the difference between these two.
scipy.optimize.minimize is a high-level interface that lets you choose from a broad range of solvers, one of which is Nelder–Mead. scipy.optimize.fmin is a special solver using Nelder–Mead. For a specific subdocumentation of minimize with Nelder–Mead, see here.
So mixed-effects regression model is used when I believe that there is dependency with a particular group of a feature. I've attached the Wiki link because it explains better than me. (https://en.wikipedia.org/wiki/Mixed_model)
Although I believe that there are many occasions in which we need to consider the mixed-effects, there aren't too many modules that support this.
R has lme4 and Python seems to have a similar module, but they are both statistic driven; they do not use the cost function algorithm such as gradient boosting.
In Machine Learning setting, how would you handle the situation that you need to consider mixed-effects? Are there any other models that can handle longitudinal data with mixed-effects(random-effects)?
(R seems to have a package that supports mixed-effects: https://rd.springer.com/article/10.1007%2Fs10994-011-5258-3
But I am looking for a Python solution.
There are, at least, two ways to handle longitudinal data with mixed-effects in Python:
StatsModel for linear mixed effects;
MERF for mixed effects random forest.
If you go for StatsModel, I'd recommend you to do some of the examples provided here. If you go for MERF, I'd say that the best starting point is here.
I hope it helps!
Recently, i am looking through some python modules to understand their behavior and how optimized their implementation are. Can any one tell what algorithm does python use to perform the set difference operations. One possible way to achieve set difference is by using hash tables which will involve an extra N space. I tried to find the source code of set operations but i am not able to find out the code location. Please help.
A set in python is a hash itself. So implementing difference for it is not as hard as you imagine. Looking from a higher level how does one implement set difference? Iterate over one of the collections and add to the result all elements that are not present in the other sequence.