Recently I've been learning to use Python to do ANOVA analysis. I have already found an example of using Tukey's test to do ANOVA but I want to use the least significant difference (LSD) method. Does anyone know how to do it or is there an example?
Thanks.
Related
I've been trying to identify some seasonal customers in a dataset. As a first approach, I used the seasonal_decompose() function of the statsmodel package - this is useful for visualizing specific customers, but won't work for the whole dataset as I have almost 8000 different time series, one for each client.
Then, I decided trying the ADF test - the problem here was that it only detects
if my series is stationary or not, and because of the trend it won't work in my case.
I also tried combining this with the KPSS test (that tests for trend-stationarity),
but the results still bad.
Now, I have thought about four alternatives:
Find a way to evaluate it manually using a mean/variance approach
Try using CHTest
Try using the Darts package
Detrend my data and apply those tests (or others) again
The thing is that I couldn't find good examples of any of this in Python... most of the
solutions I found for my problem are developed in R. Is there a suitable way of
doing this in Python or should I give up, export my series and try using R?
Could you help me with some tips? I would really appreciate reading suggestions too. Thanks!
n-sample test for equality of proportions python
This statistical test seems pretty straightforward in R > http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R6_CategoricalDataAnalysis/R6_CategoricalDataAnalysis6.html
I looked at scipy, it doesn`t provide statistical tools for more than 2 sample test
I`m looking to a library in python capable of such advanced statistical test.
I assume that what you want to do is a chi-squared test for independence of the proportions between the different classes (see https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Testing_for_statistical_independence). You can do this in Python using the function scipy.stats.chi2_contingency (see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html for documentation).
So mixed-effects regression model is used when I believe that there is dependency with a particular group of a feature. I've attached the Wiki link because it explains better than me. (https://en.wikipedia.org/wiki/Mixed_model)
Although I believe that there are many occasions in which we need to consider the mixed-effects, there aren't too many modules that support this.
R has lme4 and Python seems to have a similar module, but they are both statistic driven; they do not use the cost function algorithm such as gradient boosting.
In Machine Learning setting, how would you handle the situation that you need to consider mixed-effects? Are there any other models that can handle longitudinal data with mixed-effects(random-effects)?
(R seems to have a package that supports mixed-effects: https://rd.springer.com/article/10.1007%2Fs10994-011-5258-3
But I am looking for a Python solution.
There are, at least, two ways to handle longitudinal data with mixed-effects in Python:
StatsModel for linear mixed effects;
MERF for mixed effects random forest.
If you go for StatsModel, I'd recommend you to do some of the examples provided here. If you go for MERF, I'd say that the best starting point is here.
I hope it helps!
I am trying to use GARCH(1,1) to find the hedge ratio as described in this paper http://search.livjm.ac.uk/AFE/AFE_docs/cibef0402.pdf. However, Python does not offer packages for GARCH(1,1), thus I think I have to implement it myself.
The data I have for the Index and the Futures are their daily returns. I would like to write a function that takes in the daily returns and output the beta of GARCH as the hedging ratio. However, I am at loss where to start writing the GARCH function. Could anyone outline step-by-step the algorithm for GARCH(1,1) in this case?
There is an implementation of this in the Python statsmodels library. The source code is available here.
There is also ARCH models in Python
I am currently using Pyevolve package to solve some Genetic Algorithms problems. I am wondering is there any examples using Pareto ranking in Pyevolve package, since I have multi evaluation functions.
If not exists, could you plz provides some pseudo code of Pareto ranking algorithms. I want to implement it by myself.
Thank you!!
Based on the last release documentation there effectively doesn't seem to be any Pareto ranking package in Pyevolve.
If you want to implement it yourself, you should check NSGA-II which is one most well known and best working algorithm for multi-objective optimization. The original article, containing pseudocode, can be found here : http://sci2s.ugr.es/docencia/doctobio/2002-6-2-DEB-NSGA-II.pdf
If you want to develop multi-objective genetic algorithms in Python and since Pyevolve development is quite moribund, I would recommend you to check out a more versatile framework named DEAP : http://deap.googlecode.com/. The framework already includes everything you need to do multi-objectives GAs, and provides many examples of how this can be done (NSGA-II is already implemented in DEAP). The transition from Pyevolve should be easy as the documentation is quite complete. You can also get in touch with the developers, they answer questions very quickly.