n-sample test for equality of proportions python
This statistical test seems pretty straightforward in R > http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R6_CategoricalDataAnalysis/R6_CategoricalDataAnalysis6.html
I looked at scipy, it doesn`t provide statistical tools for more than 2 sample test
I`m looking to a library in python capable of such advanced statistical test.
I assume that what you want to do is a chi-squared test for independence of the proportions between the different classes (see https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Testing_for_statistical_independence). You can do this in Python using the function scipy.stats.chi2_contingency (see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html for documentation).
Related
Property based framework QuickCheck can be instructed to measure how often a particular test case is generated by using collect and measure utility functions (for example: how often the same person on average places an order, how often empty orders are placed). Is there a possibility to adjust the distribution of test cases generated by a rule based statemachine in Hypothesis framework as in Quickcheck?
You can see the frequency of custom events using the event() function and the --hypothesis-show-statistics argument to pytest.
Our stateful testing doesn't support user-defined distributions, which we have found are usually counter-productive, but we do automatically use Swarm Testing to give you an empirically better-than-naive-random distribution - see https://hypothesis.readthedocs.io/en/latest/changes.html#v4-49-0 - among other tricks.
Recently I've been learning to use Python to do ANOVA analysis. I have already found an example of using Tukey's test to do ANOVA but I want to use the least significant difference (LSD) method. Does anyone know how to do it or is there an example?
Thanks.
So mixed-effects regression model is used when I believe that there is dependency with a particular group of a feature. I've attached the Wiki link because it explains better than me. (https://en.wikipedia.org/wiki/Mixed_model)
Although I believe that there are many occasions in which we need to consider the mixed-effects, there aren't too many modules that support this.
R has lme4 and Python seems to have a similar module, but they are both statistic driven; they do not use the cost function algorithm such as gradient boosting.
In Machine Learning setting, how would you handle the situation that you need to consider mixed-effects? Are there any other models that can handle longitudinal data with mixed-effects(random-effects)?
(R seems to have a package that supports mixed-effects: https://rd.springer.com/article/10.1007%2Fs10994-011-5258-3
But I am looking for a Python solution.
There are, at least, two ways to handle longitudinal data with mixed-effects in Python:
StatsModel for linear mixed effects;
MERF for mixed effects random forest.
If you go for StatsModel, I'd recommend you to do some of the examples provided here. If you go for MERF, I'd say that the best starting point is here.
I hope it helps!
Is there any python library with functions to perform fixed or random effects meta-analysis?
I have search through google, pypi and other sources but it seems that the most popular python stats libraries lack this functionality.
It would be great if it also provide graphical solutions to produce funnel plots and forest plots.
Forest plot example:
It thought of something similar to R package rmeta
I've found some people creating their own functions manually, but it isn't a actual library. In addition, metasoft was promising, but it uses python only to convert between formats.
Just to say, it seems the mostly widely used tool is R's metafor, which provides seemingly every possible method used and includes essential plotting functions.
In Python, PythonMeta the backend for a web-based tool PyMeta which offers many of the methods (fixed and random effects, various data types) found in metafor.
This PyMARE project is still under development but does provide various fixed and random effects meta-analysis estimators (this is a spin-off from the rather more mature NiMARE tool for neuroimaging meta-analysis).
statsmodels now also offers some options for meta-analysis and visualization of its results, more information here:
https://www.statsmodels.org/devel/examples/notebooks/generated/metaanalysis1.html
I'd like to run a chi-squared test in Python. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse.
Background first: I have two groups of users. My null hypothesis is that there is no significant difference in whether people in either group are more likely to use desktop, mobile, or tablet.
These are the observed frequencies in the two groups:
[[u'desktop', 14452], [u'mobile', 4073], [u'tablet', 4287]]
[[u'desktop', 30864], [u'mobile', 11439], [u'tablet', 9887]]
Here is my code using scipy.stats.chi2_contingency:
obs = np.array([[14452, 4073, 4287], [30864, 11439, 9887]])
chi2, p, dof, expected = stats.chi2_contingency(obs)
print p
This gives me a p-value of 2.02258737401e-38, which clearly is significant.
My question is: does this code look valid? In particular, I'm not sure whether I should be using scipy.stats.chi2_contingency or scipy.stats.chisquare, given the data I have.
I can't comment too much on the use of the function. However, the issue at hand may be statistical in nature. The very small p-value you are seeing is most likely a result of your data containing large frequencies ( in the order of ten thousand). When sample sizes are too large, any differences will become significant - hence the small p-value. The tests you are using are very sensitive to sample size. See here for more details.
You are using chi2_contingency correctly. If you feel uncertain about the appropriate use of a chi-squared test or how to interpret its result (i.e. your question is about statistical testing rather than coding), consider asking it over at the "CrossValidated" site: https://stats.stackexchange.com/