Is there any python library with functions to perform fixed or random effects meta-analysis?
I have search through google, pypi and other sources but it seems that the most popular python stats libraries lack this functionality.
It would be great if it also provide graphical solutions to produce funnel plots and forest plots.
Forest plot example:
It thought of something similar to R package rmeta
I've found some people creating their own functions manually, but it isn't a actual library. In addition, metasoft was promising, but it uses python only to convert between formats.
Just to say, it seems the mostly widely used tool is R's metafor, which provides seemingly every possible method used and includes essential plotting functions.
In Python, PythonMeta the backend for a web-based tool PyMeta which offers many of the methods (fixed and random effects, various data types) found in metafor.
This PyMARE project is still under development but does provide various fixed and random effects meta-analysis estimators (this is a spin-off from the rather more mature NiMARE tool for neuroimaging meta-analysis).
statsmodels now also offers some options for meta-analysis and visualization of its results, more information here:
https://www.statsmodels.org/devel/examples/notebooks/generated/metaanalysis1.html
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm a newbie to the data world. Any help at a high level for the above would be greatly appreciated.
Thanks
You can make multiple differences. On the first level: What is the difference between Python, R and Tableau? These are basically all three entirely different tools.
Python is an open-source general purpose programming language. In other words, Python is suitable for a lot of tasks. You can program desktop applications, websites/servers but is also heavily represented in datascience, computational/natural sciences,...
R on the other hand is also considered a programming language, but specifically for datascience and statistics. This means that, out-of-the-box, R will have a lot of statistical tools and functionality whereas Python requires some third-party installation.
Tableau is not a programming language, but rather a visualisation software package. You can make so-called dashboards with it, that is aimed to summarizing your complex data to human-interpretable information. The main advantage is that programming knowledge is not really necessary.
Now pandas is a third-party, open-source python library for data science and is most known for introducing dataframes. Dataframes are popular because they are an intuitive way to work with your data. You can read and write multiple file formats as well as perform operations on your data frames, calculate some statistics, make visualizations,... Note that R does not need a similar library since dataframes are already included in the standard R package.
ggplot2 is an open-source R library that is specifically aimed at plotting data. The equivalent library in Python would be matplotlib, which pandas uses as well to generate its plots.
Which tool is best for you depends on what you would like to do. If you aren't interested in learning how to program, go for Tableau. If you do want to learn how to code, you could go with Python and/or R. If you are only interested in statistics and datascience, you can go with R. However, if you are interested in other parts as well, such as machine learning, web development, ... you're better of with Python.
ggplot2 is a full-featured R library for making publication-quality statistical graphics. It has a very expressive style that makes it useful for exploratory analysis as well as final graphics for publication.
Pandas is a python library that brings the R "data frame" to python. It is not a plotting library but it interfaces with the matplotlib library to make publication quality graphics.
Tableau is a standalone application for building dashboards and interactive graphics that are linked to databases. It has a mostly graphical interface with less coding than R/Python. The dashboards made in Tableau can also be done in R/Python, but by involving other libraries as well.
Is there a python package handling binary decision diagrams (BDDs) and zero suppressed binary decision diagrams (ZDDs), as in Knuth volume 4?
I know networkx can handle DAGs cleanly, but I'm looking for something that handles the internal garbage keeping of ZDDs, constructions from the algebra of set families (Knuth), construction of BDDs from other types of decision diagrams, and maybe some primitive ZDDs and queries (such as sampling and counting).
There are some packages in other languages: Java and C++. (Edits extending this list would be welcome.)
Edit -- several promising tools listed here: https://github.com/johnyf/tool_lists/blob/master/bdd.md
Edit2 -- Graphillion, a python package recommended by Minato himself in this slides, may be the canonical answer. Especially as it comes with this endearing tutorial video (that comes with this backstory).
Yes. Graphillion should do it.
What's your source of BDD/ZDD? If say for example the source is blif, blif should be translated to x and then the x should be translated to ZDD.
For you, you have to figure out what x is.
Does anyone know if there is a Python-based procedure to decompose time series utilizing STL (Seasonal-Trend-Loess) method?
I saw references to a wrapper program to call the stl function
in R, but I found that to be unstable and cumbersome from the environment set-up perspective (Python and R together). Also, link was 4 years old.
Can someone point out something more recent (e.g. sklearn, spicy, etc.)?
I haven't tried STLDecompose but I took a peek at it and I believe it uses a general purpose loess smoother. This is hard to do right and tends to be inefficient. See the defunct STL-Java repo.
The pyloess package provides a python wrapper to the same underlying Fortran that is used by the original R version. You definitely don't need to go through a bridge to R to get this same functionality! This package is not actively maintained and I've occasionally had trouble getting it to build on some platforms (thus the fork here). But once built, it does work and is the fastest one you're likely to find. I've been tempted to modify it to include some new features, but just can't bring myself to modify the Fortran (which is pre-processed RATFOR - very assembly-language like Fortran, and I can't find a RATFOR preprocessor anywhere).
I wrote a native Java implementation, stl-decomp-4j, that can be called from python using the pyjnius package. This started as a direct port of the original Fortran, refactored to a more modern programming style. I then extended it to allow quadratic loess interpolation and to support post-decomposition smoothing of the seasonal component, features that are described in the original paper but that were not put into the Fortran/R implementation. (They apparently are in the S-plus implementation, but few of us have access to that.) The key to making this efficient is that the loess smoothing simplifies when the points are equidistant and the point-by-point smoothing is done by simply modifying the weights that one is using to do the interpolation.
The stl-decomp-4j examples include one Jupyter notebook demonstrating how to call this package from python. I should probably formalize that as a python package but haven't had time. Quite willing to accept pull requests. ;-)
I'd love to see a direct port of this approach to python/numpy. Another thing on my "if I had some spare time" list.
Here you can find an example of Seasonal-Trend decomposition using LOESS (STL), from statsmodels.
Basicaly it works this way:
from statsmodels.tsa.seasonal import STL
stl = STL(TimeSeries, seasonal=13)
res = stl.fit()
fig = res.plot()
There is indeed:
https://github.com/jrmontag/STLDecompose
In the repo you will find a jupyter notebook for usage of the package.
RSTL is a Python port of R's STL: https://github.com/ericist/rstl. It works pretty well except it is 3~5 times slower than R's STL according to the author.
If you just want to get lowess trend line, you can just use Statsmodels' lowess function
https://www.statsmodels.org/dev/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html.
So mixed-effects regression model is used when I believe that there is dependency with a particular group of a feature. I've attached the Wiki link because it explains better than me. (https://en.wikipedia.org/wiki/Mixed_model)
Although I believe that there are many occasions in which we need to consider the mixed-effects, there aren't too many modules that support this.
R has lme4 and Python seems to have a similar module, but they are both statistic driven; they do not use the cost function algorithm such as gradient boosting.
In Machine Learning setting, how would you handle the situation that you need to consider mixed-effects? Are there any other models that can handle longitudinal data with mixed-effects(random-effects)?
(R seems to have a package that supports mixed-effects: https://rd.springer.com/article/10.1007%2Fs10994-011-5258-3
But I am looking for a Python solution.
There are, at least, two ways to handle longitudinal data with mixed-effects in Python:
StatsModel for linear mixed effects;
MERF for mixed effects random forest.
If you go for StatsModel, I'd recommend you to do some of the examples provided here. If you go for MERF, I'd say that the best starting point is here.
I hope it helps!
I will soon be starting a final year Engineering project, consisting of the real-time tracking of objects moving on a 2D-surface. The objects will be registered by my algorithm using feature extraction.
I am trying to do some research to decide whether I should use MATLAB or use Python Numpy (Numerical Python). Some of the factors I am taking into account:
1.) Experience
I have reasonable experience in both, but perhaps more experience in image processing using Numpy. However, I have always found MATLAB to be very intuitive and easy to pick up.
2.) Real-Time abilities
It is very important that my choice be able to support the real-time acquisition of video data from an external camera. I found this link for MATLAB showing how to do it. I am sure that the same would be possible for Python, perhaps using the OpenCV library?
3.) Performance
I have heard, although never used, that MATLAB can easily split independent calculations across multiple cores. I should think that this would be very useful, and I am not sure whether the same is equally simple for Numpy?
4.) Price
I know that there is a cost associated with MATLAB, but I will be working at a university and thus will have access to full MATLAB without any cost to myself, so price is not a factor.
I would greatly appreciate any input from anyone who has done something similar, and what your experience was.
Thanks!
Python (with NumPy, SciPy and MatPlotLib) is the new Matlab. So I strongly recommend Python over Matlab.
I made the change over a year ago and I am very happy with the results.
Here it is a short pro/con list for Python and Matlab
Python pros:
Object Oriented
Easy to write large and "real" programs
Open Source (so it's completely free to use)
Fast (most of the heavy computation algorithms have a python wrapper to connect with C libraries e.g. NumPy, SciPy, SciKits, libSVM, libLINEAR)
Comfortable environment, highly configurable (iPython, python module for VIM, ...)
Fast growing community of Python users. Tons of documentation and people willing to help
Python cons:
Could be a pain to install (especially some modules in OS X)
Plot manipulation is not as nice/easy as in Matlab, especially 3D plots or animations
It's still a script language, so only use it for (fast) prototyping
Python is not designed for multicore programming
Matlab pros:
Very easy to install
Powerful Toolboxes (e.g. SignalProcessing, Systems Biology)
Unified documentation, and personalized support as long as you buy the licence
Easy to have plot animations and interactive graphics (that I find really useful for running experiments)
Matlab cons:
Not free (and expensive)
Based on Java + X11, which looks extremely ugly (ok, I accept I'm completely biased here)
Difficult to write large and extensible programs
A lot of Matlab users are switching to Python :)
I would recommend python.
I switched from MATLAB -> python about 1/2 way through my phd, and do not regret it. At the most simplistic, python is a much nicer language, has real objects, etc.
If you expect to be doing any parts of your code in c/c++ I would definitely recommend python. The mex interface works, but if your build gets complicated/big it starts to be a pain and I never sorted out how to effectively debug it. I also had great difficulty with mex+allocating large blocks interacting with matlab's memory management (my inability to fix that issue is what drove me to switch).
As a side note/self promotion, I have Crocker-Grier in c++ (with swig wrappers) and pure python.
If you're experienced with both languages it's not really a decision criterion.
Matlab has problems coping with real time settings especially since most computer vision algorithms are very costly. This is the advantage of using a tried and tested library such as OpenCV where many of the algorithms you'll be using are efficiently implemented. Matlab offers the possibility of compiling code into Mex-files but that is a lot of work.
Matlab has parallel for loops parfor which makes multicore processing easy (or at least easier). But the question is if that will suffice to get real-time speeds.
No comment.
The main advantage of Matlab is that you'll obtain a running program very quickly due to its good documentation. But I found that code reusability is bad with Matlab unless you put a heavy emphasis on it.
I think the final decision has to be if you have to/can run your algorithm real-time which I doubt in Matlab, but that depends on what methods you're planning to use.
Others have made a lot of great comments (I've opined on this topic before in another answer https://stackoverflow.com/a/5065585/392949) , but I just wanted to point out that Python has a number of really excellent tools for parallel computing/splitting up work across multiple cores. Here's a short and by no means comprehensive list:
IPython Parallel toolkit: http://ipython.org/ipython-doc/dev/parallel/index.html
mpi4py: https://code.google.com/p/mpi4py
The multiprocessing module in the standard library: http://docs.python.org/library/multiprocessing.html
pyzmq: http://zeromq.github.com/pyzmq/ (what the IPython parallel toolkit is based on)
parallel python (pp): http://www.parallelpython.com/
Cython's wrapping of openmp: http://docs.cython.org/src/userguide/parallelism.html
You will also probably find cython to be much to be a vastly superior tool compared to what Matlab has to offer if you ever need to interface external C-libraries or write C-extensions, and it has excellent numpy support built right in.
There is a list with a bunch of other options here:
http://wiki.python.org/moin/ParallelProcessing