I was searching scipy library for any built in modules for Bayesian curve fitting and I'm not able to find one. All I found is :
scipy.optimize.curve_fit
But the description of this link says that this is non linear least squares fit. My question is - do we have to implement our own module for Bayesian curve fitting or is there any such module that I might have missed?
Bayesian inference is not part of the SciPy library - it is simply out of scope for scipy. There is a number of separate python modules that deal with it, and it seems that you have indeed missed quite a few of those - most notably implementations of Markov chain Monte Carlo algorithms pymc and emcee that are probably the most used MCMC packages. They are both relatively straightforward to set up, but in my opinion emcee is easier to get started with.
As with everything, the devil is in the details with Bayesian curve fitting - I highly recommend reading through this overview to get a feel of subtleties of line fitting.
Related
I need to create a general-linear polynomial model with python. And as definitions for this type of models vary I'd note that I refer to this reference by NI. I guess that Matlab's implementation is quite similar.
I am particularly interested in creating an Output-Error model (OE) with it's initialization handled by Prediction Error Method (PEM).
I've been looking through scikit, statsmodels and some time-series and stat libraries on github but failed to meet the suite that addresses this very task.
I would be grateful for both:
suggestions of ready-made modules/libs
(if none exists) advice on creating my own lib: perhaps, building on top of Numpy, Scipy or one of the mentioned above.
Thank you.
P.S.: A module just for OE/PEM would be sufficient but I doubt this exists separately from other linear polynomial model libs.
My groupmates and I were doing this assignment that involves running a regression on Fama-French 3 factor model. I used python Statsmodels module and they used Stata and we share the same set of data. For Ordinary Least Squares regression, we got the same answers. But robust regression results for some reason don't agree.
Here is the result from Stata:
Here is the result from Statsmodels:
Just wondering what could be the cause of this issue? Any way to resolve it? I also tried different methods (HuberT, RamsayE etc) in Statsmodels and none of them had the same answers as the result from Stata. Any help is appreciated.
The equivalent of Stata's
regress ..., robust
in statsmodels is
OLS(...).fit(cov_type='HC1')
The options for the robust sandwich covariance matrices are here http://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.get_robustcov_results.html, but the use is through the fit keywords.
There is an incomplete FAQ answer for differences in robust standard errors between Stata and statsmodels. https://github.com/statsmodels/statsmodels/issues/1923
statsmodel.robust and RLM refer to outlier robust estimation. This is an M-estimator and the covariance has the original Huber sandwich form.
Here is the main page for statsmodels.robust
http://www.statsmodels.org/devel/rlm.html
and the documentation for RLM
http://www.statsmodels.org/devel/generated/statsmodels.robust.robust_linear_model.RLM.html
I have some functional, such as S[f] = \int_\Omega f^2(x) dx. If you're familiar with physics, it's the action. This object takes in a function defined on a certain domain \Omega and gives you a number. The math jargon for this is functional.
Now I need to minimize this thing with respect to f. I know SciPy has an optimize package that allows one to minimize multivariable functions, but I am curious if there is a better way considering if I used this I would be minimizing over ~10,000 variables (because the functions are essentially just lists of 10,000 numbers).
Do I have any other options?
You could use symbolic regression to find the function.
There are several packages available:
deap
glyph
gplearn
monkeys
Here is a good paper on symbolic regression by Schmidt and Lipson.
Although it is more designed for doing Neural Network stuff, Tensorflow sounds like it would work for you. It has the ability to differentiate vector equations and also optimize them using gradient descent.
I am looking for a Python online learning/incremental learning algorithm of 'reasonable' complexity.
In Scikit-learn I have found a few algorithms with the partial_fit method, namely ['BernoulliNB', 'GaussianNB', 'MiniBatchKMeans', 'MultinomialNB', 'PassiveAggressiveClassifier', PassiveAggressiveRegressor', 'Perceptron', 'SGDClassifier', 'SGDRegressor']
All these algorithms form simple decision boundaries as far as I can see. Do we have out-of-the-box online algorithms somewhere in Python which can model more complex decision boundaries?
Correction: As noted below K-means does of course not have a simple decision boundary. What I was looking for was supervised algorithms capable of, e.g., XOR.
One general approach is to combine a Linear-Classifier with some Kernel-Approximation techniques, e.g.:
SGD-based SVM/Logistic Regression with:
Nystroem
RBFSampler / Random Kitchen Sinks
Just build up a pipeline and you are still able to use partial_fit.
One more remark (regarding your list of algorithms): KMeans or KNearestNeighbor does not form a linear decision-boundary!
Is there a way to have an x,y pair dataset given to a function that will return a list of curve fit models and the coeff. The program DataFit does this with about 200 different models, but we are looking for a pythonic way. From exponential to inverse polynomial etc.
I have seen many posts of manually using scipy to type each model, but this is not feasible for the number of models we want to test.
The closest I found was pyeq2, but this is not returning the list of functions, and seems to be a rabbit hole to code for.
If R has this available, we could use that but python is really the goal
Below is an example of the data, we want to find the best way to describe this curve
You can try library splines in R. I have used this for higher order curve fitting to some univariate data. You can try to change and achieve similar thing with corresponding R^2 errors.
You can either decide to do the following:
Choose a model to fit a parameters. This model should be based on a single independent variable. This can be done by python's scipy.optimize curve_fit function. You can choose something like a hyberbola.
Choose a model that is complex and likely represents an underlying mechanism of something at work. Like the system of ODE's from a disease SIR model. Fitting the parameters will be no easy task. This will be done by Markov Chain Monte Carlo (MCMC) methods. This is VERY difficult.
Realise that you have data and can use machine learning via scikit learn to predict from your data. This is a method that doesn't require parameters.
Machine learning and neural networks don't fit something and can't really tell you about the underlying mechanism but can make predicitions just as a best fit model would...dare I say even better.
In the end, we found that Eureqa software was able to achieve this. https://www.nutonian.com/products/eureqa/