I need to create a general-linear polynomial model with python. And as definitions for this type of models vary I'd note that I refer to this reference by NI. I guess that Matlab's implementation is quite similar.
I am particularly interested in creating an Output-Error model (OE) with it's initialization handled by Prediction Error Method (PEM).
I've been looking through scikit, statsmodels and some time-series and stat libraries on github but failed to meet the suite that addresses this very task.
I would be grateful for both:
suggestions of ready-made modules/libs
(if none exists) advice on creating my own lib: perhaps, building on top of Numpy, Scipy or one of the mentioned above.
Thank you.
P.S.: A module just for OE/PEM would be sufficient but I doubt this exists separately from other linear polynomial model libs.
Related
For a project I am working on, I need to find a model for the data graphed below that includes a sine or cosine component (hard to tell from the image but the data does follow a trig-like function for each period, although the amplitude/max/mins are changing).
data
I originally planned on finding a simple regression model for my data using Desmos before I saw how complex the data was, but alas, I do not think I am capable of determining what equation to use without the help of Python. I don't have much experience with regression in Python, I've only done basic linear modeling where I knew the type of equation and was just determining the coefficients/constants. Could anyone offer a guiding example, git code, or resources that would be useful for this?
Your question is pretty generic and looking at the graph, we cannot tell much about the data to give you a more detailed answer, but i'd say have a look at OLS
https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html
You could also look at scikit learn for the various regression models it provides.
http://scikit-learn.org/stable/modules/linear_model.html
Essentially,these packages will help you figure our the equation you are looking to have for your data.
Also, looks like your graph has an outlier ? Please note regression is very sensitive to outliers, so you may want to handle those data points before fitting the model.
I was searching scipy library for any built in modules for Bayesian curve fitting and I'm not able to find one. All I found is :
scipy.optimize.curve_fit
But the description of this link says that this is non linear least squares fit. My question is - do we have to implement our own module for Bayesian curve fitting or is there any such module that I might have missed?
Bayesian inference is not part of the SciPy library - it is simply out of scope for scipy. There is a number of separate python modules that deal with it, and it seems that you have indeed missed quite a few of those - most notably implementations of Markov chain Monte Carlo algorithms pymc and emcee that are probably the most used MCMC packages. They are both relatively straightforward to set up, but in my opinion emcee is easier to get started with.
As with everything, the devil is in the details with Bayesian curve fitting - I highly recommend reading through this overview to get a feel of subtleties of line fitting.
I'm a beginner to using statsmodels & I'm also open to using other Python based methods of solving my problem:
I have a data set with ~ 85 features some of which are highly correlated.
When I run the OLS method I get a helpful 'strong multicollinearity problems' warning as I might expect.
I've previously run this data through Weka, which as part of the regression classifier has an eliminateColinearAttributes option.
How can I do the same thing - get the model to chose which attributes to use instead of having them all in the model?
Thanks!
To run multivariate regression use scipy.stats.linregress. Check out this nice example which has a good explanation.
The eliminateColinearAttributes option in the software you've mentioned is just some algorithm implemented in this software to fight the problem. Here, you need to implement some iterative algorithm yourself based on elimination of one of highly correlated variables with the highest p-value (then run regression again and repeat until multicollinearity is not there).
There's no one and only way here, there are different techniques. It is also a good practice to choose manually from the set of highly correlated with each other set of variables which to omit that it also makes sense.
I am trying to train a Hidden Markov Model (HMM) using the GHMM library. So far, I have been able to train both a discrete model, and a continuous model using a single Gaussian for each of the states.
There are really good examples on how to do it here.
However, I would like to train a continuous HMM with a single covariance matrix tied across all states (instead of having one for each state). Is that possible with GHMM lib? If it is, I would love to see some examples. If not, could somebody point me to some other code, or refer me to another HMM python/c library that can actually do it?
Thank you!
So, I have found this great package in C that has an HMM implementation exactly the way I wanted: Queen Mary Digital Signal Processing Library. More specifically, the HMM implementation is in these files. So, no need to use GHMM lib anymore.
I'm wondering what the set_weights method of the Maxent class in NLTK is used for (or more specifically how to use it). As I understand, it allows you to manually assign weights to certain features? Could somebody provide a basic example of the type of parameter that would be passed into it?
Thanks
Alex
It apparently allows you to set the coefficient matrix of the classifier. This may be useful if you have an external MaxEnt/logistic regression learning package from which you can export the coefficients. The train_maxent_classifier_with_gis and train_maxent_classifier_with_iis learning algorithms call this function.
If you don't know what a coefficient matrix is; it's the β mentioned in Wikipedia's treatment of MaxEnt.
(To be honest, it looks like NLTK is either leaking implementation details here, or has a very poorly documented API.)