How to use scikit-learn's SVM with histograms as features? - python

I wish to use scikit-learn's SVM with a chi-squared kernel, as shown here. In this scenario, the kernel is on histograms, which is what my data is represented as. However, I can't find an example of these used with histograms. What is the proper way to do this?
Is the correct approach to just treat the histogram as a vector, where each element in the vector corresponds to a bin of the histogram?
Thank you in advance

There is an example of using an approximate feature map here. It is for the RBF kernel but it works just the same.
The example above uses "pipeline" but you can also just apply the transform to your data before handing it to a linear classifer, as AdditiveChi2Sampler doesn't actually fit to the data in any way.
Keep in mind that this is just and approximation of the kernel map (that I found to work quite well) and if you want to use the exact kernel, you should go with ogrisel's anwser.

sklearn.svm.SVC accepts custom kernels in 2 manners:
arbitrary python functions passed as kernel argument to the constructor
precomputed kernel matrix passed as first argument to fitand kernel=precomputed in the constructor
The former can be much slower but does not require to allocate the whole kernel matrix in advance (which can be prohibitive for large n_samples).
The are more details and links to examples in the documentation on custom kernels.

Related

Issues with Modified Gaussian Fit

I have a curve-fitting problem. I am measuring a system's response over a range of input values (both input and output values are scalar and real).
I change a particular parameter for the system between trials that results in different outputs for the same inputs.
This behavior is illustrated below:
I need to fit a model that essentially takes that parameter and x values and produce the observed y values as closely as possible.
I am trying to fit a Gaussian function as follows:
I am modeling $f_\alpha(c)$ as $a0 + a1*c$
I am modeling $f_\beta(c)$ as $b1 + b1*c$
Essentially make the peak and mean location shift as a function of the parameter c.
The problem is I am having convergence issues related to $f_\beta(c)$. If I just set the median to a constant I am able to estimate a0 and a1, but obviously the fit is poor.
I am using scipy.optimize.curve_fit
So my question is basically, is there a better way to tackle this problem? For e.g., a function that can model this better, better forms for $f_\alpha$; and $f_\beta$?
Sample data here: https://drive.google.com/file/d/1QPjrpxaDnnj3pmqzjZgjJJLKWnQjLikc
Sample code here: https://drive.google.com/open?id=135P9euXYAoa9CR3hrLOg6ja_6XfGndDN
(Dataset is slightly different than what I have shown in the figure in the posting)
Two questions:
1. Is there a methodical way to make the guesses such that I am guaranteed to converge?
2. Are there better functions that I could use to approximate my observations?
Thanks for any help in advance

What does the parameter "mds" mean in the pyLDAvis.sklearn.prepare () - function?

I want to visualize the topic modeling made with the LDA-algorithm. I use the python module called "pyldavis" and as environment the jupyter notebook.
import pyLDAvis.sklearn
...
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer, mds='mmds')
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer, mds='tsne')
It does work fine, but I don't really understand the mds-parameter... Even after reading the documentation:
mds :function or a string representation of function
A function that takes topic_term_dists as an input and outputs a n_topics by 2 distance matrix. The output approximates the distance between topics. See js_PCoA() for details on the default function. A string representation currently accepts pcoa (or upper case variant), mmds (or upper case variant) and tsne (or upper case variant), if sklearn package is installed for the latter two.
Does somebody know what the differences btw. mds='pcoa', mds='mmds', mds='tsne'?
Thanks!
Dimension reduction via Jensen-Shannon Divergence &
pcoa:Principal Coordinate Analysis(aka Classical Multidimensional Scaling)
mmds:Metric Multidimensional Scaling
tsne:t-distributed Stochastic Neighbor Embedding
Simply put: text data, when transformed into numeric tabular data, usually is high-dimensional. On the other hand, visualizations on a screen is two-dimensional (2D). Thus, a method of dimension reduction is required to bring the number of dimensions down to 2.
mds stands for multidimensional scaling. The possible values of that argument are:
mmds (Metric Multidimensional Scaling),
tsne (t-distributed Stochastic Neighbor Embedding), and
pcoa (Principal Coordinate Analysis),
All of them are dimension reduction methods.
Another method of dimension reduction that may be more familiar to you but not listed above is PCA (principal component analysis). They all share the similar idea of reducing dimensionality without losing too much information, backed by different theories and implementations.

Scipy: Comparison of different kernel density estimation method?

In python, there are several way of doing kernel density estimation, I want to know the diffenreces between them, and make a good choice.
They are:
scipy.stats.gaussian_kde,
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html
sklearn.neighbors.KernelDensity, http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity
statsmodel
http://statsmodels.sourceforge.net/stable/nonparametric.html#kernel-density-estimation
I think we can compare with 1d, 2d, bandwidth selection, Implementation and performance
I only have experience with sklearn.neighbors.KernelDensity. Here is what I know:
The speed is generally fast, and can be performed over multi-dimention, but does not have helper in deciding the bandwidth.
I looked over scipy.kde, it seems like have a bandwidth selection method.
It looks like the article Kernel Density Estimation in Python is precisely what you are looking for:
I'm going to focus here on comparing the actual implementations of KDE currently available in Python. (...) four KDE implementations I'm aware of in the SciPy/Scikits stack:
In SciPy: gaussian_kde.
In Statsmodels: KDEUnivariate and KDEMultivariate.
In Scikit-learn: KernelDensity.
Each has advantages and disadvantages, and each has its area of applicability.
The "sklearn way" of choosing model hyperparameters is grid-search, with cross-validation to choose the best values. Take a look at http://mark-kay.net/2013/12/24/kernel-density-estimation/ for an example of how to apply this to Kernel Density Estimation.

How can Latent Semantic Indexing be used for feature selection?

I am studying some machine-learning and I have come across, in several places, that Latent Semantic Indexing may be used for feature selection. Can someone please provide a brief, simplified explanation of how this is done? Ideally both theoretically and in commented code. How does it differ from Principal Component Analysis?
What language it is written in doesn't really worry me, just that I can understand both code and theory.
LSA is conceptually similar to PCA, but is used in different settings.
The goal of PCA is to transform data into new, possibly less-dimensional space. For example, if you wanted to recognize faces and use 640x480 pixel images (i.e. vectors in 307200-dimensional space), you would probably try to reduce this space to something reasonable to both - make it computationally simpler and make data less noisy. PCA does exactly this: it "rotates" axes of your high-dimensional space and assigns "weight" to each of new axes, so that you can throw away least important of them.
LSA, on other hand, is used to analyze semantic similarity of words. It can't handle images, or bank data, or some other custom dataset. It is designed specifically for text processing, and works specifically with term-document matrices. Such matrices, however, are often considered too large, so they are reduced to form lower-rank matrices in a way very similar to PCA (both of them use SVD). Feature selection, though, is not performed here. Instead, what you get is feature vector transformation. SVD provides you with some transformation matrix (let's call it S), which, being multiplied by input vector x gives new vector x' in a smaller space with more important basis.
This new basis is your new features. Though, they are not selected, but rather obtained by transforming old, larger basis.
For more details on LSA, as long as implementation tips, see this article.

GMM clustering algorithm with equal weight and shared diagonal covariance

I'm looking for a Gaussian mixture model clustering algorithm that would allow me to set equal component weights and shared diagonal covariances. I need to analyze a set of data and I don't have the time to try to write the code myself.
In python you can use scikit's GMM. It's easy to do, see the doc:
http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.mixture.GMM.html
Re your specific needs:
thegmm = GMM(cvtype='tied', params='mc')
thegmm.fit(mydata)
Meaning:
shared diagonal covariances: use covariance_type='tied' in the constructor
equal component weights: use params='mc' in the constructor (rather than the default 'wmc' which lets weights update).
Actually, I'm not sure if 'tied' implies diagonal covariances. It looks like you can choose 'tied' or 'diagonal' but not both, according to the doc. Anyone confirm?
Looks like the standard Matlab GMM tool will work, set the 'CovType' option to diagonal and the 'SharedCov' option to true

Categories

Resources