Python: How can Kernel PCA be performed via SVD?

Python: How can Kernel PCA be performed via SVD? - python

Since standard PCA can be performed via either eigenvalue decomposition (numpy.linalg.eig) or Singular Value Decomposition (numpy.linalg.svd) how can I perform numpy - based PCA via SVD? Say I have the kernel matrix K, via eigenvalue decomposition:
# Technically untested, modded from the kernel-PCA package source
# Sometimes eigenvalues are negative for some reason
eVals, eVecs = numpy.linalg.eig(K)
normed = np.array(sorted([
...: [val, vec/val] for val, vec in itertools.starmap(lambda e1, e2: (np.sqrt(e1),e1) , zip(eVals, eVecs.T))
...: ], key=lambda k:k[0],reverse=True))
What the SVD equivalent be? For standard PCA we SVD the input matrix X and eigen decompose the Covariance and Gram matrices to get equivalents to U, S, V.

Related

In which scenario would one use another matrix than the identity matrix for finding eigenvalues?

The scipy.linalg.eigh function can take two matrices as arguments: first the matrix a, of which we will find eigenvalues and eigenvectors, but also the matrix b, which is optional and chosen as the identity matrix in case it is left blank.
In what scenario would someone like to use this b matrix?
Some more context: I am trying to use xdawn covariances from the pyRiemann package. This uses the scipy.linalg.eigh function with a covariance matrix a and a baseline covariance matrix b. You can find the implementation here. This yields an error, as the b matrix in my case is not positive definitive and thus not useable in the scipy.linalg.eigh function. Removing this matrix and just using the identity matrix however solves this problem and yields relatively nice results... The problem is that I do not really understand what I changed, and maybe I am doing something I should not be doing.
This is the code from the pyRiemann package I am using (modified to avoid using functions defined in other parts of the package):
# X are samples (EEG data), y are labels
# shape of X is (1000, 64, 2459)
# shape of y is (1000,)
from scipy.linalg import eigh
Ne, Ns, Nt = X.shape
tmp = X.transpose((1, 2, 0))
b = np.matrix(sklearn.covariance.empirical_covariance(tmp.reshape(Ne, Ns * Nt).T))
for c in self.classes_:
# Prototyped response for each class
P = np.mean(X[y == c, :, :], axis=0)
# Covariance matrix of the prototyper response & signal
a = np.matrix(sklearn.covariance.empirical_covariance(P.T))
# Spatial filters
evals, evecs = eigh(a, b)
# and I am now using the following, disregarding the b matrix:
# evals, evecs = eigh(a)

If A and B were both symmetric matrices that doesn't necessarily have to imply that inv(A)*B must be a symmetric matrix. And so, if i had to solve a generalised eigenvalue problem of Ax=lambda Bx then i would use eig(A,B) rather than eig(inv(A)*B), so that the symmetry isn't lost.
One practical application is in finding the natural frequencies of a dynamic mechanical system from differential equations of the form M (d²x/dt²) = Kx where M is a positive definite matrix known as the mass matrix and K is the stiffness matrix, and x is displacement vector and d²x/dt² is acceleration vector which is the second derivative of the displacement vector. To find the natural frequencies, x can be substituted with x0 sin(ωt) where ω is the natural frequency. The equation reduces to Kx = ω²Mx. Now, one can use eig(inv(K)*M) but that might break the symmetry of the resultant matrix, and so I would use eig(K,M) instead.

A - lambda B x it means that x is not in the same basis as the covariance matrix.
If the matrix is not definite positive it means that there are vectors that can be flipped by your B.
I hope it was helpful.

Recovering transformation matrix from metric_learning LMNN algorithm

I am using the LMNN module from scikit-learn metric_learning (http://contrib.scikit-learn.org/metric-learn/index.html), and I am attempting to recover the linear transformation matrix (L.T) from the learned Mahalanobis (M) matrix.
The reason I am trying to recover this linear transformation is that I am fitting my dataset using cloud compute, but am testing it on a local machine. This means I can not save or recover the LMNN model after fitting on cloud compute, but I can save the learned M matrix and use a decomposition to find the learned linear transformation. I can then apply that learned linear transformation to my test sets on a local machine.
The problem is that I can't seem to reconcile the results from the LMNN module's built in transformation with the learned linear transformation from the decomposed M matrix. Here's an example:
import numpy as np
from metric_learn import LMNN
from sklearn.datasets import load_iris
iris_data = load_iris()
X = iris_data['data']
Y = iris_data['target']
lmnn = LMNN(k=5, learn_rate=1e-6)
X_transformed = lmnn.fit_transform(X, Y)
M_matrix = lmnn.get_mahalanobis_matrix()
array([[ 2.47937397, 0.36313715, -0.41243858, -0.78715282],
[ 0.36313715, 1.69818843, -0.90042673, -0.0740197 ],
[-0.41243858, -0.90042673, 2.37024271, 2.18292864],
[-0.78715282, -0.0740197 , 2.18292864, 2.9531315 ]])
# cholesky decomp of M_matrix
eigvalues, eigcolvectors = np.linalg.eig(M_matrix)
eigvalues_diag = np.diag(eigvalues)
eigvalues_diag_sqrt = np.sqrt(eigvalues_diag)
L = eigcolvectors.dot(eigvalues_diag_sqrt.dot(np.linalg.inv(eigcolvectors)))
L_transpose = np.transpose(L)
L_transpose.dot(L) # check to confirm that matches M_matrix
array([[ 2.47937397, 0.36313715, -0.41243858, -0.78715282],
[ 0.36313715, 1.69818843, -0.90042673, -0.0740197 ],
[-0.41243858, -0.90042673, 2.37024271, 2.18292864],
[-0.78715282, -0.0740197 , 2.18292864, 2.9531315 ]])
# test fit_transform() vs. transform() using LMNN functions
lmnn.transform(X[0:4, :])
array([[8.2487 , 4.41337015, 0.14988465, 0.52629361],
[7.87314906, 3.77220291, 0.36015873, 0.525688 ],
[7.59410008, 4.03369392, 0.17339877, 0.51350962],
[7.41676205, 3.82012155, 0.47312948, 0.68515535]])
X_transformed[0:4, :]
array([[8.2487 , 4.41337015, 0.14988465, 0.52629361],
[7.87314906, 3.77220291, 0.36015873, 0.525688 ],
[7.59410008, 4.03369392, 0.17339877, 0.51350962],
[7.41676205, 3.82012155, 0.47312948, 0.68515535]])
# test manual transform of X[0:4, :]
X[0:4, :].dot(L_transpose)
array([[8.22608756, 4.45271327, 0.24690081, 0.51206068],
[7.85071271, 3.81054846, 0.45442718, 0.51144826],
[7.57310259, 4.06981377, 0.26240745, 0.50067674],
[7.39356544, 3.85511015, 0.55776916, 0.67615584]])
As seen above, the first four rows of the original dataset X[0:4, :] when transformed by the LMNN module (using either fit_transform(X, Y) or transform(X[0:4, :]) give different results from the manual transformation.
I believe my decomposition of the M matrix is correct as I can replicate the M matrix using L.T.dot(L).
The learned linear transformation is L.T as per the github code: https://github.com/scikit-learn-contrib/metric-learn/blob/master/metric_learn/base_metric.py
class MetricTransformer(six.with_metaclass(ABCMeta)):
#abstractmethod
def transform(self, X):
"""Applies the metric transformation.
Parameters
----------
X : (n x d) matrix
Data to transform.
Returns
-------
transformed : (n x d) matrix
Input data transformed to the metric space by :math:`XL^{\\top}`
class MahalanobisMixin(six.with_metaclass(ABCMeta, BaseMetricLearner,
MetricTransformer)):
r"""Mahalanobis metric learning algorithms.
Algorithm that learns a Mahalanobis (pseudo) distance :math:`d_M(x, x')`,
defined between two column vectors :math:`x` and :math:`x'` by: :math:`d_M(x,
x') = \sqrt{(x-x')^T M (x-x')}`, where :math:`M` is a learned symmetric
positive semi-definite (PSD) matrix. The metric between points can then be
expressed as the euclidean distance between points embedded in a new space
through a linear transformation. Indeed, the above matrix can be decomposed
into the product of two transpose matrices (through SVD or Cholesky
decomposition): :math:`d_M(x, x')^2 = (x-x')^T M (x-x') = (x-x')^T L^T L
(x-x') = (L x - L x')^T (L x- L x')`
What am I missing here?
Thanks!

metric-learn contributor here, #BeginnersMindTruly you're right, for LMNN we indeed learn the L matrix directly during training, from which we compute M at the end, so computing back the transformation L from M may lead to numerical differences.
As for your particular use case of accessing directly the learned matrix L, you should be able to do that using the components_ attribute of your metric learner, at the end of training.

Per contributors from the team who built the module (http://contrib.scikit-learn.org/metric-learn/index.html), this discrepancy is due to floating point precision errors. The LMNN module first computes the linear transformation L.T then computes the M matrix by computing L.T.dot(L). So any attempt to recover the original transformation loses precision both during the computation of M and then the refactoring M.

Eigensystem of parametrized, hermitian matrix in python

Suppose we are interested in the eigenvalues and eigenvectors of a hermitian matrix h(t) that depends on a parameter t. My matrix is large and sparse and hence needs to be treated numerically.
A naive approach is to evaluate the matrix h(t_k) at discretized parameter values t_k. Is it possible to sort the eigenvectors and eigenvalues according to the "character of the eigenvector"?
Let me illustrate what I mean by "character of the eigenvector" with the following simple example (i denotes the imaginary unit).
h(t) = {{1, it}, {-it, 1}}
The eigenvalues are 1-t and 1+t with the corresponding eigenvectors {-i, 1} and {i, 1}. Hence sorting according to the "eigenvector character" the eigenvalues should cross at t = 0. However most eigensolvers sort them by increasing eigenvalues exchanging the eigenvectors from negative to positive t (see code and output plot).
import numpy as np
import scipy.sparse.linalg as sla
import matplotlib.pyplot as plt
def h(t):
# parametrized hermitian matrix
return np.array([[1, t*1j], [-t*1j, 1]])
def eigenvalues(t):
# convert to tuple for np.vectorize to work
return tuple(sla.eigsh(h(t), k=2, return_eigenvectors=False))
eigenvalues = np.vectorize(eigenvalues)
t = np.linspace(-1, 1, num=200)
ev0, ev1 = eigenvalues(t)
plt.plot(t, ev0, 'r')
plt.plot(t, ev1, 'g')
plt.xlabel('t')
plt.ylabel('eigenvalues')
plt.show()
Idea
Most eigensolvers iteratively approximate the eigenvectors and eigenvalues. By feeding the eigensystem of the matrix h(t_k) to the solver as an initial guess to diagonalize h(t_{k+1}) one might obtain a result ordered by the "character of the eigenvector".
Is it possible to achieve this with scipy or more generally with python? Preferentially the heavy diagonalization should be delegated to a dedicated compiled library (e.g. Lapack in scipy). Is there a suitable Lapack routine maybe already wrapped in scipy?
Is there an alternative method that achieves the same? How can it be implemented in python?

How to use Robust PCA output as principal-component (eigen)vectors from traditional PCA

I am using PCA to reduce the dimensionality of a N-dimensional dataset, but I want to build in robustness to large outliers, so I've been looking into Robust PCA codes.
For traditional PCA, I'm using python's sklearn.decomposition.PCA which nicely returns the principal components as vectors, onto which I can then project my data (to be clear, I've also coded my own versions using SVD so I know how the method works). I found a few pre-coded RPCA python codes out there (like https://github.com/dganguli/robust-pca and https://github.com/jkarnows/rpcaADMM).
The 1st code is based on the Candes et al. (2009) method, and returns low rank L and sparse S matrices for a dataset D. The 2nd code uses the ADMM method of matrix decomposition (Parikh, N., & Boyd, S. 2013) and returns X_1, X_2, X_3 matrices. I must admit, I'm having a very hard time figuring out how to connect these to the principal axes that are returned by a standard PCM algorithm. Can anyone provide any guidance?
Specifically, in one dataset X, I have a cloud of N 3-D points. I run it through PCA:
pca=sklean.decompose.PCA(n_components=3)
pca.fit(X)
comps=pca.components_
and these 3 components are 3-D vectors define the new basis onto which I project all my points. With Robust PCA, I get matrices L+S=X. Does one then run pca.fit(L)? I would have thought that RPCA would have given me back the eigenvectors but have internal steps to throw out outliers as part of building the covariance matrix or performing SVD.
Maybe what I think of as "Robust PCA" isn't how other people are using/coding it?

The robust-pca code factors the data matrix D into two matrices, L and S which are "low-rank" and "sparse" matrices (see the paper for details). L is what's mostly constant between the various observations, while S is what varies. Figures 2 and 3 in the paper give a really nice example from a couple of security cameras, picking out the static background (L) and variability such as passing people (S).
If you just want the eigenvectors, treat the S as junk (the "large outliers" you're wanting to clip out) and do an eigenanalysis on the L matrix.
Here's an example using the robust-pca code:
L, S = RPCA(data).fit()
rcomp, revals, revecs = pca(L)
print("Normalised robust eigenvalues: %s" % (revals/np.sum(revals),))
Here, the pca function is:
def pca(data, numComponents=None):
"""Principal Components Analysis
From: http://stackoverflow.com/a/13224592/834250
Parameters
----------
data : `numpy.ndarray`
numpy array of data to analyse
numComponents : `int`
number of principal components to use
Returns
-------
comps : `numpy.ndarray`
Principal components
evals : `numpy.ndarray`
Eigenvalues
evecs : `numpy.ndarray`
Eigenvectors
"""
m, n = data.shape
data -= data.mean(axis=0)
R = np.cov(data, rowvar=False)
# use 'eigh' rather than 'eig' since R is symmetric,
# the performance gain is substantial
evals, evecs = np.linalg.eigh(R)
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
evals = evals[idx]
if numComponents is not None:
evecs = evecs[:, :numComponents]
# carry out the transformation on the data using eigenvectors
# and return the re-scaled data, eigenvalues, and eigenvectors
return np.dot(evecs.T, data.T).T, evals, evecs

How to whiten matrix in PCA

I'm working with Python and I've implemented the PCA using this tutorial.
Everything works great, I got the Covariance I did a successful transform, brought it make to the original dimensions not problem.
But how do I perform whitening? I tried dividing the eigenvectors by the eigenvalues:
S, V = numpy.linalg.eig(cov)
V = V / S[:, numpy.newaxis]
and used V to transform the data but this led to weird data values.
Could someone please shred some light on this?

Here's a numpy implementation of some Matlab code for matrix whitening I got from here.
import numpy as np
def whiten(X,fudge=1E-18):
# the matrix X should be observations-by-components
# get the covariance matrix
Xcov = np.dot(X.T,X)
# eigenvalue decomposition of the covariance matrix
d, V = np.linalg.eigh(Xcov)
# a fudge factor can be used so that eigenvectors associated with
# small eigenvalues do not get overamplified.
D = np.diag(1. / np.sqrt(d+fudge))
# whitening matrix
W = np.dot(np.dot(V, D), V.T)
# multiply by the whitening matrix
X_white = np.dot(X, W)
return X_white, W
You can also whiten a matrix using SVD:
def svd_whiten(X):
U, s, Vt = np.linalg.svd(X, full_matrices=False)
# U and Vt are the singular matrices, and s contains the singular values.
# Since the rows of both U and Vt are orthonormal vectors, then U * Vt
# will be white
X_white = np.dot(U, Vt)
return X_white
The second way is a bit slower, but probably more numerically stable.

If you use python's scikit-learn library for this, you can just set the inbuilt parameter
from sklearn.decomposition import PCA
pca = PCA(whiten=True)
whitened = pca.fit_transform(X)
check the documentation.

I think you need to transpose V and take the square root of S. So the formula is
matrix_to_multiply_with_data = transpose( v ) * s^(-1/2 )

Use ZCA mapping instead
function [Xw] = whiten(X)
% Compute and apply the ZCA mapping
mu_X = mean(X, 1);
X = bsxfun(#minus, X, mu_X);
Xw = X / sqrtm(cov(X));
end

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.