Eig in Python giving different Eigenvalues? - python

So essentially what the problem is the eig function in Matlab and Python are giving me different things. I am reproducing data from a paper in order to confirm my numerical method is correct (So I know the answers- have them via Matlab)
I have tried eigh, still no improvement.
Below is the data matrix used:
2852 170.380000000000 77.3190000000000 -51.0710000000000 -191.560000000000 105.410000000000 240.950000000000 102.700000000000
2842 169.640000000000 76.6120000000000 -50.3980000000000 -191.310000000000 105.660000000000 240.850000000000 102.960000000000
2838.80000000000 176.950000000000 80.4150000000000 -51.5700000000000 -192.190000000000 104.870000000000 239.700000000000 104.110000000000
2837.40000000000 182.930000000000 88.4070000000000 -54.1410000000000 -194.460000000000 104.230000000000 238.760000000000 105.020000000000
2890.80000000000 167.270000000000 122 -67.7490000000000 -275.150000000000 160.960000000000 248.010000000000 95.9470000000000
2962.10000000000 113.910000000000 177.060000000000 -98.9930000000000 -259.270000000000 80.7860000000000 262.890000000000 80.9180000000000
3013.90000000000 72.9740000000000 225.260000000000 -135.700000000000 -233.520000000000 0.0469300000000000 272.110000000000 71.5160000000000
3026.50000000000 112.420000000000 243.020000000000 -169.460000000000 -218.060000000000 0.0465190000000000 271.250000000000 71.8280000000000
3367.10000000000 -0.310680000000000 479.870000000000 0.494350000000000 -0.603940000000000 -0.147820000000000 282.700000000000 -64.1680000000000
import scipy.io as sc
import math as m
import numpy as np
from numpy import diag, power
from scipy.linalg import expm, sinm, cosm
import matplotlib.pyplot as plt
import pandas as pd
###########################. Import Data from Excel Sheet.
###################################
df = pd.read_excel('DataCompanionMatrix.xlsx', header=None)
data = np.array(df)
###########################. FUNCTION DEFINE.
#################################################
m = data.shape[0]
n = data.shape[1]
x = data[0:-1,:]
y = data[-1,:]
A = np.dot(x,np.transpose(x))
xx = np.dot(x,np.transpose(y))
Co_values = np.dot(np.linalg.pinv(A),xx)
C = np.zeros((n,n))
for i in range(0,n-1):
C[i,i-1] = 1
C[:,n-1] = Co_values
eigV,eigW = np.linalg.eig(C)
print(eigV)
The data is a 9x8 matrix, x is a 8x8 matrix, y is a 1x8 array, A is 8x8, C is 8x8, co is 1x8 array.
In Matlab the eigenvalues are an 1x8 array of complex eigenvalues. In Python, I get 1x8 array filled with 7 zeros and 1 integer.
I expect to plot the eigenvalues and they should sit on the unit circle, this I've done on Matlab.
C matrix- matlab and python (both look like this)
Python eigenvalues
Matlab eigenvalues

The array C you create in Python does not correspond to the one you have in MATLAB.
If I modify your Python code as follows, I get the same array C and the same eigenvalues:
C = np.zeros((n,n))
for i in range(0,n-1):
C[i+1,i] = 1 # This is where the differences are!
C[:,n-1] = Co_values

Related

Similar matrix computation using numpy

I am trying to find a similar matrix B to a 3 X 3 matrix :A using a random invertible matrix P .
B = P_inv.A.P
import numpy as np
from scipy import linalg as LA
from numpy.linalg import inv
A = np.random.randint(1,10,9).reshape(3,3)
P = np.random.randn(3,3)
P_inv = inv(P)
eig1 = LA.eigvalsh(A)
eig1 = np.sort(eig1)
B1 = P_inv.dot(A)
B = B1.dot(P)
eig2 = LA.eigvalsh(B)
eig2 = np.sort(eig2)
print(np.round(eig1 ,3))
print(np.round(eig2,3))
However ,I ntoice that eig1 & eig2 are never equal.
What am I missing, or is it a numerical error ?
Thanks
Kedar
You're using eigvalsh, which requires that the matrix be real symmetric (or complex Hermitian), which your randomly generated matrix is not.
Deleting the h and using eigvals instead fixes this.

How to find Eigenspace of a matrix using python

I have a matrix which is I found its Eigenvalues and EigenVectors, but now I want to solve for eigenspace, which is Find a basis for each of the corresponding eigenspaces! and don't know how to start! by finding the null space from scipy or solve for reef(), I tried but didn't work! please help!
this is the code I am using
# import packages
import numpy as np
from numpy import linalg as LA
from scipy.linalg import null_space
# define matrix and vector
M = np.array([[0.82, 0.1],[0.18,0.9]])
v0 = np.array([[15000],[800]])
eigenVal, eigenVec = LA.eig(M)
print(eigenVal)
# Based on the Characteristic polynomial formula
#pol_formula =(A- \lambda I)\mathbf{v} = 0\)
identity = np.identity(2, dtype=float)
lamdbdaI= eigenVal*identity
## Apply the Characteristic polynomial formula using ###M matrix
char_poly = M-lamdbdaI
print(char_poly)
Here I am stuck !
The np.linalg.eig functions already returns the eigenvectors, which are exactly the basis vectors for your eigenspaces. More precisely:
v1 = eigenVec[:,0]
v2 = eigenVec[:,1]
span the corresponding eigenspaces for eigenvalues lambda1 = eigenVal[0] and lambda2 = eigenvVal[1].

Principal component analysis dimension reduction in python

I have to implement my own PCA function function Y,V = PCA(data, M, whitening) that computes the first M principal
components and transforms the data, so that y_n = U^T x_n. The function should further
return V that explains the amount of variance that is explained by the transformation.
I have to reduce the dimension of data D=4 to M=2 > given function below <
def PCA(data,nr_dimensions=None, whitening=False):
""" perform PCA and reduce the dimension of the data (D) to nr_dimensions
Input:
data... samples, nr_samples x D
nr_dimensions... dimension after the transformation, scalar
whitening... False -> standard PCA, True -> PCA with whitening
Returns:
transformed data... nr_samples x nr_dimensions
variance_explained... amount of variance explained by the the first nr_dimensions principal components, scalar"""
if nr_dimensions is not None:
dim = nr_dimensions
else:
dim = 2
what I have done is the following:
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import multivariate_normal
import pdb
import sklearn
from sklearn import datasets
#covariance matrix
mean_vec = np.mean(data)
cov_mat = (data - mean_vec).T.dot((data - mean_vec)) / (data.shape[0] - 1)
print('Covariance matrix \n%s' % cov_mat)
#now the eigendecomposition of the cov matrix
cov_mat = np.cov(data.T)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
print('Eigenvectors \n%s' % eig_vecs)
print('\nEigenvalues \n%s' % eig_vals)
# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
This is the point where I don't know what to do now and how to reduce dimension.
Any help would be welcome! :)
Here is a simple example for the case where the initial matrix A that contains the samples and features has shape=[samples, features]
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column since I assume that it's column is a variable/feature
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
PCA is actually the same as singular value decomposition, so you can either use numpy.linalg.svd:
import numpy as np
def PCA(U,ndim,whitening=False):
L,G,R=np.linalg.svd(U,full_matrices=False)
if not whitening:
L=L # G
Y=L[:,:ndim] # R[:,:ndim].T
return Y,G[:ndim]
If you want to use the eigenvalue problem, then assuming that the number of samples is higher than the number of features (or your data would be underfit), it is inefficient to calculate the spatial correlations (left eigenvectors) directly. Instead, using SVD use the right eigenfunctions:
def PCA(U,ndim,whitening=False):
K=U.T # U # Calculating right eigenvectors
G,R=np.linalg.eigh(K)
G=G[:,::-1]
R=R[::-1]
L=U # R # reconstructing left ones
nrm=np.linalg.norm(L,axis=0,keepdims=True) #normalizing them
L/=nrm
if not whitening:
L=L # G
Y=L[:,:ndim] # R[:,:ndim].T
return Y,G[:ndim]

creating a large pdf matrix efficiently

I have a dataset of 60,000 examples of the form:
mu1 mu2 std1 std2
0 -0.745 0.729 0.0127 0.0149
1 -0.711 0.332 0.1240 0.0433
...
They are essentially parameters of 2-dimensional normal distributions. What I want to do is create a (NxN) matrix P such that P_ij = Normal( mu_i | mean=mu_j, cov=diagonal(std_j)), where mu_i is (mu1, mu2) for data 'i'.
I can do this with the following code for example:
from scipy import stats
import numpy as np
mu_all = data[['mu1', 'mu2']]
std_all = data[['std1', 'std2']]
P = []
for i in range(len(data)):
mu_i = mu_all[i,:]
std_i = std_all[i,:]
prob_i = stats.multivariate_normal.pdf(mu_all, mean=mu_i, cov=np.diag(std_i))
P.append(prob_i)
P = np.array(P).T
But this is too expensive (my machine freezes). How can I do this more efficiently? My guess is that scipy cannot handle computing pdf of 60000 at once. Is there an alternative?
Just realized creating a matrix of that size (60,0000 x 60,000) cannot be handled in python:
Very large matrices using Python and NumPy
So I don't think this can be done

Is there a Python equivalent to the mahalanobis() function in R? If not, how can I implement it?

I have the following code in R that calculates the mahalanobis distance on the Iris dataset and returns a numeric vector with 150 values, one for every observation in the dataset.
x=read.csv("Iris Data.csv")
mean<-colMeans(x)
Sx<-cov(x)
D2<-mahalanobis(x,mean,Sx)
I tried to implement the same in Python using 'scipy.spatial.distance.mahalanobis(u, v, VI)' function, but it seems this function takes only one-dimensional arrays as parameters.
I used the Iris dataset from R, I suppose it is the same you are using.
First, these is my R benchmark, for comparison:
x <- read.csv("IrisData.csv")
x <- x[,c(2,3,4,5)]
mean<-colMeans(x)
Sx<-cov(x)
D2<-mahalanobis(x,mean,Sx)
Then, in python you can use:
from scipy.spatial.distance import mahalanobis
import scipy as sp
import pandas as pd
x = pd.read_csv('IrisData.csv')
x = x.ix[:,1:]
Sx = x.cov().values
Sx = sp.linalg.inv(Sx)
mean = x.mean().values
def mahalanobisR(X,meanCol,IC):
m = []
for i in range(X.shape[0]):
m.append(mahalanobis(X.iloc[i,:],meanCol,IC) ** 2)
return(m)
mR = mahalanobisR(x,mean,Sx)
I defined a function so you can use it in other sets, (observe I use pandas DataFrames as inputs)
Comparing results:
In R
> D2[c(1,2,3,4,5)]
[1] 2.134468 2.849119 2.081339 2.452382 2.462155
In Python:
In [43]: mR[0:5]
Out[45]:
[2.1344679233248431,
2.8491186861585733,
2.0813386639577991,
2.4523816316796712,
2.4621545347140477]
Just be careful that what you get in R is the squared Mahalanobis distance.
A simpler solution would be:
from scipy.spatial.distance import cdist
x = ...
mean = x.mean(axis=0).reshape(1, -1) # make sure 2D
vi = np.linalg.inv(np.cov(x.T))
cdist(mean, x, 'mahalanobis', VI=vi)

Categories

Resources