How numpy.cov() function is implemented?

How numpy.cov() function is implemented? - python

I have my own implementation of the covariance function based on the equation:
'''
Calculate the covariance coefficient between two variables.
'''
import numpy as np
X = np.array([171, 184, 210, 198, 166, 167])
Y = np.array([78, 77, 98, 110, 80, 69])
# Expected value function.
def E(X, P):
expectedValue = 0
for i in np.arange(0, np.size(X)):
expectedValue += X[i] * (P[i] / np.size(X))
return expectedValue
# Covariance coefficient function.
def covariance(X, Y):
'''
Calculate the product of the multiplication for each pair of variables
values.
'''
XY = X * Y
# Calculate the expected values for each variable and for the XY.
EX = E(X, np.ones(np.size(X)))
EY = E(Y, np.ones(np.size(Y)))
EXY = E(XY, np.ones(np.size(XY)))
# Calculate the covariance coefficient.
return EXY - (EX * EY)
# Display matrix of the covariance coefficient values.
covMatrix = np.array([[covariance(X, X), covariance(X, Y)],
[covariance(Y, X), covariance(Y, Y)]])
print("My function:", covMatrix)
# Display standard numpy.cov() covariance coefficient matrix.
print("Numpy.cov() function:", np.cov([X, Y]))
But the problem is, that I'm getting different values from my function and from numpy.cov(), ie:
My function: [[ 273.88888889 190.61111111]
[ 190.61111111 197.88888889]]
Numpy.cov() function: [[ 328.66666667 228.73333333]
[ 228.73333333 237.46666667]]
Why is that? How is numpy.cov() function implemented? If the function numpy.cov() is well-implemented, what am I doing wrong? I'll just say, that results from my function covariance() are consistent with the results from paper examples in the internet for calculating the covariance coefficient, eg http://www.naukowiec.org/wzory/statystyka/kowariancja_11.html.

The numpy function has a different normalization to yours as a default setting. Try instead
>>> np.cov([X, Y], ddof=0)
array([[ 273.88888889, 190.61111111],
[ 190.61111111, 197.88888889]])
References:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html
http://en.wikipedia.org/wiki/Covariance#Calculating_the_sample_covariance

Related

Principal Component Analysis (PCA) in Python numpy using the Snapshot method

I am trying to implement PCA analysis using numpy to mimic the results from sklearn's decomposition.PCA classifier.
I am using as input vectors of N flattened images of fixed size M = 128x192 (image dimensions) joined horizontally into a single matrix D of dimensions MxN
I am aiming to use the Snapshot method, as other implementations (see here and here) crash my build while computing np.cov, since the size of the covariant matrix would be C = D(D^T) = MxM.
The snapshot method first computes C_acute = (D^T)D, then computes the (acute) eigenvectors and values of this NxN matrix. This gives eigenvectors that are (D^T)v, and eigenvalues that are the same.
To retrieve the eigenvectors v from the (acute) eigenvectors, we simply do v = (1/eigenvalue) * (D(v_acute)).
Here is the reference implementation I am using adapted from this SO post (which is known to work):
class TemplatePCA:
def __init__(self, n_components=None):
self.n_components = n_components
def fit_transform(self, X):
X -= np.mean(X, axis = 0)
R = np.cov(X, rowvar=False)
# calculate eigenvectors & eigenvalues of the covariance matrix
evals, evecs = np.linalg.eig(R)
# sort eigenvalue in decreasing order
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
# sort eigenvectors according to same index
evals = evals[idx]
# select the first n eigenvectors (n is desired dimension
# of rescaled data array, or dims_rescaled_data)
evecs = evecs[:, :self.n_components]
# carry out the transformation on the data using eigenvectors
# and return the re-scaled data
return -1 * np.dot(X, evecs) #
Here is the implementation I have so far.
class MyPCA:
def __init__(self, n_components=None):
self.n_components = n_components
def fit_transform(self, X):
X -= np.mean(X, axis = 0)
D = X.T
M, N = D.shape
D_T = X # D.T == (X.T).T == X
C_acute = np.dot(D_T, D)
eigen_values, eigen_vectors_acute = np.linalg.eig(C_acute)
eigen_vectors = []
for i in range(eigen_vectors_acute.shape[0]): # for each eigenvector
v = np.dot(D, eigen_vectors_acute[i]) / eigen_values[i]
eigen_vectors.append(v)
eigen_vectors = np.array(eigen_vectors)
# sort eigenvalues and eigenvectors in decreasing order
idx = np.argsort(eigen_values)[::-1]
eigen_vectors = eigen_vectors[:,idx]
eigen_values = eigen_values[idx]
# select the first n_components eigenvectors
eigen_vectors = eigen_vectors[:, :self.n_components]
# carry out the transformation on the data using eigenvectors
# return the re-scaled data (projection)
return np.dot(C_acute, eigen_vectors)
The reference text I am using notes that:
The eigenvector is now (D^T)v, so to do face detection we first multiply our test image vector by (D^T) before projecting onto the eigenimages.
I am not sure whether it is possible to retrieve the exact same principal components (i.e. eigenvectors) using this method, and it would seem impossible to even get the same eigenvectors back, since the size of the eigen_vectors_acute is only (4, 6) (meaning there are only 4 vectors), compared to the other method where it is (6, 6) (there are 6).
Running both on an input:
x = np.array([
[0.387,123, 789,256, 4878, 5.42],
[0.723,9.78,1.90,1234, 12104,5.25],
[1,123, 67.98,7.91,12756,5.52],
[1.524,1.34,23.456,1.23,6787,3.94],
])
# These two are the same
print(sklearn.decomposition.PCA(n_components=3).fit_transform(x))
print(TemplatePCA(n_components=3).fit_transform(x))
# This one is different
print(MyPCA(n_components=3).fit_transform(x))
Output:
[[ 4282.20163145 147.84415964 -267.73483211]
[-3025.62452358 683.58580386 67.76941319]
[-3599.15380006 -569.33984612 -148.62757658]
[ 2342.57669218 -262.09011737 348.5929955 ]]
[[-4282.20163145 -147.84415964 267.73483211]
[ 3025.62452358 -683.58580386 -67.76941319]
[ 3599.15380006 569.33984612 148.62757658]
[-2342.57669218 262.09011737 -348.5929955 ]]
[[ 3.35535639e+15, -5.70493660e+17, -8.57482740e+17],
[-2.45510474e+15, 4.17428591e+17, 6.27417685e+17],
[-2.82475918e+15, 4.80278997e+17, 7.21885236e+17],
[ 1.92450753e+15, -3.27213928e+17, -4.91820181e+17]]

Why does my sigmoid function return values not in the interval ]0,1[?

I am implementing logistic regression in Python with numpy. I have generated the following data set:
# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000
# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000
# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)
X = np.concatenate((r0,r1))
Now I have implemented the sigmoid function with the aid of the following methods:
def logistic_function(x):
""" Applies the logistic function to x, element-wise. """
return 1.0 / (1 + np.exp(-x))
def logistic_hypothesis(theta):
return lambda x : logistic_function(np.dot(generateNewX(x), theta.T))
def generateNewX(x):
x = np.insert(x, 0, 1, axis=1)
return x
After applying logistic regression, I found out that the best thetas are:
best_thetas = [-0.9673200946417307, -1.955812236119612, -5.060885703369424]
However, when I apply the logistic function with these thetas, then the output is numbers that are not inside the interval [0,1]
Example:
data = logistic_hypothesis(np.asarray(best_thetas))(X)
print(data
This gives the following result:
[2.67871968e-11 3.19858822e-09 3.77845881e-09 ... 5.61325410e-03
2.19767618e-01 6.23288747e-01]
Can someone help me understand what has gone wrong with my implementation? I cannot understand why I am getting such big values. Isnt the sigmoid function supposed to only give results in the [0,1] interval?

It does, it's just in scientific notation.
'e' Exponent notation. Prints the number in scientific notation using
the letter ‘e’ to indicate the exponent.
>>> a = [2.67871968e-11, 3.19858822e-09, 3.77845881e-09, 5.61325410e-03]
>>> [0 <= i <= 1 for i in a]
[True, True, True, True]

Improve performance of autograd jacobian

I'm wondering how the following code could be faster. At the moment, it seems unreasonably slow, and I suspect I may be using the autograd API wrong. The output I expect is each element of timeline evaluated at the jacobian of f, which I do get, but it takes a long time:
import numpy as np
from autograd import jacobian
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]))
I would expect the following:
jacobian(f) returns an function that represents the gradient vector w.r.t. the parameters.
jacobian(f)(np.array([1.0, 1.0])) is the Jacobian evaluated at the point (1, 1). To me, this should be like a vectorized numpy function, so it should execute very fast, even for 40k length arrays. However, this is not what is happening.
Even something like the following has the same poor performance:
import numpy as np
from autograd import jacobian
def f(params, t):
mu_, log_sigma_ = params
Z = t * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]), timeline)

From https://github.com/HIPS/autograd/issues/439 I gathered that there is an undocumented function autograd.make_jvp which calculates the jacobian with a fast forward mode.
The link states:
Given a function f, vectors x and v in the domain of f, make_jvp(f)(x)(v) computes both f(x) and the Jacobian of f evaluated at x, right multiplied by the vector v.
To get the full Jacobian of f you just need to write a loop to evaluate make_jvp(f)(x)(v) for each v in the standard basis of f's domain. Our reverse mode Jacobian operator works in the same way.
From your example:
import autograd.numpy as np
from autograd import make_jvp
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates (f(0), first column of jacobian)
# [0, 1] evaluates (f(0), second column of jacobian)
for basis in (np.array([1, 0]), np.array([0, 1])):
val_of_f, col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[ 1. 1.00247506 1.00495012 ... 99.99504988 99.99752494
100. ]
[ -1. -1.00247506 -1.00495012 ... -99.99504988 -99.99752494
-100. ]
This runs in ~ 0.005 seconds on google collab.
Edit:
Functions like cdf aren't defined for the regular jvp yet but you can use another undocumented function make_jvp_reversemode where it is defined. Usage is similar except that the output is only the column and not the value of the function:
import autograd.numpy as np
from autograd.scipy.stats.norm import cdf
from autograd.differential_operators import make_jvp_reversemode
def f(params):
mu_, log_sigma_ = params
Z = timeline * cdf(mu_ / log_sigma_)
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp_reversemode(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates first column of jacobian
# [0, 1] evaluates second column of jacobian
for basis in (np.array([1, 0]), np.array([0, 1])):
col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[0.05399097 0.0541246 0.05425823 ... 5.39882939 5.39896302 5.39909665]
[-0.05399097 -0.0541246 -0.05425823 ... -5.39882939 -5.39896302 -5.39909665]
Note that make_jvp_reversemode will be slightly faster than make_jvp by a constant factor due to it's use of caching.

Numerical computation of softmax cross entropy gradient

I implemented the softmax() function, softmax_crossentropy() and the derivative of softmax cross entropy: grad_softmax_crossentropy(). Now I wanted to compute the derivative of the softmax cross entropy function numerically. I tried to do this by using the finite difference method but the function returns only zeros. Here is my code with some random data:
import numpy as np
batch_size = 3
classes = 10
# random preactivations
a = np.random.randint(1,100,(batch_size,classes))
# random labels
y = np.random.randint(0,np.size(a,axis=1),(batch_size,1))
def softmax(a):
epowa = np.exp(a-np.max(a,axis=1,keepdims=True))
return epowa/np.sum(epowa,axis=1,keepdims=True)
print(softmax(a))
def softmax_crossentropy(a, y):
y_one_hot = np.eye(classes)[y[:,0]]
return -np.sum(y_one_hot*np.log(softmax(a)),axis=1)
print(softmax_crossentropy(a, y))
def grad_softmax_crossentropy(a, y):
y_one_hot = np.eye(classes)[y[:,0]]
return softmax(a) - y_one_hot
print(grad_softmax_crossentropy(a, y))
# Finite difference approach to compute grad_softmax_crossentropy()
eps = 1e-5
print((softmax_crossentropy(a+eps,y)-softmax_crossentropy(a,y))/eps)
What did I wrong?

Here's how you could do it. I think you're referring to the gradient wrt the activations indicated by y's indicator matrix.
First, I instantiate a as float to change individual items.
a = np.random.randint(1,100,(batch_size,classes)).astype("float")
Then,
np.diag(grad_softmax_crossentropy(a, y)[:, y.flatten()])
array([ -1.00000000e+00, -1.00000000e+00, -4.28339542e-04])
But also
b = a.copy()
for i, o in zip(y.max(axis=1), range(y.shape[0])):
b[o, i] += eps
(softmax_crossentropy(b,y)-softmax_crossentropy(a,y))/eps
[ -1.00000000e+00 -1.00000000e+00 -4.28125536e-04]
So basically you have to change a_i in softmax, not the entirety of a.

Python PCA - projection into lower dimensional space

i am trying to implement PCA, which worked well regarding the intermediate results such as eigenvalues and eigenvectors. Yet when i try to project the data (3 dimensional) into the a 2D-principal-component space, the result is wrong.
I spent a lot of time comparing my code to other implementations such as:
http://sebastianraschka.com/Articles/2014_pca_step_by_step.html
Yet after a long time there is no progress and I can not find the mistake. I assume the problem is a simple coding mistake due to the correct intermediate results.
Thanks in advance for anyone who actually read this question and thanks even more to those who give helpful comments/answers.
My code is as follows:
import numpy as np
class PCA():
def __init__(self, X):
#center the data
X = X - X.mean(axis=0)
#calculate covariance matrix based on X where data points are represented in rows
C = np.cov(X, rowvar=False)
#get eigenvectors and eigenvalues
d,u = np.linalg.eigh(C)
#sort both eigenvectors and eigenvalues descending regarding the eigenvalue
#the output of np.linalg.eigh is sorted ascending, therefore both are turned around to reach a descending order
self.U = np.asarray(u).T[::-1]
self.D = d[::-1]
**problem starts here**
def project(self, X, m):
#use the top m eigenvectors with the highest eigenvalues for the transformation matrix
Z = np.dot(X,np.asmatrix(self.U[:m]).T)
return Z
The result of my code is:
myresult
([[ 0.03463706, -2.65447128],
[-1.52656731, 0.20025725],
[-3.82672364, 0.88865609],
[ 2.22969475, 0.05126909],
[-1.56296316, -2.22932369],
[ 1.59059825, 0.63988429],
[ 0.62786254, -0.61449831],
[ 0.59657118, 0.51004927]])
correct result - such as by sklearn.PCA
([[ 0.26424835, -2.25344912],
[-1.29695602, 0.60127941],
[-3.59711235, 1.28967825],
[ 2.45930604, 0.45229125],
[-1.33335186, -1.82830153],
[ 1.82020954, 1.04090645],
[ 0.85747383, -0.21347615],
[ 0.82618248, 0.91107143]])
The input is defined as follows:
X = np.array([
[-2.133268233289599,0.903819474847349,2.217823388231679,-0.444779660856219,-0.661480010318842,-0.163814281248453,-0.608167714051449, 0.949391996219125],
[-1.273486742804804,-1.270450725314960,-2.873297536940942, 1.819616794091556,-2.617784834189455, 1.706200163080549,0.196983250752276,0.501491995499840],
[-0.935406638147949,0.298594472836292,1.520579082270122,-1.390457671168661,-1.180253547776717,-0.194988736923602,-0.645052874385757,-1.400566775105519]]).T

You need to center your data by subtracting the mean before you project it onto the new basis:
mu = X.mean(0)
C = np.cov(X - mu, rowvar=False)
d, u = np.linalg.eigh(C)
U = u.T[::-1]
Z = np.dot(X - mu, U[:2].T)
print(Z)
# [[ 0.26424835 -2.25344912]
# [-1.29695602 0.60127941]
# [-3.59711235 1.28967825]
# [ 2.45930604 0.45229125]
# [-1.33335186 -1.82830153]
# [ 1.82020954 1.04090645]
# [ 0.85747383 -0.21347615]
# [ 0.82618248 0.91107143]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How numpy.cov() function is implemented? - python

Related

Principal Component Analysis (PCA) in Python numpy using the Snapshot method

Why does my sigmoid function return values not in the interval ]0,1[?

Improve performance of autograd jacobian

Numerical computation of softmax cross entropy gradient

Python PCA - projection into lower dimensional space

Categories

Resources