how to use varimax rotation in python? - python

I have a varimax rotation code from wikipedia
def varimax(Phi, gamma = 1, q = 20, tol = 1e-6):
from numpy import eye, asarray, dot, sum, diag
from numpy.linalg import svd
p,k = Phi.shape
R = eye(k)
d=0
for i in xrange(q):
d_old = d
Lambda = dot(Phi, R)
u,s,vh = svd(dot(Phi.T,asarray(Lambda)**3 - (gamma/p) * dot(Lambda, diag(diag(dot(Lambda.T,Lambda))))))
R = dot(u,vh)
d = sum(s)
if d/d_old < tol: break
return dot(Phi, R)
and I use it this way:
varimax(X) ## X is a numpy array
but it returns numbers like this: 2.4243244e-15 !! that's not my expected answer
should I change other arguments? for example gamma or q??
I'm not familiar with varimax rotation

Can you post an example of what you're using as the inputs for X and what kind of outputs you're expecting?
I tested your code by fixing up the indenting in your code, like this:
from numpy import eye, asarray, dot, sum, diag
from numpy.linalg import svd
def varimax(Phi, gamma = 1, q = 20, tol = 1e-6):
p,k = Phi.shape
R = eye(k)
d=0
for i in xrange(q):
d_old = d
Lambda = dot(Phi, R)
u,s,vh = svd(dot(Phi.T,asarray(Lambda)**3 - (gamma/p) * dot(Lambda, diag(diag(dot(Lambda.T,Lambda))))))
R = dot(u,vh)
d = sum(s)
if d/d_old < tol: break
return dot(Phi, R)
And making some dummy components to test it like this:
import numpy as np
comps = np.linalg.svd(
np.random.randn(100,10),
full_matrices=False
)[0]
rot_comps = varimax(comps)
print("Original components dimension {}".format(comps.shape))
print("Component norms")
print(np.sum(comps**2, axis=0))
print("Rotated components dimension {}".format(rot_comps.shape))
print("Rotated component norms")
print(np.sum(rot_comps**2, axis=0))
The inputs and outputs are 100 x 10 arrays with unit norm, just as you'd expect.

Related

How to compute the derivative graph of a Python plot

I am working on the following code, which solves a system of coupled differential equations. I have been able to solve them, and I plotted one of them. I am curious how to compute and plot the derivative of this graph numerically (I know the derivative is given in the first function, but suppose I didn't have that). I was thinking that I could use a for-loop, but is there a faster way?
import numpy as np
from scipy.integrate import odeint
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
import math
def hiv(x,t):
kr1 = 1e5
kr2 = 0.1
kr3 = 2e-7
kr4 = 0.5
kr5 = 5
kr6 = 100
h = x[0] # Healthy Cells -- function of time
i= x[1] #Infected Cells -- function of time
v = x[2] # Virus -- function of time
p = kr3 * h * v
dhdt = kr1 - kr2*h - p
didt = p - kr4*i
dvdt = -p -kr5*v + kr6*i
return [dhdt, didt, dvdt]
print(hiv([1e6, 0, 100], 0))
x0 = [1e6, 0, 100] #initial conditions
t = np.linspace(0,15,1000) #time in years
x = odeint(hiv, x0, t) #vector of the functions H(t), I(t), V(t)
h = x[:,0]
i = x[:,1]
v = x[:,2]
plt.semilogy(t,h)
plt.show()

more accuracy with sicpy interp1d

I am trying to implement a non parametric estimation of the KL divergence shown in this paper
Here is my code:
import numpy as np
import math
import itertools
import random
from scipy.interpolate import interp1d
def log(x):
if x > 0: return math.log(x)
else: return 0
g = lambda x, inp,N : sum(0.5 + 0.5 * np.sign(x-inp))/N
def ecdf(x,N):
out = [g(i,x,N) for i in x]
fun = interp1d(x, out, kind='linear', bounds_error = False, fill_value = (0,1))
return fun
def KL_est(x,y):
ex = min(np.diff(sorted(np.unique(x))))
ey = min(np.diff(sorted(np.unique(y))))
e = min(ex,ey) * 0.9
N = len(x)
x.sort()
y.sort()
P = ecdf(x,N)
Q = ecdf(y,N)
KL = sum(log(v) for v in ((P(x)-P(x-e))/(Q(x)-Q(x-e))) ) / N
return KL
My trouble is with scipy interp1d. I am using the function returned from interp1d to find the value of new inputs. The problem is, some of the input values are very close (10^-5 apart) and the function returns the same value for both. In my code above, Q(x) - Q(x-e) leads to a divide by zero error.
Here is some test code that reproduces the problem:
x = np.random.normal(0, 1, 10)
y = np.random.normal(0, 1, 10)
ex = min(np.diff(sorted(np.unique(x))))
ey = min(np.diff(sorted(np.unique(y))))
e = min(ex,ey) * 0.9
N = len(x)
x.sort()
y.sort()
P = ecdf(x,N)
Q = ecdf(y,N)
KL = sum(log(v) for v in ((P(x)-P(x-e))/(Q(x)-Q(x-e))) ) / N
How would I go about getting a more accurate interpolation?
As e gets small you are effectively trying to compute the ratio of derivatives of P and Q numerically. As you are finding, you run out of precision really quickly in floating point doing it this way.
An alternate approach would be to use an interpolation function that can return derivatives directly. For example, you could try scipy.interpolate.InterpolatedUnivariateSpline. You were saying kind='linear' to interp1d, so the equivalent is k=1. Once you construct it, the spline has method derivatives() that gives you all the derivatives at different points. For small values of e you could switch to using the derivative.

Eigenstates with specific eigenvalue in python

I have a very large matrix, but I only want to find the eigenvectors (more than 1) with one specific eigenvalue. How can I get this without solving the whole eigenvalues and eigenvectors of this matrix in python?
One option could be perhaps to use shift-invert method. The method eigs in scipy has an optional parameter sigma using which it is possible to specify the value close to which it should search for eigenvalues:
import numpy as np
from scipy.sparse.linalg import eigs
np.random.seed(42)
N = 10
A = np.random.random_sample((N, N))
A += A.T
A += N*np.identity(N)
#get N//2 largest eigenvalues
l,_ = eigs(A, N//2)
print(l)
#get 2 eigenvalues closest in magnitude to 12
l,_ = eigs(A, 2, sigma = 12)
print(l)
This produces:
[ 19.52479260+0.j 12.28842653+0.j 11.43948696+0.j 10.89132148+0.j
10.79397596+0.j]
[ 12.28842653+0.j 11.43948696+0.j]
EDIT:
In case you know the eigenvalues in advance, then you could try to calculate the basis of the corresponding nullspace. For example:
import numpy as np
from numpy.linalg import eig, svd, norm
from scipy.sparse.linalg import eigs
from scipy.linalg import orth
def nullspace(A, atol=1e-13, rtol=0):
A = np.atleast_2d(A)
u, s, vh = svd(A)
tol = max(atol, rtol * s[0])
nnz = (s >= tol).sum()
ns = vh[nnz:].conj().T
return ns
np.random.seed(42)
eigen_values = [1,2,3,3,4,5]
N = len(eigen_values)
D = np.matrix(np.diag(eigen_values))
#generate random unitary matrix
U = np.matrix(orth(np.random.random_sample((N, N))))
#construct test matrix - it has the same eigenvalues as D
A = U.T * D * U
#get eigenvectors corresponding to eigenvalue 3
Omega = nullspace(A - np.eye(N)*3)
_,M = Omega.shape
for i in range(0, M):
v = Omega[:,i]
print(i, norm(A*v - 3*v))

Density of multivariate t distribution in Python for large number of observations

I am trying to evaluate the density of multivariate t distribution of a 13-d vector. Using the dmvt function from the mvtnorm package in R, the result I get is
[1] 1.009831e-13
When i tried to write the function by myself in Python (thanks to the suggestions in this post:
multivariate student t-distribution with python), I realized that the gamma function was taking very high values (given the fact that I have n=7512 observations), making my function going out of range.
I tried to modify the algorithm, using the math.lgamma() and np.linalg.slogdet() functions to transform it to the log scale, but the result I got was
8.97669876e-15
This is the function that I used in python is the following:
def dmvt(x,mu,Sigma,df,d):
'''
Multivariate t-student density:
output:
the density of the given element
input:
x = parameter (d dimensional numpy array or scalar)
mu = mean (d dimensional numpy array or scalar)
Sigma = scale matrix (dxd numpy array)
df = degrees of freedom
d: dimension
'''
Num = math.lgamma( 1. *(d+df)/2 ) - math.lgamma( 1.*df/2 )
(sign, logdet) = np.linalg.slogdet(Sigma)
Denom =1/2*logdet + d/2*( np.log(pi)+np.log(df) ) + 1.*( (d+df)/2 )*np.log(1 + (1./df)*np.dot(np.dot((x - mu),np.linalg.inv(Sigma)), (x - mu)))
d = 1. * (Num - Denom)
return np.exp(d)
Any ideas why this functions does not produce the same results as the R equivalent?
Using as x = (0,0) produces similar results (up to a point, die to rounding) but with x = (1,1)1 I get a significant difference!
I finally managed to 'translate' the code from the mvtnorm package in R and the following script works without numerical underflows.
import numpy as np
import scipy.stats
import math
from math import lgamma
from numpy import matrix
from numpy import linalg
from numpy.linalg import slogdet
import scipy.special
from scipy.special import gammaln
mu = np.array([3,3])
x = np.array([1, 1])
Sigma = np.array([[1, 0], [0, 1]])
p=2
df=1
def dmvt(x, mu, Sigma, df, log):
'''
Multivariate t-student density. Returns the density
of the function at points specified by x.
input:
x = parameter (n x d numpy array)
mu = mean (d dimensional numpy array)
Sigma = scale matrix (d x d numpy array)
df = degrees of freedom
log = log scale or not
'''
p = Sigma.shape[0] # Dimensionality
dec = np.linalg.cholesky(Sigma)
R_x_m = np.linalg.solve(dec,np.matrix.transpose(x)-mu)
rss = np.power(R_x_m,2).sum(axis=0)
logretval = lgamma(1.0*(p + df)/2) - (lgamma(1.0*df/2) + np.sum(np.log(dec.diagonal())) \
+ p/2 * np.log(math.pi * df)) - 0.5 * (df + p) * math.log1p((rss/df) )
if log == False:
return(np.exp(logretval))
else:
return(logretval)
print(dmvt(x,mu,Sigma,df,True))
print(dmvt(x,mu,Sigma,df,False))

Inverse of numpy.dot

I can easily calculate something like:
R = numpy.column_stack([A,np.ones(len(A))])
M = numpy.dot(R,[k,m0])
where A is a simple array and k,m0 are known values.
I want something different. Having fixed R, M and k, I need to obtain m0.
Is there a way to calculate this by an inverse of the function numpy.dot()?
Or it is only possible by rearranging the matrices?
M = numpy.dot(R,[k,m0])
is performing matrix multiplication. M = R * x.
So to compute the inverse, you could use np.linalg.lstsq(R, M):
import numpy as np
A = np.random.random(5)
R = np.column_stack([A,np.ones(len(A))])
k = np.random.random()
m0 = np.random.random()
M = R.dot([k,m0])
(k_inferred, m0_inferred), residuals, rank, s = np.linalg.lstsq(R, M)
assert np.allclose(m0, m0_inferred)
assert np.allclose(k, k_inferred)
Note that both k and m0 are determined, given M and R (assuming len(M) >= 2).

Categories

Resources