I am trying to find the elements of a matrix inverse for an ill-conditioned matrix
Consider the complex non-Hermitian matrix M, I know this matrix has one zero eigenvalue, and is therefor singular. However, I need to find the sum of the matrix elements: v#f(M)#u, where u and v are both vectors and f(x)=1/x (effectively the matrix inverse). I know that the zeroth eigenvalue does not contribute to this sum, so there is no explicit issue with the singularity. However, my code is very numerically unstable, I presume this is a consequence of an error in finding the eigenvalues of the system.
Starting by building the preliminary matrices:
import numpy as np
import scipy as sc
g0 = np.array([0,0,1])
g1 = np.array([0,1,0])
e0 = np.array([1,0,0])
sm = np.outer(g0, e0)
sp = np.outer(e0, g0)
def spre(op):
return np.kron(np.eye(op.shape[0]),op)
def spost(op):
return np.kron(op.T,np.eye(op.shape[0]))
def sprepost(op1,op2):
return np.kron(op1.T,op2)
sm_reg = spre(sm)
sp_reg = spre(sp)
spsm_reg=spre(sp#sm)
hil_dim = int(g0.shape[0])
cav_proj= np.eye(hil_dim).reshape(hil_dim**2,)
rho0 =(np.outer(e0,e0)).reshape(hil_dim**2,)
def ham(g):
return g * (np.outer(g1,e0) + np.outer(e0, g1))
def lind_op(A):
L = 2 * sprepost(A,A.conj().T) - spre(A.conj().T # A)
L += - spost(A.conj().T # A)
return L
def JC_lio(g, kappa, gamma):
unit = -1j * (spre(ham(g)) - spost(ham(g)))
lind = gamma * lind_op(np.outer(g0 , e0)) + kappa * lind_op(np.outer(g0 , g1))
return unit + lind
Now define a function that first finds the left and right eigenvalues, and then finds the sum of the matrix elements:
def power_int(g, kappa, gamma):
# Construct the non-Hermitian matrix of interest
lio = JC_lio(g,kappa,gamma)
#Find its left and right eigenvectors:
ev, left, right = scipy.linalg.eig(lio, left=True,right=True)
# Find the appropriate normalisation factors
norm = np.array([(left.conj().T[ii]).dot(right.conj().T[ii]) for ii in range(len(ev))])
#Find the similarity transformation for the problem
P = right
Pinv = (left/norm).conj().T
#find the projectors for the Eigenbasis
Proj = [np.outer(P.conj().T[ii],Pinv[ii]) for ii in range(len(ev))]
#Find the relevant matrix elements between the Eigenbasis and the projectors --- this is where the zero eigenvector gets removed
PowList = [(spsm_reg# Proj[ii] # rho0).dot(cav_proj) for ii in range(len(ev))]
#apply the function
Pow = 0
for ii in range(len(ev)):
if PowList[ii]==0:
Pow = Pow
else:
Pow += PowList[ii]/ev[ii]
return -np.pi * np.real(Pow)
#example run:
grange = np.linspace(0.001,10,40)
dat = np.array([power_int(g, 1, 1) for g in grange])
Running this code leads to extremely oscillatory results where I expect a smooth curve. I suspect this error is due to poor accuracy in determining the eigenvectors, but I can't seem to find any documentation on this. Any insights would be welcome.
Related
I would like to numerically compute this ODE from time 0 -> T :
ODE equation where all of the sub-matrix are numerically given in a paper. Here are all of the variables :
import numpy as np
T = 1
eta = np.diag([2e-7, 2e-7])
R = [[0.33, 3.95],
[-2.52, 10.23]]
R = np.array(R)
gamma = 2e-5
GAMMA = 100
S_bar = [54.23, 27.45]
cov = [[0.47, 0.2],
[0.2, 0.14]]
cov = np.array(cov)
shape = cov.shape
Q = 0.5*np.block([[gamma*cov, R],
[np.transpose(R), np.zeros(shape)]])
Y = np.block([[np.zeros(shape), np.zeros(shape)],
[gamma*cov, R]])
U = np.block([[-linalg.inv(eta), np.zeros(shape)],
[np.zeros(shape), 2*gamma*cov]])
P_T = np.block([[-GAMMA*np.ones(shape), np.zeros(shape)],
[np.zeros(shape), np.zeros(shape)]])
Now I define de function f so that P' = f(t, P) :
n = len(P_T)
def f(t, X):
X = X.reshape([n, n])
return (Q + np.transpose(Y)#X + X#Y + X#U#X).reshape(-1)
Now my goal is to numerically solve this ODE, im trying to figure out the right function solve so that if I integrate the ODE from T to 0, then using the final value I get, I integrate back from 0 to T, the two matrices I get are actually (nearly) the same. Here is my solve function :
from scipy import integrate
def solve(interval, initial_value):
return integrate.solve_ivp(f, interval, initial_value, method="LSODA", max_step=1e-4)
Now I can test wether the computation is right :
solv = solve([T, 0], P_T.reshape(-1))
y = np.array(solv.y)
solv2 = solve([0, T], y[:, -1])
y2 = np.array(solv2.y)
# print(solv.status)
# print(solv2.status)
# this lines shows the diffenrence between the initial matrix at T and the final matrix computed at T
# the smallest is the value, the better is the computation
print(sum(sum(abs((P_T - y2[:, -1].reshape([n, n]))))))
My issue is : No matter what "solve" function im using (using different methods, different step sizes, testing all the parameters...) I always get either errors or a very bad convergence (the difference between the two matrices is too high).
Knowing that according to the paper where this ODE comes from ( (23) in https://arxiv.org/pdf/2103.13773v4.pdf) there exists a solution, how can I numerically compute it?
I have two solutions to this problem actually, they are both applied below to a test case. The thing is that none of them is perfect: first one only take into account the two end points, the other one can't be made "arbitrarily smooth": there is a limit in the amount of smoothness one can achieve (the one I am showing).
I am sure there is a better solution, that kind-of go from the first solution to the other and all the way to no smoothing at all. It may already be implemented somewhere. Maybe solving a minimization problem with an arbitrary number of splines equidistributed?
Thank you very much for your help
Ps: the seed used is a challenging one
import matplotlib.pyplot as plt
from scipy import interpolate
from scipy.signal import savgol_filter
import numpy as np
import random
def scipy_bspline(cv, n=100, degree=3):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
"""
cv = np.asarray(cv)
count = cv.shape[0]
degree = np.clip(degree,1,count-1)
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Return samples
max_param = count - (degree * (1-periodic))
spl = interpolate.BSpline(kv, cv, degree)
return spl(np.linspace(0,max_param,n))
def round_up_to_odd(f):
return np.int(np.ceil(f / 2.) * 2 + 1)
def generateRandomSignal(n=1000, seed=None):
"""
Parameters
----------
n : integer, optional
Number of points in the signal. The default is 1000.
Returns
-------
sig : numpy array
"""
np.random.seed(seed)
print("Seed was:", seed)
steps = np.random.choice(a=[-1, 0, 1], size=(n-1))
roughSig = np.concatenate([np.array([0]), steps]).cumsum(0)
sig = savgol_filter(roughSig, round_up_to_odd(n/10), 6)
return sig
# Generate a random signal to illustrate my point
n = 1000
t = np.linspace(0, 10, n)
seed = 45136. # Challenging seed
sig = generateRandomSignal(n=1000, seed=seed)
sigInit = np.copy(sig)
# Add noise to the signal
mean = 0
std = sig.max()/3.0
num_samples = n/5
idxMin = n/2-100
idxMax = idxMin + num_samples
tCut = t[idxMin+1:idxMax]
noise = np.random.normal(mean, std, size=num_samples-1) + 2*std*np.sin(2.0*np.pi*tCut/0.4)
sig[idxMin+1:idxMax] += noise
# Define filtering range enclosing the noisy area of the signal
idxMin -= 20
idxMax += 20
# Extreme filtering solution
# Spline between first and last points, the points in between have no influence
sigTrim = np.delete(sig, np.arange(idxMin,idxMax))
tTrim = np.delete(t, np.arange(idxMin,idxMax))
f = interpolate.interp1d(tTrim, sigTrim, kind='quadratic')
sigSmooth1 = f(t)
# My attempt. Not bad but not perfect because there is a limit in the maximum
# amount of smoothing we can add (degree=len(tSlice) is the maximum)
# If I could do degree=10*len(tSlice) and converging to the first solution
# I would be done!
sigSlice = sig[idxMin:idxMax]
tSlice = t[idxMin:idxMax]
cv = np.stack((tSlice, sigSlice)).T
p = scipy_bspline(cv, n=len(tSlice), degree=len(tSlice))
tSlice = p.T[0]
sigSliceSmooth = p.T[1]
sigSmooth2 = np.copy(sig)
sigSmooth2[idxMin:idxMax] = sigSliceSmooth
# Plot
plt.figure()
plt.plot(t, sig, label="Signal")
plt.plot(t, sigSmooth1, label="Solution 1")
plt.plot(t, sigSmooth2, label="Solution 2")
plt.plot(t[idxMin:idxMax], sigInit[idxMin:idxMax], label="What I'd want (kind of, smoother will be even better actually)")
plt.plot([t[idxMin],t[idxMax]], [sig[idxMin],sig[idxMax]],"o")
plt.legend()
plt.show()
sys.exit()
Yes, a minimization is a good way to approach this smoothing problem.
Least squares problem
Here is a suggestion for a least squares formulation: let s[0], ..., s[N] denote the N+1 samples of the given signal to smooth, and let L and R be the desired slopes to preserve at the left and right endpoints. Find the smoothed signal u[0], ..., u[N] as the minimizer of
min_u (1/2) sum_n (u[n] - s[n])² + (λ/2) sum_n (u[n+1] - 2 u[n] + u[n-1])²
subject to
s[0] = u[0], s[N] = u[N] (value constraints),
L = u[1] - u[0], R = u[N] - u[N-1] (slope constraints),
where in the minimization objective, the sums are over n = 1, ..., N-1 and λ is a positive parameter controlling the smoothing strength. The first term tries to keep the solution close to the original signal, and the second term penalizes u for bending to encourage a smooth solution.
The slope constraints require that
u[1] = L + u[0] = L + s[0] and u[N-1] = u[N] - R = s[N] - R. So we can consider the minimization as over only the interior samples u[2], ..., u[N-2].
Finding the minimizer
The minimizer satisfies the Euler–Lagrange equations
(u[n] - s[n]) / λ + (u[n+2] - 4 u[n+1] + 6 u[n] - 4 u[n-1] + u[n-2]) = 0
for n = 2, ..., N-2.
An easy way to find an approximate solution is by gradient descent: initialize u = np.copy(s), set u[1] = L + s[0] and u[N-1] = s[N] - R, and do 100 iterations or so of
u[2:-2] -= (0.05 / λ) * (u - s)[2:-2] + np.convolve(u, [1, -4, 6, -4, 1])[4:-4]
But with some more work, it is possible to do better than this by solving the E–L equations directly. For each n, move the known quantities to the right-hand side: s[n] and also the endpoints u[0] = s[0], u[1] = L + s[0], u[N-1] = s[N] - R, u[N] = s[N]. The you will have a linear system "A u = b", and matrix A has rows like
0, ..., 0, 1, -4, (6 + 1/λ), -4, 1, 0, ..., 0.
Finally, solve the linear system to find the smoothed signal u. You could use numpy.linalg.solve to do this if N is not too large, or if N is large, try an iterative method like conjugate gradients.
you can apply a simple smoothing method and plot the smooth curves with different smoothness values to see which one works best.
def smoothing(data, smoothness=0.5):
last = data[0]
new_data = [data[0]]
for datum in data[1:]:
new_value = smoothness * last + (1 - smoothness) * datum
new_data.append(new_value)
last = datum
return new_data
You can plot this curve for multiple values of smoothness and pick the curve which suits your needs. You can also apply this method only on a range of values in the actual curve by defining start and end
I have coded Isomap function starting with computing the eulidean distance matrix (using scipy.spatial.distance.cdist), next basing on K-nearest neighbors method and Dijkstra algorithm (to determinate the shortest path) I have Computed the full distance matrix over all paths, finally I have did map computations, following by the dimensionality reduction.
BUT, I want to use epsilon instead of K-nearest neighbors like in the following :
Y = isomap (X, epsilon, d)
• X is an n × m matrix which corresponds to n points with m attributes.
• epsilon is an anonymous function of the distance matrix used to find the parameters of neighborhood. (The neighborhood graph must be formed by eliminating the edges whose width is greater to epsilon of the complete distance graph).
• d is a parameter which signifies the output dimension.
• Y is an n × d matrix, which signifies the embedding resulting from isomap.
THANKS in advance
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
def distance_Matrix(X):
return cdist(X,X,'euclidean')
def Dijkstra(h):
q = h.copy()
for i in range(ndata):
for j in range(ndata):
k = np.argmin(q[i,:])
while not(np.isinf(q[i,k])):
q[i,k] = np.inf
for l in neighbours[k,:]:
possible = h[i,l] + h[l,k]
if possible < h[i,k]:
h[i,k] = possible
k = np.argmin(q[i,:])
return h
def MDS(D,newdim=2):
n = D.shape[0]
# Torgerson formula
I = np.eye(n)
J = np.ones(D.shape)
J = I-(1/n)*J
B = (-1/2)*np.dot(np.dot(J,D),np.dot(D,J)) # B = -(1/2).JD²J
#
eigenval, eigenvec = np.linalg.eig(B)
indices = np.argsort(eigenval)[::-1]
eigenval = eigenval[indices]
eigenvec = eigenvec[:, indices]
# dimension reduction
K = eigenvec[:, :newdim]
L = np.diag(eigenval[:newdim])
# result
Y = K # L **(1/2)
return np.real(Y)
def isomap(data,newdim=2,K=12):
ndata = np.shape(data)[0]
ndim = np.shape(data)[1]
d = distance_Matrix(X)
# replace begin
# K-nearest neighbours
indices = d.argsort()
#notneighbours = indices[:,K+1:]
neighbours = indices[:,:K+1]
# replace end
h = np.ones((ndata,ndata),dtype=float)*np.inf
for i in range(ndata):
h[i,neighbours[i,:]] = d[i,neighbours[i,:]]
h = Dijkstra(h)
return MDS(h,newdim)
Try sklearn.neighbors.radius_neighbors_graph for your distance matrix
I am very new to scipy and doing data analysis in python. I am trying to solve the following regularized optimization problem and unfortunately I haven't been able to make too much sense from the scipy documentation. I am looking to solve the following constrained optimization problem using scipy.optimize
Here is the function I am looking to minimize:
here A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.
In the function A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second term is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.
I would like to know how to minimize this function subject to the constraints that:
wj >= 0
wj,j = 0
I would like to use coordinate descent (or any other method that scipy.optimize provides) to solve the above problem. I would like so direction on how to achieve this as I have no idea how to take the frobenius norm or how to tune the parameters beta and lambda or whether the scipy.optimize will tune and return the parameters for me. Any help regarding these questions would be much appreciated.
Thanks in advance!
How large is m and n?
Here is a basic example for how to use fmin:
from scipy import optimize
import numpy as np
m = 5
n = 3
a = np.random.rand(m, n)
idx = np.arange(n)
def func(w, beta, lam):
w = w.reshape(n, n)
w2 = np.abs(w)
w2[idx, idx] = 0
return 0.5*((a - np.dot(a, w2))**2).sum() + lam*w2.sum() + 0.5*beta*(w2**2).sum()
w = optimize.fmin(func, np.random.rand(n*n), args=(0.1, 0.2))
w = w.reshape(n, n)
w[idx, idx] = 0
w = np.abs(w)
print w
If you want to use coordinate descent, you can implement it by theano.
http://deeplearning.net/software/theano/
Your problem seems tailor-made for cvxopt - http://cvxopt.org/
and in particular
http://cvxopt.org/userguide/solvers.html#problems-with-nonlinear-objectives
using fmin would likely be slower, since it does not take advantage of gradient / Hessian information.
The code in HYRY's answer also has the drawback that as far as fmin is concerned the diagonal W is a variable and fmin would try to move the W-diagonal values around until it realizes that they don't do anything (since the objective function resets them to zero). Here is the implementation in cvxopt of HYRY's code that explicitly enforces the zero-constraints and uses gradient info, WARNING: I couldn't derive the Hessian for your objective... and you might double-check the gradient as well:
'''CVXOPT version:'''
from numpy import *
from cvxopt import matrix, mul
''' warning: CVXOPT uses column-major order (Fortran) '''
m = 5
n = 3
n_active = (n)*(n-1)
A = matrix(random.rand(m*n),(m,n))
ids = arange(n)
beta = 0.1;
lam = 0.2;
W = matrix(zeros(n*n), (n,n));
def cvx_objective_func(w=None, z=None):
if w is None:
num_nonlinear_constraints = 0;
w_0 = matrix(1, (n_active,1), 'd');
return num_nonlinear_constraints, w_0
#main call:
'calculate objective:'
'form W matrix, warning _w is column-major order (Fortran)'
'''column-major order!'''
_w = matrix(w, (n, n-1))
for k in xrange(n):
W[k, 0:k] = _w[k, 0:k]
W[k, k+1:n] = _w[k, k:n-1]
squared_error = A - A*W
objective_value = .5 * sum( mul(squared_error,squared_error)) +\
.5* beta*sum(mul(W,W)) +\
lam * sum(abs(W));
'not sure if i calculated this right...'
_Df = -A.T*(squared_error) + beta*W + lam;
'''column-major order!'''
Df = matrix(0., (1, n*(n-1)))
for jdx in arange(n):
for idx in list(arange(0,jdx)) + list(arange(jdx+1,n)):
idx = int(idx);
jdx = int(jdx)
Df[0, jdx*(n-1) + idx] = _Df[idx, jdx]
if z is None:
return objective_value, Df
'''Also form hessian of objective+non-linear constraints
(but there are no nonlinear constraints) :
This is the trickiest part...
WARNING: H is for sure coded wrong'''
H = matrix(1., (n_active, n_active))
return objective_value, Df, H
m, w_0 = cvx_objective_func()
print cvx_objective_func(w_0)
G = -matrix(diag(ones(n_active),), (n_active,n_active))
h = matrix(0., (n_active,1), 'd')
from cvxopt import solvers
print solvers.cp(cvx_objective_func, G=G, h=h)
having said that, the tricks to eliminate the equality/inequality constraints in HYRY's code are quite cute
I have another question that I was hoping someone could help me with.
I'm using the Jensen-Shannon-Divergence to measure the similarity between two probability distributions. The similarity scores appear to be correct in the sense that they fall between 1 and 0 given that one uses the base 2 logarithm, with 0 meaning that the distributions are equal.
However, I'm not sure whether there is in fact an error somewhere and was wondering whether someone might be able to say 'yes it's correct' or 'no, you did something wrong'.
Here is the code:
from numpy import zeros, array
from math import sqrt, log
class JSD(object):
def __init__(self):
self.log2 = log(2)
def KL_divergence(self, p, q):
""" Compute KL divergence of two vectors, K(p || q)."""
return sum(p[x] * log((p[x]) / (q[x])) for x in range(len(p)) if p[x] != 0.0 or p[x] != 0)
def Jensen_Shannon_divergence(self, p, q):
""" Returns the Jensen-Shannon divergence. """
self.JSD = 0.0
weight = 0.5
average = zeros(len(p)) #Average
for x in range(len(p)):
average[x] = weight * p[x] + (1 - weight) * q[x]
self.JSD = (weight * self.KL_divergence(array(p), average)) + ((1 - weight) * self.KL_divergence(array(q), average))
return 1-(self.JSD/sqrt(2 * self.log2))
if __name__ == '__main__':
J = JSD()
p = [1.0/10, 9.0/10, 0]
q = [0, 1.0/10, 9.0/10]
print J.Jensen_Shannon_divergence(p, q)
The problem is that I feel that the scores are not high enough when comparing two text documents, for instance. However, this is purely a subjective feeling.
Any help is, as always, appreciated.
Note that the scipy entropy call below is the Kullback-Leibler divergence.
See: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
#!/usr/bin/env python
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np
def JSD(P, Q):
_P = P / norm(P, ord=1)
_Q = Q / norm(Q, ord=1)
_M = 0.5 * (_P + _Q)
return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))
Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.
See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm
Since the Jensen-Shannon distance (distance.jensenshannon) has been included in Scipy 1.2, the Jensen-Shannon divergence can be obtained as the square of the Jensen-Shannon distance:
from scipy.spatial import distance
distance.jensenshannon([1.0/10, 9.0/10, 0], [0, 1.0/10, 9.0/10]) ** 2
# 0.5306056938642212
Get some data for distributions with known divergence and compare your results against those known values.
BTW: the sum in KL_divergence may be rewritten using the zip built-in function like this:
sum(_p * log(_p / _q) for _p, _q in zip(p, q) if _p != 0)
This does away with lots of "noise" and is also much more "pythonic". The double comparison with 0.0 and 0 is not necessary.
A general version, for n probability distributions, in python
import numpy as np
from scipy.stats import entropy as H
def JSD(prob_distributions, weights, logbase=2):
# left term: entropy of misture
wprobs = weights * prob_distributions
mixture = wprobs.sum(axis=0)
entropy_of_mixture = H(mixture, base=logbase)
# right term: sum of entropies
entropies = np.array([H(P_i, base=logbase) for P_i in prob_distributions])
wentropies = weights * entropies
sum_of_entropies = wentropies.sum()
divergence = entropy_of_mixture - sum_of_entropies
return(divergence)
# From the original example with three distributions:
P_1 = np.array([1/2, 1/2, 0])
P_2 = np.array([0, 1/10, 9/10])
P_3 = np.array([1/3, 1/3, 1/3])
prob_distributions = np.array([P_1, P_2, P_3])
n = len(prob_distributions)
weights = np.empty(n)
weights.fill(1/n)
print(JSD(prob_distributions, weights))
#0.546621319446
Explicitly following the math in the Wikipedia article:
def jsdiv(P, Q):
"""Compute the Jensen-Shannon divergence between two probability distributions.
Input
-----
P, Q : array-like
Probability distributions of equal length that sum to 1
"""
def _kldiv(A, B):
return np.sum([v for v in A * np.log2(A/B) if not np.isnan(v)])
P = np.array(P)
Q = np.array(Q)
M = 0.5 * (P + Q)
return 0.5 * (_kldiv(P, M) +_kldiv(Q, M))