Jensen-Shannon Divergence

Jensen-Shannon Divergence - python

I have another question that I was hoping someone could help me with.
I'm using the Jensen-Shannon-Divergence to measure the similarity between two probability distributions. The similarity scores appear to be correct in the sense that they fall between 1 and 0 given that one uses the base 2 logarithm, with 0 meaning that the distributions are equal.
However, I'm not sure whether there is in fact an error somewhere and was wondering whether someone might be able to say 'yes it's correct' or 'no, you did something wrong'.
Here is the code:
from numpy import zeros, array
from math import sqrt, log
class JSD(object):
def __init__(self):
self.log2 = log(2)
def KL_divergence(self, p, q):
""" Compute KL divergence of two vectors, K(p || q)."""
return sum(p[x] * log((p[x]) / (q[x])) for x in range(len(p)) if p[x] != 0.0 or p[x] != 0)
def Jensen_Shannon_divergence(self, p, q):
""" Returns the Jensen-Shannon divergence. """
self.JSD = 0.0
weight = 0.5
average = zeros(len(p)) #Average
for x in range(len(p)):
average[x] = weight * p[x] + (1 - weight) * q[x]
self.JSD = (weight * self.KL_divergence(array(p), average)) + ((1 - weight) * self.KL_divergence(array(q), average))
return 1-(self.JSD/sqrt(2 * self.log2))
if __name__ == '__main__':
J = JSD()
p = [1.0/10, 9.0/10, 0]
q = [0, 1.0/10, 9.0/10]
print J.Jensen_Shannon_divergence(p, q)
The problem is that I feel that the scores are not high enough when comparing two text documents, for instance. However, this is purely a subjective feeling.
Any help is, as always, appreciated.

Note that the scipy entropy call below is the Kullback-Leibler divergence.
See: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
#!/usr/bin/env python
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np
def JSD(P, Q):
_P = P / norm(P, ord=1)
_Q = Q / norm(Q, ord=1)
_M = 0.5 * (_P + _Q)
return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))
Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.
See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm

Since the Jensen-Shannon distance (distance.jensenshannon) has been included in Scipy 1.2, the Jensen-Shannon divergence can be obtained as the square of the Jensen-Shannon distance:
from scipy.spatial import distance
distance.jensenshannon([1.0/10, 9.0/10, 0], [0, 1.0/10, 9.0/10]) ** 2
# 0.5306056938642212

Get some data for distributions with known divergence and compare your results against those known values.
BTW: the sum in KL_divergence may be rewritten using the zip built-in function like this:
sum(_p * log(_p / _q) for _p, _q in zip(p, q) if _p != 0)
This does away with lots of "noise" and is also much more "pythonic". The double comparison with 0.0 and 0 is not necessary.

A general version, for n probability distributions, in python
import numpy as np
from scipy.stats import entropy as H
def JSD(prob_distributions, weights, logbase=2):
# left term: entropy of misture
wprobs = weights * prob_distributions
mixture = wprobs.sum(axis=0)
entropy_of_mixture = H(mixture, base=logbase)
# right term: sum of entropies
entropies = np.array([H(P_i, base=logbase) for P_i in prob_distributions])
wentropies = weights * entropies
sum_of_entropies = wentropies.sum()
divergence = entropy_of_mixture - sum_of_entropies
return(divergence)
# From the original example with three distributions:
P_1 = np.array([1/2, 1/2, 0])
P_2 = np.array([0, 1/10, 9/10])
P_3 = np.array([1/3, 1/3, 1/3])
prob_distributions = np.array([P_1, P_2, P_3])
n = len(prob_distributions)
weights = np.empty(n)
weights.fill(1/n)
print(JSD(prob_distributions, weights))
#0.546621319446

Explicitly following the math in the Wikipedia article:
def jsdiv(P, Q):
"""Compute the Jensen-Shannon divergence between two probability distributions.
Input
-----
P, Q : array-like
Probability distributions of equal length that sum to 1
"""
def _kldiv(A, B):
return np.sum([v for v in A * np.log2(A/B) if not np.isnan(v)])
P = np.array(P)
Q = np.array(Q)
M = 0.5 * (P + Q)
return 0.5 * (_kldiv(P, M) +_kldiv(Q, M))

Related

Smooth a curve in Python while preserving the value and slope at the end points

I have two solutions to this problem actually, they are both applied below to a test case. The thing is that none of them is perfect: first one only take into account the two end points, the other one can't be made "arbitrarily smooth": there is a limit in the amount of smoothness one can achieve (the one I am showing).
I am sure there is a better solution, that kind-of go from the first solution to the other and all the way to no smoothing at all. It may already be implemented somewhere. Maybe solving a minimization problem with an arbitrary number of splines equidistributed?
Thank you very much for your help
Ps: the seed used is a challenging one
import matplotlib.pyplot as plt
from scipy import interpolate
from scipy.signal import savgol_filter
import numpy as np
import random
def scipy_bspline(cv, n=100, degree=3):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
"""
cv = np.asarray(cv)
count = cv.shape[0]
degree = np.clip(degree,1,count-1)
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Return samples
max_param = count - (degree * (1-periodic))
spl = interpolate.BSpline(kv, cv, degree)
return spl(np.linspace(0,max_param,n))
def round_up_to_odd(f):
return np.int(np.ceil(f / 2.) * 2 + 1)
def generateRandomSignal(n=1000, seed=None):
"""
Parameters
----------
n : integer, optional
Number of points in the signal. The default is 1000.
Returns
-------
sig : numpy array
"""
np.random.seed(seed)
print("Seed was:", seed)
steps = np.random.choice(a=[-1, 0, 1], size=(n-1))
roughSig = np.concatenate([np.array([0]), steps]).cumsum(0)
sig = savgol_filter(roughSig, round_up_to_odd(n/10), 6)
return sig
# Generate a random signal to illustrate my point
n = 1000
t = np.linspace(0, 10, n)
seed = 45136. # Challenging seed
sig = generateRandomSignal(n=1000, seed=seed)
sigInit = np.copy(sig)
# Add noise to the signal
mean = 0
std = sig.max()/3.0
num_samples = n/5
idxMin = n/2-100
idxMax = idxMin + num_samples
tCut = t[idxMin+1:idxMax]
noise = np.random.normal(mean, std, size=num_samples-1) + 2*std*np.sin(2.0*np.pi*tCut/0.4)
sig[idxMin+1:idxMax] += noise
# Define filtering range enclosing the noisy area of the signal
idxMin -= 20
idxMax += 20
# Extreme filtering solution
# Spline between first and last points, the points in between have no influence
sigTrim = np.delete(sig, np.arange(idxMin,idxMax))
tTrim = np.delete(t, np.arange(idxMin,idxMax))
f = interpolate.interp1d(tTrim, sigTrim, kind='quadratic')
sigSmooth1 = f(t)
# My attempt. Not bad but not perfect because there is a limit in the maximum
# amount of smoothing we can add (degree=len(tSlice) is the maximum)
# If I could do degree=10*len(tSlice) and converging to the first solution
# I would be done!
sigSlice = sig[idxMin:idxMax]
tSlice = t[idxMin:idxMax]
cv = np.stack((tSlice, sigSlice)).T
p = scipy_bspline(cv, n=len(tSlice), degree=len(tSlice))
tSlice = p.T[0]
sigSliceSmooth = p.T[1]
sigSmooth2 = np.copy(sig)
sigSmooth2[idxMin:idxMax] = sigSliceSmooth
# Plot
plt.figure()
plt.plot(t, sig, label="Signal")
plt.plot(t, sigSmooth1, label="Solution 1")
plt.plot(t, sigSmooth2, label="Solution 2")
plt.plot(t[idxMin:idxMax], sigInit[idxMin:idxMax], label="What I'd want (kind of, smoother will be even better actually)")
plt.plot([t[idxMin],t[idxMax]], [sig[idxMin],sig[idxMax]],"o")
plt.legend()
plt.show()
sys.exit()

Yes, a minimization is a good way to approach this smoothing problem.
Least squares problem
Here is a suggestion for a least squares formulation: let s[0], ..., s[N] denote the N+1 samples of the given signal to smooth, and let L and R be the desired slopes to preserve at the left and right endpoints. Find the smoothed signal u[0], ..., u[N] as the minimizer of
min_u (1/2) sum_n (u[n] - s[n])² + (λ/2) sum_n (u[n+1] - 2 u[n] + u[n-1])²
subject to
s[0] = u[0], s[N] = u[N] (value constraints),
L = u[1] - u[0], R = u[N] - u[N-1] (slope constraints),
where in the minimization objective, the sums are over n = 1, ..., N-1 and λ is a positive parameter controlling the smoothing strength. The first term tries to keep the solution close to the original signal, and the second term penalizes u for bending to encourage a smooth solution.
The slope constraints require that
u[1] = L + u[0] = L + s[0] and u[N-1] = u[N] - R = s[N] - R. So we can consider the minimization as over only the interior samples u[2], ..., u[N-2].
Finding the minimizer
The minimizer satisfies the Euler–Lagrange equations
(u[n] - s[n]) / λ + (u[n+2] - 4 u[n+1] + 6 u[n] - 4 u[n-1] + u[n-2]) = 0
for n = 2, ..., N-2.
An easy way to find an approximate solution is by gradient descent: initialize u = np.copy(s), set u[1] = L + s[0] and u[N-1] = s[N] - R, and do 100 iterations or so of
u[2:-2] -= (0.05 / λ) * (u - s)[2:-2] + np.convolve(u, [1, -4, 6, -4, 1])[4:-4]
But with some more work, it is possible to do better than this by solving the E–L equations directly. For each n, move the known quantities to the right-hand side: s[n] and also the endpoints u[0] = s[0], u[1] = L + s[0], u[N-1] = s[N] - R, u[N] = s[N]. The you will have a linear system "A u = b", and matrix A has rows like
0, ..., 0, 1, -4, (6 + 1/λ), -4, 1, 0, ..., 0.
Finally, solve the linear system to find the smoothed signal u. You could use numpy.linalg.solve to do this if N is not too large, or if N is large, try an iterative method like conjugate gradients.

you can apply a simple smoothing method and plot the smooth curves with different smoothness values to see which one works best.
def smoothing(data, smoothness=0.5):
last = data[0]
new_data = [data[0]]
for datum in data[1:]:
new_value = smoothness * last + (1 - smoothness) * datum
new_data.append(new_value)
last = datum
return new_data
You can plot this curve for multiple values of smoothness and pick the curve which suits your needs. You can also apply this method only on a range of values in the actual curve by defining start and end

Improve speed of gradient descent

I am trying to maximize a target function f(x) with function scipy.optimize.minimum. But it usually takes 4-5 hrs to run the code because the function f(x) involves a lot of computation of complex matrix. To improve its speed, I want to use gpu. And I've already tried tensorflow package. Since I use numpy to define f(x), I have to convert it into tensorflow's format. However, it doesn't support the computation of complex matrix. What else package or means I can use? Any suggestions?
To specific my problem, I will show calculate scheme below:
Calculate the expectation :
-where H=x*H_0, x is the parameter
Let \phi go through the dynamics of Schrödinger equation
-Different H is correspond to a different \phi_end. Thus, parameter x determines the expectation
Change x, calculate the corresponding expectation
Find a specific x that minimize the expectation
Here is a simple example of part of my code:
import numpy as np
import cmath
from scipy.linalg import expm
import scipy.optimize as opt
# create initial complex matrixes
N = 2 # Dimension of matrix
H = np.array([[1.0 + 1.0j] * N] * N) # a complex matrix with shape(N, N)
A = np.array([[0.0j] * N] * N)
A[0][0] = 1.0 + 1j
# calculate the expectation
def value(phi):
exp_H = expm(H) # put the matrix in the exp function
new_phi = np.linalg.linalg.matmul(exp_H, phi)
# calculate the expectation of the matrix
x = np.linalg.linalg.matmul(H, new_phi)
expectation = np.inner(np.conj(phi), x)
return expectation
# Contants
tmax = 1
dt = 0.1
nstep = int(tmax/dt)
phi_init = [1.0 + 1.0j] * N
# 1st derivative of Schrödinger equation
def dXdt(t, phi, H): # 1st derivative of the function
return -1j * np.linalg.linalg.matmul(H, phi)
def f(X):
phi = [[0j] * N] * nstep # store every time's phi
phi[0] = phi_init
# phi go through the dynamics of Schrödinger equation
for i in range(nstep - 1):
phi[i + 1] = phi[i] - dXdt(i * dt, X[i] * H, phi[i]) * dt
# calculate the corresponding value
f_result = value(phi[-1])
return f_result
# Initialize the parameter
X0 = np.array(np.ones(nstep))
results = opt.minimize(f, X0) # minimize the target function
opt_x = results.x
PS:
Python Version: 3.7
Operation System: Win 10

Elements of a matrix inverse for ill-conditioned matrix

I am trying to find the elements of a matrix inverse for an ill-conditioned matrix
Consider the complex non-Hermitian matrix M, I know this matrix has one zero eigenvalue, and is therefor singular. However, I need to find the sum of the matrix elements: v#f(M)#u, where u and v are both vectors and f(x)=1/x (effectively the matrix inverse). I know that the zeroth eigenvalue does not contribute to this sum, so there is no explicit issue with the singularity. However, my code is very numerically unstable, I presume this is a consequence of an error in finding the eigenvalues of the system.
Starting by building the preliminary matrices:
import numpy as np
import scipy as sc
g0 = np.array([0,0,1])
g1 = np.array([0,1,0])
e0 = np.array([1,0,0])
sm = np.outer(g0, e0)
sp = np.outer(e0, g0)
def spre(op):
return np.kron(np.eye(op.shape[0]),op)
def spost(op):
return np.kron(op.T,np.eye(op.shape[0]))
def sprepost(op1,op2):
return np.kron(op1.T,op2)
sm_reg = spre(sm)
sp_reg = spre(sp)
spsm_reg=spre(sp#sm)
hil_dim = int(g0.shape[0])
cav_proj= np.eye(hil_dim).reshape(hil_dim**2,)
rho0 =(np.outer(e0,e0)).reshape(hil_dim**2,)
def ham(g):
return g * (np.outer(g1,e0) + np.outer(e0, g1))
def lind_op(A):
L = 2 * sprepost(A,A.conj().T) - spre(A.conj().T # A)
L += - spost(A.conj().T # A)
return L
def JC_lio(g, kappa, gamma):
unit = -1j * (spre(ham(g)) - spost(ham(g)))
lind = gamma * lind_op(np.outer(g0 , e0)) + kappa * lind_op(np.outer(g0 , g1))
return unit + lind
Now define a function that first finds the left and right eigenvalues, and then finds the sum of the matrix elements:
def power_int(g, kappa, gamma):
# Construct the non-Hermitian matrix of interest
lio = JC_lio(g,kappa,gamma)
#Find its left and right eigenvectors:
ev, left, right = scipy.linalg.eig(lio, left=True,right=True)
# Find the appropriate normalisation factors
norm = np.array([(left.conj().T[ii]).dot(right.conj().T[ii]) for ii in range(len(ev))])
#Find the similarity transformation for the problem
P = right
Pinv = (left/norm).conj().T
#find the projectors for the Eigenbasis
Proj = [np.outer(P.conj().T[ii],Pinv[ii]) for ii in range(len(ev))]
#Find the relevant matrix elements between the Eigenbasis and the projectors --- this is where the zero eigenvector gets removed
PowList = [(spsm_reg# Proj[ii] # rho0).dot(cav_proj) for ii in range(len(ev))]
#apply the function
Pow = 0
for ii in range(len(ev)):
if PowList[ii]==0:
Pow = Pow
else:
Pow += PowList[ii]/ev[ii]
return -np.pi * np.real(Pow)
#example run:
grange = np.linspace(0.001,10,40)
dat = np.array([power_int(g, 1, 1) for g in grange])
Running this code leads to extremely oscillatory results where I expect a smooth curve. I suspect this error is due to poor accuracy in determining the eigenvectors, but I can't seem to find any documentation on this. Any insights would be welcome.

Adjusting the mean and standard deviation of a list of random numbers?

I have to create a list of random numbers (with decimals) in the range between -3 and 3. The problem is that the list must have a mean of 0 and a standard deviation of 1. How can I adjust the mean and standard deviation parameters? Is there a function I can use?
I was already able to create a list of random numbers between -3 and 3.
import random
def lista_aleatorios(n):
lista = [0] * n
for i in range(n):
lista[i] = random.uniform(-3, 3)
return lista
print("\nHow many numbers do you want?: ")
n = int(input())
print (lista_aleatorios(n))

The function random.normalvariate(mu, sigma) allows you to specify the mean and the stdev for normally distributed random variables.

Use random.gauss, then scale:
import numpy as np
from random import gauss
def bounded_normal(n, mean, std, lower_bound, upper_bound):
# generate numbers between lower_bound and upper_bound
result = []
for i in range(n):
while True:
value = gauss(mean, std)
if lower_bound < value < upper_bound:
break
result.append(value)
# modify the mean and standard deviation
actual_mean = np.mean(result)
actual_std = np.std(result)
mean_difference = mean - actual_mean
std_difference = std / actual_std
new_result = [(element + mean_difference) * std_difference for element in result]
return new_result

Ok, here is quick way to solution (if you want to use truncated gaussian). Set boundaries and desired stddev. I assume mean is 0. Then quick-and-crude code to do binary search for distribution sigma, solving for non-linear root (brentq() should be used in production code). All formulas are taken from Wiki page on Truncated Normal. It (sigma) shall be larger than desired stddev due to the fact, that truncation removes random values which contribute to large stddev. Then we do quick sampling test - and mean and stddev are close to desired values but never exactly equal to them. Code (Python-3.7, Anaconda, Win10 x64)
import numpy as np
from scipy.special import erf
from scipy.stats import truncnorm
def alpha(a, sigma):
return a/sigma
def beta(b, sigma):
return b/sigma
def xi(x, sigma):
return x/sigma
def fi(xi):
return 1.0/np.sqrt(2.0*np.pi) * np.exp(-0.5*xi*xi)
def Fi(x):
return 0.5*(1.0 + erf(x/np.sqrt(2.0)))
def Z(al, be):
return Fi(be) - Fi(al)
def Variance(sigma, a, b):
al = alpha(a, sigma)
be = beta(b, sigma)
ZZ = Z(al, be)
return sigma*sigma*(1.0 + (al*fi(al) - be*fi(be))/ZZ - ((fi(al)-fi(be))/ZZ)**2)
def stddev(sigma, a, b):
return np.sqrt(Variance(sigma, a, b))
m = 0.0 # mean
s = 1.0 # this is what we want
a = -3.0 # left boundary
b = 3.0 # right boundary
#print(stddev(s , a, b))
#print(stddev(s + 0.1, a, b))
slo = 1.0
shi = 1.1
stdlo = stddev(slo, a, b)
stdhi = stddev(shi, a, b)
sigma = -1.0
while True: # binary search for sigma
sme = (slo + shi) / 2.0
stdme = stddev(sme, a, b)
if stdme - s == 0.0:
sigma = stdme
break
elif stdme - s < 0.0:
slo = sme
else:
shi = sme
if shi - slo < 0.0000001:
sigma = (shi + slo) / 2.0
break
print(sigma) # we got it, shall be slightly bigger than s, desired stddev
np.random.seed(73123457)
rvs = truncnorm.rvs(a, b, loc=m, scale=sigma, size=1000000) # quick sampling test
print(np.mean(rvs))
print(np.std(rvs))
For me it printed
sigma = 1.0153870105743408
mean = -0.000400729471992301
stddev = 1.0024267696681475
with different seed or sequence length you might get output like
1.0153870105743408
-0.00015923177289006116
0.9999974266369461

more accuracy with sicpy interp1d

I am trying to implement a non parametric estimation of the KL divergence shown in this paper
Here is my code:
import numpy as np
import math
import itertools
import random
from scipy.interpolate import interp1d
def log(x):
if x > 0: return math.log(x)
else: return 0
g = lambda x, inp,N : sum(0.5 + 0.5 * np.sign(x-inp))/N
def ecdf(x,N):
out = [g(i,x,N) for i in x]
fun = interp1d(x, out, kind='linear', bounds_error = False, fill_value = (0,1))
return fun
def KL_est(x,y):
ex = min(np.diff(sorted(np.unique(x))))
ey = min(np.diff(sorted(np.unique(y))))
e = min(ex,ey) * 0.9
N = len(x)
x.sort()
y.sort()
P = ecdf(x,N)
Q = ecdf(y,N)
KL = sum(log(v) for v in ((P(x)-P(x-e))/(Q(x)-Q(x-e))) ) / N
return KL
My trouble is with scipy interp1d. I am using the function returned from interp1d to find the value of new inputs. The problem is, some of the input values are very close (10^-5 apart) and the function returns the same value for both. In my code above, Q(x) - Q(x-e) leads to a divide by zero error.
Here is some test code that reproduces the problem:
x = np.random.normal(0, 1, 10)
y = np.random.normal(0, 1, 10)
ex = min(np.diff(sorted(np.unique(x))))
ey = min(np.diff(sorted(np.unique(y))))
e = min(ex,ey) * 0.9
N = len(x)
x.sort()
y.sort()
P = ecdf(x,N)
Q = ecdf(y,N)
KL = sum(log(v) for v in ((P(x)-P(x-e))/(Q(x)-Q(x-e))) ) / N
How would I go about getting a more accurate interpolation?

As e gets small you are effectively trying to compute the ratio of derivatives of P and Q numerically. As you are finding, you run out of precision really quickly in floating point doing it this way.
An alternate approach would be to use an interpolation function that can return derivatives directly. For example, you could try scipy.interpolate.InterpolatedUnivariateSpline. You were saying kind='linear' to interp1d, so the equivalent is k=1. Once you construct it, the spline has method derivatives() that gives you all the derivatives at different points. For small values of e you could switch to using the derivative.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Jensen-Shannon Divergence - python

Related

Smooth a curve in Python while preserving the value and slope at the end points

Improve speed of gradient descent

Elements of a matrix inverse for ill-conditioned matrix

Adjusting the mean and standard deviation of a list of random numbers?

more accuracy with sicpy interp1d

Categories

Resources