Euclidean distance with weights - python

I am currently using SciPy to calculate the euclidean distance
dis = scipy.spatial.distance.euclidean(A,B)
where; A, B are 5-dimension bit vectors. It works fine now, but if I add weights for each dimension then, is it still possible to use scipy?
What I have now: sqrt((a1-b1)^2 + (a2-b2)^2 +...+ (a5-b5)^2)
What I want: sqrt(w1(a1-b1)^2 + w2(a2-b2)^2 +...+ w5(a5-b5)^2) using scipy or numpy or any other efficient way to do this.
Thanks

The suggestion of writing your own weighted L2 norm is a good one, but the calculation provided in this answer is incorrect. If the intention is to calculate
then this should do the job:
def weightedL2(a,b,w):
q = a-b
return np.sqrt((w*q*q).sum())

Simply define it yourself. Something like this should do the trick:
def mynorm(A, B, w):
import numpy as np
q = np.matrix(w * (A - B))
return np.sqrt((q * q.T).sum())

If you want to keep using scipy function you could pre-process the vector like this.
def weighted_euclidean(a, b, w):
A = a*np.sqrt(w)
B = b*np.sqrt(w)
return scipy.spatial.distance.euclidean(A, B)
However it's look slower than
def weightedL2(a, b, w):
q = a-b
return np.sqrt((w*q*q).sum())

The present version of scipy (v1.9.3 as of writing) supports weighted L2 distance. From scipy.spatial.distance.euclidean
where: w(N,) array_like, optional
The weights for each value in u and v. Default is None, which gives each value a weight of 1.0

Related

I have problems with this scipy.optimize.minimize task: Newton-CG vs BFGS vs L-BFGS

(SciPy optimisation: Newton-CG vs BFGS vs L-BFGS)
Consider the following area: D = [-5.10]x[0.15]. This task, given below, must be performed in
this domain.And executed in Jupyter notebook
Draw N = 100 random points uniformly distributed over D. For each point, run a local minimization of f using scipy.optimize.minimize with the following methods:
CG,BFGS,Newton-CG,L-BFGS-B. For this task, you will have to write two other functions, one that returns the Jacobian matrix of f and one that returns the Hessian matrix of f . Store the answers in an array with shape N x 6, each row of which has the following data:
(x1, y1,x2, y2,v,c),
where (x1, y1) and (x2, y2) are respectively the starting and the final point of the optimization, while v is the final value of f . The final element of the row c is code of the used method, according to this correspondence: CG:1, BFGS:2, Newton-CG:3, L-BFGS-B:4
I got this code
import numpy as np
from scipy import optimize
from scipy.optimize import minimize
#%from numdifftools import Jacobian, Hessian
#method
mmm='CG'
#mmm='BFGS'
#mmm='Newton-CG'
#mmm='L-BFGS-B'
a=1
b=5.1/(4*np.pi**2)
c=5/np.pi
r=6
s=10
t=1/(8*np.pi)
a1=-5
b1=10
c1=0
d1=15
def f_brain(xx):
aa=a*(xx[1]-b*xx[0]**2+c*xx[0]-r)**2+s*(1-t)*np.cos(xx[0])+s
return aa
def fun_der(xx):
aa1=2*a*(xx[1]-b*(xx[0]**2)+c*xx[0]-r)*(-2*b*xx[0]+c)-s*(1-t)*np.sin(xx[0])
aa2=2*a*(xx[1]-b*(xx[0]**2)+c*xx[0]-r)
return np.array([aa1,aa2])
def fun_hess(xx):
b11=2*a*(c-2*b*xx[0])**2-2*b*(xx[1]-b*xx[0]**2+c*xx[1]-r)-s*(1-t)*np.cos(xx[0])
b12=2*a*(c-2*b*xx[0])
b21=2*a*(c-2*b*xx[0])
b22=2*a
return np.array([[b11,b12],[b21,b22]])
# importing libraries
x0=np.array([a1,c1])
x_bound = (a1,b1)
y_bound = (c1,d1)
result = optimize.minimize(f_brain, x0=x0, method=mmm,
options={'maxiter':100}, jac=fun_der(x0).all(), hess=fun_hess(x0).all(),
bounds=(x_bound,y_bound))
print(result)
print("Method is "+mmm+" and the Minimum occurs at: ", result['x'])
x00=np.float64(result['x'])
print("Jacobian", fun_der(x00))
print("Hessian", fun_hess(x00))

Using scipy.linalg.solve with symmetric coefficient matrix (assume_a='sym')

I am trying to solve a system of linear equations A * x = b for the unknown x using scipy's linalg.solve function. Here is an example that works fine:
import numpy as np
import scipy.linalg as linalg
A = np.array([[ 0.18666667, 0.06222222, -0.01777778],
[ 0.01777778, 0.18666667, 0.01777778],
[-0.01777778, 0.06222222, 0.18666667]])
b = np.array([0.26666667, -0.26666667, -0.4])
x = linalg.solve(A, b, assume_a='gen')
It results in x = [1.77194417, -1.4555256, -1.48892533], which is a correct solution. This can be verified by computing A.dot(x), which results in [0.26666667, -0.26666667, -0.4]. As this is the same as b, the solution is correct.
However, the matrix of coefficients A is symmetrical, i.e., the values above and below the main diagonal are the same. If I understand the documentation correctly, for solving such a problem more efficiently, the solve function allows to set the argument assume_a='sym'. Unfortunately, using the following code (given the same A and b) results in an incorrect solution being found:
x = linalg.solve(A, b, assume_a='sym')
It results in x = [1.88811181, -1.88811181, -1.78321672], which is different from the solution above. Computing A.dot(x) results in [0.26666667, -0.35058274, -0.48391607]. As this is different from b, the solution seems to be incorrect.
I am wondering, if there is any problem with my code, or if my understanding of symmetric matrices or the expected result is simply wrong!? Maybe the matrix must satisfy additional constraints to be used together with assume_a='sym'?
I appreciate your answers. Thanks in advance!
In think it won't happen. I provide a short answer about it.
Non-symmetric A
import numpy as np
import scipy.linalg as linalg
A = np.array([[ 0.18666667, 0.06222222, -0.01777778],
[ 0.01777778, 0.18666667, 0.01777778],
[-0.01777778, 0.06222222, 0.18666667]])
b = np.array([0.26666667, -0.26666667, -0.4])
x = linalg.solve(A, b, assume_a='gen')
np.allclose(A # x,b)
Out:
True
Which shows the solver works well.
Symmetric A
# use you upper triangular A to get a symmetric matrix
A_symm = (np.triu(A) + np.triu(A).T -np.diag(A.diagonal()))
# solve the equations
x = linalg.solve(A_symm, b, assume_a='sym')
np.allclose(A_symm # x,b)
Out:
True
It still works.
If you pass a non-symmetric matrix A to the solver , and then specify the assume_a = 'sym', solver will only use upper triangular matrix of A, see below:
x = linalg.solve(A, b, assume_a='sym')
np.allclose(A # x,b),x
Out:
(False, array([ 1.88811181, -1.88811181, -1.78321672]))
The result shows that solver works "wrong", but the result x is the same with result of linalg.solve(A_symm, b, assume_a='sym')

python: Initial condition in solving differential equation

I want to solve this differential equation:
y′′+2y′+2y=cos(2x) with initial conditions:
y(1)=2,y′(2)=0.5
y′(1)=1,y′(2)=0.8
y(1)=0,y(2)=1
and it's code is:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def dU_dx(U, x):
return [U[1], -2*U[1] - 2*U[0] + np.cos(2*x)]
U0 = [1,0]
xs = np.linspace(0, 10, 200)
Us = odeint(dU_dx, U0, xs)
ys = Us[:,0]
plt.xlabel("x")
plt.ylabel("y")
plt.title("Damped harmonic oscillator")
plt.plot(xs,ys);
how can I fulfill it?
Your initial conditions are not, as they give values at two different points. These are all boundary conditions.
def bc1(u1,u2): return [u1[0]-2.0,u2[1]-0.5]
def bc2(u1,u2): return [u1[1]-1.0,u2[1]-0.8]
def bc3(u1,u2): return [u1[0]-0.0,u2[0]-1.0]
You need a BVP solver to solve these boundary value problems.
You can either make your own solver using the shooting method, in case 1 as
def shoot(b): return odeint(dU_dx,[2,b],[1,2])[-1,1]-0.5
b = fsolve(shoot,0)
T = linspace(1,2,N)
U = odeint(dU_dx,[2,b],T)
or use the secant method instead of scipy.optimize.fsolve, as the problem is linear this should converge in 1, at most 2 steps.
Or you can use the scipy.integrate.solve_bvp solver (which is perhaps newer than the question?). Your task is similar to the documented examples. Note that the argument order in the ODE function is switched in all other solvers, even in odeint you can give the option tfirst=True.
def dudx(x,u): return [u[1], np.cos(2*x)-2*(u[1]+u[0])]
Solutions generated with solve_bvp, the nodes are the automatically generated subdivision of the integration interval, their density tells how "non-flat" the ODE is in that region.
xplot=np.linspace(1,2,161)
for k,bc in enumerate([bc1,bc2,bc3]):
res = solve_bvp(dudx, bc, [1.0,2.0], [[0,0],[0,0]], tol=1e-5)
print res.message
l,=plt.plot(res.x,res.y[0],'x')
c = l.get_color()
plt.plot(xplot, res.sol(xplot)[0],c=c, label="%d."%(k+1))
Solutions generated using the shooting method using the initial values at x=0 as unknown parameters to then obtain the solution trajectories for the interval [0,3].
x = np.linspace(0,3,301)
for k,bc in enumerate([bc1,bc2,bc3]):
def shoot(u0): u = odeint(dudx,u0,[0,1,2],tfirst=True); return bc(u[1],u[2])
u0 = fsolve(shoot,[0,0])
u = odeint(dudx,u0,x,tfirst=True);
l, = plt.plot(x, u[:,0], label="%d."%(k+1))
c = l.get_color()
plt.plot(x[::100],u[::100,0],'x',c=c)
You can use the scipy.integrate.ode function this is similar to scipy.integrate.odeint but allows a jac parameter which is df/dy or in the case of your given ODE df/dx

Is there a Python equivalent to the mahalanobis() function in R? If not, how can I implement it?

I have the following code in R that calculates the mahalanobis distance on the Iris dataset and returns a numeric vector with 150 values, one for every observation in the dataset.
x=read.csv("Iris Data.csv")
mean<-colMeans(x)
Sx<-cov(x)
D2<-mahalanobis(x,mean,Sx)
I tried to implement the same in Python using 'scipy.spatial.distance.mahalanobis(u, v, VI)' function, but it seems this function takes only one-dimensional arrays as parameters.
I used the Iris dataset from R, I suppose it is the same you are using.
First, these is my R benchmark, for comparison:
x <- read.csv("IrisData.csv")
x <- x[,c(2,3,4,5)]
mean<-colMeans(x)
Sx<-cov(x)
D2<-mahalanobis(x,mean,Sx)
Then, in python you can use:
from scipy.spatial.distance import mahalanobis
import scipy as sp
import pandas as pd
x = pd.read_csv('IrisData.csv')
x = x.ix[:,1:]
Sx = x.cov().values
Sx = sp.linalg.inv(Sx)
mean = x.mean().values
def mahalanobisR(X,meanCol,IC):
m = []
for i in range(X.shape[0]):
m.append(mahalanobis(X.iloc[i,:],meanCol,IC) ** 2)
return(m)
mR = mahalanobisR(x,mean,Sx)
I defined a function so you can use it in other sets, (observe I use pandas DataFrames as inputs)
Comparing results:
In R
> D2[c(1,2,3,4,5)]
[1] 2.134468 2.849119 2.081339 2.452382 2.462155
In Python:
In [43]: mR[0:5]
Out[45]:
[2.1344679233248431,
2.8491186861585733,
2.0813386639577991,
2.4523816316796712,
2.4621545347140477]
Just be careful that what you get in R is the squared Mahalanobis distance.
A simpler solution would be:
from scipy.spatial.distance import cdist
x = ...
mean = x.mean(axis=0).reshape(1, -1) # make sure 2D
vi = np.linalg.inv(np.cov(x.T))
cdist(mean, x, 'mahalanobis', VI=vi)

How to minimize a multivariable function using scipy

So I have the function
f(x) = I_0(exp(Q*x/nKT)
Where Q, K and T are constants, for the sake of clarity I'll add the values
Q = 1.6x10^(-19)
K = 1.38x10^(-23)
T = 77.6
and n and I_0 are the two constraints that I'm trying to minimize.
my xdata is a list of 50 datapoints and as is my ydata. So as of yet this is my code:
from __future__ import division
import scipy.optimize as optimize
import numpy
xdata = numpy.array([1.07,1.07994,1.08752,1.09355,
1.09929,1.10536,1.10819,1.11321,
1.11692,1.12099,1.12435,1.12814,
1.13181,1.13594,1.1382,1.14147,
1.14443,1.14752,1.15023,1.15231,
1.15514,1.15763,1.15985,1.16291,1.16482])
ydata = [0.00205,
0.004136,0.006252,0.008252,0.010401,
0.012907,0.014162,0.016498,0.018328,
0.020426,0.022234,0.024363,0.026509,
0.029024,0.030457,0.032593,0.034576,
0.036725,0.038703,0.040223,0.042352,
0.044289,0.046043,0.048549,0.050146]
#data and ydata is experimental data, xdata is voltage and ydata is current
def f(x,I0,N):
# I0 = 7.85E-07
# N = 3.185413895
Q = 1.66E-19
K = 1.38065E-23
T = 77.3692
return I0*(numpy.e**((Q*x)/(N*K*T))-1)
result = optimize.curve_fit(f, xdata,ydata) #trying to minize I0 and N
But the answer doesn't give suitably optimized constraints
Any help would be hugely appreciated I realize there may be something obvious I am missing, I just can't see what it is!
I have tried this, but for some reason if you throw out those constants so function becomes
def f(x,I0,N):
return I0*(numpy.exp(x/N)-1)
you get something reasonable.
1.86901114e-13, 4.41838309e-02
Its true, that when we get rid off constants its better. Define function as:
def f(x,A,B):
return A*(np.e**(B*x)-1)
and fit it by curve_fit, you'll be able to get A that is explicitly I0 (A=I0) and B (you can obtain N simply by N=Q/(BKT) ). I managed to get pretty good fit.
I think if there is too much constants, algorithm gets confused some way.

Categories

Resources