Inverse problem with scipy.optimize.differential_evolution

Inverse problem with scipy.optimize.differential_evolution - python

I've been doing a code trying to calculate y=ax+b with values for x and y. So I want to determine a and b.
For example, I have y=2x+1 so
xexp =[ 0., 0.1111111, 0.2222222, 0.3333333, 0.4444444, 0.5555556, 0.6666667, 0.7777778, 0.8888889, 1.] and
yexp=[1., 1.2222222, 1.4444444, 1.6666667, 1.8888889, 2.1111111, 2.3333333, 2.5555556, 2.7777778, 3.]
I need to create a loop in my code to calculate every ycalc[i] and return as my objective function the mean error of each iteration using all the given data.
from scipy.optimize import differential_evolution
import numpy as np
from matplotlib import pyplot
from scipy.optimize import LinearConstraint
from scipy.optimize import NonlinearConstraint
#2*x+1 // a*x+b --> have xexp and yexp need to calc a and b when xcalc = xexp
errototal=[]
ycalc=[]
erro=[]
#experimental values
xexp =[ 0., 0.1111111, 0.2222222, 0.3333333, 0.4444444, 0.5555556, 0.6666667, 0.7777778, 0.8888889, 1.]
yexp=[1., 1.2222222, 1.4444444, 1.6666667, 1.8888889, 2.1111111, 2.3333333, 2.5555556, 2.7777778, 3.]
#obj function
def func(xcalc):
return np.mean(sum(errototal))
bounds = [(-5, 5), (-5, 5)]
plot1=[]
def condition(xk, convergence):
for i in range(0,9):
ycalc.append(xk[0]*xexp[i]+xk[1])
# ycalc.append(xk[0]*xk[2]+xk[1])
erro.append(abs(yexp[i]-ycalc[i]))
errototal.append(erro[i])
plot1.append(func(sum(errototal)))
print(xk,errototal,ycalc, yexp)
print(sum(errototal))
result = differential_evolution(func, bounds, disp=True, callback=condition)
# line plot of best objective function values
pyplot.plot(plot1, '.-')
pyplot.xlabel('Improvement Number')
pyplot.ylabel('Evaluation f(x)')
pyplot.show()
My code stops in the first step. Any tips?

Related

Obtain masked arrays through corresponding values

I have a series of observations y_obs and p, taken at times t. I calculate a model to fit the data that is an integral function of both y_obs and p in dt, $\int f(y_obs, p) dt$. To evaluate the model outside of the defined range of observations, I create x_fit = np.linspace(min(times)-10., max(times)+10, 1e4). I now want to create a correspondent y_obs_modified and p_modified such that they have zero at correspondent x_fit times that are not in the original t (let's say, the difference between the element x_fit and t is larger than 1), but have their original values for correspondent times (let's say, the difference between the element x_fit and t is less than 1). How do I do that?
import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate as it
t=np.asarray([8418, 8422, 8424, 8426, 8428, 8430, 8540, 8542, 8650, 8654, 8656, 8658, 8660, 8662, 8664, 8666, 8668, 8670, 8672, 8674, 8768, 8770, 8772, 8774, 8776, 8778, 8780, 8782, 8784, 8786, 8788, 8790, 8792, 8794, 8883, 8884, 8886, 8888, 8890, 8890, 8892, 8894, 8896, 8898, 8904])
y_obs =np.asarray([ 0.00393986,0.00522288,0.00820794,0.01102782,0.00411525,0.00297762, 0.00463183,0.00602662,0.0114886, 0.00176694,0.01241464,0.01316199, 0.01108201, 0.01056611, 0.0107585, 0.00723887,0.0082614, 0.01239229, 0.00148118,0.00407329,0.00626722,0.01026926,0.01408419,0.02638901, 0.02284189, 0.02142943, 0.02274845, 0.01315814, 0.01155898, 0.00985705, 0.00476936,0.00130343,0.00350376,0.00463576, 0.00610933, 0.00286234, 0.00845177,0.00849791,0.0151215, 0.0151215, 0.00967625,0.00802465, 0.00291534, 0.00819779,0.00366089])
y_obs_err = np.asarray([6.12189334e-05, 6.07487598e-05, 4.66365096e-05, 4.48781264e-05, 5.55250430e-05, 6.18699105e-05, 6.35339947e-05, 6.21108524e-05, 5.55636135e-05, 7.66087180e-05, 4.34256323e-05, 3.61131000e-05, 3.30783270e-05, 2.41312040e-05, 2.85080015e-05, 2.96644612e-05, 4.58662869e-05, 5.19419065e-05, 6.00479888e-05, 6.62586953e-05, 3.64830945e-05, 2.58120956e-05, 1.83249104e-05, 1.59433858e-05, 1.33375408e-05, 1.29714326e-05, 1.26025166e-05, 1.47293107e-05, 2.17933175e-05, 2.21611713e-05, 2.42946630e-05, 3.61296843e-05, 4.23009806e-05, 7.23405476e-05, 5.59390368e-05, 4.68144974e-05, 3.44773949e-05, 2.32907036e-05, 2.23491451e-05, 2.23491451e-05, 2.92956472e-05, 3.28665479e-05, 4.41214301e-05, 4.88142073e-05, 7.19116984e-05])
p= np.asarray([ 2.82890497,3.75014266,5.89347542,7.91821558,2.95484056,2.13799544, 3.32575733,4.32724456,8.2490644, 1.26870083,8.91397925,9.45059128, 7.95712563, 7.58669608, 7.72483557,5.19766853,5.93186433,8.89793105, 1.06351782,2.92471065,4.49999613,7.37354766, 10.11275281, 18.94787684, 16.40097363, 15.38679306, 16.33387783, 9.44782842, 8.29959664,7.07757293, 3.42450524,0.93588962,2.515773,3.32857547,7.180216, 2.05522399, 6.06855409,6.1016838,10.8575614,10.8575614, 6.94775991,5.76187014, 2.09327787, 5.88619335,2.62859611])
Following OP's suggestion does not lead to desired result:
print y_obs_modified # [8418. 0. 0. ... 0. 0. 8904.]
print y_obs_modified[y_obs_modified > 0] # [8418. 8904.]

Use np.where and np.isin.
You need to use it like that:
y_obs_modified = np.where(np.isin(x_fit, t), x_fit, 0)

Improve performance of autograd jacobian

I'm wondering how the following code could be faster. At the moment, it seems unreasonably slow, and I suspect I may be using the autograd API wrong. The output I expect is each element of timeline evaluated at the jacobian of f, which I do get, but it takes a long time:
import numpy as np
from autograd import jacobian
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]))
I would expect the following:
jacobian(f) returns an function that represents the gradient vector w.r.t. the parameters.
jacobian(f)(np.array([1.0, 1.0])) is the Jacobian evaluated at the point (1, 1). To me, this should be like a vectorized numpy function, so it should execute very fast, even for 40k length arrays. However, this is not what is happening.
Even something like the following has the same poor performance:
import numpy as np
from autograd import jacobian
def f(params, t):
mu_, log_sigma_ = params
Z = t * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]), timeline)

From https://github.com/HIPS/autograd/issues/439 I gathered that there is an undocumented function autograd.make_jvp which calculates the jacobian with a fast forward mode.
The link states:
Given a function f, vectors x and v in the domain of f, make_jvp(f)(x)(v) computes both f(x) and the Jacobian of f evaluated at x, right multiplied by the vector v.
To get the full Jacobian of f you just need to write a loop to evaluate make_jvp(f)(x)(v) for each v in the standard basis of f's domain. Our reverse mode Jacobian operator works in the same way.
From your example:
import autograd.numpy as np
from autograd import make_jvp
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates (f(0), first column of jacobian)
# [0, 1] evaluates (f(0), second column of jacobian)
for basis in (np.array([1, 0]), np.array([0, 1])):
val_of_f, col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[ 1. 1.00247506 1.00495012 ... 99.99504988 99.99752494
100. ]
[ -1. -1.00247506 -1.00495012 ... -99.99504988 -99.99752494
-100. ]
This runs in ~ 0.005 seconds on google collab.
Edit:
Functions like cdf aren't defined for the regular jvp yet but you can use another undocumented function make_jvp_reversemode where it is defined. Usage is similar except that the output is only the column and not the value of the function:
import autograd.numpy as np
from autograd.scipy.stats.norm import cdf
from autograd.differential_operators import make_jvp_reversemode
def f(params):
mu_, log_sigma_ = params
Z = timeline * cdf(mu_ / log_sigma_)
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp_reversemode(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates first column of jacobian
# [0, 1] evaluates second column of jacobian
for basis in (np.array([1, 0]), np.array([0, 1])):
col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[0.05399097 0.0541246 0.05425823 ... 5.39882939 5.39896302 5.39909665]
[-0.05399097 -0.0541246 -0.05425823 ... -5.39882939 -5.39896302 -5.39909665]
Note that make_jvp_reversemode will be slightly faster than make_jvp by a constant factor due to it's use of caching.

Different values weibull pdf

I was wondering why the values of weibull pdf with the prebuilt function dweibull.pdf are more or less the half they should be
I did a test. For the same x I created the weibull pdf for A=10 and K=2 twice, one by writing myself the formula and the other one with the prebuilt function of dweibull.
import numpy as np
from scipy.stats import exponweib,dweibull
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
K=2.0
A=10.0
x=np.arange(0.,20.,1)
#own function
def weib(data,a,k):
return (k / a) * (data / a)**(k - 1) * np.exp(-(data / a)**k)
pdf1=weib(x,A,K)
print sum(pdf1)
#prebuilt function
dist=dweibull(K,1,A)
pdf2=dist.pdf(x)
print sum(pdf2)
f=plt.figure()
suba=f.add_subplot(121)
suba.plot(x,pdf1)
suba.set_title('pdf dweibull')
subb=f.add_subplot(122)
subb.plot(x,pdf2)
subb.set_title('pdf own function')
f.show()
It seems with dweibull the pdf values are the half but that this is wrong as the summation should be in total 1 and not aroung 0.5 as it is with dweibull. By writing myself the formula the summation is around 1[

scipy.stats.dweibull implements the double Weibull distribution. Its support is the real line. Your function weib corresponds to the PDF of scipy's weibull_min distribution.
Compare your function weib to weibull_min.pdf:
In [128]: from scipy.stats import weibull_min
In [129]: x = np.arange(0, 20, 1.0)
In [130]: K = 2.0
In [131]: A = 10.0
Your implementation:
In [132]: weib(x, A, K)
Out[132]:
array([ 0. , 0.019801 , 0.03843158, 0.05483587, 0.0681715 ,
0.07788008, 0.08372116, 0.0857677 , 0.08436679, 0.08007445,
0.07357589, 0.0656034 , 0.05686266, 0.04797508, 0.03944036,
0.03161977, 0.02473752, 0.01889591, 0.014099 , 0.0102797 ])
scipy.stats.weibull_min.pdf:
In [133]: weibull_min.pdf(x, K, scale=A)
Out[133]:
array([ 0. , 0.019801 , 0.03843158, 0.05483587, 0.0681715 ,
0.07788008, 0.08372116, 0.0857677 , 0.08436679, 0.08007445,
0.07357589, 0.0656034 , 0.05686266, 0.04797508, 0.03944036,
0.03161977, 0.02473752, 0.01889591, 0.014099 , 0.0102797 ])
By the way, there is a mistake in this line of your code:
dist=dweibull(K,1,A)
The order of the parameters is shape, location, scale, so you are setting the location parameter to 1. That's why the values in your second plot are shifted by one. That line should have been
dist = dweibull(K, 0, A)

How to get p-value from numpy.linalg.lstsq?

I would like to calculate the p-value of the fit I got from numpy.linalg.lstsq. Here a toy example:
import numpy as np
x = np.array([[ 58295.62187335],[ 45420.95483714],[ 3398.64920064],[ 977.22166306],[ 5515.32801851],[ 14184.57621022],[ 16027.2803392 ],[ 15313.01865824],[ 6443.2448182 ]])
y = np.array([ 143547.79123381, 22996.69597427, 2591.56411049, 661.93115277, 8826.96549102, 17735.13549851, 11629.13003263, 14438.33177173, 6997.89334741])
a, res, rank, s = np.linalg.lstsq(x, y)
from previous question (get the R^2 value from scipy.linalg.lstsq) I know got to get R², however I would also like to compute the p-value.
Thanks in advance.

you could use scipy.stats
import numpy as np
from scipy.stats import pearsonr
x = np.array([ 58295.62187335, 45420.95483714, 3398.64920064, 977.22166306, 5515.32801851, 14184.57621022, 16027.2803392 , 15313.01865824, 6443.2448182 ])
y = np.array([ 143547.79123381, 22996.69597427, 2591.56411049, 661.93115277, 8826.96549102, 17735.13549851, 11629.13003263, 14438.33177173, 6997.89334741])
r, p = pearsonr(x,y)

Flatten a numpy array

I am solving a ODE as follows:
import numpy as np
import scipy as sp
import math
from math import *
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def g(y, x):
y0 = y[0]
return x #formula##
# Initial conditions on y, y' at x=0
init = 0 #value##
# First integrate from 0 to 100
xplotval=np.linspace(4,8,4) #linspacefunction
print(xplotval)
I am getting output as:
[[ 7. ]
[ 5.76455273 ]
[ 5.41898906 ]
[ 6.49185668 ]]
I'd like to output a single dimensional array as follows:
[7., 5.76455273, 5.41898906, 6.49185668]
How can I?

Maybe you want flatten:
print(xplotval.flatten())
Unless you actually want the transposed vector, which you would get with numpy.transpose:
print(np.transpose(xplotval))

You can simply use list comprehension, something like:
oneD = [l[0] for l in xplotval]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inverse problem with scipy.optimize.differential_evolution - python

Related

Obtain masked arrays through corresponding values

Improve performance of autograd jacobian

Different values weibull pdf

How to get p-value from numpy.linalg.lstsq?

Flatten a numpy array

Categories

Resources