Linear regression with leastsq() and global minimum not found - python

In Python scipy.optimize.leastsq() is normally used for non-linear regression. However, leastsq() should in principle be expected to work with linear fitting functions also. Here appears to be a simple linear regression problem that leastsq() apparently fails to solve properly. Data is fitted with the line y=mx.
Code sample is at the bottom of the post. When plot_real_data = False, then 100 points of linearly correlated data are generated randomly. Here leastsq() can effectively find the minimum of the sum-squared error function:
Graph of correct solution
However, when plot_real_data = True, then 100 data points are taken from a real data set. Here, leastsq() cannot, for some unknown reason, find the minimum of the sum-squared error function:
Graph of incorrect solution
leastsq() consistently reports an optimal gradient parameter m=1.082, regardless of the initial guess of the gradient. However m=1.082 is not the global minimum. The proper value is closer to m=1.25:
print sum(errorfunc([1.0], x, y))
3.9511006207
print sum(errorfunc([1.08], x, y))
3.59052114948
print sum(errorfunc([1.25], x, y))
3.37109033259 (near the minimum)
print sum(errorfunc([1.4], x, y))
3.79503789072
This is puzzling behaviour. In this case, the sum squared error function is a simple quadratic and there is no risk of local minima.
I know that direct methods exist for linear regression, but any ideas on this issue with leastsq()?
Python 2.7.11 :: Anaconda 4.0.0 (64-bit)
Scipy version 0.17.0
CODE:
from __future__ import division
import matplotlib.pyplot as plt
import numpy
import random
from scipy.optimize import leastsq
def errorfunc(params, x_data, y_data) :
"""
Return error at each x point, to a straight line of gradient m
This 1-parameter error function has a clearly defined minimum
"""
squared_errors = []
for i, lm in enumerate(x_data) :
predicted_um = lm * params[0]
squared_errors.append((y_data[i] - predicted_um)**2)
return squared_errors
plt.figure()
###################################################################
# STEP 1: make a scatter plot of the data
plot_real_data = True
###################################################################
if plot_real_data :
# 100 points of real data
x = [0.85772, 0.17135, 0.03401, 0.17227, 0.17595, 0.1742, 0.22454, 0.32792, 0.19036, 0.17109, 0.16936, 0.17357, 0.6841, 0.24588, 0.22913, 0.28291, 0.19845, 0.3324, 0.66254, 0.1766, 0.47927, 0.47999, 0.50301, 0.16035, 0.65964, 0.0, 0.14308, 0.11648, 0.10936, 0.1983, 0.13352, 0.12471, 0.29475, 0.25212, 0.08334, 0.07697, 0.82263, 0.28078, 0.24192, 0.25383, 0.26707, 0.26457, 0.0, 0.24843, 0.26504, 0.24486, 0.0, 0.23914, 0.76646, 0.66567, 0.62966, 0.61771, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.79157, 0.06889, 0.07669, 0.1372, 0.11681, 0.11103, 0.13577, 0.07543, 0.10636, 0.09176, 0.10941, 0.08327, 1.19903, 0.20987, 0.21103, 0.21354, 0.26011, 0.28862, 0.28441, 0.2424, 0.29196, 0.20248, 0.1887, 0.20045, 1.2041, 0.20687, 0.22448, 0.23296, 0.25434, 0.25832, 0.25722, 0.24378, 0.24035, 0.17912, 0.18058, 0.13556, 0.97535, 0.25504, 0.20418, 0.22241]
y = [1.13085, 0.19213, 0.01827, 0.20984, 0.21898, 0.12174, 0.38204, 0.31002, 0.26701, 0.2759, 0.26018, 0.24712, 1.18352, 0.29847, 0.30622, 0.5195, 0.30406, 0.30653, 1.13126, 0.24761, 0.81852, 0.79863, 0.89171, 0.19251, 1.33257, 0.0, 0.19127, 0.13966, 0.15877, 0.19266, 0.12997, 0.13133, 0.25609, 0.43468, 0.09598, 0.08923, 1.49033, 0.27278, 0.3515, 0.38368, 0.35134, 0.37048, 0.0, 0.3566, 0.36296, 0.35054, 0.0, 0.32712, 1.23759, 1.02589, 1.02413, 0.9863, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.19224, 0.12192, 0.12815, 0.2672, 0.21856, 0.14736, 0.20143, 0.1452, 0.15965, 0.14342, 0.15828, 0.12247, 0.5728, 0.10603, 0.08939, 0.09194, 0.1145, 0.10313, 0.13377, 0.09734, 0.12124, 0.11429, 0.09536, 0.11457, 0.76803, 0.10173, 0.10005, 0.10541, 0.13734, 0.12192, 0.12619, 0.11325, 0.1092, 0.11844, 0.11373, 0.07865, 1.28568, 0.25871, 0.22843, 0.26608]
else :
# 100 points of test data with noise added
x_clean = numpy.linspace(0,1.2,100)
y_clean = [ i * 1.38 for i in x_clean ]
x = [ i + random.uniform(-1 * random.uniform(0, 0.1), random.uniform(0, 0.1)) for i in x_clean ]
y = [ i + random.uniform(-1 * random.uniform(0, 0.5), random.uniform(0, 0.5)) for i in y_clean ]
plt.subplot(2,1,1)
plt.scatter(x,y); plt.xlabel('x'); plt.ylabel('y')
# STEP 2: vary gradient m of a y = mx fitting line
# plot sum squared error with respect to gradient m
# here you can see by eye, the optimal gradient of the fitting line
plt.subplot(2,1,2)
try_m = numpy.linspace(0.1,4,200)
sse = [ sum(errorfunc([m], x, y)) for m in try_m ]
plt.plot(try_m,sse); plt.xlabel('line gradient, m'); plt.ylabel('sum-squared error')
# STEP 3: use leastsq() to find optimal gradient m
params = [2] # start with initial guess of 2 for gradient
params_fitted, cov, infodict, mesg, ier = leastsq(errorfunc, params[:], args=(x, y), full_output=1)
optimal_m = params_fitted[0]
print optimal_m
# optimal gradient m should be the minimum of the error function
plt.subplot(2,1,2)
plt.plot([optimal_m,optimal_m],[0,100], 'r')
# optimal gradient m should give best fit straight line
plt.subplot(2,1,1)
plt.plot([0, 1.2],[0, 1.2 * optimal_m],'r')
plt.show()

Related

Scipy method SLSQP with vector returns in constraints and objective function

[Edited]
Recently I work with Nelson Siegel Svensson Yield Curve Model, but I ran into a situation: search best fit of paramenters Model.
Given the above, I use simple dataset represented with Period (const vector) and Yield Value (y_real vector), to calibrate the parameters b0, b1, b2, b3, t1 and t2 of original function (detailed in model objective function with x[0], x[1], x[2], x[3], x[4] and x[5] respectively) to find a minimal difference of original Yield Value vs. estimated Yield Value with the NSS Yield Curve Model using the most adjusted parameters for this purpose, so that in the end calculate interpolation and extrapolations Yields Values based on specific Periods.
Note: The constraint function defined with this premise fun(x) == 0 (type 'eq') to search a minimal difference between the y_real and result of model (in vector form), for example:
If y_real = [1.1, 1.4 ,1.3] and the result of model function is [0.2, 1.7, 3.3], then the difference result is [0.9, -0.3, -2] and its necesary to iterate again until you get approximately zero vector in result difference, for exalmple of solution vector difference is: [1.0e-22, 1.0e-21, 1.0e-25]
I deveolped trial solution with Scipy minimize least_squares (Calibrate parameters of Yield Curve Nelson Siegel Svensson) but it's a very simple form and I need more accuracy, for this some people recommended me SLSQP method of Scipy Optimize Minimize.
This is my code for search a calibrated parameters and then use in a NSS Yield Curve:
from numpy import array, append
from scipy.optimize import minimize
from math import exp as EXP
const = [30,90,180,270,365,730,1095,1460,1825,2190,2555,2920,3285,3650,4015,4380,4745,5110,5475,5840,6205,6570,6935,7300]
y_real = [3.11826,3.71463,3.74677,3.83900,4.00049,4.40666,4.52346,4.64026,4.75706,4.87386,4.99066,5.10746,5.22426,
5.34106,5.44522,5.54669,5.64816,5.74963,5.85110,5.88607,5.91162,5.93717,5.96272,5.98827]
def model(x, const, y_real):
arr = array([])
for val in const:
arr = append(arr,(x[0])+(x[1]*((1-EXP(-val/x[4]))/(val/x[4])))+(x[2]*((((1-EXP(-val/x[4]))/(val/x[4])))-(EXP(-val/x[4]))))+x[3]*((((1-EXP(-val/x[5]))/(val/x[5])))-(EXP(-val/x[5]))))
return array(y_real) - arr
def fun(x, const, y_real):
eval = model(x, const, y_real)
leval = array([0 if val < 1.0e-20 else 1 for val in eval])
return leval
con = {'type': 'eq', 'fun': fun, 'args' : (const, y_real)}
x0 = array([0.001, 0.001, 0.001, 0.001, 1.0e-10, 1.0e-10])
#bounds_ = [(0.001,8),(0.001,8),(0.001,8),(0.001,8),(1.0e-15,3),(1.0e-15,3)] bounds=bounds_
res = minimize(model, x0, method='SLSQP', constraints=[con] , args=(const, y_real))
print(res)
but I reached one result error:
in _minimize_slsqp
w = zeros(len_w)
ValueError: negative dimensions are not allowed
How I reach the solution with SLSQP method (or another best option) with a comprehensive way without this error?
Thanks in advance.
First of all, your problem is probably that your model doesn't return a scalar value.
It's highly recommended to make yourself familiar with numpy. For instance, given const and y_real are numpy arrays, you don't need loops and append to implement your model function. It can be written as follows (note that it returns the sum of the squared differences):
import numpy as np
const = np.array([ # your values here ])
y_real = np.array([ # your values here ])
def model(x, const, y_real):
arr = (
x[0] + (x[1] * ((1 - np.exp(-const / x[4])) / (const / x[4])))
+ (x[2] * ((((1 - np.exp(-const / x[4])) / (const / x[4])))
- (np.exp(-const / x[4]))))
+ x[3] * ((((1 - np.exp(-const / x[5])) / (const / x[5])))
- (np.exp(-const / x[5])))
)
return np.sum((y_real - arr)**2)
In addition, there are a few more things that do not make sense and it's still not 100% clear to me what you are trying to achieve:
Subtracting a vector of zeros from the evaluated model inside fun
Subtracting a vector of zeros from leval inside fun
Adding the constraint fun(x) >= 0. What's the purpose of this constraint? The function returns a vector consisting of zeros or ones, so this constraint is always fulfilled.
By ignoring the constraint fun (which, by the way, is not differentiable and contradicts the mathematical assumptions of the SLSQP algorithm), you can write:
from scipy.optimize import minimize
x0 = np.array([0.001, 0.001, 0.001, 0.001, 1.0e-10, 1.0e-10])
res = minimize(lambda x: model(x, const, y_real), x0, method="SLSQP")
Team
In order to this case, I worked part of functional solution using perspective and some good tips of #joni in previos answer (very kind for that), it's only a part because works approximately for first third segment of dataset, to work with rest of dataset segment can use the previous case code (Calibrate parameters of Yield Curve Nelson Siegel Svensson).
I share with you to reuse and get idea of one possible solution in two parts, please discuss if you have a more sofistified solution
from scipy.optimize import minimize
import numpy as np
from math import trunc
from nelson_siegel_svensson import NelsonSiegelSvenssonCurve
import pandas as pd
import matplotlib.pyplot as plt
# days
const = np.array([30,90,270,548,913,1278,1643,2008,2373,2738,3103,3468,3833,4198,4563,4928,5293,5658,6023,6388,6935,7300,7665,8030])/365
# empty values represented with 0
y_real = np.array([3.33156,3.44928,3.62778,3.74313,3.96015,4.384,4.4705,4.55701,4.63817,4.69949,4.76081,4.82213,4.87285,4.8681,4.86336,4.85861,4.85387,4.84912,4.87039,4.89833,4.94286,4.98739,5.03192,5.07645])
def model(x, const, y_real):
arr = np.array([])
for val in const:
arr = np.append(arr,(x[0])+(x[1]*((1-np.exp(-val/x[4]))/(val/x[4])))+(x[2]*((((1-np.exp(-val/x[4]))/(val/x[4])))-(np.exp(-val/x[4]))))+x[3]*((((1-np.exp(-val/x[5]))/(val/x[5])))-(np.exp(-val/x[5]))))
return np.sum((y_real - arr)**2)
# initial coefficients of Nelson Siegel Svensson
x0 = np.array([0.001, 0.001, 0.001, 0.001, 1.0e-10, 1.0e-10])
res = minimize(model, x0, method='SLSQP', args=(const, y_real),
options={'maxiter': 10000, 'ftol': 1e-190})
print(res.x)
X_fix = np.linspace(start=const[0], stop=const[-1], num=(const.size*20))
NSS = NelsonSiegelSvenssonCurve(beta0=res.x[0], beta1=res.x[1], beta2=res.x[2], beta3=res.x[3], tau1=res.x[4], tau2=res.x[5])
pd_interpolation = pd.DataFrame(columns=['Period','Value'])
font = {'family': 'serif',
'color': '#1F618D',
'weight': 'bold',
'size': 14,
}
font_x_y = {'family': 'serif',
'color': '#C70039',
'weight': 'bold',
'size': 13,
'style': 'oblique'
}
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10, 6))
minx = -const[1]*3
config_manager = plt.get_current_fig_manager()
config_manager.set_window_title("Visualización " )
screen_x, screen_y = config_manager.window.wm_maxsize()
anchura = str(trunc(screen_x/12))
altura = str(trunc(screen_y/8))
middle_window = "+" + anchura + "+" + altura
config_manager.window.wm_geometry(middle_window)
plt.title('Interpolación ', fontdict=font, loc='center')
plt.xlabel('Periodo', fontdict=font_x_y)
plt.ylabel('Aproximación', fontdict=font_x_y, rotation=90, labelpad=10)
ax.set_ylim(ymin=0, ymax=(np.amax(y_real)*1.1))
ax.set_xlim(xmin=minx, xmax=(np.amax(const)*1.03))
ax.plot(const, y_real, 'ro', label='Dato real')
ax.plot(X_fix, NSS(X_fix),'--', label='Dato interpolado')
ax.legend(loc='lower right', frameon=False)
plt.show()
And result graph is this:

Simulating a sub-system in a Python (non real-time) simulation environment through a transfer function

I am coding a dynamic system simulation (fixed step, non real-time; it runs on my desktop) and I want to model some of the system's components (e.g. filters...) through the tools made available by scipy.signal (e.g. dlsim). For those components, I know their representation in the classical form of transfer functions.
With scipy.signal, it is pretty easy and straightforward to simulate the output of the transfer functions "statically", that is once the time and input vectors are already known; on the other hand, I couldn't find a way to compute it within each simulation step. My simulator also include some closed-loop controllers, thus the outputs change dinamically as the simulation moves forward.
Any ideas?
PS I found this thread which seems to be quite similar, but I must admit that I do not understand the solution given by the author...: How to simulate one step to a transfer function in python
The described solution of the question How to simulate one step to a transfer function in python works as follows. You generate only two simulation steps for the inputs U (array-like) and T (also array-like). With both variables and your system you call the function scipy.signal.lsim (or scipy.signal.dsim for discrete systems) and you set also the initial values for the system states X. As result you get the output values and the new states Xn+1 which you store in the state variable for X.
In the next loop you take the last values of U and T and add the next input and time step. Now you call the lsim again but this time with the states X of the last iteration and so on.
Here some sample code of a Two-Mass-System:
(sorry, it's not beautiful but it works.)
import math
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
class TwoMassSystem:
def __init__(self):
# Source: Feiler2003
JM = 0.166 # kgm^2 moment of inertia (drive-mass)
JL = 0.333 # kgm^2 moment of inertia (load-mass)
d = 0.025 # Nms/rad damping coefficient (elastic shaft)
c = 410.0 # NM/rad stiffness (elastic shaft)
self.A = np.array([[-d/JM, -c/JM, d/JM],
[ 1.0, 0.0, -1.0],
[ d/JL, c/JL, -d/JL]] )
self.B = np.array([[ 1/JM, 0.0, 0.0],
[ 0.0, 0.0, 0.0],
[ 0.0, 0.0,-1/JL]] )
self.C = np.array([ 0.0, 0.0, 1.0 ])
self.D = np.array([ 0.0, 0.0, 0.0 ] )
self.X = np.array([ 0.0, 0.0, 0.0 ] )
self.T = np.array([ 0.0])
self.U = np.array([[0.0, 0.0, 0.0]])
self.sys1 = signal.StateSpace(self.A, self.B, self.C, self.D)
def resetStates(self):
self.X = np.array([ 0.0, 0.0, 0.0 ] )
self.T = np.array([ 0.0])
self.U = np.array([[0.0, 0.0, 0.0]])
def test_sim(self):
self.resetStates()
h = 0.1
ts = np.arange(0,10,h)
u = []
t = []
for i in ts:
uM = 1.0
if i > 1:
uL = 1.0
else:
uL = 0.0
u.append([ uM,
0.0,
uL])
t.append(i)
tout, y, x = signal.lsim(self.sys1, u, t, self.X)
return t, y
def test_step(self, uM, uL, tn):
"""
test_step(uM, uL, tn)
The call of the object instance simulates the two mass system with
the given input values for a discrete time step.
Parameters
----------
uM : float
input drive torque
uL : float
input load torque
tn : float
time step
Returns
-------
nM : float
angular velocity of the drive
nL : float
angular velocity of the load
"""
u_new = [ uM,
0.0,
uL]
self.T = np.array([self.T[-1], tn])
self.U = np.array([self.U[-1], u_new])
tout, y, x = signal.lsim(self.sys1, self.U, self.T, self.X)
# x and y contains 2 simulation points the newer one is the correct one.
self.X = x[-1] # update state
return y[-1] #
if __name__ == "__main__":
a = TwoMassSystem()
tsim, ysim = a.test_sim()
h = 0.1
ts = np.arange(h,10,h)
ys = []
a.resetStates()
for i in ts:
uM = 1.0
if i > 1:
uL = 1.0
else:
uL = 0.0
ys.append(a.test_step(uM, uL, i))
plt.plot(tsim, ysim, ts, ys)
plt.show()
However:
However, as you can see in the plot the result isn't identical and here is the problem, because I'dont know why. Therefore, I created a new question: Why does the result of a simulated step differ from the complete simulation?
Sources:
#InProceedings{Feiler2003,
author = {Feiler, M. and Westermaier, C. and Schroder, D.},
booktitle = {Proceedings of 2003 IEEE Conference on Control Applications, 2003. CCA 2003.},
title = {Adaptive speed control of a two-mass system},
year = {2003},
pages = {1112-1117 vol.2},
volume = {2},
doi = {10.1109/CCA.2003.1223166}
}

How to apply linear regression with fixed x intercept in python?

I've found quite a few examples of fitting a linear regression with zero intercept.
However, I would like to fit a linear regression with a fixed x-intercept. In other words, the regression will start at a specific x.
I have the following code for plotting.
import numpy as np
import matplotlib.pyplot as plt
xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
20.0, 40.0, 60.0, 80.0])
ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
11.073788414382639, 23.248479770546009, 32.120462301367183,
44.036117671229206, 54.009003143831116, 102.7077685684846,
185.72880217806673, 256.12183145545811, 301.97120103079675])
def best_fit_slope_and_intercept(xs, ys):
# m = xs.dot(ys)/xs.dot(xs)
m = (((np.average(xs)*np.average(ys)) - np.average(xs*ys)) /
((np.average(xs)*np.average(xs)) - np.average(xs*xs)))
b = np.average(ys) - m*np.average(xs)
return m, b
def rSquaredValue(ys_orig, ys_line):
def sqrdError(ys_orig, ys_line):
return np.sum((ys_line - ys_orig) * (ys_line - ys_orig))
yMeanLine = np.average(ys_orig)
sqrtErrorRegr = sqrdError(ys_orig, ys_line)
sqrtErrorYMean = sqrdError(ys_orig, yMeanLine)
return 1 - (sqrtErrorRegr/sqrtErrorYMean)
m, b = best_fit_slope_and_intercept(xs, ys)
regression_line = m*xs+b
r_squared = rSquaredValue(ys, regression_line)
print(r_squared)
plt.plot(xs, ys, 'bo')
# Normal best fit
plt.plot(xs, m*xs+b, 'r-')
# Zero intercept
plt.plot(xs, m*xs, 'g-')
plt.show()
And I want something like the follwing where the regression line starts at (5, 0).
Thank You. Any and all help is appreciated.
I been thinking for some time and I've found a possible workaround to the problem.
If I understood well, you want to find slope and intercept of the linear regression model with a fixed x-axis intercept.
Providing that's the case (imagine you want the x-axis intercept to take the value forced_intercept), it's as if you "moved" all the points -forced_intercept times in the x-axis, and then you forced scikit-learn to use y-axis intercept equal 0. You would then have the slope. To find the intercept just isolate b from y=ax+b and force the point (forced_intercept,0). When you do that, you get to b=-a*forced_intercept (where a is the slope). In code (notice xs reshaping):
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
20.0, 40.0, 60.0, 80.0]).reshape((-1,1)) #notice you must reshape your array or you will get a ValueError error from NumPy.
ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
11.073788414382639, 23.248479770546009, 32.120462301367183,
44.036117671229206, 54.009003143831116, 102.7077685684846,
185.72880217806673, 256.12183145545811, 301.97120103079675])
forced_intercept = 5 #as you provided in your example of (5,0)
new_xs = xs - forced_intercept #here we "move" all the points
model = LinearRegression(fit_intercept=False).fit(new_xs, ys) #force an intercept of 0
r = model.score(new_xs,ys)
a = model.coef_
b = -1 * a * forced_intercept #here we find the slope so that the line contains (forced intercept,0)
print(r,a,b)
plt.plot(xs,ys,'o')
plt.plot(xs,a*xs+b)
plt.show()
Hope this is what you were looking for.
May be this approach will be useful.
import numpy as np
import matplotlib.pyplot as plt
xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
20.0, 40.0, 60.0, 80.0])
ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
11.073788414382639, 23.248479770546009, 32.120462301367183,
44.036117671229206, 54.009003143831116, 102.7077685684846,
185.72880217806673, 256.12183145545811, 301.97120103079675])
# At first we add this anchor point to the points set.
xs = np.append(xs, [5.])
ys = np.append(ys, [0.])
# Then we prepare the coefficient matrix according docs
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html
A = np.vstack([xs, np.ones(len(xs))]).T
# Then we prepare weights for these points. And we put all weights
# equal except the last one (for added anchor point).
# In this example it's weight 1000 times larger in comparison with others.
W = np.diag(np.ones([len(xs)]))
W[-1,-1] = 1000.
# And we find least-squares solution.
m, c = np.linalg.lstsq(np.dot(W, A), np.dot(W, ys), rcond=None)[0]
plt.plot(xs, ys, 'o', label='Original data', markersize=10)
plt.plot(xs, m * xs + c, 'r', label='Fitted line')
plt.show()
If you used scikit-learn for linear regression task, it's possible to define intercept(s) using intercept_ attribute.

scipy curve_fit returns initial estimates

To fit a hyperbolic function I am trying to use the following code:
import numpy as np
from scipy.optimize import curve_fit
def hyperbola(x, s_1, s_2, o_x, o_y, c):
# x > Input x values
# s_1 > slope of line 1
# s_2 > slope of line 2
# o_x > x offset of crossing of asymptotes
# o_y > y offset of crossing of asymptotes
# c > curvature of hyperbola
b_2 = (s_1 + s_2) / 2
b_1 = (s_2 - s_1) / 2
return o_y + b_1 * (x - o_x) + b_2 * np.sqrt((x - o_x) ** 2 + c ** 2 / 4)
min_fit = np.array([-3.0, 0.0, -2.0, -10.0, 0.0])
max_fit = np.array([0.0, 3.0, 3.0, 0.0, 10.0])
guess = np.array([-2.5/3.0, 4/3.0, 1.0, -4.0, 0.5])
vars, covariance = curve_fit(f=hyperbola, xdata=n_step, ydata=n_mean, p0=guess, bounds=(min_fit, max_fit))
Where n_step and n_mean are measurement values generated earlier on. The code runs fine and gives no error message, but it only returns the initial guess with a very small change. Also, the covariance matrix contains only zeros. I tried to do the same fit with a better initial guess, but that does not have any influence.
Further, I plotted the exact same function with the initial guess as input and that gives me indeed a function which is close to the real values. Does anyone know where I make a mistake here? Or do I use the wrong function to make my fit?
The issue must lie with n_step and n_mean (which are not given in the question as currently stated); when trying to reproduce the issue with some arbitrarily chosen set of input parameters, the optimization works as expected. Let's try it out.
First, let's define some arbitrarily chosen input parameters in the given parameter space by
params = [-0.1, 2.95, -1, -5, 5]
Let's see what that looks like:
import matplotlib.pyplot as plt
xs = np.linspace(-30, 30, 100)
plt.plot(xs, hyperbola(xs, *params))
Based on this, let us define some rather crude inputs for xdata and ydata by
xdata = np.linspace(-30, 30, 10)
ydata = hyperbola(xs, *params)
With these, let us run the optimization and see if we match our given parameters:
vars, covariance = curve_fit(f=hyperbola, xdata=xdata, ydata=ydata, p0=guess, bounds=(min_fit, max_fit))
print(vars) # [-0.1 2.95 -1. -5. 5. ]
That is, the fit is perfect even though our params are rather different from our guess. In other words, if we are free to choose n_step and n_mean, then the method works as expected.
In order to try to challenge the optimization slightly, we could also try to add a bit of noise:
np.random.seed(42)
xdata = np.linspace(-30, 30, 10)
ydata = hyperbola(xdata, *params) + np.random.normal(0, 10, size=len(xdata))
vars, covariance = curve_fit(f=hyperbola, xdata=xdata, ydata=ydata, p0=guess, bounds=(min_fit, max_fit))
print(vars) # [ -1.18173287e-01 2.84522636e+00 -1.57023215e+00 -6.90851334e-12 6.14480856e-08]
plt.plot(xdata, ydata, '.')
plt.plot(xs, hyperbola(xs, *vars))
Here we note that the optimum ends up being different from both our provided params and the guess, still within the bounds provided by min_fit and max_fit, and still provided a good fit.

Scipy optimize minimize initial guess using SLSQP

For some purpose, in part of my code, I want to guess polynom 5th degree, that best fits my data and is nondecreasing in some points.
The sample code is:
import numpy as np
import scipy.optimize as optimize
def make_const(points):
constr = []
for point in points:
c = {'type' : 'ineq', 'fun' : der, 'args' : (point,)}
constr.append(c)
return constr
def der(args_pol, bod):
a, b, c, d, e, f = args_pol
return (5*a*bod**4 + 4*b*bod**3 + 3*c*bod**2 + 2*d*bod + e)
def squares(args_pol, x, y):
a, b, c, d, e, f = args_pol
return ((y-(a*x**5 + b*x**4 + c*x**3 + d*x**2 + e*x + f))**2).sum()
def ecdf(arr):
arr = np.array(arr)
F = [len(arr[arr<=t]) / len(arr) for t in arr]
return np.array(F)
pH = np.array([8,8,8,7,7,7,7,7,7,7,7,6,3,2,2,2,1])
pH = np.sort(pH)
e = ecdf(pH)
ppoints = [ 1., 2.75, 4.5, 6.25, 8. ]
constraints1 = make_const(ppoints)
p1 = optimize.minimize(squares, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
method = 'SLSQP', args = (pH, e), constraints = constraints1)
p2 = optimize.minimize(squares, [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0],
method = 'SLSQP', args = (pH, e), constraints = constraints1)
Here p1 fails to optimize, p2 terminates succesfully. In addition, if I have no constraints, thus if ppoints = [], p1 termites succesfully while p2 fails. The message if optimizations fails is always :
'Inequality constraints incompatible'
The problem is obviously in initial guess in optimize.minimize. I thougth that parameters of that guess must meet my contraints. But here, initial guess [1.0, 1.0, 1.0, 1.0, 1.0, 1.0] meets my contsraints. Can anybody please explain, where is the problem?
Yes, your initial point satisfies the constraints. But SLSQP works with linearized constraints, and is looking for a search direction that is compatible with all the linearizations (described here). Those may end up either incompatible, or poorly compatible in the sense that there is only a tiny range of directions that qualify, and the search fails to find them.
The starting point [1, 1, 1, 1, 1, 1] is not a good one. Consider that at x=8 the contribution of the leading coefficient 1 to the polynomial is 8**5, and since its gets squared in the objective function, you get about 8**10. This dwarfs the contribution of lower-order coefficients, which are nonetheless important for satisfying the constraints at points close to 0. So the algorithm is presented with a badly scaled problem when the initial point is all-ones.
Using np.zeros((6, )) as a starting point is a better idea; the search succeeds from there. Scaling the initial point as [7**(d-5) for d in range(6)] also works but just barely (replacing 7 by 6 or 8 yields another kind of error, "Positive directional derivative for linesearch").
So the summary is: the optimization problem has poor scaling, making the search difficult; and the error message is not very explicit about what actually went wrong.
Besides changing initial point, you can try providing the Jacobians of objective function and constraints (both are important, as the method works with the Lagrangian).

Categories

Resources