I am using Python 2.7. I am wondering why the optimize function of SciPy doesn't converge to the right function when the target is a high frequency sinus wave.
import numpy as np
from scipy import optimize
test_func = lambda x: 5*np.sin(15*x+3)+1
t = linspace(0,25,100000)
y_t = test_func(t)
plot(t,y_t)
fitfunc = lambda p, x: p[0]*np.sin(p[1]*x+p[2])+p[3]
errfunc = lambda p, x, y: fitfunc(p, x) - y
p0 = [max(y_t),10,2,0]
p1, success = optimize.leastsq(errfunc, p0, args=(t,y_t))
plot(t,fitfunc(p1,t))
One can clearly see that the end solution diverges from the target clearly. Am I doing something wrong ? Is the error function ill adapted here ?
Thanks for any input
Your problem is that there are a large number of local minima in your residuals function as the phase and frequency shift with respect to their true values; without really good initial guesses for the phase and frequency you will converge into one instead of falling into the much deeper, global minimum:
If you don't have any more information about the phase and frequency, you can either estimate them from a FFT of the data or rewrite your formula as
Asin(bx + phi) + d = Acos(phi)sin(bx) + Asin(phi)cos(bx) + d
which has only one nonlinear parameter (b): you can use a grid-search for b and much faster and more reliable linear least-squares fitting for the rest (a1 = Acos(phi), a2 = Asin(phi) and d).
Here's a plot of the rms residual as the frequency, b varies, showing the various minima:
Related
For this particular work, I am using scipy optimize to try find the best parameters that fit two different models at the same time.
model_func_par = lambda t, total, r0, theta: np.multiply((total/3),(1+2*r0),np.exp(-t/theta))
model_func_perp = lambda t, total, r0, theta: np.multiply((total/3),(1-r0),np.exp(-t/theta))
After this I create two error functions by subtractig the raw data, and plug it into scipy.optimize.leastsq(). As you can see I have two different equations with the same r0 and theta parameters - I have to find the parameters that fit best both equations (in theory, r0 and theta should be the same for both equations, but because of noise and experimental errors etc I am sure this won't be quite the case.
I guess I could do a separate optimization for each equation and perhaps take an average of the two results, but I wanted to see if anyone knows of a way to do one optimization for both.
Thanks in advance!
Is there any specific reason to use np.multiply? Since the typical mathematical operators are overloaded for np.ndarrays, it's more convenient to write (and read):
model1 = lambda t, total, r0, theta: (total/3) * (1+2*r0) * np.exp(-t/theta)
model2 = lambda t, total, r0, theta: (total/3) * (1-r0) * np.exp(-t/theta)
To answer your question: AFAIK this isn't possible with scipy.optimize.least_squares. However, a very simple approach would be to minimize the sum of least squares residuals
min || model1(xdata, *coeffs) - ydata ||^2 + || model2(xdata, *coeffs) - ydata ||^2
like this:
import numpy as np
from scipy.optimize import minimize
from scipy.linalg import norm
# your xdata and ydata as np.ndarrays
# xdata = np.array([...])
# ydata = np.array([...])
# the objective function to minimize
def obj(coeffs):
return norm(model1(xdata,*coeffs)-ydata)**2 + norm(model2(xdata,*coeffs)-ydata)**2
# the initial point (coefficients)
coeffs0 = np.ones(3)
# res.x contains your coefficients
res = minimize(obj, x0=coeffs0)
However, note that this isn't a complete answer. A better approach would be to formulate it as a multi-objective optimization problem. You may take a look at pymoo for this purpose.
I am asked to write an implementation of the gradient descent in python with the signature gradient(f, P0, gamma, epsilon) where f is an unknown and possibly multivariate function, P0 is the starting point for the gradient descent, gamma is the constant step and epsilon the stopping criteria.
What I find tricky is how to evaluate the gradient of f at the point P0 without knowing anything on f. I know there is numpy.gradient but I don't know how to use it in the case where I don't know the dimensions of f. Also, numpy.gradient works with samples of the function, so how to choose the right samples to compute the gradient at a point without any information on the function and the point?
I'm assuming here, So how can i choose a generic set of samples each time I need to compute the gradient at a given point? means, that the dimension of the function is fixed and can be deduced from your start point.
Consider this a demo, using scipy's approx_fprime, which is an easier to use wrapper-method for numerical-differentiation and also used in scipy's optimizers when a jacobian is needed, but not given.
Of course you can't ignore the parameter epsilon, which can make a difference depending on the data.
(This code is also ignoring optimize's args-parameter which is usually a good idea; i'm using the fact that A and b are inside the scope here; surely not best-practice)
import numpy as np
from scipy.optimize import approx_fprime, minimize
np.random.seed(1)
# Synthetic data
A = np.random.random(size=(1000, 20))
noiseless_x = np.random.random(size=20)
b = A.dot(noiseless_x) + np.random.random(size=1000) * 0.01
# Loss function
def fun(x):
return np.linalg.norm(A.dot(x) - b, 2)
# Optimize without any explicit jacobian
x0 = np.zeros(len(noiseless_x))
res = minimize(fun, x0)
print(res.message)
print(res.fun)
# Get numerical-gradient function
eps = np.sqrt(np.finfo(float).eps)
my_gradient = lambda x: approx_fprime(x, fun, eps)
# Optimize with our gradient
res = res = minimize(fun, x0, jac=my_gradient)
print(res.message)
print(res.fun)
# Eval gradient at some point
print(my_gradient(np.ones(len(noiseless_x))))
Output:
Optimization terminated successfully.
0.09272331925776327
Optimization terminated successfully.
0.09272331925776327
[15.77418041 16.43476772 15.40369129 15.79804516 15.61699104 15.52977276
15.60408688 16.29286766 16.13469887 16.29916573 15.57258797 15.75262356
16.3483305 15.40844536 16.8921814 15.18487358 15.95994091 15.45903492
16.2035532 16.68831635]
Using:
# Get numerical-gradient function with a way too big eps-value
eps = 1e-3
my_gradient = lambda x: approx_fprime(x, fun, eps)
shows that eps is a critical parameter resulting in:
Desired error not necessarily achieved due to precision loss.
0.09323354898565098
I want to fit a function with vector output using Scipy's curve_fit (or something more appropriate if available). For example, consider the following function:
import numpy as np
def fmodel(x, a, b):
return np.vstack([a*np.sin(b*x), a*x**2 - b*x, a*np.exp(b/x)])
Each component is a different function but they share the parameters I wish to fit. Ideally, I would do something like this:
x = np.linspace(1, 20, 50)
a = 0.1
b = 0.5
y = fmodel(x, a, b)
y_noisy = y + 0.2 * np.random.normal(size=y.shape)
from scipy.optimize import curve_fit
popt, pcov = curve_fit(f=fmodel, xdata=x, ydata=y_noisy, p0=[0.3, 0.1])
But curve_fit does not work with functions with vector output, and an error Result from function call is not a proper array of floats. is thrown. What I did instead is to flatten out the output like this:
def fmodel_flat(x, a, b):
return fmodel(x[0:len(x)/3], a, b).flatten()
popt, pcov = curve_fit(f=fmodel_flat, xdata=np.tile(x, 3),
ydata=y_noisy.flatten(), p0=[0.3, 0.1])
and this works. If instead of a vector function I am actually fitting several functions with different inputs as well but which share model parameters, I can concatenate both input and output.
Is there a more appropriate way to fit vector function with Scipy or perhaps some additional module? A main consideration for me is efficiency - the actual functions to fit are much more complex and fitting can take some time, so if this use of curve_fit is mangled and is leading to excessive runtimes I would like to know what I should do instead.
If I can be so blunt as to recommend my own package symfit, I think it does precisely what you need. An example on fitting with shared parameters can be found in the docs.
Your specific problem stated above would become:
from symfit import variables, parameters, Model, Fit, sin, exp
x, y_1, y_2, y_3 = variables('x, y_1, y_2, y_3')
a, b = parameters('a, b')
a.value = 0.3
b.value = 0.1
model = Model({
y_1: a * sin(b * x),
y_2: a * x**2 - b * x,
y_3: a * exp(b / x),
})
xdata = np.linspace(1, 20, 50)
ydata = model(x=xdata, a=0.1, b=0.5)
y_noisy = ydata + 0.2 * np.random.normal(size=(len(model), len(xdata)))
fit = Fit(model, x=xdata, y_1=y_noisy[0], y_2=y_noisy[1], y_3=y_noisy[2])
fit_result = fit.execute()
Check out the docs for more!
I think what you're doing is perfectly fine from an efficiency stand point. I'll try to look at the implementation and come up with something more quantitative, but for the time being here is my reasoning.
What you're doing during curve fitting is optimizing the parameters (a,b) such that
res = sum_i |f(x_i; a,b)-y_i|^2
is minimal. By this I mean that you have data points (x_i,y_i) of arbitrary dimensionality, two parameters (a,b) and a fitting model that approximates the data at query points x_i.
The curve fitting algorithm starts from a starting (a,b) pair, puts this into a black box that computes the above square error, and tries to come up with a new (a',b') pair that produces a smaller error. My point is that the error above is really a black box for the fitting algorithm: the configurational space of the fitting is defined merely by the (a,b) parameters. If you imagine how you'd implement a simple curve fitting function, you could imagine that you try to do, say, a gradient descent, with the square error as cost function.
Now, it should be irrelevant to the fitting procedure how the black box computes the error. It's easy to see that the dimensionality of x_i is really irrelevant for scalar functions, since it doesn't matter if you have 1000 1d query points to fit for, or a 10x10x10 grid in 3d space. What matters is that you have 1000 points x_i for which you need to compute f(x_i) ~ y_i from the model.
The only subtlety that should further be noted is that in case of a vector-valued function, the calculation of the error is not trivial. In my opinion, it's fine to define the error at each x_i point using the 2-norm of the vector-valued function. But hey: in this case, the square error at point x_i is
|f(x_i; a,b)-y_i|^2 == sum_k (f(x_i; a,b)[k]-y_i[k])^2
which implies that the square error for each component is accumulated. This just means that what you're doing right now is just right: by replicating your x_i points and taking into account each component of the function individually, your square error will contain exactly the 2-norm of the error at each point.
So my point is what you're doing is mathematically correct, and I don't expect any behaviour of the fitting procedure to depend on the way how multivariate/vector-valued functions are handled.
I am trying to find the x0parameter which fits as much as possible the blue model on the green curve (x0control the width of the crenel; see below).
Here is my attempt:
from pylab import *
from scipy.optimize import curve_fit
x=linspace(0,2*pi,1000)
def crenel(x):return sign(sin(x))
def inverter(x,x0): return (crenel(x-x0)+crenel(x+x0))/2
p,e = curve_fit(inverter,x,sin(x),1)
plot(x,inverter(x,*p),x,sin(x))
ylim(-1.5,1.5)
By hand, the optimal value is x0 = arcsin(1/2) # 0.523598, but curve_fit doesn't estimate any value ( "OptimizeWarning: Covariance of the parameters could not be estimated") . I suspect the stiffness of the model. The docs inform :
The algorithm uses the Levenberg-Marquardt algorithm through leastsq. Additional keyword arguments are passed directly to that algorithm.
So my question is : Is there keyword arguments that can help curve_fit to estimate the parameter in this case ? or another approach ?
Thanks for any advice.
The problem is that the objective function that curve_fit tries to minimize is not continuous. x0 controls the location of the discontinuities in the inverter function. When a discontinuity crosses one of the grid points in x, there is a jump in the objective function. Between these points, the objective function is constant. curve_fit (actually, leastsq, the function used by curve_fit) is not designed to handle such a function.
The following function sse is (in effect) the function that curve_fit tries to minimize, with x being the same x defined in your example, and y = sin(x):
def sse(x0, x, y):
f = inverter(x, x0)
diff = y - f
s = (diff**2).sum()
return s
If you plot this function on a fine grid with code such as
xx = np.linspace(0, 1, 10000)
yy = [sse(x0, x, y) for x0 in xx]
plot(xx, yy)
and zoom in, you'll see
To use scipy to find your optimal value, you can use fmin with a smooth objective function. For example, here's the continuous objective function, using only the interval [0, pi/2] (quad is scipy.integrate.quad):
def func(x0):
s0, e0 = quad(lambda x: np.sin(x)**2, 0, x0)
s1, e0 = quad(lambda x: (1 - np.sin(x))**2, x0, 0.5*np.pi)
return s0 + s1
scipy.optimize.fmin can be used to find the minimum of that function, as in this snippet from an ipython session:
In [202]: fmin(func, 0.3, xtol=1e-8)
Optimization terminated successfully.
Current function value: 0.100545
Iterations: 28
Function evaluations: 56
Out[202]: array([ 0.52359878])
In [203]: np.arcsin(0.5)
Out[203]: 0.52359877559829882
I have some data that follow a sigmoid distribution as you can see in the following image:
After normalizing and scaling my data, I have adjusted the curve at the bottom using scipy.optimize.curve_fit and some initial parameters:
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0 = [0.05, 0.05, 0.05])
>>> print popt
[ 2.82019932e+02 -1.90996563e-01 5.00000000e-02]
So popt, according to the documentation, returns *"Optimal values for the parameters so that the sum of the squared error of f(xdata, popt) - ydata is minimized". I understand here that there is no calculation of the slope with curve_fit, because I do not think the slope of this gentle curve is 282, neither is negative.
Then I tried with scipy.optimize.leastsq, because the documentation says it returns "The solution (or the result of the last iteration for an unsuccessful call).", so I thought the slope would be returned. Like this:
p, cov, infodict, mesg, ier = leastsq(residuals, p_guess, args = (nxdata, nydata), full_output=True)
>>> print p
Param(x0=281.73193626250207, y0=-0.012731420027056234, c=1.0069006606656596, k=0.18836680131910222)
But again, I did not get what I expected. curve_fit and leastsq returned almost the same values, with is not surprising I guess, as curve_fit is using an implementation of the least squares method within to find the curve. But no slope back...unless I overlooked something.
So, how to calculate the slope in a point, say, where X = 285 and Y = 0.5?
I am trying to avoid manual methods, like calculating the derivative in, say, (285.5, 0.55) and (284.5, 0.45) and subtract and divide results and so. I would like to know if there is a more automatic method for this.
Thank you all!
EDIT #1
This is my "sigmoid_function", used by curve_fit and leastsq methods:
def sigmoid_function(xdata, x0, k, p0): # p0 not used anymore, only its components (x0, k)
# This function is called by two different methods: curve_fit and leastsq,
# this last one through function "residuals". I don't know if it makes sense
# to use a single function for two (somewhat similar) methods, but there
# it goes.
# p0:
# + Is the initial parameter for scipy.optimize.curve_fit.
# + For residuals calculation is left empty
# + It is initialized to [0.05, 0.05, 0.05]
# x0:
# + Is the convergence parameter in X-axis and also the shift
# + It starts with 0.05 and ends up being around ~282 (days in a year)
# k:
# + Set up either by curve_fit or leastsq
# + In least squares it is initially fixed at 0.5 and in curve_fit
# + to 0.05. Why? Just did this approach in two different ways and
# + it seems it is working.
# + But honestly, I have no clue on what it represents
# xdata:
# + Positions in X-axis. In this case from 240 to 365
# Finally I changed those parameters as suggested in the answer.
# Sigmoid curve has 2 degrees of freedom, therefore, the initial
# guess only needs to be this size. In this case, p0 = [282, 0.5]
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
def residuals(p_guess, xdata, ydata):
# For the residuals calculation, there is no need of setting up the initial parameters
# After fixing the initial guess and sigmoid_function header, remove []
# return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
I am sorry if I made mistakes while describing the parameters or confused technical terms. I am very new with numpy and I have not studied maths for years, so I am catching up again.
So, again, what is your advice to calculate the slope of X = 285, Y = 0.5 (more or less the midpoint) for this dataset? Thanks!!
EDIT #2
Thanks to Oliver W., I updated my code as he suggested and understood a bit better the problem.
There is a final detail I do not fully get. Apparently, curve_fit returns a popt array (x0, k) with the optimum parameters for the fitting:
x0 seems to be how shifted is the curve by indicating the central point of the curve
k parameter is the slope when y = 0.5, also in the center of the curve (I think!)
Why if the sigmoid function is a growing one, the derivative/slope in popt is negative? Does it make sense?
I used sigmoid_derivative to calculate the slope and, yes, I obtained the same results that popt but with positive sign.
# Year 2003, 2005, 2007. Slope in midpoint.
k = [-0.1910, -0.2545, -0.2259] # Values coming from popt
slope = [0.1910, 0.2545, 0.2259] # Values coming from sigmoid_derivative function
I know this is being a bit peaky because I could use both. The relevant data is in there but with negative sign, but I was wondering why is this happening.
So, the calculation of the derivative function as you suggested, is only required if I need to know the slope in other points than y = 0.5. Only for midpoint, I can use popt.
Thanks for your help, it saved me a lot of time. :-)
You're never using the parameter p0 you're passing to your sigmoid function. Hence, curve fitting will not have any good measure to find convergence, because it can take any value for this parameter. You should first rewrite your sigmoid function like this:
def sigmoid_function(xdata, x0, k):
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
This means your model (the sigmoid) has only two degrees of freedom. This will be returned in popt:
initial_guess = [282, 1] # (x0, k): at x0, the sigmoid reaches 50%, k is slope related
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0=initial_guess)
Now popt will be a tuple (or array of 2 values), being the best possible x0 and k.
To get the slope of this function at any point, to be honest, I would just calculate the derivative symbolically as the sigmoid is not such a hard function. You will end up with:
def sigmoid_derivative(x, x0, k):
f = np.exp(-k*(x-x0))
return -k / f
If you have the results from your curve fitting stored in popt, you could pass this easily to this function:
print(sigmoid_derivative(285, *popt))
which will return for you the derivative at x=285. But, because you ask specifically for the midpoint, so when x==x0 and y==.5, you'll see (from the sigmoid_derivative) that the derivative there is just -k, which can be observed immediately from the curve_fit output you've already obtained. In the output you've shown, that's about 0.19.