I currently want to fit data with errors in x and y and im using the scipy.odr package to get my results. Im just wondering about the correct use of errors sx and sy. So here is an example.
Lets assume im measuring a voltage V for different currents I.
So I have measured 1V with an error of +- 0.1V and so on. So if I'm assuming no error in current measurement I could use spipy.curve_fit as follows. Absolute_sigma is set True because my absolute error of +- 0.1V.
So I'm getting:
from scipy.optimize import curve_fit
y=[1,2.5,3,4]
x=[1,2,3,4]
yerr=[0.1,0.1,0.1,0.1]
def func(x, a, b):
return a*x+b
popt, pcov = curve_fit(func, x, y,sigma=yerr,absolute_sigma=True)
print(popt)
print(np.sqrt(np.diag(pcov)))
[0.95 0.25]
[0.04472136 0.12247449]
In a second step I want to use the odr-package with errors in both, current and voltage.
According to the documentation it should be used as follows: sx and sy are the errors for my measurement data.
So I should assume to get similar results to curve_fit if I'm using a very small error for sx.
from scipy.odr import *
x_err = [0.0000000001]*x.__len__()
y_err = yerr
def linear_func(p, x):
m, c = p
return m*x+c
linear_model = Model(linear_func)
data = RealData(x, y, sx=x_err, sy=y_err)
odr = ODR(data, linear_model, beta0=[0.4, 0.4])
out = odr.run()
out.pprint()
Beta: [0.94999996 0.24999994]
Beta Std Error: [0.13228754 0.36228459]
But as you can see, the result is different from the curve_fit above with absolute_sigma=True.
Using the same data with curve_fit and absolute_sigma=False leads to the same results as the ODR-Fit .
popt, pcov = curve_fit(func, x, y,sigma=yerr,absolute_sigma=False)
print(popt)
print(np.sqrt(np.diag(pcov)))
[0.95 0.25]
[0.13228756 0.36228442]
So I guess the ODR-Fit is not really taking care of my absolute errors as curve_fit and absolute_sigma=True does. Is there any way to do that or am I missing something?
The option absolute_sigma=True in curve_fit() gives out the real covariance matrix in that sense that np.sqrt(np.diag(pcov)) yields the 1-sigma standard deviation as error as defined in, e.g., Numerical Recipes, i.e., it could have a unit, like meter or so. A very helpful summary comes with kmpfit
ODR, however, gives out the standard error derived from a scaled covariance matrix. See here, How to compute standard error from ODR results? or the example below.
This scaling by ODR is performed such that a reduced chi2 - calculated with the input weights being scaled in the same manner - yields approx 1. Or from the curve_fit docs:
This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity.
The remaining question is now: What does sd_beta really mean?
It's the standard error, see e.g., Standard error of mean versus standard deviation
(there may exist conditions where the magnitude of both are the same. See comments below)
see also the preceding discussion here: https://mail.python.org/pipermail/scipy-user/2013-February/034196.html
Now, via scaling pcov with the reduced chi2 one obtains for the parameters errors the same output of a) curve_fit(..., absolute_sigma=False) and b) ODR which are a priory relative errors:
# calculate chi2
residuals = (y - func(x, *popt))
chi_arr = residuals / yerr
chi2_red = (chi_arr**2).sum() / (len(x)-len(popt))
print('red. chi2:\t\t\t\t', chi2_red)
print('np.sqrt(np.diag(pcov) * chi2_red):\t', np.sqrt(np.diag(pcov) * chi2_red))
yielding:
red. chi2: 8.75
np.sqrt(np.diag(pcov) * chi2_red): [0.13228756 0.36228442]
Note, that, however, for curve_fit(..., absolute_sigma=True) and ODR the covariance matrices pcov from curve_fitand cov_beta from the ODR output are still the same. Here, the disadvantage of back- and forth rescaling becomes perceivable!
The relative error now of course implies, that if the input errors are scaled, the magnitude of the relative errors remains:
# scaled uncertainty
yerr = np.asfarray([0.1, 0.1, 0.1, 0.1]) * 2
popt, pcov = curve_fit(func, x, y, sigma=yerr, absolute_sigma=False)
print('popt:\t\t\t', popt)
print('np.sqrt(np.diag(pcov)):\t', np.sqrt(np.diag(pcov)))
with the same output of:
popt: [0.95 0.25]
np.sqrt(np.diag(pcov)): [0.13228756 0.36228442]
Related
I'm currently comparing several traditional methods for selecting an exponential trend in a series of points. Trends are often selected in my industry without concern for measures of fit, and this means that a common method is to simply measure yr/yr change and average these results over a period of time. This produces a factor, but doesn't produce a fit, so it's difficult to compare it to an exponential regression or other approach. So my question:
If I have a pre-selected, fixed trend factor for an exponential curve, is there a simple method for optimizing the 'intercept' value which would minimize the squared error of the overall fit to a set of data? Consider the following:
import numpy as np
from sklearn.metrics import r2_score
from scipy.optimize import curve_fit
#Define exponential function
def ex(x,a,b):
return a*b**x
#Seed data with normally distributed error
x=np.linspace(1,20,20)
np.random.seed(100)
y=ex(x,100,1.01)+3*np.random.randn(20)
#Exponential regression for trend value, fitted values, and r_sq measure
popt, pcov = curve_fit(ex, x, y)
trend,fit,r_sq=(popt[1])-1, ex(x,*popt), r2_score(y,ex(x,*popt))
#Mean Yr/Yr change as an alternative measure of trend
trend_yryr=np.mean(np.diff(y)/y[1:])
print(trend)
print(trend_yryr)
The mean year/year change produces a different trend value for the data, and I'd like to compare it to the exponential regression's selected trend. My goal is to find the intercept which would minimize squared error for this alternative trend value over the data, and measure that squared error for comparison. Thanks for the help.
One way of fixing a parameter when using curve_fit is to pass a lambda function that hardcodes the parameters you want to fix. Here's how I did it:
# ... all your preamble, but importing matplotlib
new_b = trend_yryr + 1
popt2, pcov2 = curve_fit(lambda x, a: ex(x, a, new_b), x, y)
fit2, r_sq2 = ex(x, *popt2, new_b), r2_score(y, ex(x, *popt2, new_b))
popt is array([100.24989416, 1.00975864]) and popt2 is array([99.09513612]). Plotting the results gives me this:
You could use lmfit to do this perhaps a bit more elegantly, but it's essentially up to you.
from lmfit import Parameters, minimize
def residual(params, x, y):
a = params['a']
b = params['b']
model = ex(x, a, b)
return model - y
p1 = Parameters()
p1.add('a', value=1, vary=True)
p1.add('b', value=1, vary=True)
p2 = Parameters()
p2.add('a', value=1, vary=True)
p2.add('b', value=np.mean(np.diff(y)/y[1:]) + 1, vary=False) # important
out1 = minimize(residual, p1, args=(x, y))
out2 = minimize(residual, p2, args=(x, y))
The outputs of both methods are essentially the same so I won't post them again
I am having a hard time trying to understand why my Gaussian fit to a set of data (ydata) does not work well if I shift the interval of x-values corresponding to that data (xdata1 to xdata2). The Gaussian is written as:
where A is just an amplitude factor. Changing some of the values of the data, it is easy to make it work for both cases, but one can also easily find cases in which it does not work well for xdata1 and also in which covariance of the parameters is not estimated.
I am using scipy.optimize.curve_fit in Spyder with Python 3.7.1 on Windows 7.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xdata1 = np.linspace(-9,4,20, endpoint=True) # works fine
xdata2 = xdata1+2
ydata = np.array([8,9,15,12,14,20,24,40,54,94,160,290,400,420,300,130,40,10,8,4])
def gaussian(x, amp, mean, sigma):
return amp*np.exp(-(((x-mean)**2)/(2*sigma**2)))/(sigma*np.sqrt(2*np.pi))
popt1, pcov1 = curve_fit(gaussian, xdata1, ydata)
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata)
fig, ([ax1, ax2]) = plt.subplots(nrows=1, ncols=2,figsize=(9, 4))
ax1.plot(xdata1, ydata, 'b+:', label='xdata1')
ax1.plot(xdata1, gaussian(xdata1, *popt1), 'r-', label='fit')
ax1.legend()
ax2.plot(xdata2, ydata, 'b+:', label='xdata2')
ax2.plot(xdata2, gaussian(xdata2, *popt2), 'r-', label='fit')
ax2.legend()
The problem is your second attempt at fitting a gaussian is getting stuck in a local minimum while searching parameter space: curve_fit is a wrapper for least_squares which uses gradient descent to minimize the cost function and this is liable to get stuck in local minima.
You should try providing reasonable starting parameters (by using the p0 argument of curve_fit) to avoid this:
#... your code
y_max = np.max(y_data)
max_pos = ydata[ydata==y_max][0]
initial_guess = [y_max, max_pos, 1] # amplitude, mean, std
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata, p0=initial_guess)
Which as you can see provides a reasonable fit:
You should write a function which can provide reasonable estimates of the starting parameters. Here I just found the maximum y value and used this to determine the initial parameters. I've found this works well for the fitting normal distributions but you could consider other methods.
Edit:
You can also solve the problem by scaling the amplitude: the amplitude is so large the parameter space is distorted and the gradient descent simply follows the direction of greatest change in the amplitude and effectively ignores the sigma. Consider the following plot in parameter space (Colour is the sum of the squared residuals of the fit for given parameters and the white cross shows the optimal solution):
Make sure to make note of the different scales for the x and y axis.
One needs to make a large number of 'unit' sized steps in y (amplitude) to get to the minimum from the point x,y = (0,0), where as you only need less than one 'unit' sized step to get to the minimum in x (sigma). The algorithm simply takes steps in amplitude as this is the steepest gradient. When it gets to the amplitude which minimises the cost function it simply stops the algorithm as it appears to have converged and makes little or no changes in the sigma parameter.
One way to fix this is to scale your ydata to un-distort the parameter space: divide your ydata by 100 and you will see your fit works without providing any starting parameters!
I want to fit a function with vector output using Scipy's curve_fit (or something more appropriate if available). For example, consider the following function:
import numpy as np
def fmodel(x, a, b):
return np.vstack([a*np.sin(b*x), a*x**2 - b*x, a*np.exp(b/x)])
Each component is a different function but they share the parameters I wish to fit. Ideally, I would do something like this:
x = np.linspace(1, 20, 50)
a = 0.1
b = 0.5
y = fmodel(x, a, b)
y_noisy = y + 0.2 * np.random.normal(size=y.shape)
from scipy.optimize import curve_fit
popt, pcov = curve_fit(f=fmodel, xdata=x, ydata=y_noisy, p0=[0.3, 0.1])
But curve_fit does not work with functions with vector output, and an error Result from function call is not a proper array of floats. is thrown. What I did instead is to flatten out the output like this:
def fmodel_flat(x, a, b):
return fmodel(x[0:len(x)/3], a, b).flatten()
popt, pcov = curve_fit(f=fmodel_flat, xdata=np.tile(x, 3),
ydata=y_noisy.flatten(), p0=[0.3, 0.1])
and this works. If instead of a vector function I am actually fitting several functions with different inputs as well but which share model parameters, I can concatenate both input and output.
Is there a more appropriate way to fit vector function with Scipy or perhaps some additional module? A main consideration for me is efficiency - the actual functions to fit are much more complex and fitting can take some time, so if this use of curve_fit is mangled and is leading to excessive runtimes I would like to know what I should do instead.
If I can be so blunt as to recommend my own package symfit, I think it does precisely what you need. An example on fitting with shared parameters can be found in the docs.
Your specific problem stated above would become:
from symfit import variables, parameters, Model, Fit, sin, exp
x, y_1, y_2, y_3 = variables('x, y_1, y_2, y_3')
a, b = parameters('a, b')
a.value = 0.3
b.value = 0.1
model = Model({
y_1: a * sin(b * x),
y_2: a * x**2 - b * x,
y_3: a * exp(b / x),
})
xdata = np.linspace(1, 20, 50)
ydata = model(x=xdata, a=0.1, b=0.5)
y_noisy = ydata + 0.2 * np.random.normal(size=(len(model), len(xdata)))
fit = Fit(model, x=xdata, y_1=y_noisy[0], y_2=y_noisy[1], y_3=y_noisy[2])
fit_result = fit.execute()
Check out the docs for more!
I think what you're doing is perfectly fine from an efficiency stand point. I'll try to look at the implementation and come up with something more quantitative, but for the time being here is my reasoning.
What you're doing during curve fitting is optimizing the parameters (a,b) such that
res = sum_i |f(x_i; a,b)-y_i|^2
is minimal. By this I mean that you have data points (x_i,y_i) of arbitrary dimensionality, two parameters (a,b) and a fitting model that approximates the data at query points x_i.
The curve fitting algorithm starts from a starting (a,b) pair, puts this into a black box that computes the above square error, and tries to come up with a new (a',b') pair that produces a smaller error. My point is that the error above is really a black box for the fitting algorithm: the configurational space of the fitting is defined merely by the (a,b) parameters. If you imagine how you'd implement a simple curve fitting function, you could imagine that you try to do, say, a gradient descent, with the square error as cost function.
Now, it should be irrelevant to the fitting procedure how the black box computes the error. It's easy to see that the dimensionality of x_i is really irrelevant for scalar functions, since it doesn't matter if you have 1000 1d query points to fit for, or a 10x10x10 grid in 3d space. What matters is that you have 1000 points x_i for which you need to compute f(x_i) ~ y_i from the model.
The only subtlety that should further be noted is that in case of a vector-valued function, the calculation of the error is not trivial. In my opinion, it's fine to define the error at each x_i point using the 2-norm of the vector-valued function. But hey: in this case, the square error at point x_i is
|f(x_i; a,b)-y_i|^2 == sum_k (f(x_i; a,b)[k]-y_i[k])^2
which implies that the square error for each component is accumulated. This just means that what you're doing right now is just right: by replicating your x_i points and taking into account each component of the function individually, your square error will contain exactly the 2-norm of the error at each point.
So my point is what you're doing is mathematically correct, and I don't expect any behaviour of the fitting procedure to depend on the way how multivariate/vector-valued functions are handled.
I have some data that follow a sigmoid distribution as you can see in the following image:
After normalizing and scaling my data, I have adjusted the curve at the bottom using scipy.optimize.curve_fit and some initial parameters:
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0 = [0.05, 0.05, 0.05])
>>> print popt
[ 2.82019932e+02 -1.90996563e-01 5.00000000e-02]
So popt, according to the documentation, returns *"Optimal values for the parameters so that the sum of the squared error of f(xdata, popt) - ydata is minimized". I understand here that there is no calculation of the slope with curve_fit, because I do not think the slope of this gentle curve is 282, neither is negative.
Then I tried with scipy.optimize.leastsq, because the documentation says it returns "The solution (or the result of the last iteration for an unsuccessful call).", so I thought the slope would be returned. Like this:
p, cov, infodict, mesg, ier = leastsq(residuals, p_guess, args = (nxdata, nydata), full_output=True)
>>> print p
Param(x0=281.73193626250207, y0=-0.012731420027056234, c=1.0069006606656596, k=0.18836680131910222)
But again, I did not get what I expected. curve_fit and leastsq returned almost the same values, with is not surprising I guess, as curve_fit is using an implementation of the least squares method within to find the curve. But no slope back...unless I overlooked something.
So, how to calculate the slope in a point, say, where X = 285 and Y = 0.5?
I am trying to avoid manual methods, like calculating the derivative in, say, (285.5, 0.55) and (284.5, 0.45) and subtract and divide results and so. I would like to know if there is a more automatic method for this.
Thank you all!
EDIT #1
This is my "sigmoid_function", used by curve_fit and leastsq methods:
def sigmoid_function(xdata, x0, k, p0): # p0 not used anymore, only its components (x0, k)
# This function is called by two different methods: curve_fit and leastsq,
# this last one through function "residuals". I don't know if it makes sense
# to use a single function for two (somewhat similar) methods, but there
# it goes.
# p0:
# + Is the initial parameter for scipy.optimize.curve_fit.
# + For residuals calculation is left empty
# + It is initialized to [0.05, 0.05, 0.05]
# x0:
# + Is the convergence parameter in X-axis and also the shift
# + It starts with 0.05 and ends up being around ~282 (days in a year)
# k:
# + Set up either by curve_fit or leastsq
# + In least squares it is initially fixed at 0.5 and in curve_fit
# + to 0.05. Why? Just did this approach in two different ways and
# + it seems it is working.
# + But honestly, I have no clue on what it represents
# xdata:
# + Positions in X-axis. In this case from 240 to 365
# Finally I changed those parameters as suggested in the answer.
# Sigmoid curve has 2 degrees of freedom, therefore, the initial
# guess only needs to be this size. In this case, p0 = [282, 0.5]
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
def residuals(p_guess, xdata, ydata):
# For the residuals calculation, there is no need of setting up the initial parameters
# After fixing the initial guess and sigmoid_function header, remove []
# return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
I am sorry if I made mistakes while describing the parameters or confused technical terms. I am very new with numpy and I have not studied maths for years, so I am catching up again.
So, again, what is your advice to calculate the slope of X = 285, Y = 0.5 (more or less the midpoint) for this dataset? Thanks!!
EDIT #2
Thanks to Oliver W., I updated my code as he suggested and understood a bit better the problem.
There is a final detail I do not fully get. Apparently, curve_fit returns a popt array (x0, k) with the optimum parameters for the fitting:
x0 seems to be how shifted is the curve by indicating the central point of the curve
k parameter is the slope when y = 0.5, also in the center of the curve (I think!)
Why if the sigmoid function is a growing one, the derivative/slope in popt is negative? Does it make sense?
I used sigmoid_derivative to calculate the slope and, yes, I obtained the same results that popt but with positive sign.
# Year 2003, 2005, 2007. Slope in midpoint.
k = [-0.1910, -0.2545, -0.2259] # Values coming from popt
slope = [0.1910, 0.2545, 0.2259] # Values coming from sigmoid_derivative function
I know this is being a bit peaky because I could use both. The relevant data is in there but with negative sign, but I was wondering why is this happening.
So, the calculation of the derivative function as you suggested, is only required if I need to know the slope in other points than y = 0.5. Only for midpoint, I can use popt.
Thanks for your help, it saved me a lot of time. :-)
You're never using the parameter p0 you're passing to your sigmoid function. Hence, curve fitting will not have any good measure to find convergence, because it can take any value for this parameter. You should first rewrite your sigmoid function like this:
def sigmoid_function(xdata, x0, k):
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
This means your model (the sigmoid) has only two degrees of freedom. This will be returned in popt:
initial_guess = [282, 1] # (x0, k): at x0, the sigmoid reaches 50%, k is slope related
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0=initial_guess)
Now popt will be a tuple (or array of 2 values), being the best possible x0 and k.
To get the slope of this function at any point, to be honest, I would just calculate the derivative symbolically as the sigmoid is not such a hard function. You will end up with:
def sigmoid_derivative(x, x0, k):
f = np.exp(-k*(x-x0))
return -k / f
If you have the results from your curve fitting stored in popt, you could pass this easily to this function:
print(sigmoid_derivative(285, *popt))
which will return for you the derivative at x=285. But, because you ask specifically for the midpoint, so when x==x0 and y==.5, you'll see (from the sigmoid_derivative) that the derivative there is just -k, which can be observed immediately from the curve_fit output you've already obtained. In the output you've shown, that's about 0.19.
I have a data set which in theory is described by a polynomial of the second degree. I would like to fit this data and I have used numpy.polyfit to do this. However, the down side is that the error on the returned coefficients is not available. Therefore I decided to also fit the data using scipy.odr. The weird thing was that the coefficients for the polynomial deviated from each other.
I do not understand this and therefore decided to test both fitting routines on a set of data that I produce my self:
import numpy
import scipy.odr
import matplotlib.pyplot as plt
x = numpy.arange(-20, 20, 0.1)
y = 1.8 * x**2 -2.1 * x + 0.6 + numpy.random.normal(scale = 100, size = len(x))
#Define function for scipy.odr
def fit_func(p, t):
return p[0] * t**2 + p[1] * t + p[2]
#Fit the data using numpy.polyfit
fit_np = numpy.polyfit(x, y, 2)
#Fit the data using scipy.odr
Model = scipy.odr.Model(fit_func)
Data = scipy.odr.RealData(x, y)
Odr = scipy.odr.ODR(Data, Model, [1.5, -2, 1], maxit = 10000)
output = Odr.run()
#output.pprint()
beta = output.beta
betastd = output.sd_beta
print "poly", fit_np
print "ODR", beta
plt.plot(x, y, "bo")
plt.plot(x, numpy.polyval(fit_np, x), "r--", lw = 2)
plt.plot(x, fit_func(beta, x), "g--", lw = 2)
plt.tight_layout()
plt.show()
An example of an outcome is as follows:
poly [ 1.77992643 -2.42753714 3.86331152]
ODR [ 3.8161735 -23.08952492 -146.76214989]
In the included image, the solution of numpy.polyfit (red dashed line) corresponds pretty well. The solution of scipy.odr (green dashed line) is basically completely off. I do have to note that the difference between numpy.polyfit and scipy.odr was less in the actual data set I wanted to fit. However, I do not understand where the difference between the two comes from, why in my own testing example the difference is extremely big, and which fitting routine is better?
I hope you can provide answers that might help me give a better understanding between the two fitting routines and in the process provide answers to the questions I have.
In the way you are using ODR it does a full orthogonal distance regression. To have it do a normal nonlinear least squares fit add
Odr.set_job(fit_type=2)
before starting the optimization and you will get what you expected.
The reason that the full ODR fails so badly is due to not specifying weights/standard deviations. Obviously it does hard to interpret that point cloud and assumes equal wheights for x and y. If you provide estimated standard deviations, odr will yield a good (though different of course) result, too.
Data = scipy.odr.RealData(x, y, sx=0.1, sy=10)
The actual problem is that the odr output has the beta coefficients in the opposite order than numpy.polyfit has. So the green curve is not calculated correctly. To plot it, use instead
plt.plot(x, fit_func(beta[::-1], x), "g--", lw = 2)