Fitting a step function - python

I am trying to fit a step function using scipy.optimize.leastsq. Consider the following example:
import numpy as np
from scipy.optimize import leastsq
def fitfunc(p, x):
y = np.zeros(x.shape)
y[x < p[0]] = p[1]
y[p[0] < x] = p[2]
return y
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
x = np.arange(1000)
y = np.random.random(1000)
y[x < 250.] -= 10
p0 = [500.,0.,0.]
p1, success = leastsq(errfunc, p0, args=(x, y))
print p1
the parameters are the location of the step and the level on either side. What's strange is that the first free parameter never varies, if you run that scipy will give
[ 5.00000000e+02 -4.49410173e+00 4.88624449e-01]
when the first parameter would be optimal when set to 250 and the second to -10.
Does anyone have any insight as to why this might not be working and how to get it to work?
If I run
print np.sum(errfunc(p1, x, y)**2.)
print np.sum(errfunc([250.,-10.,0.], x, y)**2.)
I find:
12547.1054663
320.679545235
where the first number is what leastsq is finding, and the second is the value for the actual optimal function it should be finding.

It turns out that the fitting is much better if I add the epsfcn= argument to leastsq:
p1, success = leastsq(errfunc, p0, args=(x, y), epsfcn=10.)
and the result is
[ 248.00000146 -8.8273455 0.40818216]
My basic understanding is that the first free parameter has to be moved more than the spacing between neighboring points to affect the square of the residuals, and epsfcn has something to do with how big steps to use to find the gradient, or something similar.

I don't think that least squares fitting is the way to go about coming up with an approximation for a step. I don't believe it will give you a satisfactory description of the discontinuity. Least squares would not be my first thought when attacking this problem.
Why wouldn't you use a Fourier series approximation instead? You'll always be stuck with Gibbs' phenomenon at the discontinuity, but the rest of the function can be approximated as well as you and your CPU can afford.
What exactly are you going to use this for? Some context might help.

I propose approximating the step function. Instead of
inifinite slope at the "change point" make it linear over
one x distance (1.0 in the example). E.g. if the x
parameter, xp, for the function is defined as the midpoint
on this line then the value at xp-0.5 is the lower y value
and the value at xp+0.5 is the higher y value and
intermediate values of the function in the
interval [xp-0.5; xp+0.5] is a linear
interpolation between these two points.
If it can be assumed that the step function (or its
approximation) goes from a lower value to a higher value
then I think the initial guess for the last two parameters
should be the lowest y value and the highest y value
respectively instead of 0.0 and 0.0.
I have 2 corrections:
1) np.random.random() returns random numbers in the range
0.0 to 1.0. Thus the mean is +0.5 and is also the value of
the third parameter (instead 0.0). And the second paramter
is then -9.5 (+0.5 - 10.0) instead of -10.0.
Thus
print np.sum(errfunc([250.,-10.,0.], x, y)**2.)
should be
print np.sum(errfunc([250.,-9.5,0.5], x, y)**2.)
2) In the original fitfunc() one value of y becomes 0.0 if x
is exactly equal to p[0]. Thus it is not a step function in
that case (more like a sum of two step functions). E.g. this
happens when the start value of the first parameter is 500.

Most probably your optimization is stuck in a local minima. I don't know what leastsq really works like, but if you give it an initial estimate of (0, 0, 0), it gets stuck there, too.
You can check the gradient at the initial estimate numerically (evaluate at +/- epsilon for a very small epsilon and divide bei 2*epsilon, take difference) and I bet it will be sth around 0.

use statsmodel ols. ols uses ordinary least square for curve fitting

Related

How to minimize the error to a given dataset

lets assume a function
f(x,y) = z
Now I want to choose x so that the output of f matches real data, and y decreases in equidistant steps to zero starting from 1. The output is calculated in the function f by a set of differential equations.
How can I select x so that the error to the real outputs is as small as possible. Assuming I know a set of z - values, namely
f(x,1) = z_1
f(x,0.9) = z_2
f(x,0.8) = z_3
now find x, that the error to the real data z_1,z_2,z_3 is minimal.
How can one do this?
A common method of optimizing is least squares fitting, in which you would basically try to find params such that the sum of squares: sum (f(params,xdata_i) - ydata_i))^2 is minimized for given xdata and ydata. In your case: params would be x, xdata_i would be 1, 0.9 and 0.8 and ydata_i z_1, z_2 and z_3.
You should consider the package scipy.optimize. It's used in finding parameters for a function. I think this page gives quite a good example on how to use it.

SciPy + Numpy: Finding the slope of a sigmoid curve

I have some data that follow a sigmoid distribution as you can see in the following image:
After normalizing and scaling my data, I have adjusted the curve at the bottom using scipy.optimize.curve_fit and some initial parameters:
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0 = [0.05, 0.05, 0.05])
>>> print popt
[ 2.82019932e+02 -1.90996563e-01 5.00000000e-02]
So popt, according to the documentation, returns *"Optimal values for the parameters so that the sum of the squared error of f(xdata, popt) - ydata is minimized". I understand here that there is no calculation of the slope with curve_fit, because I do not think the slope of this gentle curve is 282, neither is negative.
Then I tried with scipy.optimize.leastsq, because the documentation says it returns "The solution (or the result of the last iteration for an unsuccessful call).", so I thought the slope would be returned. Like this:
p, cov, infodict, mesg, ier = leastsq(residuals, p_guess, args = (nxdata, nydata), full_output=True)
>>> print p
Param(x0=281.73193626250207, y0=-0.012731420027056234, c=1.0069006606656596, k=0.18836680131910222)
But again, I did not get what I expected. curve_fit and leastsq returned almost the same values, with is not surprising I guess, as curve_fit is using an implementation of the least squares method within to find the curve. But no slope back...unless I overlooked something.
So, how to calculate the slope in a point, say, where X = 285 and Y = 0.5?
I am trying to avoid manual methods, like calculating the derivative in, say, (285.5, 0.55) and (284.5, 0.45) and subtract and divide results and so. I would like to know if there is a more automatic method for this.
Thank you all!
EDIT #1
This is my "sigmoid_function", used by curve_fit and leastsq methods:
def sigmoid_function(xdata, x0, k, p0): # p0 not used anymore, only its components (x0, k)
# This function is called by two different methods: curve_fit and leastsq,
# this last one through function "residuals". I don't know if it makes sense
# to use a single function for two (somewhat similar) methods, but there
# it goes.
# p0:
# + Is the initial parameter for scipy.optimize.curve_fit.
# + For residuals calculation is left empty
# + It is initialized to [0.05, 0.05, 0.05]
# x0:
# + Is the convergence parameter in X-axis and also the shift
# + It starts with 0.05 and ends up being around ~282 (days in a year)
# k:
# + Set up either by curve_fit or leastsq
# + In least squares it is initially fixed at 0.5 and in curve_fit
# + to 0.05. Why? Just did this approach in two different ways and
# + it seems it is working.
# + But honestly, I have no clue on what it represents
# xdata:
# + Positions in X-axis. In this case from 240 to 365
# Finally I changed those parameters as suggested in the answer.
# Sigmoid curve has 2 degrees of freedom, therefore, the initial
# guess only needs to be this size. In this case, p0 = [282, 0.5]
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
def residuals(p_guess, xdata, ydata):
# For the residuals calculation, there is no need of setting up the initial parameters
# After fixing the initial guess and sigmoid_function header, remove []
# return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
I am sorry if I made mistakes while describing the parameters or confused technical terms. I am very new with numpy and I have not studied maths for years, so I am catching up again.
So, again, what is your advice to calculate the slope of X = 285, Y = 0.5 (more or less the midpoint) for this dataset? Thanks!!
EDIT #2
Thanks to Oliver W., I updated my code as he suggested and understood a bit better the problem.
There is a final detail I do not fully get. Apparently, curve_fit returns a popt array (x0, k) with the optimum parameters for the fitting:
x0 seems to be how shifted is the curve by indicating the central point of the curve
k parameter is the slope when y = 0.5, also in the center of the curve (I think!)
Why if the sigmoid function is a growing one, the derivative/slope in popt is negative? Does it make sense?
I used sigmoid_derivative to calculate the slope and, yes, I obtained the same results that popt but with positive sign.
# Year 2003, 2005, 2007. Slope in midpoint.
k = [-0.1910, -0.2545, -0.2259] # Values coming from popt
slope = [0.1910, 0.2545, 0.2259] # Values coming from sigmoid_derivative function
I know this is being a bit peaky because I could use both. The relevant data is in there but with negative sign, but I was wondering why is this happening.
So, the calculation of the derivative function as you suggested, is only required if I need to know the slope in other points than y = 0.5. Only for midpoint, I can use popt.
Thanks for your help, it saved me a lot of time. :-)
You're never using the parameter p0 you're passing to your sigmoid function. Hence, curve fitting will not have any good measure to find convergence, because it can take any value for this parameter. You should first rewrite your sigmoid function like this:
def sigmoid_function(xdata, x0, k):
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
This means your model (the sigmoid) has only two degrees of freedom. This will be returned in popt:
initial_guess = [282, 1] # (x0, k): at x0, the sigmoid reaches 50%, k is slope related
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0=initial_guess)
Now popt will be a tuple (or array of 2 values), being the best possible x0 and k.
To get the slope of this function at any point, to be honest, I would just calculate the derivative symbolically as the sigmoid is not such a hard function. You will end up with:
def sigmoid_derivative(x, x0, k):
f = np.exp(-k*(x-x0))
return -k / f
If you have the results from your curve fitting stored in popt, you could pass this easily to this function:
print(sigmoid_derivative(285, *popt))
which will return for you the derivative at x=285. But, because you ask specifically for the midpoint, so when x==x0 and y==.5, you'll see (from the sigmoid_derivative) that the derivative there is just -k, which can be observed immediately from the curve_fit output you've already obtained. In the output you've shown, that's about 0.19.

Fitting Fresnel Equations Using Scipy

I am attempting a non-linear fit of Fresnel equations with data of reflectance against angle of incidence. Found on this site http://en.wikipedia.org/wiki/Fresnel_equations are two graphs that have a red and blue line. I need to basically fit the blue line when n1 = 1 to my data.
Here I use the following code where th is theta, the angle of incidence.
def Rperp(th, n, norm, constant):
numerator = np.cos(th) - np.sqrt(n**2.0 - np.sin(th)**2.0)
denominator = 1.0 * np.cos(th) + np.sqrt(n**2.0 - np.sin(th)**2.0)
return ((numerator / denominator)**2.0) * norm + constant
The parameters I'm looking for are:
the index of refraction n
some normalization to multiply by and
a constant to shift the baseline of the graph.
My attempt is the following:
xdata = angle[1:] * 1.0 # angle of incidence
ydata = greenDD[1:] # reflectance
params = curve_fit(Rperp, xdata, ydata)
What I get is a division of zero apparently and gives me [1, 1, 1] for the parameters. The Fresnel equation itself is the bit without the normalizer and the constant in Rperp. Theta in the equation is the angle of incidence also. Overall I am just not sure if I am doing this right at all to get the parameters.
The idea seems to be the first parameter in the function is the independent variable and the rest are the dependent variables going to be found. Then you just plug into scipy's curve_fit and it will give you a fit to your data for the parameters. If it is just getting around division of zero, which I had though might be integer division, then it seems like I should be set. Any help is appreciated and let me know if things need to be clarified (such as np is numpy).
Make sure to pass the arguments to the trigonometric functions, like sine, in radians, not degrees.
As for why you're getting a negative refractive index returned: it is because in your function, you're always squaring the refractive index. The curve_fit algorithm might end up in a local minimum state where (by accident) n is negative, because it has the same value as n positive.
Ideally, you'd add constraints to the minimization problem, but for this (simple) problem, just observe your formula and remember that a result of negative n is simply solved by changing the sign, as you did.
You could also try passing an initial guess to the algorithm and you might observe that it will not end up in the local minimum with negative value.

Integral of Intensity function in python

There is a function which determine the intensity of the Fraunhofer diffraction pattern of a circular aperture... (more information)
Integral of the function in distance x= [-3.8317 , 3.8317] must be about 83.8% ( If assume that I0 is 100) and when you increase the distance to [-13.33 , 13.33] it should be about 95%.
But when I use integral in python, the answer is wrong.. I don't know what's going wrong in my code :(
from scipy.integrate import quad
from scipy import special as sp
I0=100.0
dist=3.8317
I= quad(lambda x:( I0*((2*sp.j1(x)/x)**2)) , -dist, dist)[0]
print I
Result of the integral can't be bigger than 100 (I0) because this is the diffraction of I0 ... I don't know.. may be scaling... may be the method! :(
The problem seems to be in the function's behaviour near zero. If the function is plotted, it looks smooth:
However, scipy.integrate.quad complains about round-off errors, which is very strange with this beautiful curve. However, the function is not defined at 0 (of course, you are dividing by zero!), hence the integration does not go well.
You may use a simpler integration method or do something about your function. You may also be able to integrate it to very close to zero from both sides. However, with these numbers the integral does not look right when looking at your results.
However, I think I have a hunch of what your problem is. As far as I remember, the integral you have shown is actually the intensity (power/area) of Fraunhofer diffraction as a function of distance from the center. If you want to integrate the total power within some radius, you will have to do it in two dimensions.
By simple area integration rules you should multiply your function by 2 pi r before integrating (or x instead of r in your case). Then it becomes:
f = lambda(r): r*(sp.j1(r)/r)**2
or
f = lambda(r): sp.j1(r)**2/r
or even better:
f = lambda(r): r * (sp.j0(r) + sp.jn(2,r))
The last form is best as it does not suffer from any singularities. It is based on Jaime's comment to the original answer (see the comment below this answer!).
(Note that I omitted a couple of constants.) Now you can integrate it from zero to infinity (no negative radii):
fullpower = quad(f, 1e-9, np.inf)[0]
Then you can integrate from some other radius and normalize by the full intensity:
pwr = quad(f, 1e-9, 3.8317)[0] / fullpower
And you get 0.839 (which is quite close to 84 %). If you try the farther radius (13.33):
pwr = quad(f, 1e-9, 13.33)
which gives 0.954.
It should be noted that we introduce a small error by starting the integration from 1e-9 instead of 0. The magnitude of the error can be estimated by trying different values for the starting point. The integration result changes very little between 1e-9 and 1e-12, so they seem to be safe. Of course, you could use, e.g., 1e-30, but then there may be numerical instability in the division. (In this case there isn't, but in general singularities are numerically evil.)
Let us do one thing still:
import matplotlib.pyplot as plt
import numpy as np
x = linspace(0.01, 20, 1000)
intg = np.array([ quad(f, 1e-9, xx)[0] for xx in x])
plt.plot(x, intg/fullpower)
plt.grid('on')
plt.show()
And this is what we get:
At least this looks right, the dark fringes of the Airy disk are clearly visible.
What comes to the last part of the question: I0 defines the maximum intensity (the units may be, e.g. W/m2), whereas the integral gives total power (if the intensity is in W/m2, the total power is in W). Setting the maximum intensity to 100 does not guarantee anything about the total power. That is why it is important to calculate the total power.
There actually exists a closed form equation for the total power radiated onto a circular area:
P(x) = P0 ( 1 - J0(x)^2 - J1(x)^2 ),
where P0 is the total power.
Note that you also can get a closed form solution for your integration using Sympy:
import sympy as sy
sy.init_printing() # LaTeX like pretty printing in IPython
x,d = sy.symbols("x,d", real=True)
I0=100
dist=3.8317
f = I0*((2*sy.besselj(1,x)/x)**2) # the integrand
F = f.integrate((x, -d, d)) # symbolic integration
print(F.evalf(subs={d:dist})) # numeric evalution
F evaluates to:
1600*d*besselj(0, Abs(d))**2/3 + 1600*d*besselj(1, Abs(d))**2/3 - 800*besselj(1, Abs(d))**2/(3*d)
with besselj(0,r) corresponding to sp.j0(r).
They might be a singularity in the integration algorithm when doing the jacobian at x = 0. You can exclude this points from the integration with "points":
f = lambda x:( I0*((2*sp.j1(x)/x)**2))
I = quad(f, -dist, dist, points = [0])
I get then the following result (is this your desired result?)
331.4990321315221

Fitting curve: why small numbers are better?

I spent some time these days on a problem. I have a set of data:
y = f(t), where y is very small concentration (10^-7), and t is in second. t varies from 0 to around 12000.
The measurements follow an established model:
y = Vs * t - ((Vs - Vi) * (1 - np.exp(-k * t)) / k)
And I need to find Vs, Vi, and k. So I used curve_fit, which returns the best fitting parameters, and I plotted the curve.
And then I used a similar model:
y = (Vs * t/3600 - ((Vs - Vi) * (1 - np.exp(-k * t/3600)) / k)) * 10**7
By doing that, t is a number of hour, and y is a number between 0 and about 10. The parameters returned are of course different. But when I plot each curve, here is what I get:
http://i.imgur.com/XLa4LtL.png
The green fit is the first model, the blue one with the "normalized" model. And the red dots are the experimental values.
The fitting curves are different. I think it's not expected, and I don't understand why. Are the calculations more accurate if the numbers are "reasonnable" ?
The docstring for optimize.curve_fit says,
p0 : None, scalar, or M-length sequence
Initial guess for the parameters. If None, then the initial
values will all be 1 (if the number of parameters for the function
can be determined using introspection, otherwise a ValueError
is raised).
Thus, to begin with, the initial guess for the parameters is by default 1.
Moreover, curve fitting algorithms have to sample the function for various values of the parameters. The "various values" are initially chosen with an initial step size on the order of 1. The algorithm will work better if your data varies somewhat smoothly with changes in the parameter values that on the order of 1.
If the function varies wildly with parameter changes on the order of 1, then the algorithm may tend to miss the optimum parameter values.
Note that even if the algorithm uses an adaptive step size when it tweaks the parameter values, if the initial tweak is so far off the mark as to produce a big residual, and if tweaking in some other direction happens to produce a smaller residual, then the algorithm may wander off in the wrong direction and miss the local minimum. It may find some other (undesired) local minimum, or simply fail to converge. So using an algorithm with an adaptive step size won't necessarily save you.
The moral of the story is that scaling your data can improve the algorithm's chances of of finding the desired minimum.
Numerical algorithms in general all tend to work better when applied to data whose magnitude is on the order of 1. This bias enters into the algorithm in numerous ways. For instance, optimize.curve_fit relies on optimize.leastsq, and the call signature for optimize.leastsq is:
def leastsq(func, x0, args=(), Dfun=None, full_output=0,
col_deriv=0, ftol=1.49012e-8, xtol=1.49012e-8,
gtol=0.0, maxfev=0, epsfcn=None, factor=100, diag=None):
Thus, by default, the tolerances ftol and xtol are on the order of 1e-8. If finding the optimum parameter values require much smaller tolerances, then these hard-coded default numbers will cause optimize.curve_fit to miss the optimize parameter values.
To make this more concrete, suppose you were trying to minimize f(x) = 1e-100*x**2. The factor of 1e-100 squashes the y-values so much that a wide range of x-values (the parameter values mentioned above) will fit within the tolerance of 1e-8. So, with un-ideal scaling, leastsq will not do a good job of finding the minimum.
Another reason to use floats on the order of 1 is because there are many more (IEEE754) floats in the interval [-1,1] than there are far away from 1. For example,
import struct
def floats_between(x, y):
"""
http://stackoverflow.com/a/3587987/190597 (jsbueno)
"""
a = struct.pack("<dd", x, y)
b = struct.unpack("<qq", a)
return b[1] - b[0]
In [26]: floats_between(0,1) / float(floats_between(1e6,1e7))
Out[26]: 311.4397707054894
This shows there are over 300 times as many floats representing numbers between 0 and 1 than there are in the interval [1e6, 1e7].
Thus, all else being equal, you'll typically get a more accurate answer if working with small numbers than very large numbers.
I would imagine it has more to do with the initial parameter estimates you are passing to curve fit. If you are not passing any I believe they all default to 1. Normalizing your data makes those initial estimates closer to the truth. If you don't want to use normalized data just pass the initial estimates yourself and give them reasonable values.
Others have already mentioned that you probably need to have a good starting guess for your fit. In cases like this is, I usually try to find some quick and dirty tricks to get at least a ballpark estimate of the parameters. In your case, for large t, the exponential decays pretty quickly to zero, so for large t, you have
y == Vs * t - (Vs - Vi) / k
Doing a first-order linear fit like
[slope1, offset1] = polyfit(t[t > 2000], y[t > 2000], 1)
you will get slope1 == Vs and offset1 == (Vi - Vs) / k.
Subtracting this straight line from all the points you have, you get the exponential
residual == y - slope1 * t - offset1 == (Vs - Vi) * exp(-t * k)
Taking the log of both sides, you get
log(residual) == log(Vs - Vi) - t * k
So doing a second fit
[slope2, offset2] = polyfit(t, log(y - slope1 * t - offset1), 1)
will give you slope2 == -k and offset2 == log(Vs - Vi), which should be solvable for Vi since you already know Vs. You might have to limit the second fit to small values of t, otherwise you might be taking the log of negative numbers. Collect all the parameters you obtained with these fits and use them as the starting points for your curve_fit.
Finally, you might want to look into doing some sort of weighted fit. The information about the exponential part of your curve is contained in just the first few points, so maybe you should give those a higher weight. Doing this in a statistically correct way is not trivial.

Categories

Resources