How to fit any non-linear functions in python? - python

I have already checked post1, post2, post3 and post4 but didn't help.
I have a data about a specific plant including two variables called "Age" and "Height". The correlation between them is non-linear.
To fit a model, one solution I assume is as follows:
If the non-linear function is
then we can bring in a new variable k where
so we have changed the first non-linear function into a multilinear regression one. Based on this, I have the following code:
data['K'] = data["Age"].pow(2)
x = data[["Age", "K"]]
y = data["Height"]
model = LinearRegression().fit(x, y)
print(model.score(x, y)) # = 0.9908571840250205
Am I doing correctly?
How to do with cubic and exponential functions?
Thanks.

for cubic polynomials
data['x2'] = data["Age"].pow(2)
data['x3'] = data["Age"].pow(3)
x = data[["Age", "x2","x3"]]
y = data["Height"]
model = LinearRegression().fit(x, y)
print(model.score(x, y))
you can handle exponential data by fitting log(y).
or find some library that can fit polynomials automatically t.ex: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

Hopefully you don't have a religious fervor for using SKLearn here because the answer I'm going to suggest is going to completely ignore it.
If you're interested doing regression analysis where you get to have complete autonomy with the fitting function, I'd suggest cutting directly down to the least-squares optimization algorithm that drives a lot of this type of work, which you can do using scipy
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
x, y = np.array([0,1,2,3,4,5]), np.array([0,1,4,9,16,25])
# initial_guess[i] maps to p[x] in function_to_fit, must be reasonable
initial_guess = [1, 1, 1]
def function_to_fit(x, p):
return pow(p[0]*x, 2) + p[1]*x + p[2]
def residuals(p,y,x):
return y - function_to_fit(x,p)
cnsts = leastsq(
residuals,
initial_guess,
args=(y, x)
)[0]
fig, ax = plt.subplots()
ax.plot(x, y, 'o')
xi = np.arange(0,10,0.1)
ax.plot(xi, [function_to_fit(x, cnsts) for x in xi])
plt.show()
Now this is a numeric approach to the solution, so I would recommend taking a moment to make sure you understand the limitations of such an approach - but for problems like these I've found they're more than adequate for functionalizing non-linear data sets without trying to do some hand-waving to make it if inside a linearizable manifold.

Related

How do I obtain exact value from scipy interpolation/matplotlib? [duplicate]

I am trying to invert an interpolated function using scipy's interpolate function. Let's say I create an interpolated function,
import scipy.interpolate as interpolate
interpolatedfunction = interpolated.interp1d(xvariable,data,kind='cubic')
Is there some function that can find x when I specify a:
interpolatedfunction(x) == a
In other words, "I want my interpolated function to equal a; what is the value of xvariable such that my function is equal to a?"
I appreciate I can do this with some numerical scheme, but is there a more straightforward method? What if the interpolated function is multivalued in xvariable?
There are dedicated methods for finding roots of cubic splines. The simplest to use is the .roots() method of InterpolatedUnivariateSpline object:
spl = InterpolatedUnivariateSpline(x, y)
roots = spl.roots()
This finds all of the roots instead of just one, as generic solvers (fsolve, brentq, newton, bisect, etc) do.
x = np.arange(20)
y = np.cos(np.arange(20))
spl = InterpolatedUnivariateSpline(x, y)
print(spl.roots())
outputs array([ 1.56669456, 4.71145244, 7.85321627, 10.99554642, 14.13792756, 17.28271674])
However, you want to equate the spline to some arbitrary number a, rather than 0. One option is to rebuild the spline (you can't just subtract a from it):
solutions = InterpolatedUnivariateSpline(x, y - a).roots()
Note that none of this will work with the function returned by interp1d; it does not have roots method. For that function, using generic methods like fsolve is an option, but you will only get one root at a time from it. In any case, why use interp1d for cubic splines when there are more powerful ways to do the same kind of interpolation?
Non-object-oriented way
Instead of rebuilding the spline after subtracting a from data, one can directly subtract a from spline coefficients. This requires us to drop down to non-object-oriented interpolation methods. Specifically, sproot takes in a tck tuple prepared by splrep, as follows:
tck = splrep(x, y, k=3, s=0)
tck_mod = (tck[0], tck[1] - a, tck[2])
solutions = sproot(tck_mod)
I'm not sure if messing with tck is worth the gain here, as it's possible that the bulk of computation time will be in root-finding anyway. But it's good to have alternatives.
After creating an interpolated function interp_fn, you can find the value of x where interp_fn(x) == a by the roots of the function
interp_fn2 = lambda x: interp_fn(x) - a
There are number of options to find the roots in scipy.optimize. For instance, to use Newton's method with the initial value starting at 10:
from scipy import optimize
optimize.newton(interp_fn2, 10)
Actual example
Create an interpolated function and then find the roots where fn(x) == 5
import numpy as np
from scipy import interpolate, optimize
x = np.arange(10)
y = 1 + 6*np.arange(10) - np.arange(10)**2
y2 = 5*np.ones_like(x)
plt.scatter(x,y)
plt.plot(x,y)
plt.plot(x,y2,'k-')
plt.show()
# create the interpolated function, and then the offset
# function used to find the roots
interp_fn = interpolate.interp1d(x, y, 'quadratic')
interp_fn2 = lambda x: interp_fn(x)-5
# to find the roots, we need to supply a starting value
# because there are more than 1 root in our range, we need
# to supply multiple starting values. They should be
# fairly close to the actual root
root1, root2 = optimize.newton(interp_fn2, 1), optimize.newton(interp_fn2, 5)
root1, root2
# returns:
(0.76393202250021064, 5.2360679774997898)
If your data are monotonic you might also try the following:
inversefunction = interpolated.interp1d(data, xvariable, kind='cubic')
Mentioning another option because I found this page in a google search and the other option works for my simple use case. Hopefully it'll be of use to someone.
If the function you're interpolating is very simple and always has a 1:1 relationship between y and x, then you can simply take your data, swap x and y when you pass it into interp1d, and then call the interpolation function in that direction.
Adapting code from https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.arange(0, 10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)
xnew = np.arange(0, 9, 0.1)
ynew = f(xnew)
plt.plot(x, y, 'o', xnew, ynew, '-')
plt.show()
When x and y have been swapped you can call swappedInterpolationFunction(a) to get the x value where that would occur.
f = interpolate.interp1d(y, x)
xnew = np.arange(np.exp(-9/3), np.exp(0), 0.01)
ynew = f(xnew)
plt.plot(y, x, 'o', xnew, ynew, '-')
plt.title("Inverted")
plt.show()
Of course, if the function ever has multiple x values for a given y value (like sine or a parabola) then this will not work because it will no longer be a 1:1 function from x to y, and the above answers are necessary. This is just a simplification in a limited use case.

Finding a better scipy curve_fit! -- exponential function?

I am trying to fit data using scipy curve_fit. I believe that negative exponential is probably best, as this works well for some of my other (similarly generated) data -- but I am achieving sub-optimal results.
I've normalized the dataset to avoid supplying initial values and am applying an exponential function as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.array([[0.32,0.38],[0.61,0.32],[0.28,0.50],[0.60,0.32],[0.26,0.45],[0.19,0.57],[0.61,0.32],[0.59,0.29],[0.39,0.42],[0.61,0.32],[0.20,0.46],[0.24,0.45],[0.59,0.29],[0.39,0.42],[0.56,0.39],[0.32,0.43],[0.38,0.44],[0.54,0.34],[0.61,0.32],[0.20,0.46],[0.28,0.51],[0.54,0.34],[0.60,0.32],[0.30,0.42],[0.28,0.43],[0.14,0.57],[0.24,0.54],[0.39,0.42],[0.20,0.56],[0.56,0.39],[0.24,0.54],[0.33,0.37],[0.33,0.51],[0.20,0.46],[0.32,0.39],[0.20,0.56],[0.19,0.57],[0.32,0.39],[0.30,0.42],[0.33,0.50],[0.54,0.34],[0.28,0.50],[0.32,0.39],[0.28,0.43],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.19,0.57],[0.60,0.32],[0.44,0.41],[0.27,0.42],[0.19,0.57],[0.24,0.38],[0.24,0.54],[0.61,0.32],[0.39,0.40],[0.30,0.41],[0.19,0.57],[0.14,0.57],[0.32,0.43],[0.14,0.57],[0.59,0.29],[0.44,0.41],[0.30,0.41],[0.32,0.38],[0.61,0.32],[0.20,0.46],[0.20,0.56],[0.30,0.41],[0.33,0.36],[0.14,0.57],[0.19,0.57],[0.46,0.38],[0.36,0.44],[0.61,0.32],[0.31,0.48],[0.60,0.32],[0.39,0.40],[0.14,0.57],[0.44,0.41],[0.24,0.49],[0.41,0.40],[0.19,0.57],[0.19,0.57],[0.31,0.49],[0.31,0.43],[0.35,0.35],[0.20,0.46],[0.54,0.34],[0.20,0.56],[0.39,0.44],[0.33,0.36],[0.20,0.56],[0.30,0.41],[0.56,0.39],[0.31,0.48],[0.28,0.51],[0.14,0.57],[0.61,0.32],[0.30,0.50],[0.20,0.56],[0.19,0.57],[0.59,0.31],[0.20,0.56],[0.27,0.42],[0.29,0.48],[0.56,0.39],[0.32,0.39],[0.20,0.56],[0.59,0.29],[0.24,0.49],[0.56,0.39],[0.60,0.32],[0.35,0.35],[0.28,0.50],[0.46,0.38],[0.14,0.57],[0.54,0.34],[0.32,0.38],[0.26,0.45],[0.26,0.45],[0.39,0.42],[0.19,0.57],[0.28,0.51],[0.27,0.42],[0.33,0.50],[0.54,0.34],[0.39,0.40],[0.19,0.57],[0.33,0.36],[0.22,0.44],[0.33,0.51],[0.61,0.32],[0.28,0.51],[0.25,0.50],[0.39,0.40],[0.34,0.35],[0.59,0.31],[0.31,0.49],[0.20,0.46],[0.39,0.46],[0.20,0.50],[0.32,0.39],[0.30,0.41],[0.23,0.44],[0.29,0.53],[0.28,0.50],[0.31,0.48],[0.61,0.32],[0.54,0.34],[0.28,0.53],[0.56,0.39],[0.19,0.57],[0.14,0.57],[0.59,0.29],[0.29,0.48],[0.44,0.41],[0.27,0.51],[0.50,0.29],[0.14,0.57],[0.60,0.32],[0.32,0.39],[0.19,0.57],[0.24,0.38],[0.56,0.39],[0.14,0.57],[0.54,0.34],[0.61,0.38],[0.27,0.53],[0.20,0.46],[0.61,0.32],[0.27,0.42],[0.27,0.42],[0.20,0.56],[0.30,0.41],[0.31,0.51],[0.32,0.39],[0.31,0.51],[0.29,0.48],[0.20,0.46],[0.33,0.51],[0.31,0.43],[0.30,0.41],[0.27,0.44],[0.31,0.51],[0.29,0.48],[0.35,0.35],[0.46,0.38],[0.28,0.51],[0.61,0.38],[0.31,0.49],[0.33,0.51],[0.59,0.29],[0.14,0.57],[0.31,0.51],[0.39,0.40],[0.32,0.39],[0.20,0.56],[0.55,0.31],[0.56,0.39],[0.24,0.49],[0.56,0.39],[0.27,0.50],[0.60,0.32],[0.54,0.34],[0.19,0.57],[0.28,0.51],[0.54,0.34],[0.56,0.39],[0.19,0.57],[0.59,0.31],[0.37,0.45],[0.19,0.57],[0.44,0.41],[0.32,0.43],[0.35,0.48],[0.24,0.49],[0.26,0.45],[0.14,0.57],[0.59,0.30],[0.26,0.45],[0.26,0.45],[0.14,0.57],[0.20,0.50],[0.31,0.45],[0.27,0.51],[0.30,0.41],[0.19,0.57],[0.30,0.41],[0.27,0.50],[0.34,0.35],[0.30,0.42],[0.27,0.42],[0.27,0.42],[0.34,0.35],[0.35,0.35],[0.14,0.57],[0.45,0.36],[0.26,0.45],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.26,0.45],[0.26,0.45],[0.59,0.29],[0.19,0.57],[0.26,0.45],[0.32,0.39],[0.30,0.50],[0.28,0.50],[0.32,0.39],[0.59,0.29],[0.32,0.51],[0.56,0.39],[0.59,0.29],[0.61,0.38],[0.33,0.51],[0.22,0.44],[0.33,0.36],[0.27,0.42],[0.20,0.56],[0.28,0.51],[0.31,0.48],[0.20,0.56],[0.61,0.32],[0.24,0.54],[0.59,0.29],[0.32,0.43],[0.61,0.32],[0.19,0.57],[0.61,0.38],[0.55,0.31],[0.19,0.57],[0.31,0.46],[0.32,0.52],[0.30,0.41],[0.28,0.51],[0.28,0.50],[0.60,0.32],[0.61,0.32],[0.27,0.50],[0.59,0.29],[0.41,0.47],[0.39,0.42],[0.20,0.46],[0.19,0.57],[0.14,0.57],[0.23,0.47],[0.54,0.34],[0.28,0.51],[0.19,0.57],[0.33,0.37],[0.46,0.38],[0.27,0.42],[0.20,0.56],[0.39,0.42],[0.30,0.47],[0.26,0.45],[0.61,0.32],[0.61,0.38],[0.35,0.35],[0.14,0.57],[0.35,0.35],[0.28,0.51],[0.61,0.32],[0.24,0.54],[0.54,0.34],[0.28,0.43],[0.24,0.54],[0.30,0.41],[0.56,0.39],[0.23,0.52],[0.14,0.57],[0.26,0.45],[0.30,0.42],[0.32,0.43],[0.19,0.57],[0.45,0.36],[0.27,0.42],[0.29,0.48],[0.28,0.43],[0.27,0.51],[0.39,0.44],[0.32,0.49],[0.24,0.49],[0.56,0.39],[0.20,0.56],[0.30,0.42],[0.24,0.38],[0.46,0.38],[0.28,0.50],[0.26,0.45],[0.27,0.50],[0.23,0.47],[0.39,0.42],[0.28,0.51],[0.24,0.49],[0.27,0.42],[0.26,0.45],[0.60,0.32],[0.32,0.43],[0.39,0.42],[0.28,0.50],[0.28,0.52],[0.61,0.32],[0.32,0.39],[0.24,0.50],[0.39,0.40],[0.33,0.36],[0.24,0.38],[0.54,0.33],[0.19,0.57],[0.61,0.32],[0.33,0.36],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.34,0.35],[0.24,0.42],[0.27,0.42],[0.54,0.34],[0.54,0.34],[0.24,0.49],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.20,0.50],[0.14,0.57],[0.30,0.41],[0.30,0.41],[0.33,0.36],[0.26,0.45],[0.26,0.45],[0.23,0.47],[0.32,0.39],[0.27,0.53],[0.30,0.41],[0.20,0.46],[0.34,0.35],[0.34,0.35],[0.14,0.57],[0.46,0.38],[0.27,0.42],[0.36,0.44],[0.17,0.51],[0.60,0.32],[0.27,0.42],[0.20,0.56],[0.24,0.49],[0.41,0.40],[0.61,0.38],[0.19,0.57],[0.28,0.50],[0.23,0.52],[0.61,0.32],[0.39,0.46],[0.33,0.51],[0.19,0.57],[0.39,0.44],[0.56,0.39],[0.35,0.35],[0.28,0.43],[0.54,0.34],[0.36,0.44],[0.14,0.57],[0.61,0.38],[0.46,0.38],[0.61,0.32],[0.19,0.57],[0.54,0.34],[0.27,0.53],[0.33,0.51],[0.31,0.51],[0.59,0.29],[0.24,0.42],[0.28,0.43],[0.56,0.39],[0.28,0.50],[0.61,0.32],[0.29,0.48],[0.20,0.46],[0.50,0.29],[0.56,0.39],[0.20,0.50],[0.24,0.38],[0.32,0.39],[0.32,0.43],[0.28,0.50],[0.22,0.44],[0.20,0.56],[0.27,0.42],[0.61,0.38],[0.31,0.49],[0.20,0.46],[0.27,0.42],[0.24,0.38],[0.61,0.32],[0.26,0.45],[0.23,0.44],[0.59,0.30],[0.56,0.39],[0.33,0.44],[0.27,0.42],[0.31,0.51],[0.27,0.53],[0.32,0.39],[0.28,0.51],[0.30,0.42],[0.46,0.38],[0.27,0.42],[0.30,0.47],[0.39,0.40],[0.28,0.43],[0.30,0.42],[0.32,0.39],[0.59,0.31],[0.36,0.44],[0.54,0.34],[0.34,0.35],[0.30,0.41],[0.32,0.49],[0.32,0.43],[0.31,0.51],[0.32,0.52],[0.60,0.32],[0.19,0.57],[0.41,0.47],[0.32,0.39],[0.28,0.43],[0.28,0.51],[0.32,0.51],[0.56,0.39],[0.24,0.45],[0.55,0.31],[0.24,0.43],[0.61,0.38],[0.33,0.51],[0.30,0.41],[0.32,0.47],[0.32,0.38],[0.33,0.51],[0.39,0.40],[0.19,0.57],[0.27,0.42],[0.54,0.33],[0.59,0.29],[0.28,0.51],[0.61,0.38],[0.19,0.57],[0.30,0.41],[0.14,0.57],[0.32,0.39],[0.34,0.35],[0.54,0.34],[0.24,0.54],[0.56,0.39],[0.24,0.49],[0.61,0.32],[0.61,0.38],[0.61,0.32],[0.19,0.57],[0.14,0.57],[0.54,0.34],[0.59,0.29],[0.28,0.43],[0.19,0.57],[0.61,0.32],[0.32,0.43],[0.29,0.48],[0.56,0.39],[0.19,0.57],[0.56,0.39],[0.59,0.29],[0.59,0.29],[0.59,0.30],[0.14,0.57],[0.23,0.44],[0.28,0.50],[0.29,0.48],[0.31,0.45],[0.27,0.51],[0.24,0.45],[0.61,0.38],[0.24,0.49],[0.14,0.57],[0.61,0.32],[0.39,0.40],[0.33,0.44],[0.54,0.33],[0.33,0.51],[0.20,0.50],[0.19,0.57],[0.25,0.50],[0.28,0.43],[0.17,0.51],[0.19,0.57],[0.27,0.42],[0.20,0.56],[0.24,0.38],[0.19,0.57],[0.28,0.50],[0.28,0.50],[0.27,0.42],[0.26,0.45],[0.39,0.42],[0.23,0.47],[0.28,0.43],[0.32,0.39],[0.32,0.39],[0.24,0.54],[0.33,0.36],[0.29,0.53],[0.27,0.42],[0.44,0.41],[0.27,0.42],[0.33,0.36],[0.24,0.43],[0.61,0.38],[0.20,0.50],[0.55,0.31],[0.31,0.46],[0.60,0.32],[0.30,0.41],[0.41,0.47],[0.39,0.40],[0.27,0.53],[0.61,0.38],[0.46,0.38],[0.28,0.43],[0.44,0.41],[0.35,0.35],[0.24,0.49],[0.31,0.43],[0.27,0.42],[0.61,0.38],[0.29,0.48],[0.54,0.34],[0.61,0.32],[0.20,0.56],[0.24,0.49],[0.39,0.40],[0.27,0.42],[0.59,0.29],[0.59,0.29],[0.19,0.57],[0.24,0.54],[0.59,0.31],[0.24,0.38],[0.33,0.51],[0.23,0.44],[0.20,0.46],[0.24,0.45],[0.29,0.48],[0.28,0.50],[0.61,0.32],[0.19,0.57],[0.22,0.44],[0.19,0.57],[0.39,0.44],[0.19,0.57],[0.28,0.50],[0.30,0.41],[0.44,0.41],[0.28,0.52],[0.28,0.43],[0.54,0.33],[0.28,0.50],[0.19,0.57],[0.14,0.57],[0.30,0.41],[0.26,0.45],[0.56,0.39],[0.27,0.51],[0.20,0.46],[0.24,0.38],[0.32,0.38],[0.26,0.45],[0.61,0.32],[0.59,0.29],[0.19,0.57],[0.43,0.45],[0.14,0.57],[0.35,0.35],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.56,0.39],[0.27,0.42],[0.19,0.57],[0.60,0.32],[0.24,0.54],[0.54,0.34],[0.61,0.38],[0.33,0.51],[0.27,0.42],[0.32,0.39],[0.34,0.35],[0.20,0.56],[0.26,0.45],[0.32,0.51],[0.33,0.51],[0.35,0.35],[0.31,0.43],[0.56,0.39],[0.59,0.29],[0.28,0.43],[0.30,0.42],[0.27,0.44],[0.28,0.53],[0.29,0.48],[0.33,0.51],[0.60,0.32],[0.54,0.33],[0.19,0.57],[0.33,0.49],[0.30,0.41],[0.54,0.34],[0.27,0.53],[0.19,0.57],[0.19,0.57],[0.32,0.39],[0.20,0.56],[0.35,0.35],[0.30,0.42],[0.46,0.38],[0.54,0.34],[0.54,0.34],[0.14,0.57],[0.33,0.51],[0.32,0.39],[0.14,0.57],[0.59,0.29],[0.59,0.31],[0.30,0.41],[0.26,0.45],[0.32,0.38],[0.32,0.39],[0.59,0.31],[0.20,0.56],[0.20,0.46],[0.29,0.48],[0.59,0.29],[0.39,0.40],[0.28,0.50],[0.32,0.39],[0.28,0.53],[0.44,0.41],[0.20,0.50],[0.24,0.49],[0.20,0.46],[0.28,0.52],[0.24,0.50],[0.32,0.43],[0.39,0.40],[0.38,0.44],[0.60,0.32],[0.54,0.33],[0.61,0.32],[0.19,0.57],[0.59,0.29],[0.33,0.49],[0.28,0.43],[0.24,0.38],[0.30,0.41],[0.27,0.51],[0.35,0.48],[0.61,0.32],[0.43,0.45],[0.20,0.50],[0.24,0.49],[0.20,0.50],[0.20,0.56],[0.29,0.48],[0.14,0.57],[0.14,0.57],[0.26,0.45],[0.26,0.45],[0.39,0.40],[0.33,0.36],[0.56,0.39],[0.59,0.29],[0.27,0.42],[0.35,0.35],[0.30,0.41],[0.20,0.50],[0.19,0.57],[0.29,0.48],[0.39,0.42],[0.37,0.45],[0.30,0.41],[0.20,0.56],[0.30,0.42],[0.41,0.47],[0.28,0.43],[0.14,0.57],[0.27,0.53],[0.32,0.39],[0.30,0.41],[0.34,0.35],[0.32,0.47],[0.33,0.51],[0.20,0.56],[0.56,0.39],[0.60,0.32],[0.28,0.52],[0.56,0.39],[0.44,0.41],[0.27,0.42],[0.00,1.00],[0.29,0.49],[0.89,0.06],[0.22,0.66],[0.18,0.70],[0.67,0.22],[0.14,0.79],[0.58,0.17],[0.67,0.12],[0.95,0.05],[0.46,0.26],[0.15,0.54],[0.16,0.67],[0.48,0.31],[0.41,0.29],[0.18,0.66],[0.10,0.71],[0.11,0.72],[0.65,0.15],[0.94,0.03],[0.17,0.67],[0.44,0.29],[0.32,0.38],[0.79,0.10],[0.52,0.26],[0.25,0.59],[0.89,0.04],[0.69,0.13],[0.43,0.34],[0.75,0.07],[0.16,0.65],[0.02,0.70],[0.38,0.33],[0.57,0.23],[0.75,0.07],[0.25,0.58],[0.94,0.02],[0.55,0.22],[0.58,0.17],[0.14,0.79],[0.20,0.56],[0.10,0.88],[0.15,0.79],[0.11,0.77],[0.67,0.22],[0.07,0.87],[0.43,0.33],[0.08,0.84],[0.05,0.67],[0.07,0.77],[0.17,0.68],[1.00,0.00],[0.15,0.79],[0.08,0.77],[0.16,0.67],[0.69,0.13],[0.07,0.87],[0.15,0.54],[0.55,0.19],[0.14,0.63],[0.75,0.18],[0.25,0.63],[0.83,0.05],[0.55,0.50],[0.86,0.04],[0.73,0.18],[0.44,0.32],[0.70,0.15],[0.89,0.06],[0.17,0.67],[0.61,0.12],[0.55,0.50],[0.36,0.56],[0.03,0.86],[0.09,0.82],[0.09,0.82],[0.09,0.83],[0.17,0.68],[0.88,0.03],[0.64,0.22],[0.08,0.85],[0.74,0.16],[0.47,0.28],[0.05,0.84],[0.14,0.54],[0.01,0.93],[0.77,0.16],[0.17,0.60],[0.64,0.22],[0.84,0.05],[0.85,0.03],[0.23,0.67],[0.20,0.69],[0.00,0.87],[0.14,0.77],[0.11,0.69],[0.17,0.67],[0.56,0.27],[0.14,0.67],[0.37,0.31],[0.11,0.69],[0.35,0.52],[0.53,0.27],[0.50,0.21],[0.25,0.64],[0.36,0.56],[0.39,0.26],[0.02,0.83],[0.41,0.29],[0.07,0.77],[0.16,0.63],[0.92,0.03],[0.10,0.71],[0.83,0.05],[0.42,0.27],[0.62,0.12],[0.23,0.60],[0.19,0.61],[0.69,0.19],[0.21,0.65],[0.67,0.19],[0.18,0.69],[0.44,0.29],[0.14,0.65],[0.73,0.18],[0.15,0.66],[0.44,0.34],[0.74,0.10],[0.18,0.69],[0.25,0.61],[0.52,0.23],[0.06,0.82],[0.52,0.29],[0.22,0.68],[0.46,0.26],[0.14,0.54],[0.78,0.07],[0.80,0.05],[0.15,0.67],[0.10,0.82],[0.56,0.27],[0.64,0.22],[0.87,0.06],[0.14,0.66],[0.10,0.84],[0.88,0.05],[0.02,0.81],[0.62,0.15],[0.13,0.68],[0.50,0.28],[0.11,0.62],[0.46,0.32],[0.56,0.28],[0.43,0.28],[0.12,0.83],[0.11,0.80],[0.10,0.83],[0.90,0.04],[0.17,0.65],[0.15,0.63],[0.72,0.15],[0.64,0.26],[0.84,0.06],[0.09,0.83],[0.16,0.68],[0.09,0.63],[0.43,0.29],[0.88,0.05],[0.20,0.69],[0.73,0.09],[0.61,0.20],[0.67,0.13],[0.08,0.85],[0.73,0.16],[0.89,0.05],[0.41,0.25],[0.61,0.23],[0.58,0.22],[0.03,0.84],[0.58,0.24],[0.48,0.30],[0.25,0.54],[0.23,0.63],[0.41,0.46],[0.84,0.06],[0.45,0.29],[0.09,0.55],[0.54,0.26],[0.11,0.82],[0.69,0.18],[0.43,0.45],[0.43,0.28],[0.45,0.32],[0.07,0.78],[0.26,0.64],[0.92,0.04],[0.12,0.66],[0.32,0.51],[0.28,0.59],[0.70,0.18]])
x = data[:,0]
y = data[:,1]
def func(x,a,b,c):
return a * np.exp(-b*x) + c
popt, pcov = curve_fit(func, x, y)
a, b, c = popt
x_line = np.arange(min(x), max(x), 0.01)
x_line =np.reshape(x_line,(-1,1))
y_line = func(x_line, a, b, c)
y_line = np.reshape(y_line,(-1,1))
plt.scatter(x,y)
plt.plot(x_line,y_line)
plt.show()
As you can see, the fit deviates towards high x values. example plot I know there are numerous similar questions out there, and I have read many - but my math skills are not phenomenal so i am struggling coming up with a better solution for my particular problem.
I'm not tied to the exponential function - can anyone suggest something better?
I need to do this semi-automatically for hundreds of datasets, so ideally I want something as flexible as possible.
Any help greatly appreciated!
p.s. I am sorry about posting such a large sample dataset - but I figured this kind of question necessitates the actual data, and I didn't want to post links to suspicious looking files.. =)
This is not an optimal solution, but this should work for any kind of density distribution in your data. The idea is to resample the data a given number of times by computing local averages along the x-axis to have evenly distributed points.
#!/usr/bin/python3.6
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def my_floor(a, precision=0):
return np.round(a - 0.5 * 10**(-precision), precision)
data = np.array([[0.32,0.38],[0.61,0.32],[0.28,0.50],[0.60,0.32],[0.26,0.45],[0.19,0.57],[0.61,0.32],[0.59,0.29],[0.39,0.42],[0.61,0.32],[0.20,0.46],[0.24,0.45],[0.59,0.29],[0.39,0.42],[0.56,0.39],[0.32,0.43],[0.38,0.44],[0.54,0.34],[0.61,0.32],[0.20,0.46],[0.28,0.51],[0.54,0.34],[0.60,0.32],[0.30,0.42],[0.28,0.43],[0.14,0.57],[0.24,0.54],[0.39,0.42],[0.20,0.56],[0.56,0.39],[0.24,0.54],[0.33,0.37],[0.33,0.51],[0.20,0.46],[0.32,0.39],[0.20,0.56],[0.19,0.57],[0.32,0.39],[0.30,0.42],[0.33,0.50],[0.54,0.34],[0.28,0.50],[0.32,0.39],[0.28,0.43],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.19,0.57],[0.60,0.32],[0.44,0.41],[0.27,0.42],[0.19,0.57],[0.24,0.38],[0.24,0.54],[0.61,0.32],[0.39,0.40],[0.30,0.41],[0.19,0.57],[0.14,0.57],[0.32,0.43],[0.14,0.57],[0.59,0.29],[0.44,0.41],[0.30,0.41],[0.32,0.38],[0.61,0.32],[0.20,0.46],[0.20,0.56],[0.30,0.41],[0.33,0.36],[0.14,0.57],[0.19,0.57],[0.46,0.38],[0.36,0.44],[0.61,0.32],[0.31,0.48],[0.60,0.32],[0.39,0.40],[0.14,0.57],[0.44,0.41],[0.24,0.49],[0.41,0.40],[0.19,0.57],[0.19,0.57],[0.31,0.49],[0.31,0.43],[0.35,0.35],[0.20,0.46],[0.54,0.34],[0.20,0.56],[0.39,0.44],[0.33,0.36],[0.20,0.56],[0.30,0.41],[0.56,0.39],[0.31,0.48],[0.28,0.51],[0.14,0.57],[0.61,0.32],[0.30,0.50],[0.20,0.56],[0.19,0.57],[0.59,0.31],[0.20,0.56],[0.27,0.42],[0.29,0.48],[0.56,0.39],[0.32,0.39],[0.20,0.56],[0.59,0.29],[0.24,0.49],[0.56,0.39],[0.60,0.32],[0.35,0.35],[0.28,0.50],[0.46,0.38],[0.14,0.57],[0.54,0.34],[0.32,0.38],[0.26,0.45],[0.26,0.45],[0.39,0.42],[0.19,0.57],[0.28,0.51],[0.27,0.42],[0.33,0.50],[0.54,0.34],[0.39,0.40],[0.19,0.57],[0.33,0.36],[0.22,0.44],[0.33,0.51],[0.61,0.32],[0.28,0.51],[0.25,0.50],[0.39,0.40],[0.34,0.35],[0.59,0.31],[0.31,0.49],[0.20,0.46],[0.39,0.46],[0.20,0.50],[0.32,0.39],[0.30,0.41],[0.23,0.44],[0.29,0.53],[0.28,0.50],[0.31,0.48],[0.61,0.32],[0.54,0.34],[0.28,0.53],[0.56,0.39],[0.19,0.57],[0.14,0.57],[0.59,0.29],[0.29,0.48],[0.44,0.41],[0.27,0.51],[0.50,0.29],[0.14,0.57],[0.60,0.32],[0.32,0.39],[0.19,0.57],[0.24,0.38],[0.56,0.39],[0.14,0.57],[0.54,0.34],[0.61,0.38],[0.27,0.53],[0.20,0.46],[0.61,0.32],[0.27,0.42],[0.27,0.42],[0.20,0.56],[0.30,0.41],[0.31,0.51],[0.32,0.39],[0.31,0.51],[0.29,0.48],[0.20,0.46],[0.33,0.51],[0.31,0.43],[0.30,0.41],[0.27,0.44],[0.31,0.51],[0.29,0.48],[0.35,0.35],[0.46,0.38],[0.28,0.51],[0.61,0.38],[0.31,0.49],[0.33,0.51],[0.59,0.29],[0.14,0.57],[0.31,0.51],[0.39,0.40],[0.32,0.39],[0.20,0.56],[0.55,0.31],[0.56,0.39],[0.24,0.49],[0.56,0.39],[0.27,0.50],[0.60,0.32],[0.54,0.34],[0.19,0.57],[0.28,0.51],[0.54,0.34],[0.56,0.39],[0.19,0.57],[0.59,0.31],[0.37,0.45],[0.19,0.57],[0.44,0.41],[0.32,0.43],[0.35,0.48],[0.24,0.49],[0.26,0.45],[0.14,0.57],[0.59,0.30],[0.26,0.45],[0.26,0.45],[0.14,0.57],[0.20,0.50],[0.31,0.45],[0.27,0.51],[0.30,0.41],[0.19,0.57],[0.30,0.41],[0.27,0.50],[0.34,0.35],[0.30,0.42],[0.27,0.42],[0.27,0.42],[0.34,0.35],[0.35,0.35],[0.14,0.57],[0.45,0.36],[0.26,0.45],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.26,0.45],[0.26,0.45],[0.59,0.29],[0.19,0.57],[0.26,0.45],[0.32,0.39],[0.30,0.50],[0.28,0.50],[0.32,0.39],[0.59,0.29],[0.32,0.51],[0.56,0.39],[0.59,0.29],[0.61,0.38],[0.33,0.51],[0.22,0.44],[0.33,0.36],[0.27,0.42],[0.20,0.56],[0.28,0.51],[0.31,0.48],[0.20,0.56],[0.61,0.32],[0.24,0.54],[0.59,0.29],[0.32,0.43],[0.61,0.32],[0.19,0.57],[0.61,0.38],[0.55,0.31],[0.19,0.57],[0.31,0.46],[0.32,0.52],[0.30,0.41],[0.28,0.51],[0.28,0.50],[0.60,0.32],[0.61,0.32],[0.27,0.50],[0.59,0.29],[0.41,0.47],[0.39,0.42],[0.20,0.46],[0.19,0.57],[0.14,0.57],[0.23,0.47],[0.54,0.34],[0.28,0.51],[0.19,0.57],[0.33,0.37],[0.46,0.38],[0.27,0.42],[0.20,0.56],[0.39,0.42],[0.30,0.47],[0.26,0.45],[0.61,0.32],[0.61,0.38],[0.35,0.35],[0.14,0.57],[0.35,0.35],[0.28,0.51],[0.61,0.32],[0.24,0.54],[0.54,0.34],[0.28,0.43],[0.24,0.54],[0.30,0.41],[0.56,0.39],[0.23,0.52],[0.14,0.57],[0.26,0.45],[0.30,0.42],[0.32,0.43],[0.19,0.57],[0.45,0.36],[0.27,0.42],[0.29,0.48],[0.28,0.43],[0.27,0.51],[0.39,0.44],[0.32,0.49],[0.24,0.49],[0.56,0.39],[0.20,0.56],[0.30,0.42],[0.24,0.38],[0.46,0.38],[0.28,0.50],[0.26,0.45],[0.27,0.50],[0.23,0.47],[0.39,0.42],[0.28,0.51],[0.24,0.49],[0.27,0.42],[0.26,0.45],[0.60,0.32],[0.32,0.43],[0.39,0.42],[0.28,0.50],[0.28,0.52],[0.61,0.32],[0.32,0.39],[0.24,0.50],[0.39,0.40],[0.33,0.36],[0.24,0.38],[0.54,0.33],[0.19,0.57],[0.61,0.32],[0.33,0.36],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.34,0.35],[0.24,0.42],[0.27,0.42],[0.54,0.34],[0.54,0.34],[0.24,0.49],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.20,0.50],[0.14,0.57],[0.30,0.41],[0.30,0.41],[0.33,0.36],[0.26,0.45],[0.26,0.45],[0.23,0.47],[0.32,0.39],[0.27,0.53],[0.30,0.41],[0.20,0.46],[0.34,0.35],[0.34,0.35],[0.14,0.57],[0.46,0.38],[0.27,0.42],[0.36,0.44],[0.17,0.51],[0.60,0.32],[0.27,0.42],[0.20,0.56],[0.24,0.49],[0.41,0.40],[0.61,0.38],[0.19,0.57],[0.28,0.50],[0.23,0.52],[0.61,0.32],[0.39,0.46],[0.33,0.51],[0.19,0.57],[0.39,0.44],[0.56,0.39],[0.35,0.35],[0.28,0.43],[0.54,0.34],[0.36,0.44],[0.14,0.57],[0.61,0.38],[0.46,0.38],[0.61,0.32],[0.19,0.57],[0.54,0.34],[0.27,0.53],[0.33,0.51],[0.31,0.51],[0.59,0.29],[0.24,0.42],[0.28,0.43],[0.56,0.39],[0.28,0.50],[0.61,0.32],[0.29,0.48],[0.20,0.46],[0.50,0.29],[0.56,0.39],[0.20,0.50],[0.24,0.38],[0.32,0.39],[0.32,0.43],[0.28,0.50],[0.22,0.44],[0.20,0.56],[0.27,0.42],[0.61,0.38],[0.31,0.49],[0.20,0.46],[0.27,0.42],[0.24,0.38],[0.61,0.32],[0.26,0.45],[0.23,0.44],[0.59,0.30],[0.56,0.39],[0.33,0.44],[0.27,0.42],[0.31,0.51],[0.27,0.53],[0.32,0.39],[0.28,0.51],[0.30,0.42],[0.46,0.38],[0.27,0.42],[0.30,0.47],[0.39,0.40],[0.28,0.43],[0.30,0.42],[0.32,0.39],[0.59,0.31],[0.36,0.44],[0.54,0.34],[0.34,0.35],[0.30,0.41],[0.32,0.49],[0.32,0.43],[0.31,0.51],[0.32,0.52],[0.60,0.32],[0.19,0.57],[0.41,0.47],[0.32,0.39],[0.28,0.43],[0.28,0.51],[0.32,0.51],[0.56,0.39],[0.24,0.45],[0.55,0.31],[0.24,0.43],[0.61,0.38],[0.33,0.51],[0.30,0.41],[0.32,0.47],[0.32,0.38],[0.33,0.51],[0.39,0.40],[0.19,0.57],[0.27,0.42],[0.54,0.33],[0.59,0.29],[0.28,0.51],[0.61,0.38],[0.19,0.57],[0.30,0.41],[0.14,0.57],[0.32,0.39],[0.34,0.35],[0.54,0.34],[0.24,0.54],[0.56,0.39],[0.24,0.49],[0.61,0.32],[0.61,0.38],[0.61,0.32],[0.19,0.57],[0.14,0.57],[0.54,0.34],[0.59,0.29],[0.28,0.43],[0.19,0.57],[0.61,0.32],[0.32,0.43],[0.29,0.48],[0.56,0.39],[0.19,0.57],[0.56,0.39],[0.59,0.29],[0.59,0.29],[0.59,0.30],[0.14,0.57],[0.23,0.44],[0.28,0.50],[0.29,0.48],[0.31,0.45],[0.27,0.51],[0.24,0.45],[0.61,0.38],[0.24,0.49],[0.14,0.57],[0.61,0.32],[0.39,0.40],[0.33,0.44],[0.54,0.33],[0.33,0.51],[0.20,0.50],[0.19,0.57],[0.25,0.50],[0.28,0.43],[0.17,0.51],[0.19,0.57],[0.27,0.42],[0.20,0.56],[0.24,0.38],[0.19,0.57],[0.28,0.50],[0.28,0.50],[0.27,0.42],[0.26,0.45],[0.39,0.42],[0.23,0.47],[0.28,0.43],[0.32,0.39],[0.32,0.39],[0.24,0.54],[0.33,0.36],[0.29,0.53],[0.27,0.42],[0.44,0.41],[0.27,0.42],[0.33,0.36],[0.24,0.43],[0.61,0.38],[0.20,0.50],[0.55,0.31],[0.31,0.46],[0.60,0.32],[0.30,0.41],[0.41,0.47],[0.39,0.40],[0.27,0.53],[0.61,0.38],[0.46,0.38],[0.28,0.43],[0.44,0.41],[0.35,0.35],[0.24,0.49],[0.31,0.43],[0.27,0.42],[0.61,0.38],[0.29,0.48],[0.54,0.34],[0.61,0.32],[0.20,0.56],[0.24,0.49],[0.39,0.40],[0.27,0.42],[0.59,0.29],[0.59,0.29],[0.19,0.57],[0.24,0.54],[0.59,0.31],[0.24,0.38],[0.33,0.51],[0.23,0.44],[0.20,0.46],[0.24,0.45],[0.29,0.48],[0.28,0.50],[0.61,0.32],[0.19,0.57],[0.22,0.44],[0.19,0.57],[0.39,0.44],[0.19,0.57],[0.28,0.50],[0.30,0.41],[0.44,0.41],[0.28,0.52],[0.28,0.43],[0.54,0.33],[0.28,0.50],[0.19,0.57],[0.14,0.57],[0.30,0.41],[0.26,0.45],[0.56,0.39],[0.27,0.51],[0.20,0.46],[0.24,0.38],[0.32,0.38],[0.26,0.45],[0.61,0.32],[0.59,0.29],[0.19,0.57],[0.43,0.45],[0.14,0.57],[0.35,0.35],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.56,0.39],[0.27,0.42],[0.19,0.57],[0.60,0.32],[0.24,0.54],[0.54,0.34],[0.61,0.38],[0.33,0.51],[0.27,0.42],[0.32,0.39],[0.34,0.35],[0.20,0.56],[0.26,0.45],[0.32,0.51],[0.33,0.51],[0.35,0.35],[0.31,0.43],[0.56,0.39],[0.59,0.29],[0.28,0.43],[0.30,0.42],[0.27,0.44],[0.28,0.53],[0.29,0.48],[0.33,0.51],[0.60,0.32],[0.54,0.33],[0.19,0.57],[0.33,0.49],[0.30,0.41],[0.54,0.34],[0.27,0.53],[0.19,0.57],[0.19,0.57],[0.32,0.39],[0.20,0.56],[0.35,0.35],[0.30,0.42],[0.46,0.38],[0.54,0.34],[0.54,0.34],[0.14,0.57],[0.33,0.51],[0.32,0.39],[0.14,0.57],[0.59,0.29],[0.59,0.31],[0.30,0.41],[0.26,0.45],[0.32,0.38],[0.32,0.39],[0.59,0.31],[0.20,0.56],[0.20,0.46],[0.29,0.48],[0.59,0.29],[0.39,0.40],[0.28,0.50],[0.32,0.39],[0.28,0.53],[0.44,0.41],[0.20,0.50],[0.24,0.49],[0.20,0.46],[0.28,0.52],[0.24,0.50],[0.32,0.43],[0.39,0.40],[0.38,0.44],[0.60,0.32],[0.54,0.33],[0.61,0.32],[0.19,0.57],[0.59,0.29],[0.33,0.49],[0.28,0.43],[0.24,0.38],[0.30,0.41],[0.27,0.51],[0.35,0.48],[0.61,0.32],[0.43,0.45],[0.20,0.50],[0.24,0.49],[0.20,0.50],[0.20,0.56],[0.29,0.48],[0.14,0.57],[0.14,0.57],[0.26,0.45],[0.26,0.45],[0.39,0.40],[0.33,0.36],[0.56,0.39],[0.59,0.29],[0.27,0.42],[0.35,0.35],[0.30,0.41],[0.20,0.50],[0.19,0.57],[0.29,0.48],[0.39,0.42],[0.37,0.45],[0.30,0.41],[0.20,0.56],[0.30,0.42],[0.41,0.47],[0.28,0.43],[0.14,0.57],[0.27,0.53],[0.32,0.39],[0.30,0.41],[0.34,0.35],[0.32,0.47],[0.33,0.51],[0.20,0.56],[0.56,0.39],[0.60,0.32],[0.28,0.52],[0.56,0.39],[0.44,0.41],[0.27,0.42],[0.00,1.00],[0.29,0.49],[0.89,0.06],[0.22,0.66],[0.18,0.70],[0.67,0.22],[0.14,0.79],[0.58,0.17],[0.67,0.12],[0.95,0.05],[0.46,0.26],[0.15,0.54],[0.16,0.67],[0.48,0.31],[0.41,0.29],[0.18,0.66],[0.10,0.71],[0.11,0.72],[0.65,0.15],[0.94,0.03],[0.17,0.67],[0.44,0.29],[0.32,0.38],[0.79,0.10],[0.52,0.26],[0.25,0.59],[0.89,0.04],[0.69,0.13],[0.43,0.34],[0.75,0.07],[0.16,0.65],[0.02,0.70],[0.38,0.33],[0.57,0.23],[0.75,0.07],[0.25,0.58],[0.94,0.02],[0.55,0.22],[0.58,0.17],[0.14,0.79],[0.20,0.56],[0.10,0.88],[0.15,0.79],[0.11,0.77],[0.67,0.22],[0.07,0.87],[0.43,0.33],[0.08,0.84],[0.05,0.67],[0.07,0.77],[0.17,0.68],[1.00,0.00],[0.15,0.79],[0.08,0.77],[0.16,0.67],[0.69,0.13],[0.07,0.87],[0.15,0.54],[0.55,0.19],[0.14,0.63],[0.75,0.18],[0.25,0.63],[0.83,0.05],[0.55,0.50],[0.86,0.04],[0.73,0.18],[0.44,0.32],[0.70,0.15],[0.89,0.06],[0.17,0.67],[0.61,0.12],[0.55,0.50],[0.36,0.56],[0.03,0.86],[0.09,0.82],[0.09,0.82],[0.09,0.83],[0.17,0.68],[0.88,0.03],[0.64,0.22],[0.08,0.85],[0.74,0.16],[0.47,0.28],[0.05,0.84],[0.14,0.54],[0.01,0.93],[0.77,0.16],[0.17,0.60],[0.64,0.22],[0.84,0.05],[0.85,0.03],[0.23,0.67],[0.20,0.69],[0.00,0.87],[0.14,0.77],[0.11,0.69],[0.17,0.67],[0.56,0.27],[0.14,0.67],[0.37,0.31],[0.11,0.69],[0.35,0.52],[0.53,0.27],[0.50,0.21],[0.25,0.64],[0.36,0.56],[0.39,0.26],[0.02,0.83],[0.41,0.29],[0.07,0.77],[0.16,0.63],[0.92,0.03],[0.10,0.71],[0.83,0.05],[0.42,0.27],[0.62,0.12],[0.23,0.60],[0.19,0.61],[0.69,0.19],[0.21,0.65],[0.67,0.19],[0.18,0.69],[0.44,0.29],[0.14,0.65],[0.73,0.18],[0.15,0.66],[0.44,0.34],[0.74,0.10],[0.18,0.69],[0.25,0.61],[0.52,0.23],[0.06,0.82],[0.52,0.29],[0.22,0.68],[0.46,0.26],[0.14,0.54],[0.78,0.07],[0.80,0.05],[0.15,0.67],[0.10,0.82],[0.56,0.27],[0.64,0.22],[0.87,0.06],[0.14,0.66],[0.10,0.84],[0.88,0.05],[0.02,0.81],[0.62,0.15],[0.13,0.68],[0.50,0.28],[0.11,0.62],[0.46,0.32],[0.56,0.28],[0.43,0.28],[0.12,0.83],[0.11,0.80],[0.10,0.83],[0.90,0.04],[0.17,0.65],[0.15,0.63],[0.72,0.15],[0.64,0.26],[0.84,0.06],[0.09,0.83],[0.16,0.68],[0.09,0.63],[0.43,0.29],[0.88,0.05],[0.20,0.69],[0.73,0.09],[0.61,0.20],[0.67,0.13],[0.08,0.85],[0.73,0.16],[0.89,0.05],[0.41,0.25],[0.61,0.23],[0.58,0.22],[0.03,0.84],[0.58,0.24],[0.48,0.30],[0.25,0.54],[0.23,0.63],[0.41,0.46],[0.84,0.06],[0.45,0.29],[0.09,0.55],[0.54,0.26],[0.11,0.82],[0.69,0.18],[0.43,0.45],[0.43,0.28],[0.45,0.32],[0.07,0.78],[0.26,0.64],[0.92,0.04],[0.12,0.66],[0.32,0.51],[0.28,0.59],[0.70,0.18]])
x = data[:,0]
y = data[:,1]
#---------------------------ADD THIS---------------------------
# Define how to resample the data
n_bins = 20 # choose how many samples to use
bin_size = (max(x) - min(x))/n_bins
# Prepare empty arrays for resampled x and y
x_res, y_res = [],[]
# Resample the data with consistent density
for i in range(n_bins-1):
lower = x >= min(x)+i*bin_size
higher = x < min(x)+(i+1)*bin_size
x_res.append(np.mean(x[np.where(lower & higher)]))
y_res.append(np.mean(y[np.where(lower & higher)]))
#------------------------------------------------------
def func(x,a,b,c):
return a * np.exp(-b*x) + c
popt, pcov = curve_fit(func, x_res, y_res)
a, b, c = popt
x_line = np.arange(min(x), max(x), 0.01)
x_line = np.reshape(x_line,(-1,1))
y_line = func(x_line, a, b, c)
y_line = np.reshape(y_line,(-1,1))
plt.scatter(x,y,alpha=0.5)
plt.scatter(x_res, y_res)
plt.plot(x_line,y_line, c='red')
plt.show()
Which gives the output:
curve_fit does not give you a great deal of control over the fit, you may want to look at the much more general, but somewhat more complicated to use, least_squares:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html
where you can control a lot of things. curve_fit does give you a sigma parameter which allows you to weight the 'uncertainty' in your points. The trick here is to assign lower uncertainty to the points around x=1 where the fit is poor. By giving it lower uncertainty the fitter will try harder to fit them.
after some experimenting, if replacing your ...curve_fit... line with
uncertainty = np.exp(-5*x*x)
popt, pcov = curve_fit(func, x, y, sigma = uncertainty)
I got the following fit
you can try to improve this by playing with the uncertainty vector above

Predicting on new data using locally weighted regression (LOESS/LOWESS)

How to fit a locally weighted regression in python so that it can be used to predict on new data?
There is statsmodels.nonparametric.smoothers_lowess.lowess, but it returns the estimates only for the original data set; so it seems to only do fit and predict together, rather than separately as I expected.
scikit-learn always has a fit method that allows the object to be used later on new data with predict; but it doesn't implement lowess.
Lowess works great for predicting (when combined with interpolation)! I think the code is pretty straightforward-- let me know if you have any questions!
Matplolib Figure
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.interpolate import interp1d
import statsmodels.api as sm
# introduce some floats in our x-values
x = list(range(3, 33)) + [3.2, 6.2]
y = [1,2,1,2,1,1,3,4,5,4,5,6,5,6,7,8,9,10,11,11,12,11,11,10,12,11,11,10,9,8,2,13]
# lowess will return our "smoothed" data with a y value for at every x-value
lowess = sm.nonparametric.lowess(y, x, frac=.3)
# unpack the lowess smoothed points to their values
lowess_x = list(zip(*lowess))[0]
lowess_y = list(zip(*lowess))[1]
# run scipy's interpolation. There is also extrapolation I believe
f = interp1d(lowess_x, lowess_y, bounds_error=False)
xnew = [i/10. for i in range(400)]
# this this generate y values for our xvalues by our interpolator
# it will MISS values outsite of the x window (less than 3, greater than 33)
# There might be a better approach, but you can run a for loop
#and if the value is out of the range, use f(min(lowess_x)) or f(max(lowess_x))
ynew = f(xnew)
plt.plot(x, y, 'o')
plt.plot(lowess_x, lowess_y, '*')
plt.plot(xnew, ynew, '-')
plt.show()
I've created a module called moepy that provides an sklearn-like API for a LOWESS model (incl. fit/predict). This enables predictions to be made using the underlying local regression models, rather than the interpolation method described in the other answers. A minimalist example is shown below.
# Imports
import numpy as np
import matplotlib.pyplot as plt
from moepy import lowess
# Data generation
x = np.linspace(0, 5, num=150)
y = np.sin(x) + (np.random.normal(size=len(x)))/10
# Model fitting
lowess_model = lowess.Lowess()
lowess_model.fit(x, y)
# Model prediction
x_pred = np.linspace(0, 5, 26)
y_pred = lowess_model.predict(x_pred)
# Plotting
plt.plot(x_pred, y_pred, '--', label='LOWESS', color='k', zorder=3)
plt.scatter(x, y, label='Noisy Sin Wave', color='C1', s=5, zorder=1)
plt.legend(frameon=False)
A more detailed guide on how to use the model (as well as its confidence and prediction interval variants) can be found here.
Consider using Kernel Regression instead.
statmodels has an implementation.
If you have too many data points, why not use sk.learn's radiusNeighborRegression and specify a tricube weighting function?
It's not clear whether it's a good idea to have a dedicated LOESS object with separate fit/predict methods like what is commonly found in Scikit-Learn. By contrast, for neural networks, you could have an object which stores only a relatively small set of weights. The fit method would then optimize the "few" weights by using a very large training dataset. The predict method only needs the weights to make new predictions, and not the entire training set.
Predictions based on LOESS and nearest neighbors, on the other hand, need the entire training set to make new predictions. The only thing a fit method could do is store the training set in the object for later use. If x and y are the training data, and x0 are the points at which to make new predictions, this object-oriented fit/predict solution would look something like the following:
model = Loess()
model.fit(x, y) # No calculations. Just store x and y in model.
y0 = model.predict(x0) # Uses x and y just stored.
By comparison, in my localreg library, I opted for simplicity:
y0 = localreg(x, y, x0)
It really comes down to design choices, as the performance would be the same.
One advantage of the fit/predict approach is that you could have a unified interface like they do in Scikit-Learn, where one model could easily be swapped by another. The fit/predict approach also encourages a machine learning way to think of it, but in that sense LOESS is not very efficient, since it requires storing and using all the data for every new prediction. The latter approach leans more towards the origins of LOESS as a scatterplot smoothing algorithm, which is how I prefer to think about it. This might also shed some light on why statsmodel do it the way they do.
Check out the loess class in scikit-misc. The fitted object has a predict method:
loess_fit = loess(x, y, span=.01);
loess_fit.fit();
preds = loess_fit.predict(x_new).values
https://has2k1.github.io/scikit-misc/stable/generated/skmisc.loess.loess.html

numpy.polyfit versus scipy.odr

I have a data set which in theory is described by a polynomial of the second degree. I would like to fit this data and I have used numpy.polyfit to do this. However, the down side is that the error on the returned coefficients is not available. Therefore I decided to also fit the data using scipy.odr. The weird thing was that the coefficients for the polynomial deviated from each other.
I do not understand this and therefore decided to test both fitting routines on a set of data that I produce my self:
import numpy
import scipy.odr
import matplotlib.pyplot as plt
x = numpy.arange(-20, 20, 0.1)
y = 1.8 * x**2 -2.1 * x + 0.6 + numpy.random.normal(scale = 100, size = len(x))
#Define function for scipy.odr
def fit_func(p, t):
return p[0] * t**2 + p[1] * t + p[2]
#Fit the data using numpy.polyfit
fit_np = numpy.polyfit(x, y, 2)
#Fit the data using scipy.odr
Model = scipy.odr.Model(fit_func)
Data = scipy.odr.RealData(x, y)
Odr = scipy.odr.ODR(Data, Model, [1.5, -2, 1], maxit = 10000)
output = Odr.run()
#output.pprint()
beta = output.beta
betastd = output.sd_beta
print "poly", fit_np
print "ODR", beta
plt.plot(x, y, "bo")
plt.plot(x, numpy.polyval(fit_np, x), "r--", lw = 2)
plt.plot(x, fit_func(beta, x), "g--", lw = 2)
plt.tight_layout()
plt.show()
An example of an outcome is as follows:
poly [ 1.77992643 -2.42753714 3.86331152]
ODR [ 3.8161735 -23.08952492 -146.76214989]
In the included image, the solution of numpy.polyfit (red dashed line) corresponds pretty well. The solution of scipy.odr (green dashed line) is basically completely off. I do have to note that the difference between numpy.polyfit and scipy.odr was less in the actual data set I wanted to fit. However, I do not understand where the difference between the two comes from, why in my own testing example the difference is extremely big, and which fitting routine is better?
I hope you can provide answers that might help me give a better understanding between the two fitting routines and in the process provide answers to the questions I have.
In the way you are using ODR it does a full orthogonal distance regression. To have it do a normal nonlinear least squares fit add
Odr.set_job(fit_type=2)
before starting the optimization and you will get what you expected.
The reason that the full ODR fails so badly is due to not specifying weights/standard deviations. Obviously it does hard to interpret that point cloud and assumes equal wheights for x and y. If you provide estimated standard deviations, odr will yield a good (though different of course) result, too.
Data = scipy.odr.RealData(x, y, sx=0.1, sy=10)
The actual problem is that the odr output has the beta coefficients in the opposite order than numpy.polyfit has. So the green curve is not calculated correctly. To plot it, use instead
plt.plot(x, fit_func(beta[::-1], x), "g--", lw = 2)

Scipy Fmin Guassian model to real data

I've been trying to solve this for a bit and really just haven't seen an example or anything that my brain is able to use to move forward.
The goal is to find a model Gaussian curve by minimizing the total chi-squared between the real data and the model resulting from unknown parameters that require sensible estimations (the Gaussian is of unknown position, amplitude and width). scipy.optimize.fmin has come up but I've never used this before and I'm still very new to python...
Ultimately, I'd like to plot the original data along with the model - I have use pyplot before, it's just generating the model and using fmin that has me completely bewildered where I'm essentially here:
def gaussian(a, b, c, x):
return a*np.exp(-(x-b)**2/(2*c**2))
I've seen multiple ways to generate a model and this has rendered me confused and thus I have no code! I have imported my data file through np.loadtxt.
Thanks for anyone that can suggest a framework or help at all.
There are basically four (or five) main steps involved in model fitting problems like this:
Define your forward model, yhat = F(P, x), that takes a set of parameters P and your independent variable x, and estimates your response variable y
Define your loss function, loss = L(P, x, y) that you'd like to minimize over your parameters
Optional: define a function that returns the Jacobian matrix, i.e. the partial derivatives of your loss function w.r.t. your model parameters.*
Make an initial guess at your model parameters
Plug all these into one of the optimizers and get the fitted parameters for your model
Here's a worked example to get you started:
import numpy as np
from scipy.optimize import minimize
from matplotlib import pyplot as pp
# function that defines the model we're fitting
def gaussian(P, x):
a, b, c = P
return a*np.exp(-(x-b)**2 /( 2*c**2))
# objective function to minimize
def loss(P, x, y):
yhat = gaussian(P, x)
return ((y - yhat)**2).sum()
# generate a gaussian distribution with known parameters
amp = 1.3543
pos = 64.546
var = 12.234
P_real = np.array([amp, pos, var])
# we use the vector of real parameters to generate our fake data
x = np.arange(100)
y = gaussian(P_real, x)
# add some gaussian noise to make things harder
y_noisy = y + np.random.randn(y.size)*0.5
# minimize needs an initial guess at the model parameters
P_guess = np.array([1, 50, 25])
# minimize provides a unified interface to all of scipy's solvers. you
# can also access them individually in scipy.optimize, but the
# standalone versions have annoying differences in their syntax. for now
# we'll use the Nelder-Mead solver, which doesn't use the Jacobian. we
# also need to hand it x and y_noisy as additional args to loss()
res = minimize(loss, P_guess, method='Nelder-Mead', args=(x, y_noisy))
# res is a dict containing the results of the optimization. in particular we
# want the optimized model parameters:
P_fit = res['x']
# we can pass these to gaussian() to evaluate our fitted model
y_fit = gaussian(P_fit, x)
# now let's plot the results:
fig, ax = pp.subplots(1,1)
ax.hold(True)
ax.plot(x, y, '-r', lw=2, label='Real')
ax.plot(x, y_noisy, '-k', alpha=0.5, label='Noisy')
ax.plot(x, y_fit, '--b', lw=5, label='Fit')
ax.legend(loc=0, fancybox=True)
*Some solvers, e.g. conjugate gradient methods, take the Jacobian as an additional argument, and by and large these solvers are faster and more robust, but if you're feeling lazy and performance isn't all that critical then you can usually get away without providing the Jacobian, in which case it will use the finite differences method to estimate the gradients.
You can read more about the different solvers here

Categories

Resources