Curve Does not Fit (Python) - python

I made a curve fitting application but the curve does not fit truly. I can't solve that problem.
enter image description here
Here's my code btw.
import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
c = [0.3, 0.5, 1, 1.2, 2.1, 2.5 ,2.88 ]
d = [20.93, 25.03, 35.75, 40.37, 66.32, 81.41, 104.52 ]
x = np.array(c)
y = np.array(d)
def test(x, a, b):
return a * np.sin(b * x)
param, param_cov = curve_fit(test, x, y,)
print("Sine function coefficients:")
print(param)
print("Covariance of coefficients:")
print(param_cov)
ans = (param[0]*(np.sin(param[1]*x)))
plt.plot(x, y, 'o', color ='red', label ="data")
plt.plot(x, ans, '--', color ='blue', label ="fitted curve")
plt.legend()
plt.show()

The sine function is a bad choice for this fitting as you can see from the covariance values. The exponential function is a lot better. So you have chosen the wrong model.
import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
c = [0.3, 0.5, 1, 1.2, 2.1, 2.5 ,2.88 ]
d = [20.93, 25.03, 35.75, 40.37, 66.32, 81.41, 104.52 ]
x = np.array(c)
y = np.array(d)
def test(x, a, b):
return a * np.exp(-b * x)
param, param_cov = curve_fit(test, x, y)
print("Exp function coefficients:")
print(param)
print("Covariance of coefficients:")
print(param_cov)
ans = test(x, *param)
plt.plot(x, y, 'o', color ='red', label ="data")
plt.plot(x, ans, '--', color ='blue', label ="fitted curve")
plt.legend()
plt.show()

So I am not sure about the method in which scipy fits the curve. Considering that you are using a sin function, multiple fits could be optimal. Please check this post, at the bottom it explains the use of evolutionary approach with SciPy that might fit your case more. scipy curve_fit do not converge even if I iteratively change initial guess

I would also like to suggest a somewhat more automatic way to fit functions and data points (I don't know if it's useful/applies well to your case) but you should check and give a try to numpy.polyfit - the documentation and minimal examples can be seen here.
Just to show how efficient the library is, let's check it running on your own data points with a third order polynomial fit using the simple following script:
import matplotlib.pyplot as plt
import numpy as np
c = np.array([0.3, 0.5, 1, 1.2, 2.1, 2.5 ,2.88 ])
d = np.array([20.93, 25.03, 35.75, 40.37, 66.32, 81.41, 104.52 ])
z = np.polyfit(c, d, 3)
p = np.poly1d(z)
xp = np.linspace(0, 3, 100)
plt.plot(c, d, 'o', label = 'data points')
plt.plot(xp, p(xp), '-', label = 'fit pol. 1-D')
plt.legend()
plt.show()
So, that code should return
without you having to concern about a function that will probably fit you points well, as #blunova brilliantly explained and demonstrated in the other answer (it will be really useful when you're dealing with a lot of data points.). You can even use higher order polynomials to fit your data, but notice that in some level they will fluctuate quite intensely and may end up becoming not so useful. You can use lowers too!
just at a example level, I'll leave a script with another order for you to compare:
import matplotlib.pyplot as plt
import numpy as np
import warnings
c = np.array([0.3, 0.5, 1, 1.2, 2.1, 2.5 ,2.88 ])
d = np.array([20.93, 25.03, 35.75, 40.37, 66.32, 81.41, 104.52 ])
z = np.polyfit(c, d, 3)
p = np.poly1d(z)
xp = np.linspace(0, 3, 100)
with warnings.catch_warnings():
warnings.simplefilter('ignore', np.RankWarning)
p30 = np.poly1d(np.polyfit(c, d, 30))
plt.plot(c, d, 'o', label = 'data points')
plt.plot(xp, p(xp), '-', label = 'fit pol. 3-order')
plt.plot(xp, p30(xp), '--', label = 'fit pol. 30-order')
plt.legend()
plt.show()
OUTPUT:

Related

How to perform a linear regression with a forced gradient in Python?

I am trying to do a linear regression on some limited and scattered data. I know from theory that the gradient should be 1, but it may have a y-offset. I found a lot of resources on how to force an intercept for linear regression, but never on forcing a gradient. I need the linear regression statistics to be reported and the gradient to be precisely 1.
Would I need to manually calculate the statistics? Or is there a way to use some packages like "statsmodels," "scipy," or "scikit-learn"? Or do I need to use a Bayesian approach with previous knowledge of the gradient?
Here is a graphical example of what I am trying to achieve.
import numpy as np
import matplotlib.pyplot as plt
# Generate random data to illustrate the point
n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # Add noise to the 1:1 relationship
plt.scatter(x, y, ec="k", label="Measured data")
true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1 line
plt.plot(true_x, true_x-1, "r:", label="Forced gradient") # Theoretical line
m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x\*m + c, "g:", label="Linear regression")
plt.xlabel("Theoretical value")
plt.ylabel("Measured value")
plt.legend()
I suggest using scipy.optimize.curve_fit that has the benefit of being flexible and easy to use also for non-linear regressions. You just need to define a function that represents a line with a known gradient and an offset given as input:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a):
gradient = 1 # fixed gradient, not optimized
return gradient * x + a
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
That generates the plot:

Calculating the intersection points of 3 horizontal lines and a cubic spline

I have a problem, similar to those, I am posting now. I wanted to calculate the intersection point between one cubic spline and 3 horizontal lines. For all of these horizontal lines I knew the y-value, and I wanted to find out the corresponding x-value of the intersection.
I hope you can help me. I am sure it is very easy to solve for more experienced coders!
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = np.arange(0, 10)
y = np.exp(-x**2.0)
spline = interpolate.interp1d(x, y, kind = "cubic")
xnew = np.arange(0, 9, 0.1)
ynew = spline(xnew)
x1=np.arange(0,10)
y1=1/10*np.ones(10)
x2=np.arange(0,10)
y2=2/10*np.ones(10)
x3=np.arange(0,10)
y3=3/10*np.ones(10)
plt.plot(x,y,'o', xnew, ynew, '-', x1,y1, '-.', x2,y2, '-.', x3,y3, '-.')
plt.show()
for i in range(1,4):
idx = np.argwhere(np.diff(np.sign(spline-y_i))).flatten()
list_idx.append(idx)
print(list_idx)
You can use scipy.interpolate.InterpolatedUnivariateSpline's roots() function to find the roots. So first you have to subtract the y-value from the function and find the roots, which gives you the x-value at that particular y-value.
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = np.arange(0, 10)
y = np.exp(-x**2.0)
spline = interpolate.interp1d(x, y, kind = "cubic")
xnew = np.arange(0, 9, 0.1)
ynew = spline(xnew)
x1=np.arange(0,10)
y1=1*np.ones(10)/10
x2=np.arange(0,10)
y2=2*np.ones(10)/10
x3=np.arange(0,10)
y3=3*np.ones(10)/10
plt.plot(x,y,'o', xnew, ynew, '-', x1,y1, '-.', x2,y2, '-.', x3,y3, '-.')
plt.show()
y_val = 0.2
func = np.array(y) - y_val
sub_funct = interpolate.InterpolatedUnivariateSpline(x, func) # to find the roots we need to substract y_val from the function
root = sub_funct.roots() # find roots here
print(root)
This prints the x value when y_val=0.2 as,
[1.36192179]
EDIT
You can plot the output figure as follows.
plt.arrow(root, y_val, 0, -y_val, head_width=0.2, head_length=0.06)

How to get a robust nonlinear regression fit using scipy.optimize.least_squares?

My specific issue is that I cannot seem to get my data to converted to floating points. I have data and simply want to fit a robust curve using my model equation:
y = a * e^(-b*z)
This cookbook is my reference: click
Below is my attempt. I am getting this:
TypeError: 'data type not understood'
which I believe is because my columns are strings, so I tried pd.Series.astype().
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import least_squares
for i in range(1):
def model(z, a, b):
y = a * np.exp(-b * z)
return y
data = pd.read_excel('{}.xlsx'.format(600+i), names = ['EdGnd','380','395','412','443','465','490','510','520','532','555','560','565','589','625','665','670','683','694','710','Temp','z','EdZTemp','Tilt','Roll','EdZVin'])
data.dropna(axis = 0, how = 'any')
data.astype('float')
np.dtype(data)
data.plot.scatter('z','380')
def fun(x, z, y):
return x[0] * np.exp(-x[1] * z) - y
x0 = np.ones(3)
rbst1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=('z', 'ed380'))
y_robust = model('z', *rbst1.x)
plt.plot('z', y_robust, label='robust lsq')
plt.xlabel('$z$')
plt.ylabel('$Ed$')
plt.legend();
I think the problem is that you pass 'z' in args which is a string and can therefore not be used in the multiplication.
Below is some code using curve_fit which uses least_squares but might be slightly easier to use:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
# your model definition
def model(z, a, b):
return a * np.exp(-b * z)
# your input data
x = np.array([20, 30, 40, 50, 60])
y = np.array([5.4, 4.0, 3.0, 2.2, 1.6])
# do the fit with some initial values
popt, pcov = curve_fit(model, x, y, p0=(5, 0.1))
# prepare some data for a plot
xx = np.linspace(20, 60, 1000)
yy = model(xx, *popt)
plt.plot(x, y, 'o', xx, yy)
plt.title('Exponential Fit')
plt.show()
This will plot
You could try to adapt this code for your needs.
If you want to use f_scale you can use:
popt, pcov = curve_fit(model, x, y, p0=(5, 0.1), method='trf', f_scale=0.1)
See the documentation:
kwargs
Keyword arguments passed to leastsq for method='lm' or least_squares otherwise.
If you have an unbound problem, by default method='lm' is used which uses leastsq which does not accept f_scale as a keyword. Therefore, we can use method='trf' which then uses least_squares which accepts f_scale.

Sigmoidal curve fit, how to get the value of x when y=0.5

I want to solve the following function so that after fitting, I want to get the value of x when y=0.5.
The function:
import numpy as np
from scipy.optimize import curve_fit
def sigmoid(x, b, c):
y = 1 / (1 + c*np.exp(-b*x))
return y
x_data = [4, 6, 8, 10]
y_data = [0.86, 0.73, 0.53, 0.3]
popt, pcov = curve_fit(sigmoid, x_data, y_data,(28.14,-0.25))
please explain how would you carry out this using python!
Thanks!
When I run your code I get a warning, and popt is the same as your initial guess, (28.14, -0.25). If you try plotting this you'll see that it's essentially a straight line at y == 1 that doesn't fit your data well at all:
from matplotlib import pyplot as plt
x = np.linspace(4, 10, 1000)
y = sigmoid(x, *popt)
fig, ax = plt.subplots(1, 1)
ax.hold(True)
ax.scatter(x_data, y_data, s=50, zorder=20)
ax.plot(x, y, '-k', lw=2)
The problem is that you're initializing with a negative value for the b parameter. Remember that b gets negated, so you're actually exponentiating x times a positive number, which blows up your denominator. Instead you want to initialize with a positive value for b, but perhaps a negative value for c (to give you your negative slope):
popt2, pcov2 = curve_fit(sigmoid, x_data, y_data, (-0.5, 0.1))
y2 = sigmoid(x, *popt2)
ax.plot(x, y2, '-r', lw=2)
To get the value of x at y == 0.5 using nonlinear optimization you need to define an objective function, which could be the square of the difference between 0.5 and sigmoid(x, b, c):
def objective(x, b, c):
return (0.5 - sigmoid(x, b, c)) ** 2
You can then use scipy.optimize.minimize or scipy.optimize.minimize_scalar to find the value of x that minimizes the objective function:
from scipy.optimize import minimize_scalar
res = minimize_scalar(objective, bracket=(4, 10), args=tuple(popt2))
ax.annotate("$y = 0.5$", (res.x, 0.5), (30, 30), textcoords='offset points',
arrowprops=dict(facecolor='black', shrink=0.05), fontsize='x-large')

Calculate confidence band of least-square fit

I got a question that I fight around for days with now.
How do I calculate the (95%) confidence band of a fit?
Fitting curves to data is the every day job of every physicist -- so I think this should be implemented somewhere -- but I can't find an implementation for this neither do I know how to do this mathematically.
The only thing I found is seaborn that does a nice job for linear least-square.
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
data = {'x':x, 'y':y}
frame = pd.DataFrame(data, columns=['x', 'y'])
sns.lmplot('x', 'y', frame, ci=95)
plt.savefig("confidence_band.pdf")
But this is just linear least-square. When I want to fit e.g. a saturation curve like , I'm screwed.
Sure, I can calculate the t-distribution from the std-error of a least-square method like scipy.optimize.curve_fit but that is not what I'm searching for.
Thanks for any help!!
You can achieve this easily using StatsModels module.
Also see this example and this answer.
Here is an answer for your question:
import numpy as np
from matplotlib import pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table
x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
X = sm.add_constant(x)
res = sm.OLS(y, X).fit()
st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label="data")
ax.plot(X, fittedvalues, 'r-', label='OLS')
ax.plot(X, predict_ci_low, 'b--')
ax.plot(X, predict_ci_upp, 'b--')
ax.plot(X, predict_mean_ci_low, 'g--')
ax.plot(X, predict_mean_ci_upp, 'g--')
ax.legend(loc='best');
plt.show()
kmpfit's confidence_band() calculates the confidence band for non-linear least squares. Here for your saturation curve:
from pylab import *
from kapteyn import kmpfit
def model(p, x):
a, b = p
return a*(1-np.exp(b*x))
x = np.linspace(0, 10, 100)
y = .1*np.random.randn(x.size) + model([1, -.4], x)
fit = kmpfit.simplefit(model, [.1, -.1], x, y)
a, b = fit.params
dfdp = [1-np.exp(b*x), -a*x*np.exp(b*x)]
yhat, upper, lower = fit.confidence_band(x, dfdp, 0.95, model)
scatter(x, y, marker='.', color='#0000ba')
for i, l in enumerate((upper, lower, yhat)):
plot(x, l, c='g' if i == 2 else 'r', lw=2)
savefig('kmpfit confidence bands.png', bbox_inches='tight')
The dfdp are the partial derivatives ∂f/∂p of the model f = a*(1-e^(b*x)) with respect to each parameter p (i.e., a and b), see my answer to a similar question for background links. And here the output:

Categories

Resources