Trying to fit the Gaussian function to data - python

I'm trying to fit some data that is approximately Gaussian to the function in python using the curve_fit method. For the initial guesses for the parameters I've calculated the mean and standard deviation of the data. However, I'm getting a really bad fit and I'm not sure why. This is my code:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import math
def func(x, mu, sig):
return (1./(sig*math.sqrt(2*math.pi)))*np.exp(-np.power(x-mu,2.)/(2*np.power(sig, 2.)));
xdata = np.linspace(4, 14, 21)
ydata = np.array([0.2,0.8,1.8,1.9,5.9,7,11,12.6,14,13.3,11.8,9.3,5.2,3.1,1.5,0.7,0.4,0.1,0.3,0.1,0.1])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata,[4.8,5.1])
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
The fitted model (in red) looks like this:
enter image description here

The reason you're getting a nonsense answer is because you're trying to fit the Gaussian function of a normalised curve. See the difference here.
Instead, if you use the generalised form with 3 parameters, you get a much better fit.
def generalised_gaussian(x, a, b, c):
return a*np.exp(-np.power((x-b)/(2*c**2), 2))
xdata = np.linspace(4, 14, 21)
ydata = np.array([0.2,0.8,1.8,1.9,5.9,7,11,12.6,14,13.3,11.8,9.3,5.2,3.1,1.5,0.7,0.4,0.1,0.3,0.1,0.1])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(generalised_gaussian, xdata, ydata)
plt.plot(xdata, generalised_gaussian(xdata, *popt), 'r-', label='fit')

Related

How to perform a linear regression with a forced gradient in Python?

I am trying to do a linear regression on some limited and scattered data. I know from theory that the gradient should be 1, but it may have a y-offset. I found a lot of resources on how to force an intercept for linear regression, but never on forcing a gradient. I need the linear regression statistics to be reported and the gradient to be precisely 1.
Would I need to manually calculate the statistics? Or is there a way to use some packages like "statsmodels," "scipy," or "scikit-learn"? Or do I need to use a Bayesian approach with previous knowledge of the gradient?
Here is a graphical example of what I am trying to achieve.
import numpy as np
import matplotlib.pyplot as plt
# Generate random data to illustrate the point
n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # Add noise to the 1:1 relationship
plt.scatter(x, y, ec="k", label="Measured data")
true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1 line
plt.plot(true_x, true_x-1, "r:", label="Forced gradient") # Theoretical line
m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x\*m + c, "g:", label="Linear regression")
plt.xlabel("Theoretical value")
plt.ylabel("Measured value")
plt.legend()
I suggest using scipy.optimize.curve_fit that has the benefit of being flexible and easy to use also for non-linear regressions. You just need to define a function that represents a line with a known gradient and an offset given as input:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a):
gradient = 1 # fixed gradient, not optimized
return gradient * x + a
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
That generates the plot:

How to do Non-Linar Curve fitting using Python with user defined function?

I have the following dataset:
x = 2.5, 5, 7.5, 10, 20, 30
y = 18035.21768722, 18176.09871938, 18370.22289623, 18430.68522672, 18490.76110193, 18512.69861061
Now, I want to plot this data and fit this data set with my defined function f(x) = (A*K*x/1+K*x) and find the parameters A and K ?
I wrote the following python script but it seems like it can't do the fitting I require:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
x = np.array([2.5, 5, 7.5, 10, 20, 30])
y = np.array([18035.21768722, 18176.09871938, 18370.22289623, 18430.68522672, 18490.76110193, 18512.69861061])
def func(x, A, K):
return (A*K*x / 1+K*x)
plt.plot(x, y, 'b-', label='data')
popt, pcov = curve_fit(func, x, y)
plt.plot(x, func(x, *popt), 'r-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Can anyone help me with the changes in the python script or a new script where I can properly fit the data with my desired fitting function ?
Try fixing your function definition as suggested by HS-nebula in the comments, and add an initial guess for k using the p0 argument in curve_fit:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
x = np.array([2.5, 5, 7.5, 10, 20, 30])
y = np.array([18035.21768722, 18176.09871938, 18370.22289623, 18430.68522672, 18490.76110193, 18512.69861061])
def func(x, A, K):
return A*K*x / (1+K*x)
plt.plot(x, y, 'b-', label='data')
popt, pcov = curve_fit(func, x, y, p0=[1, 1e6])
plt.plot(x, func(x, *popt), 'r-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

How do i get a smooth fit for my data points, using "scipy.optimize.curve_fit"?

I want to fit some data points using scipy.optimize.curve_fit. Unfortunately I get an unsteady fit and I do not know why.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
M = np.array([730,910,1066,1088,1150], dtype=float)
V = np.array([95.71581923, 146.18564513, 164.46723727, 288.49796413, 370.98703941], dtype=float)
def func(x, a, b, c):
return a * np.exp(b * x) + c
popt, pcov = curve_fit(func, M, V, [0,0,1], maxfev=100000000)
print(*popt)
fig, ax = plt.subplots()
fig.dpi = 80
ax.plot(M, V, 'go', label='data')
ax.plot(M, func(M, *popt), '-', label='fit')
plt.xlabel("M")
plt.ylabel("V")
plt.grid()
plt.legend()
plt.show()
I would acutally expect some kind of a smooth curve. Can someone explain what I am doing wrong here?
You are only plotting the same x points as the original data in your call:
ax.plot(M, V, 'go', label='data')
ax.plot(M, func(M, *popt), '-', label='fit')
To fix this, you can use a wider range - here we use all the values from 700 to 1200:
toplot = np.arange(700,1200)
ax.plot(toplot, func(toplot, *popt), '-', label='fit')

How do I fit my function with data to get fit parameters?

I have a set of data where x and y are the known parameters in my function, they are written in the function as x=x and y=x1, and I need to fit the data so I can get values for the unknown parameters (E, B0, S0).
I have this so far but when I try to run this I get the error:
ValueError: x and y must have same first dimension, but have shapes (4L,) and (1L,)
This error happens when I try to plot the against the fit curve. Also I get this error in regards to the bounds I have setup:
lb, ub = [np.asarray(b, dtype=float) for b in bounds]
ValueError: too many values to unpack
:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func (x, x1, E, B0, S0):
# function to optimize where x and x1 are known
# E, B0, S0 need to be fitted
return sum((x-np.power((E*B0*(1+((x1-S0)/(B0)))),(1/2)))**2)
#define the data to be fit
xdata = [0.00, 3.42, 4.56, 5.31] #distance
ydata = [335.4, 149.1, 167.1, 292.2] # beam size
plt.plot(xdata, ydata, 'b-', label='data')
pl.show()
# fit for parameters E, B0, and S0
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
#put bounds on the optimization: 0.5<E<5, 1<S0<10, 0.1<B0,10
bnds= [(0.5,5.0),(0.1,10.0),(1,10)]
popt, pcov = curve_fit(func, xdata, ydata, bounds = [(0.5,5.0),(0.1,10.0),
(1.0,10.0)])
plt.plot(xdata,func(xdata, *popt),'g--', label='fit-with-bounds')
plt.xlabel('distance')
plt.ylabel('beam size')
plt.legend()
plt.show()
It's not clear what the sum in the func function is supposed to do. You may leave it out to get rid of the first error.
Second, the bounds in the curve_fit method are the bounds for the independent variable, not for the parameters. Leave the bounds out and you'll get rid of the second error.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func (x, x1, E, B0, S0):
# function to optimize where x and x1 are known
# E, B0, S0 need to be fitted
return (x-np.power((E*B0*(1.+((x1-S0)/(B0)))),(1/2.)))**2
#define the data to be fit
xdata = [0.00, 3.42, 4.56, 5.31] #distance
ydata = [335.4, 149.1, 167.1, 292.2] # beam size
plt.plot(xdata, ydata, 'b-', label='data')
# fit for parameters E, B0, and S0
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata,func(xdata, *popt),'g--', label='fit-with-bounds')
plt.xlabel('distance')
plt.ylabel('beam size')
plt.legend()
plt.show()
Now obviously "fit" and "fit-with-bounds" are the same.
Edit: To fit for E, B0, S0 only, the fit function should only take those values as arguments.
funcwithx1 = lambda x,x1, E, B0, S0: (x-np.power((E*B0*(1.+((x1-S0)/(B0)))),(1/2.)))**2
x1 = 4.6
func = lambda x, E, B0, S0: funcwithx1(x, x1, E, B0, S0)
The function is wrongly defined. You know the independent and dependent variables, but you only supply the independent one to the fitted function.
y = func(x; params)
as it stands now, your objective function has 4 parameters to be determined.
Later on, when invoking the curve_fit you supply both, the independent and dependent variables as you correctly do in
popt, pcov = curve_fit(func, xdata, ydata)
Thus, popt is an array of length 4 and probably causing you part of your problems.
I don't know exactly your objective function, so I'll not attempt to fix that. Hope this guides you to solve the issue.

scipy.optimize.curve_fit failing to fit curve

I am trying to fit some data using the following code:
xdata = [0.03447378, 0.06894757, 0.10342136, 0.13789514, 0.17236893,
0.20684271, 0.24131649, 0.27579028, 0.31026407, 0.34473785,
0.37921163, 0.41368542, 0.44815921, 0.48263299]
ydata = [ 2.5844 , 2.87449, 3.01929, 3.10584, 3.18305, 3.24166,
3.28897, 3.32979, 3.35957, 3.39193, 3.41662, 3.43956,
3.45644, 3.47135]
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d):
return a + b*x - c*np.exp(-d*x)
popt, pcov = curve_fit(func, xdata, ydata))
plt.figure()
plt.plot(xdata, ydata, 'ko', label="Original Noised Data")
plt.plot(xdata, func(xdata, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
The curve is not being fitted:
Data fit with straight line - should be curve
What should I be doing to correctly fit the data?
It looks like the optimizer is getting stuck in a local minimum, or perhaps just a very flat area of the objective function. A better fit can be found by tweaking the initial guess of the parameters that is used by curve_fit. For example, I get a reasonable-looking fit with p0=[1, 1, 1, 2.0] (the default is [1, 1, 1, 1]):
Here's the modified version of your script that I used:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d):
return a + b*x - c*np.exp(-d*x)
xdata = np.array([0.03447378, 0.06894757, 0.10342136, 0.13789514, 0.17236893,
0.20684271, 0.24131649, 0.27579028, 0.31026407, 0.34473785,
0.37921163, 0.41368542, 0.44815921, 0.48263299])
ydata = np.array([ 2.5844 , 2.87449, 3.01929, 3.10584, 3.18305, 3.24166,
3.28897, 3.32979, 3.35957, 3.39193, 3.41662, 3.43956,
3.45644, 3.47135])
p0 = [1, 1, 1, 2.0]
popt, pcov = curve_fit(func, xdata, ydata, p0=p0)
print(popt)
plt.figure()
plt.plot(xdata, ydata, 'ko', label="Original Noised Data")
plt.plot(xdata, func(xdata, *popt), 'r-', label="Fitted Curve")
plt.legend(loc='best')
plt.show()
The printed output is:
[ 3.13903988 0.71827903 0.97047248 15.40936232]
Please try to be more specific with the issue you're having.
Two things I noticed that will prevent your code from working as it is:
line 15 (the curve_fit() call), there is an additional right paranthesis at the end of the line
xdata is a python list, so this won't work once you try to multiply it with a parameter in func, i.e. turn it into a numpy array with
xdata = np.array(xdata)
If you fix these two issues, the fit should work.
Edit: Warren is of course right - fixing the above issues still will get you started in a wrong minimum.

Categories

Resources