Fitting two peaks with gauss in python

Fitting two peaks with gauss in python - python

Curve_fit is not fit properly. I'm trying to fit experimental data with curve_fit. The data is imported from a .txt file to a array:
d = np.loadtxt("data.txt")
data_x = np.array(d[:, 0])
data_y = np.array(d[:, 2])
data_y_err = np.array(d[:, 3])
Since i know there must be two peaks, my model is a sum of two gaussian curves:
def model_dGauss(x, xc, A, y0, w, dx):
P = A/(w*np.sqrt(2*np.pi))
mu1 = (x - (xc - dx/3))/(2*w**2)
mu2 = (x - (xc + 2*dx/3))/(2*w**2)
return y0 + P * ( np.exp(-mu1**2) + 0.5 * np.exp(-mu2**2))
Using values for the guess is very sensitive to my guess values. Where is the point of fitting data if just nearly perfect fitting parameter will provide a result? Or am I doing something completely wrong?
t = np.linspace(8.4, 10, 300)
guess_dG = [32, 1, 10, 0.1, 0.2]
popt, pcov = curve_fit(model_dGauss, data_x, data_y, p0=guess_dG, sigma=data_y_err, absolute_sigma=True)
A, xc, y0, w, dx = popt
Plotting the data
plt.scatter(data_x, data_y)
plt.plot(t, model_dGauss(t1,*popt))
plt.errorbar(data_x, data_y, yerr=data_y_err)
yields:
Plot result
The result is just a straight line at the bottom of my graph while the evaluated parameters are not that bad. How can that be?

A complete example of code is always appreciated (and, ahem, usually expected here on SO). To remove much of the confusion about using curve_fit here, allow me to suggest that you will have an easier time using lmfit (https://lmfit.github.io/lmfit-py) and especially its builtin model functions and its use of named parameters. With lmfit, your code for two Gaussians plus a constant offset might look like this:
from lmfit.models import GaussianModel, ConstantModel
# start with 1 Gaussian + Constant offset:
model = GaussianModel(prefix='p1_') + ConstantModel()
# this model will have parameters named:
# p1_amplitude, p1_center, p1_sigma, and c.
# here we give initial values to these parameters
params = model.make_params(p1_amplitude=10, p1_center=32, p1_sigma=0.5, c=10)
# optionally place bounds on parameters (probably not needed here):
params['p1_amplitude'].min = 0.
## params['p1_center'].vary = False # fix a parameter from varying in fit
# now do the fit (including weighting residual by 1/y_err):
result = model.fit(data_y, params, x=data_x, weights=1.0/data_y_err)
# print out param values, uncertainties, and fit statistics, or get best-fit
# parameters from `result.params`
print(result.fit_report())
# plot results
plt.errorbar(data_x, data_y, yerr=data_y_err, label='data')
plt.plot(data_x, result.best_fit, label='best fit')
plt.legend()
plt.show()
To add a second Gaussian, you could just do
model = GaussianModel(prefix='p1_') + GaussianModel(prefix='p2_') + ConstantModel()
# and then:
params = model.make_params(p1_amplitude=10, p1_center=32, p1_sigma=0.5, c=10,
p2_amplitude=2, p2_center=31.75, p1_sigma=0.5)
and so on.
Your model has the two gaussian sharing or at least having "linked" values - the sigma values should be the same for the two peaks and the amplitude of the 2nd is half that of the 1st. As defined so far, the 2-Gaussian model has all the parameters being independent. But lmfit has a mechanism for setting constraints on any parameter by giving an algebraic expression in terms of other parameters. So, for example, you could say
params['p2_sigma'].expr = 'p1_sigma'
params['p2_amplitude'].expr = 'p1_amplitude / 2.0'
Now, p2_amplitude and p2_sigma will not be independently varied in the fit but will be constrained to have those values.

Related

Trying to fit a trig function to data with scipy

I am trying to fit some data using scipy.optimize.curve_fit. I have read the documentation and also this StackOverflow post, but neither seem to answer my question.
I have some data which is simple, 2D data which looks approximately like a trig function. I want to fit it with a general trig function
using scipy.
My approach is as follows:
from __future__ import division
import numpy as np
from scipy.optimize import curve_fit
#Load the data
data = np.loadtxt('example_data.txt')
t = data[:,0]
y = data[:,1]
#define the function to fit
def func_cos(t,A,omega,dphi,C):
# A is the amplitude, omega the frequency, dphi and C the horizontal/vertical shifts
return A*np.cos(omega*t + dphi) + C
#do a scipy fit
popt, pcov = curve_fit(func_cos, t,y)
#Plot fit data and original data
fig = plt.figure(figsize=(14,10))
ax1 = plt.subplot2grid((1,1), (0,0))
ax1.plot(t,y)
ax1.plot(t,func_cos(t,*popt))
This outputs:
where blue is the data orange is the fit. Clearly I am doing something wrong. Any pointers?

If no values are provided for initial guess of the parameters p0 then a value of 1 is assumed for each of them. From the docs:
p0 : array_like, optional
Initial guess for the parameters (length N). If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).
Since your data has very large x-values and very small y-values an initial guess of 1 is far from the actual solution and hence the optimizer does not converge. You can help the optimizer by providing suitable initial parameter values that can be guessed / approximated from the data:
Amplitude: A = (y.max() - y.min()) / 2
Offset: C = (y.max() + y.min()) / 2
Frequency: Here we can estimate the number of zero crossing by multiplying consecutive y-values and check which products are smaller than zero. This number divided by the total x-range gives the frequency and in order to get it in units of pi we can multiply that number by pi: y_shifted = y - offset; oemga = np.pi * np.sum(y_shifted[:-1] * y_shifted[1:] < 0) / (t.max() - t.min())
Phase shift: can be set to zero, dphi = 0
So in summary, the following initial parameter guess can be used:
offset = (y.max() + y.min()) / 2
y_shifted = y - offset
p0 = (
(y.max() - y.min()) / 2,
np.pi * np.sum(y_shifted[:-1] * y_shifted[1:] < 0) / (t.max() - t.min()),
0,
offset
)
popt, pcov = curve_fit(func_cos, t, y, p0=p0)
Which gives me the following fit function:

Fitting Voigt function to data in Python

I recently got a script running to fit a gaussian to my absorption profile with help of SO. My hope was that things would work fine if I simply replace the Gauss function by a Voigt one, but this seems not to be the case. I think mainly due to the fact that it is a shifted voigt.
Edit: The profiles are absorption lines that vary in optical thickness. In practice they will be a mix between optically thick and thin features. Like the bottom part in this diagram. The current data will be more like the top image, but maybe the bottom is already flattened a bit. (And we only see the left side of the profile, a bit beyond the center)
For a Gauss it looks like this and as predicted the bottom seems to be less deep than the fit wants it to be, but still quite close. The profile itself should still be a voigt though. But now I realize that the central points might throw off the fit. So maybe a weight should be added based on wing position?
I'm mostly wondering if the shifted function could be mis-defined or if its my starting values.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
from scipy.special import wofz
x = np.arange(13)
xx = xx = np.linspace(0, 13, 100)
y = np.array([19699.959 , 21679.445 , 21143.195 , 20602.875 , 16246.769 ,
11635.25 , 8602.465 , 7035.493 , 6697.0337, 6510.092 ,
7717.772 , 12270.446 , 16807.81 ])
# weighted arithmetic mean (corrected - check the section below)
#mean = 2.4
sigma = 2.4
gamma = 2.4
def Gauss(x, y0, a, x0, sigma):
return y0 + a * np.exp(-(x - x0)**2 / (2 * sigma**2))
def Voigt(x, x0, y0, a, sigma, gamma):
#sigma = alpha / np.sqrt(2 * np.log(2))
return y0 + a * np.real(wofz((x - x0 + 1j*gamma)/sigma/np.sqrt(2))) / sigma /np.sqrt(2*np.pi)
popt, pcov = curve_fit(Voigt, x, y, p0=[8, np.max(y), -(np.max(y)-np.min(y)), sigma, gamma])
#p0=[8, np.max(y), -(np.max(y)-np.min(y)), mean, sigma])
plt.plot(x, y, 'b+:', label='data')
plt.plot(xx, Voigt(xx, *popt), 'r-', label='fit')
plt.legend()
plt.show()

I may be misunderstanding the model you're using, but I think you would need to include some sort of constant or linear background.
To do that with lmfit (which has Voigt, Gaussian, and many other models built in, and tries very hard to make these interchangeable), I would suggest starting with something like this:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel, VoigtModel, LinearModel, ConstantModel
x = np.arange(13)
xx = np.linspace(0, 13, 100)
y = np.array([19699.959 , 21679.445 , 21143.195 , 20602.875 , 16246.769 ,
11635.25 , 8602.465 , 7035.493 , 6697.0337, 6510.092 ,
7717.772 , 12270.446 , 16807.81 ])
# build model as Voigt + Constant
## model = GaussianModel() + ConstantModel()
model = VoigtModel() + ConstantModel()
# create parameters with initial values
params = model.make_params(amplitude=-1e5, center=8,
sigma=2, gamma=2, c=25000)
# maybe place bounds on some parameters
params['center'].min = 2
params['center'].max = 12
params['amplitude'].max = 0.
# do the fit, print out report with results
result = model.fit(y, params, x=x)
print(result.fit_report())
# plot data, best fit, fit interpolated to `xx`
plt.plot(x, y, 'b+:', label='data')
plt.plot(x, result.best_fit, 'ko', label='fitted points')
plt.plot(xx, result.eval(x=xx), 'r-', label='interpolated fit')
plt.legend()
plt.show()
And, yes, you can simply replace VoigtModel() with GaussianModel() or LorentzianModel() and redo the fit and compare the fit statistics to see which model is better.
For the Voigt model fit, the printed report would be
[[Model]]
(Model(voigt) + Model(constant))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 41
# data points = 13
# variables = 4
chi-square = 17548672.8
reduced chi-square = 1949852.54
Akaike info crit = 191.502014
Bayesian info crit = 193.761811
[[Variables]]
amplitude: -173004.338 +/- 30031.4068 (17.36%) (init = -100000)
center: 8.06574198 +/- 0.16209266 (2.01%) (init = 8)
sigma: 1.96247322 +/- 0.23522096 (11.99%) (init = 2)
c: 23800.6655 +/- 1474.58991 (6.20%) (init = 25000)
gamma: 1.96247322 +/- 0.23522096 (11.99%) == 'sigma'
fwhm: 7.06743644 +/- 0.51511574 (7.29%) == '1.0692*gamma+sqrt(0.8664*gamma**2+5.545083*sigma**2)'
height: -18399.0337 +/- 2273.61672 (12.36%) == '(amplitude/(max(2.220446049250313e-16, sigma*sqrt(2*pi))))*wofz((1j*gamma)/(max(2.220446049250313e-16, sigma*sqrt(2)))).real'
[[Correlations]] (unreported correlations are < 0.100)
C(amplitude, c) = -0.957
C(amplitude, sigma) = -0.916
C(sigma, c) = 0.831
C(center, c) = -0.151
Note that by default gamma is constrained to be the same value as sigma. This constraint can be lifted and gamma made to vary independently with params['gamma'].set(expr=None, vary=True, min=1.e-9). I think that you may not have enough data points in this data set to robustly and independently determine gamma.
The plot for that fit would look like this:

I managed to get something, but not very satisfying. If you remove the offset as a parameter and add 20000 directly in the Voigt function, with starting values [8, 126000, 0.71, 2] (the particular values don't' matter much) you'll get something like
Now, the fit produces a value for gamma which is negative which I cannot really justify. I would expect gamma to always be positive, but maybe I'm wrong and it's completely fine.
One thing you could try is to mirror your data so that its a "positive" peak (and while at it removing the background) and/or normalize the values. That might help you in the convergence.
I have no idea why when using the offset as a parameter the solver has problems finding an optimum. Maybe you need a different optimizer routine.
Maybe it'll be a better option to use the lmfit package that it's a wrapper over scipy to fit nonlinear functions with many prebuilt lineshapes. There is even an example of fitting a Voigt profile.

Scipy optimize curve_fit gives different plots for same parameters when fitting custom function

I have a problem with fitting a custom function using scipy.optimize in Python and I do not know, why that is happening. I generate data from centered and normalized binomial distribution (Gaussian curve) and then fit a curve. The expected outcome is in the picture when I plot my function over the fitted data. But when I do the fitting, it fails.
I'm convinced it is a pythonic thing because it should give the parameter a = 1 (that's how I define it) and it gives it but then the fit is bad (see picture). However, if I change sigma to 0.65*sigma in:
p_halfg, p_halfg_cov = optimize.curve_fit(lambda x, a:piecewise_half_gauss(x, a, sigma = 0.65*sigma_fit), x, y, p0=[1])
, it gives almost perfect fit (a is then 5/3, as predicted by math). Those fits should be the same and they are not!
I give more comments bellow. Could you please tell me what is happening and where the problem could be?
Plot with a=1 and sigma = sigma_fit
Plot with sigma = 0.65*sigma_fit
I generate data from normalized binomial distribution (I can provide my code but the values are more important now). It is a distribution with N = 10 and p = 0.5 and I'm centering it and taking only the right side of the curve. Then I'm fitting it with my half-gauss function, which should be the same distribution as binomial if its parameter a = 1 (and the sigma is equal to the sigma of the distribution, sqrt(np(1-p)) ). Now the problem is first that it does not fit the data as shown in the picture despite getting the correct value of parameter a.
Notice weird stuff... if I set sigma = 3* sigma_fit, I get a = 1/3 and a very bad fit (underestimate). If I set it to 0.2*sigma_fit, I get also a bad fit and a = 1/0.2 = 5 (overestimate). And so on. Why? (btw. a=1/sigma so the fitting procedure should work).
import numpy as np
import matplotlib.pyplot as plt
import math
pi = math.pi
import scipy.optimize as optimize
# define my function
sigma_fit = 1
def piecewise_half_gauss(x, a, sigma=sigma_fit):
"""Half of normal distribution curve, defined as gaussian centered at 0 with constant value of preexponential factor for x < 0
Arguments: x values as ndarray whose numbers MUST be float type (use linspace or np.arange(start, end, step, dtype=float),
a as a parameter of width of the distribution,
sigma being the deviation, second moment
Returns: Half gaussian curve
Ex:
>>> piecewise_half_gauss(5., 1)
array(0.04839414)
>>> x = np.linspace(0,10,11)
... piecewise_half_gauss(x, 2, 3)
array([0.06649038, 0.06557329, 0.0628972 , 0.05867755, 0.05324133,
0.04698531, 0.04032845, 0.03366645, 0.02733501, 0.02158627,
0.01657952])
>>> piecewise_half_gauss(np.arange(0,11,1, dtype=float), 1, 2.4)
array([1.66225950e-01, 1.52405153e-01, 1.17463281e-01, 7.61037856e-02,
4.14488078e-02, 1.89766470e-02, 7.30345854e-03, 2.36286717e-03,
6.42616248e-04, 1.46914868e-04, 2.82345875e-05])
"""
return np.piecewise(x, [x >= 0, x < 0],
[lambda x: np.exp(-x ** 2 / (2 * ((a * sigma) ** 2))) / (np.sqrt(2 * pi) * sigma * a),
lambda x: 1 / (np.sqrt(2 * pi) * sigma)])
# Create normalized data for binomial distribution Bin(N,p)
n = 10
p = 0.5
x = np.array([0., 1., 2., 3., 4., 5.])
y = np.array([0.25231325, 0.20657662, 0.11337165, 0.0417071 , 0.01028484,
0.00170007])
# Get the estimate for sigma parameter
sigma_fit = (n*p*(1-p))**0.5
# Get fitting parameters
p_halfg, p_halfg_cov = optimize.curve_fit(lambda x, a:piecewise_half_gauss(x, a, sigma = sigma_fit), x, y, p0=[1])
print(sigma_fit, p_halfg, p_halfg_cov)
## Plot the result
# unpack fitting parameters
a = np.float64(p_halfg)
# unpack uncertainties in fitting parameters from diagonal of covariance matrix
#da = [np.sqrt(p_halfg_cov[j,j]) for j in range(p_halfg.size)] # if we fit more parameters
da = np.float64(np.sqrt(p_halfg_cov[0]))
# create fitting function from fitted parameters
f_fit = np.linspace(0, 10, 50)
y_fit = piecewise_half_gauss(f_fit, a)
# Create figure window to plot data
fig = plt.figure(1, figsize=(10,10))
plt.scatter(x, y, color = 'r', label = 'Original points')
plt.plot(f_fit, y_fit, label = 'Fit')
plt.xlabel('My x values')
plt.ylabel('My y values')
plt.text(5.8, .25, 'a = {0:0.5f}$\pm${1:0.6f}'.format(a, da))
plt.legend()
However, if I plot it manually, it fits EXACTLY!
plt.scatter(x, y, c = 'r', label = 'Original points')
plt.plot(np.linspace(0,5,50), piecewise_half_gauss(np.linspace(0,5,50), 1, sigma_fit), label = 'Fit')
plt.legend()
EDIT -- solved:
it is a plotting problem, need to use
y_fit = piecewise_half_gauss(f_fit, a, sigma = 0.6*sigma_fit)

The problem was in plotting and fitting the parameters -- if I fit it with different sigma, I also need to change it in the plotting section when I generate y_fit:
# Get fitting parameters
p_halfg, p_halfg_cov = optimize.curve_fit(lambda x, a:piecewise_half_gauss(x, a, sigma = 0.6*sigma_fit), x, y, p0=[1])
...
y_fit = piecewise_half_gauss(f_fit, a, sigma = 0.6*sigma_fit)

Gaussian fit for python with confidence interval

I'd like to make a Gaussian Fit for some data that has a rough gaussian fit. I'd like the information of data peak (A), center position (mu), and standard deviation (sigma), along with 95% confidence intervals for these values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import norm
# gaussian function
def gaussian_func(x, A, mu, sigma):
return A * np.exp( - (x - mu)**2 / (2 * sigma**2))
# generate toy data
x = np.arange(50)
y = [ 97.04421053, 96.53052632, 96.85684211, 96.33894737, 96.85052632,
96.30526316, 96.87789474, 96.75157895, 97.05052632, 96.73473684,
96.46736842, 96.23368421, 96.22526316, 96.11789474, 96.41263158,
96.32631579, 96.33684211, 96.44421053, 96.48421053, 96.49894737,
97.30105263, 98.58315789, 100.07368421, 101.43578947, 101.92210526,
102.26736842, 101.80421053, 101.91157895, 102.07368421, 102.02105263,
101.35578947, 99.83578947, 98.28, 96.98315789, 96.61473684,
96.82947368, 97.09263158, 96.82105263, 96.24210526, 95.95578947,
95.84210526, 95.67157895, 95.83157895, 95.37894737, 95.25473684,
95.32842105, 95.45684211, 95.31578947, 95.42526316, 95.30526316]
plt.scatter(x,y)
# initial_guess_of_parameters
# この値はソルバーとかで求めましょう．
parameter_initial = np.array([652, 2.9, 1.3])
# estimate optimal parameter & parameter covariance
popt, pcov = curve_fit(gaussian_func, x, y, p0=parameter_initial)
# plot result
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_func(xd, popt[0], popt[1], popt[2])
plt.plot(xd, estimated_curve, label="Estimated curve", color="r")
plt.legend()
plt.savefig("gaussian_fitting.png")
plt.show()
# estimate standard Error
StdE = np.sqrt(np.diag(pcov))
# estimate 95% confidence interval
alpha=0.025
lwCI = popt + norm.ppf(q=alpha)*StdE
upCI = popt + norm.ppf(q=1-alpha)*StdE
# print result
mat = np.vstack((popt,StdE, lwCI, upCI)).T
df=pd.DataFrame(mat,index=("A", "mu", "sigma"),
columns=("Estimate", "Std. Error", "lwCI", "upCI"))
print(df)
Data Plot with Fitted Curve
The data peak and center position seems correct, but the standard deviation is off. Any input is greatly appreciated.

Your scatter indeed looks similar to a gaussian distribution, but it is not centered around zero. Given the specifics of the Gaussian function it will therefor be hard to nicely fit a Gaussian distribution to the data the way you gave us. I would therefor propose by starting with demeaning the x series:
x = np.arange(0, 50) - 24.5
Next I would add one additional parameter to your gaussian function, the offset. Since the regular Gaussian function will always have its tails close to zero it is impossible to otherwise nicely fit your scatterplot:
def gaussian_function(x, A, mu, sigma, offset):
return A * np.exp(-np.power((x - mu)/sigma, 2.)/2.) + offset
Next you should define an error_loss_function to minimise:
def error_loss_function(params):
gaussian = gaussian_function(x, params[0], params[1], params[2], params[3])
errors = gaussian - y
return sum(np.power(errors, 2)) # You can also pick a different error loss function!
All that remains is fitting our curve now:
fit = scipy.optimize.minimize(fun=error_loss_function, x0=[2, 0, 0.2, 97])
params = fit.x # A: 6.57592661, mu: 1.95248855, sigma: 3.93230503, offset: 96.12570778
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_function(xd, params[0], params[1], params[2], params[3])
plt.plot(xd, estimated_curve, label="Estimated curve", color="b")
plt.legend()
plt.show(block=False)
Hopefully this helps. Looks like a fun project, let me know if my answer is not clear.

Failure of non linear fit to sine curve

I've been trying to fit the amplitude, frequency and phase of a sine curve given some generated two dimensional toy data. (Code at the end)
To get estimates for the three parameters, I first perform an FFT. I use the values from the FFT as initial guesses for the actual frequency and phase and then fit for them (row by row). I wrote my code such that I input which bin of the FFT I want the frequency to be in, so I can check if the fitting is working well. But there's some pretty strange behaviour. If my input bin is say 3.1 (a non integral bin, so the FFT won't give me the right frequency) then the fit works wonderfully. But if the input bin is 3 (so the FFT outputs the exact frequency) then my fit fails, and I'm trying to understand why.
Here's the output when I give the input bins (in the X and Y direction) as 3.0 and 2.1 respectively:
(The plot on the right is data - fit)
Here's the output when I give the input bins as 3.0 and 2.0:
Question: Why does the non linear fit fail when I input the exact frequency of the curve?
Code:
#! /usr/bin/python
# For the purposes of this code, it's easier to think of the X-Y axes as transposed,
# so the X axis is vertical and the Y axis is horizontal
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimize
import itertools
import sys
PI = np.pi
# Function which accepts paramters to define a sin curve
# Used for the non linear fit
def sineFit(t, a, f, p):
return a * np.sin(2.0 * PI * f*t + p)
xSize = 18
ySize = 60
npt = xSize * ySize
# Get frequency bin from user input
xFreq = float(sys.argv[1])
yFreq = float(sys.argv[2])
xPeriod = xSize/xFreq
yPeriod = ySize/yFreq
# arrays should be defined here
# Generate the 2D sine curve
for jj in range (0, xSize):
for ii in range(0, ySize):
sineGen[jj, ii] = np.cos(2.0*PI*(ii/xPeriod + jj/yPeriod))
# Compute 2dim FFT as well as freq bins along each axis
fftData = np.fft.fft2(sineGen)
fftMean = np.mean(fftData)
fftRMS = np.std(fftData)
xFreqArr = np.fft.fftfreq(fftData.shape[1]) # Frequency bins along x
yFreqArr = np.fft.fftfreq(fftData.shape[0]) # Frequency bins along y
# Find peak of FFT, and position of peak
maxVal = np.amax(np.abs(fftData))
maxPos = np.where(np.abs(fftData) == maxVal)
# Iterate through peaks in the FFT
# For this example, number of loops will always be only one
prevPhase = -1000
for col, row in itertools.izip(maxPos[0], maxPos[1]):
# Initial guesses for fit parameters from FFT
init_phase = np.angle(fftData[col,row])
init_amp = 2.0 * maxVal/npt
init_freqY = yFreqArr[col]
init_freqX = xFreqArr[row]
cntr = 0
if prevPhase == -1000:
prevPhase = init_phase
guess = [init_amp, init_freqX, prevPhase]
# Fit each row of the 2D sine curve independently
for rr in sineGen:
(amp, freq, phs), pcov = optimize.curve_fit(sineFit, xDat, rr, guess)
# xDat is an linspace array, containing a list of numbers from 0 to xSize-1
# Subtract fit from original data and plot
fitData = sineFit(xDat, amp, freq, phs)
sub1 = rr - fitData
# Plot
fig1 = plt.figure()
ax1 = fig1.add_subplot(121)
p1, = ax1.plot(rr, 'g')
p2, = ax1.plot(fitData, 'b')
plt.legend([p1,p2], ["data", "fit"])
ax2 = fig1.add_subplot(122)
p3, = ax2.plot(sub1)
plt.legend([p3], ['residual1'])
fig1.tight_layout()
plt.show()
cntr += 1
prevPhase = phs # Update guess for phase of sine curve

I've tried to distill the important parts of your question into this answer.
First of all, try fitting a single block of data, not an array. Once you are confident that your model is sufficient you can move on.
Your fit is only going to be as good as your model, if you move on to something not "sine"-like you'll need to adjust accordingly.
Fitting is an "art", in that the initial conditions can greatly change the convergence of the error function. In addition there may be more than one minima in your fits, so you often have to worry about the uniqueness of your proposed solution.
While you were on the right track with your FFT idea, I think your implementation wasn't quite correct. The code below should be a great toy system. It generates random data of the type f(x) = a0*sin(a1*x+a2). Sometimes a random initial guess will work, sometimes it will fail spectacularly. However, using the FFT guess for the frequency the convergence should always work for this system. An example output:
import numpy as np
import pylab as plt
import scipy.optimize as optimize
# This is your target function
def sineFit(t, (a, f, p)):
return a * np.sin(2.0*np.pi*f*t + p)
# This is our "error" function
def err_func(p0, X, Y, target_function):
err = ((Y - target_function(X, p0))**2).sum()
return err
# Try out different parameters, sometimes the random guess works
# sometimes it fails. The FFT solution should always work for this problem
inital_args = np.random.random(3)
X = np.linspace(0, 10, 1000)
Y = sineFit(X, inital_args)
# Use a random inital guess
inital_guess = np.random.random(3)
# Fit
sol = optimize.fmin(err_func, inital_guess, args=(X,Y,sineFit))
# Plot the fit
Y2 = sineFit(X, sol)
plt.figure(figsize=(15,10))
plt.subplot(211)
plt.title("Random Inital Guess: Final Parameters: %s"%sol)
plt.plot(X,Y)
plt.plot(X,Y2,'r',alpha=.5,lw=10)
# Use an improved "fft" guess for the frequency
# this will be the max in k-space
timestep = X[1]-X[0]
guess_k = np.argmax( np.fft.rfft(Y) )
guess_f = np.fft.fftfreq(X.size, timestep)[guess_k]
inital_guess[1] = guess_f
# Guess the amplitiude by taking the max of the absolute values
inital_guess[0] = np.abs(Y).max()
sol = optimize.fmin(err_func, inital_guess, args=(X,Y,sineFit))
Y2 = sineFit(X, sol)
plt.subplot(212)
plt.title("FFT Guess : Final Parameters: %s"%sol)
plt.plot(X,Y)
plt.plot(X,Y2,'r',alpha=.5,lw=10)
plt.show()

The problem is due to a bad initial guess of the phase, not the frequency. While cycling through the rows of genSine (inner loop) you use the fit result of the previous line as initial guess for the next row which does not work always. If you determine the phase from an fft of the current row and use that as initial guess the fit will succeed.
You could change the inner loop as follows:
for n,rr in enumerate(sineGen):
fftx = np.fft.fft(rr)
fftx = fftx[:len(fftx)/2]
idx = np.argmax(np.abs(fftx))
init_phase = np.angle(fftx[idx])
print fftx[idx], init_phase
...
Also you need to change
def sineFit(t, a, f, p):
return a * np.sin(2.0 * np.pi * f*t + p)
to
def sineFit(t, a, f, p):
return a * np.cos(2.0 * np.pi * f*t + p)
since phase=0 means that the imaginary part of the fft is zero and thus the function is cosine like.
Btw. your sample above is still lacking definitions of sineGen and xDat.

Without understanding much of your code, according to http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html:
(amp2, freq2, phs2), pcov = optimize.curve_fit(sineFit, tDat,
sub1, guess2)
should become:
(amp2, freq2, phs2), pcov = optimize.curve_fit(sineFit, tDat,
sub1, p0=guess2)
Assuming that tDat and sub1 are x and y, that should do the trick. But, once again, it is quite difficult to understand such a complex code with so many interlinked variables and no comments at all. A code should always be build from bottom up, meaning that you don't do a loop of fits when a single one is not working, you don't add noise until the code works to fit the non-noisy examples... Good luck!

By "nothing fancy" I meant something like removing EVERYTHING that is not related with the fit, and doing a simplified mock example such as:
import numpy as np
import scipy.optimize as optimize
def sineFit(t, a, f, p):
return a * np.sin(2.0 * np.pi * f*t + p)
# Create array of x and y with given parameters
x = np.asarray(range(100))
y = sineFit(x, 1, 0.05, 0)
# Give a guess and fit, printing result of the fitted values
guess = [1., 0.05, 0.]
print optimize.curve_fit(sineFit, x, y, guess)[0]
The result of this is exactly the answer:
[1. 0.05 0.]
But if you change guess not too much, just enough:
# Give a guess and fit, printing result of the fitted values
guess = [1., 0.06, 0.]
print optimize.curve_fit(sineFit, x, y, guess)[0]
the result gives absurdly wrong numbers:
[ 0.00823701 0.06391323 -1.20382787]
Can you explain this behavior?

You can use curve_fit with a series of trigonometric functions, usually very robust and ajustable to the precision that you need just by increasing the number of terms... here is an example:
from scipy import sin, cos, linspace
def f(x, a0,s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12):
return a0 + s1*sin(1*x) + c1*cos(1*x) \
+ s2*sin(2*x) + c2*cos(2*x) \
+ s3*sin(3*x) + c3*cos(3*x) \
+ s4*sin(4*x) + c4*cos(4*x) \
+ s5*sin(5*x) + c5*cos(5*x) \
+ s6*sin(6*x) + c6*cos(6*x) \
+ s7*sin(7*x) + c7*cos(7*x) \
+ s8*sin(8*x) + c8*cos(8*x) \
+ s9*sin(9*x) + c9*cos(9*x) \
+ s10*sin(9*x) + c10*cos(9*x) \
+ s11*sin(9*x) + c11*cos(9*x) \
+ s12*sin(9*x) + c12*cos(9*x)
from scipy.optimize import curve_fit
pi/2. / (x.max() - x.min())
x_norm *= norm_factor
popt, pcov = curve_fit(f, x_norm, y)
x_fit = linspace(x_norm.min(), x_norm.max(), 1000)
y_fit = f(x_fit, *popt)
plt.plot( x_fit/x_norm, y_fit )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fitting two peaks with gauss in python - python

Related

Trying to fit a trig function to data with scipy

Fitting Voigt function to data in Python

Scipy optimize curve_fit gives different plots for same parameters when fitting custom function

Gaussian fit for python with confidence interval

Failure of non linear fit to sine curve

Categories

Resources