fit multiple gaussians to the data in python

fit multiple gaussians to the data in python - python

I am just wondering if there is a easy way to implement gaussian/lorentzian fits to 10 peaks and extract fwhm and also to determine the position of fwhm on the x-values. The complicated way is to separate the peaks and fit the data and extract fwhm.
Data is [https://drive.google.com/file/d/0B6sUnnbyNGuOT2RZb2UwYXU4dlE/view?usp=sharing].
Any advise greatly appreciated. Thanks.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.txt', delimiter=',')
x, y = data
plt.plot(x,y)
plt.show()
def func(x, *params):
y = np.zeros_like(x)
print len(params)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
popt, pcov = curve_fit(func, x, y, p0=guess)
print popt
fit = func(x, *popt)
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()
Traceback (most recent call last):
File "C:\Users\test.py", line 33, in <module>
popt, pcov = curve_fit(func, x, y, p0=guess)
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 533, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 368, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 19, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 444, in _ general_function
return function(xdata, *params) - ydata
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

This requires a non-linear fit. A good tool for this is scipy's curve_fit function.
To use curve_fit, we need a model function, call it func, that takes x and our (guessed) parameters as arguments and returns the corresponding values for y. As our model, we use a sum of gaussians:
from scipy.optimize import curve_fit
import numpy as np
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
Now, let's create an initial guess for our parameters. This guess starts with peaks at x=0 and x=1,000 with amplitude 60,000 and e-folding widths of 80. Then, we add candidate peaks at x=60, 140, 220, ... with amplitude 46,000 and width of 25:
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
Now, we are ready to perform the fit:
popt, pcov = curve_fit(func, x, y, p0=guess)
fit = func(x, *popt)
To see how well we did, let's plot the actual y values (solid black curve) and the fit (dashed red curve) against x:
As you can see, the fit is fairly good.
Complete working code
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.txt', delimiter=',')
x, y = data
plt.plot(x,y)
plt.show()
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
popt, pcov = curve_fit(func, x, y, p0=guess)
print popt
fit = func(x, *popt)
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()

#john1024's answer is good, but requires a manual process to generate the initial guess. here's an easy way to automate the starting guess. replace the relevant 3 lines of john1024's code by the following:
import scipy.signal
i_pk = scipy.signal.find_peaks_cwt(y, widths=range(3,len(x)//Npks))
DX = (np.max(x)-np.min(x))/float(Npks) # starting guess for component width
guess = np.ravel([[x[i], y[i], DX] for i in i_pk]) # starting guess for (x, amp, width) for each component

IMHO it is always advisable to plot the residual (data - model) in problems such as this. You will also want to the look at the ChiSq of the fit.

Related

Runtime error fitting a binary 2d function using python

I try to fit a function to extract parameters from a binary 2d grating in python.
Here is my code, which runs but does not deliver a proper output:
import numpy as np
import pylab as plt
from scipy.optimize import curve_fit
def grid(X, Y, P, FS):
"""
function to calculate Z(X, Y) of a binary grating with
period P and feature size FS
input:
X, Y (np.array) from numpy meshgrid, the domain of the function
P(float, int): period of the grating
FS(float, int): size of the grating features
output:
Z(np.array): binary heightprofile of the grating conainting 0 and 1
same shape as X and Y
"""
Z = np.ones_like(X)
Z[X%P>FS] = 0
Z[Y%P>FS] = 0
return Z
# domain of the example
x = np.arange(0, 500)
y = np.arange(0, 500)
X, Y = np.meshgrid(x, y)
# plot of the example grating
Z = grid(X, Y, 93, 42)
plt.contourf(X, Y, Z)
plt.show()
None
# here starts the fit
# np.ravel is used in combination with scipy.optimize.curve_fit like in every example I found online
# goal: find the values of P and FS used to generate Z
xdata = np.vstack((X.ravel(), Y.ravel()))
ydata = Z.ravel()
def _grid(xdata, P, FS):
"""
helper function to call grid(X, Y, P, FS) with the flattend input used
for the curve_fit
returns the result of Z in same flatted manner
"""
# unpack x, y and generate the meshgrid
x, y = xdata
x = np.unique(x)
y = np.unique(y)
X, Y = np.meshgrid(x, y)
# call the original function and return the flattend result
res = grid(X, Y, P, FS)
return res.ravel()
# try to fit the parameters
popt, pcov = curve_fit(_grid, xdata, ydata, p0=[90, 40])
print (popt)
print (pcov)
Does someone else maybe spot the problem? Or is there a better way or programming languge to do this simple fit?

3D- Gaussian Process Regression

I am very new to Gaussian processes and python as well.
I am trying to produce a very simple Gaussian regression for a 3d model.
I have a very simple Python code for a function:
import numpy as np
def exponential_cov(x, y, params):
return params[0] * np.exp( -0.5 * params[1] * np.subtract.outer(x, y)**2)
def conditional(x_new, x, y, params):
B = exponential_cov(x_new, x, params)
C = exponential_cov(x, x, params)
A = exponential_cov(x_new, x_new, params)
mu = np.linalg.inv(C).dot(B.T).T.dot(y)
sigma = A - B.dot(np.linalg.inv(C).dot(B.T))
return(mu.squeeze(), sigma.squeeze())
import matplotlib.pylab as plt
# GP PRIOR
tu = [1, 10]
Si_tu = exponential_cov(0, 0, tu)
xpts = np.arange(-5, 5, step=0.01)
plt.errorbar(xpts, np.zeros(len(xpts)), yerr=Si_tu, capsize=0, color='#95daed', alpha=0.5, label='error') #error
plt.plot(xpts, np.zeros(len(xpts)), linestyle='dashed', color='#3105b2', linewidth=2.5, label='mu'); #mu
# GP FOR 1ST POINT
x = [1.]
y = np.sin(x)+np.cos(np.sqrt(15)*x)
Si_1 = exponential_cov(x, x, tu)
def predict(x, data, kernel, params, sigma, t):
k = [kernel(x, y, params) for y in data]
Sinv = np.linalg.inv(sigma)
y_pred = np.dot(k, Sinv).dot(t)
sigma_new = kernel(x, x, params) - np.dot(k, Sinv).dot(k)
return y_pred, sigma_new
x_pred = np.linspace(-5, 5, 1000) #change step here!!
print "x_pred="
print(x_pred)
predictions = [predict(i, x, exponential_cov, tu, Si_1, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
print "y_pred ="
print(y_pred )
print "sigmas ="
print(sigmas )
# GP FOR 2ND POINT
m, s = conditional([-1], x, y, tu)
y2 = np.sin(-1)+np.cos(np.sqrt(15)*(-1))
x.append(-1)
y=np.append(y,y2)
Si_2 = exponential_cov(x, x, tu)
predictions = [predict(i, x, exponential_cov, tu, Si_2, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
print "y_pred ="
print(y_pred )
print "sigmas ="
print(sigmas )
By using this code I get very nice fitting results for the function np.sin(x) + np.cos(np.sqrt(15) * x), but what I really want to do is to try the same Gaussian process for the function Z = np.sin(2*X) * np.cos(2*Y) / 2.
I know that the idea is basically the same, but I cannot adapt my python code to the [x,y] input to obtain z.
I will really appreciate your help, hints or links!

In the previous, the input of your function is 1-D, and then the new function is 2-D. So you have to change the covariance function, for example, use ard-based kernel, please refer to cook book for kernel. Also, you can do the isotropic kernel for 2-D, just make sure the suitable distance function (e.g. L2-norm) and the single lengthscale you choose.

Fitting points to a wrapped line

I have a set of points that is wrapped between -360 and 360 degrees. I am currently trying to fit a line through them without unwrapping the dataset. Is there a way to either alter scikit's LinearRegression model? Otherwise what's the best way of writing a line fitting algorithm that would account for the wrap in the data's model?

At interesting noise levels maybe brute force cannot be avoided.
Here are the squared errors (using wrap-around distance) as a function of the slope (best intercept is chosen at each point) for three models with noise levels 90, 180, 180 and 64, 96, 128 data points (cf. script below).
I'm not sure there is a smart way of reliably finding the global minima of those.
OTOH, brute force works reasonably well even in cases that look rather difficult, like the bottom one. Dashed line is true model without noise, dots are actual data generated by adding noise to true model, solid line is reconstruction.
Code:
import numpy as np
import scipy.optimize as so
from operator import attrgetter
from matplotlib import pylab
def setup(interc, slope, sigma, N):
x = np.random.uniform(0.1, 2.0, (N,)).cumsum()
y = (interc + x*slope + np.random.normal(0, sigma, (N,)) + 360) % 720 - 360
return x, y
def err_model_full(params, x, y):
interc, slope = params
err = (interc + x*slope - y + 360) % 720 - 360
return np.dot(err, err)
def err_model(interc, slope, x, y):
err = (interc + x*slope - y + 360) % 720 - 360
return np.dot(err, err)
for i, (interc, slope, sigma, N) in enumerate([(100, -12, 90, 64),
(-30, 20, 180, 96),
(66, -49, 180, 128)]):
# create problem
x, y = setup(interc, slope, sigma, N)
# brute force through slopes
slps = np.linspace(-128, 128, 257)
ics, err = zip(*map(attrgetter('x', 'fun'), (so.minimize(err_model, (0,), args = (sl, x, y)) for sl in slps)))
best = np.argmin(err)
# polish
res = so.minimize(err_model_full, (ics[best], slps[best]), args = (x, y))
# plot
pylab.figure(1)
pylab.subplot(3, 1, i+1)
pylab.plot(slps, err)
pylab.figure(2)
pylab.subplot(3, 1, i+1)
pylab.plot(x, y, 'o')
ic_rec, sl_rec = res.x
pylab.plot(x, (ic_rec + x*sl_rec + 360) % 720 - 360)
pylab.plot(x, (interc + x*slope + 360) % 720 - 360, '--')
print('true (intercept, slope)', (interc, slope), 'reconstructed',
tuple(res.x))
print('noise level', sigma)
print('squared error for true params', err_model_full((interc, slope), x, y))
print('squared error for reconstructed params', err_model_full(res.x, x, y))
pylab.figure(1)
pylab.savefig('bf.png')
pylab.figure(2)
pylab.savefig('recon.png')

This is quite an interesting problem, because you've only got one feature as input that contains no information about the wrapping. The simplest way that comes to mind is just to use a nearest neighbours approach
from sklearn.neighbors import KNeighborsRegressor
import numpy as np
####################
# Create some data
n_points = 100
X = np.linspace(0, 1, n_points) - 0.3
y = (X*720*2 % 720) - 360
y = y + np.random.normal(0, 15, n_points)
X = X.reshape(-1, 1)
#######################
knn = KNeighborsRegressor()
knn.fit(X, y)
lspace = np.linspace(0, 1, 1000) - 0.3
lspace = lspace.reshape(-1, 1)
plt.scatter(X, y)
plt.plot(lspace, svr.predict(lspace), color='C1')
However if you need it to be piecewise linear then I suggest you look at this blog post

How to fit an int list to a desired function

I have an int list x, like [43, 43, 46, ....., 487, 496, 502](just for example)
x is a list of word count, I want change a list of word count to a list penalty score when training a text classification model.
I'd like use a curve function(maybe like math.log?) use to map value from x to y, and I need the min value in x(43) mapping to y(0.8), and the max value in x(502) to y(0.08), the other values in x map to a y follow the function.
For example:
x = [43, 43, 46, ....., 487, 496, 502]
y_bounds = [0.8, 0.08]
def creat_curve_func(x, y_bounds, curve_shape='log'):
...
func = creat_curve_func(x, y)
assert func(43) == 0.8
assert func(502) == 0.08
func(46)
>>> 0.78652 (just a fake result for example)
func(479)
>>> 0.097 (just a fake result for example)
I quickly found that I have to try some parameter by my self to get a curve function fit my purpose, try again and again.
Then I try to find a lib to do such work, scipy.optimize.curve_fit turns out. But it need three parameter at least: f(the function I want to generate), xdata, ydata(I only have y bounds:0.8, 0.08), only xdata I have.
Is there any good sulotion?
update
I think this is easy unserstood so didn't write the fail code of curve_fit.Is this the reason of down vote?
The reason that why I can't just use curve_fit
x = sorted([43, 43, 46, ....., 487, 496, 502])
y = np.linspace(0.8, 0.08, len(x)) # can not set y as this way which lead to the wrong result
def func(x, a, b):
return a * x +b # I want a curve function in fact, linear is simple to understand here
popt, pcov = curve_fit(func, x, y)
func(42, *popt)
0.47056348146450089 # I want 0.8 here

How about this way?
EDIT: added weights. If you don't need to put your end points exactly on the curve you could use weights:
import scipy.optimize as opti
import numpy as np
xdata = np.array([43, 56, 234, 502], float)
ydata = np.linspace(0.8, 0.08, len(xdata))
weights = np.ones_like(xdata, float)
weights[0] = 0.001
weights[-1] = 0.001
def fun(x, a, b, z):
return np.log(z/x + a) + b
popt, pcov = opti.curve_fit(fun, xdata, ydata, sigma=weights)
print fun(xdata, *popt)
>>> [ 0.79999994 ... 0.08000009]
EDIT:
You can also play with these parameters, of course:
import scipy.optimize as opti
import numpy as np
xdata = np.array([43, 56, 234, 502], float)
xdata = np.round(np.sort(np.random.rand(100) * (502-43) + 43))
ydata = np.linspace(0.8, 0.08, len(xdata))
weights = np.ones_like(xdata, float)
weights[0] = 0.00001
weights[-1] = 0.00001
def fun(x, a, b, z):
return np.log(z/x + a) + b
popt, pcov = opti.curve_fit(fun, xdata, ydata, sigma=weights)
print fun(xdata, *popt)
>>>[ 0.8 ... 0.08 ]

Fitting a 2D Gaussian function using scipy.optimize.curve_fit - ValueError and minpack.error

I intend to fit a 2D Gaussian function to images showing a laser beam to get its parameters like FWHM and position. So far I tried to understand how to define a 2D Gaussian function in Python and how to pass x and y variables to it.
I've written a little script which defines that function, plots it, adds some noise to it and then tries to fit it using curve_fit. Everything seems to work except the last step in which I try to fit my model function to the noisy data. Here is my code:
import scipy.optimize as opt
import numpy as np
import pylab as plt
#define model function and pass independant variables x and y as a list
def twoD_Gaussian((x,y), amplitude, xo, yo, sigma_x, sigma_y, theta, offset):
xo = float(xo)
yo = float(yo)
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
return offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo) + c*((y-yo)**2)))
# Create x and y indices
x = np.linspace(0, 200, 201)
y = np.linspace(0, 200, 201)
x,y = np.meshgrid(x, y)
#create data
data = twoD_Gaussian((x, y), 3, 100, 100, 20, 40, 0, 10)
# plot twoD_Gaussian data generated above
plt.figure()
plt.imshow(data)
plt.colorbar()
# add some noise to the data and try to fit the data generated beforehand
initial_guess = (3,100,100,20,40,0,10)
data_noisy = data + 0.2*np.random.normal(size=len(x))
popt, pcov = opt.curve_fit(twoD_Gaussian, (x,y), data_noisy, p0 = initial_guess)
Here is the error message I get when running the script using winpython 64-bit Python 2.7:
ValueError: object too deep for desired array
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "E:/Work Computer/Software/Python/Fitting scripts/2D Gaussian function fit/2D_Gaussian_LevMarq_v2.py", line 39, in <module>
popt, pcov = opt.curve_fit(twoD_Gaussian, (x,y), data_noisy, p0 = initial_guess)
File "C:\Python\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\scipy\optimize\minpack.py", line 533, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "C:\Python\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\scipy\optimize\minpack.py", line 378, in leastsq
gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.
What is it that am I doing wrong? Is it how I pass the independent variables to the model function/curve_fit?

The output of twoD_Gaussian needs to be 1D. What you can do is add a .ravel() onto the end of the last line, like this:
def twoD_Gaussian(xy, amplitude, xo, yo, sigma_x, sigma_y, theta, offset):
x, y = xy
xo = float(xo)
yo = float(yo)
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo)
+ c*((y-yo)**2)))
return g.ravel()
You'll obviously need to reshape the output for plotting, e.g:
# Create x and y indices
x = np.linspace(0, 200, 201)
y = np.linspace(0, 200, 201)
x, y = np.meshgrid(x, y)
#create data
data = twoD_Gaussian((x, y), 3, 100, 100, 20, 40, 0, 10)
# plot twoD_Gaussian data generated above
plt.figure()
plt.imshow(data.reshape(201, 201))
plt.colorbar()
Do the fitting as before:
# add some noise to the data and try to fit the data generated beforehand
initial_guess = (3,100,100,20,40,0,10)
data_noisy = data + 0.2*np.random.normal(size=data.shape)
popt, pcov = opt.curve_fit(twoD_Gaussian, (x, y), data_noisy, p0=initial_guess)
And plot the results:
data_fitted = twoD_Gaussian((x, y), *popt)
fig, ax = plt.subplots(1, 1)
#ax.hold(True) For older versions. This has now been deprecated and later removed
ax.imshow(data_noisy.reshape(201, 201), cmap=plt.cm.jet, origin='lower',
extent=(x.min(), x.max(), y.min(), y.max()))
ax.contour(x, y, data_fitted.reshape(201, 201), 8, colors='w')
plt.show()

To expand on Dietrich's answer a bit, I got the following error when running the suggested solution with Python 3.4 (on Ubuntu 14.04):
def twoD_Gaussian((x, y), amplitude, xo, yo, sigma_x, sigma_y, theta, offset):
^
SyntaxError: invalid syntax
Running 2to3 suggested the following simple fix:
def twoD_Gaussian(xdata_tuple, amplitude, xo, yo, sigma_x, sigma_y, theta, offset):
(x, y) = xdata_tuple
xo = float(xo)
yo = float(yo)
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo)
+ c*((y-yo)**2)))
return g.ravel()
The reason for this is that automatic tuple unpacking when it is passed to a function as a parameter has been removed as of Python 3. For more information see here: PEP 3113

curve_fit() wants to the dimension of xdata to be (2,n*m) and not (2,n,m). ydata should have shape (n*m) not (n,m) respectively. So you use ravel() to flatten your 2D arrays:
xdata = np.vstack((xx.ravel(),yy.ravel()))
ydata = data_noisy.ravel()
popt, pcov = opt.curve_fit(twoD_Gaussian, xdata, ydata, p0=initial_guess)
By the way: I'm not sure if the parametrization with the trigonometric terms is the best one. E.g., taking the one described here might be a bit more robust under numerical aspects and large deviations.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

fit multiple gaussians to the data in python - python

IMHO it is always advisable to plot the residual (data - model) in problems such as this. You will also want to the look at the ChiSq of the fit.

Related

Runtime error fitting a binary 2d function using python

3D- Gaussian Process Regression

Fitting points to a wrapped line

How to fit an int list to a desired function

Fitting a 2D Gaussian function using scipy.optimize.curve_fit - ValueError and minpack.error

Categories

Resources