How do I fit my function with data to get fit parameters? - python

I have a set of data where x and y are the known parameters in my function, they are written in the function as x=x and y=x1, and I need to fit the data so I can get values for the unknown parameters (E, B0, S0).
I have this so far but when I try to run this I get the error:
ValueError: x and y must have same first dimension, but have shapes (4L,) and (1L,)
This error happens when I try to plot the against the fit curve. Also I get this error in regards to the bounds I have setup:
lb, ub = [np.asarray(b, dtype=float) for b in bounds]
ValueError: too many values to unpack
:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func (x, x1, E, B0, S0):
# function to optimize where x and x1 are known
# E, B0, S0 need to be fitted
return sum((x-np.power((E*B0*(1+((x1-S0)/(B0)))),(1/2)))**2)
#define the data to be fit
xdata = [0.00, 3.42, 4.56, 5.31] #distance
ydata = [335.4, 149.1, 167.1, 292.2] # beam size
plt.plot(xdata, ydata, 'b-', label='data')
pl.show()
# fit for parameters E, B0, and S0
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
#put bounds on the optimization: 0.5<E<5, 1<S0<10, 0.1<B0,10
bnds= [(0.5,5.0),(0.1,10.0),(1,10)]
popt, pcov = curve_fit(func, xdata, ydata, bounds = [(0.5,5.0),(0.1,10.0),
(1.0,10.0)])
plt.plot(xdata,func(xdata, *popt),'g--', label='fit-with-bounds')
plt.xlabel('distance')
plt.ylabel('beam size')
plt.legend()
plt.show()

It's not clear what the sum in the func function is supposed to do. You may leave it out to get rid of the first error.
Second, the bounds in the curve_fit method are the bounds for the independent variable, not for the parameters. Leave the bounds out and you'll get rid of the second error.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func (x, x1, E, B0, S0):
# function to optimize where x and x1 are known
# E, B0, S0 need to be fitted
return (x-np.power((E*B0*(1.+((x1-S0)/(B0)))),(1/2.)))**2
#define the data to be fit
xdata = [0.00, 3.42, 4.56, 5.31] #distance
ydata = [335.4, 149.1, 167.1, 292.2] # beam size
plt.plot(xdata, ydata, 'b-', label='data')
# fit for parameters E, B0, and S0
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata,func(xdata, *popt),'g--', label='fit-with-bounds')
plt.xlabel('distance')
plt.ylabel('beam size')
plt.legend()
plt.show()
Now obviously "fit" and "fit-with-bounds" are the same.
Edit: To fit for E, B0, S0 only, the fit function should only take those values as arguments.
funcwithx1 = lambda x,x1, E, B0, S0: (x-np.power((E*B0*(1.+((x1-S0)/(B0)))),(1/2.)))**2
x1 = 4.6
func = lambda x, E, B0, S0: funcwithx1(x, x1, E, B0, S0)

The function is wrongly defined. You know the independent and dependent variables, but you only supply the independent one to the fitted function.
y = func(x; params)
as it stands now, your objective function has 4 parameters to be determined.
Later on, when invoking the curve_fit you supply both, the independent and dependent variables as you correctly do in
popt, pcov = curve_fit(func, xdata, ydata)
Thus, popt is an array of length 4 and probably causing you part of your problems.
I don't know exactly your objective function, so I'll not attempt to fix that. Hope this guides you to solve the issue.

Related

Trying to fit the Gaussian function to data

I'm trying to fit some data that is approximately Gaussian to the function in python using the curve_fit method. For the initial guesses for the parameters I've calculated the mean and standard deviation of the data. However, I'm getting a really bad fit and I'm not sure why. This is my code:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import math
def func(x, mu, sig):
return (1./(sig*math.sqrt(2*math.pi)))*np.exp(-np.power(x-mu,2.)/(2*np.power(sig, 2.)));
xdata = np.linspace(4, 14, 21)
ydata = np.array([0.2,0.8,1.8,1.9,5.9,7,11,12.6,14,13.3,11.8,9.3,5.2,3.1,1.5,0.7,0.4,0.1,0.3,0.1,0.1])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata,[4.8,5.1])
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
The fitted model (in red) looks like this:
enter image description here
The reason you're getting a nonsense answer is because you're trying to fit the Gaussian function of a normalised curve. See the difference here.
Instead, if you use the generalised form with 3 parameters, you get a much better fit.
def generalised_gaussian(x, a, b, c):
return a*np.exp(-np.power((x-b)/(2*c**2), 2))
xdata = np.linspace(4, 14, 21)
ydata = np.array([0.2,0.8,1.8,1.9,5.9,7,11,12.6,14,13.3,11.8,9.3,5.2,3.1,1.5,0.7,0.4,0.1,0.3,0.1,0.1])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(generalised_gaussian, xdata, ydata)
plt.plot(xdata, generalised_gaussian(xdata, *popt), 'r-', label='fit')

ValueError: Unable to determine number of fit parameters. "Problem in curve fitting"

I am new to python, so my knowledge is inadequate.
I have a datafile named "tlove_cc_seq2_k2_NL3.dat". I want to fit a curve to the data.
The code I am using is as follows::
...
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import math
import pandas as pd
import lmfit
from lmfit import Model
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from array import *
def test(x, a, b, c):
return (a + b*math.log(x) + c*math.log(x)**2)
func = np.vectorize(test)
data_k2_2fl_NL3=np.loadtxt('tlove_cc_seq2_k2_NL3.dat')
plt.plot(data_k2_2fl_NL3[:,8], data_k2_2fl_NL3[:,5], 'b-', label='data')
popt, pcov = curve_fit(func, data_k2_2fl_NL3[:,8], data_k2_2fl_NL3[:,5])
popt
plt.plot(data_k2_2fl_NL3[:,8], func(data_k2_2fl_NL3[:,8], *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
popt, pcov = curve_fit(func, data_k2_2fl_NL3[:,8], data_k2_2fl_NL3[:,5], bounds=(-20,
[30., 30., 20.5]))
popt
plt.plot(data_k2_2fl_NL3[:,8], func(data_k2_2fl_NL3[:,8], *popt), 'g--',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
...
The error I am getting is as follows::
...
ValueError Traceback (most recent call last)
in
13 #y = data[:, 1]
14 plt.plot(data_k2_2fl_NL3[:,8], data_k2_2fl_NL3[:,5], 'b-', label='data')
---> 15 popt, pcov = curve_fit(func, data_k2_2fl_NL3[:,8], data_k2_2fl_NL3[:,5])
16 popt
17
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/minpack.py in curve_fit(f, xdata,
ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
678 args, varargs, varkw, defaults = _getargspec(f)
679 if len(args) < 2:
--> 680 raise ValueError("Unable to determine number of fit parameters.")
681 n = len(args) - 1
682 else:
ValueError: Unable to determine number of fit parameters.
How can I resolve this?
Thank you.
I think the problem is that the curve_fit function cannot determine the number of parameters by introspection because the function you are asking it to fit (test) is wrapped in the np.vectorize function.
I tried a minimal example where I used the test function un-vectorized and it worked:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def test(x, a, b, c):
return (a + b*np.log(x) + c*np.log(x)**2)
func = np.vectorize(test)
#Create some dummy data
x_data = list(range(1, 11))
y_data = np.log(x_data) + np.log(x_data)**2 + np.random.random(10)
plt.plot(x_data, y_data, 'b-', label='data')
popt, pcov = curve_fit(test, x_data, y_data)
popt
If you need vectorize for performance reasons you can also pass a parameter p0 an array of initial parameters. e.g:
popt, pcov = curve_fit(func, x_data, y_data, p0=[1,1,1])
It seems that most of the issues you had were with using numpy vs math. For completeness, and since you mentioned lmfit, to do this with lmfit, you could use
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def test(x, a, b, c):
return (a + b*np.log(x) + c*np.log(x)**2)
# create model from your model function
mymodel = Model(test)
# create initial set of named parameters from argument of your function
params = mymodel.make_params(a=0.5, b=1.1, c=0.5)
# Create some dummy data
x_data = np.linspace(1, 10, 10)
y_data = np.log(x_data) + np.log(x_data)**2 + np.random.random(len(x_data))
# run fit, get result
result = mymodel.fit(y_data, params, x=x_data)
# print out full fit report: fit statistics, best-fit values, uncertainties
print(result.fit_report())
# make a stacked plot of residual and data + fit
result.plot()
plt.show()
Note that curve_fit() will happily accept uninitialized parameters, assigning the impossible-to-justify default value of 1 for all parameters. Lmfit does not allow this and forces you to explicitly set initial values. But it also better reports fit statistics, uncertainties, and allows for composition of more complpex models.
For your example, the fit report will read
[[Model]]
Model(test)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 8
# data points = 10
# variables = 3
chi-square = 0.91573485
reduced chi-square = 0.13081926
Akaike info crit = -17.9061352
Bayesian info crit = -16.9983799
[[Variables]]
a: 0.69752193 +/- 0.34404583 (49.32%) (init = 0.5)
b: 1.17700278 +/- 0.59765274 (50.78%) (init = 1.1)
c: 0.85298657 +/- 0.23838141 (27.95%) (init = 0.5)
[[Correlations]] (unreported correlations are < 0.100)
C(b, c) = -0.961
C(a, b) = -0.782
C(a, c) = 0.607
and a plot of

Scipy sigmoid curve fitting

I have some data points and would like to find a fitting function, I guess a cumulative Gaussian sigmoid function would fit, but I don't really know how to realize that.
This is what I have right now:
import numpy as np
import pylab
from scipy.optimize import curve_fit
def sigmoid(x, a, b):
y = 1 / (1 + np.exp(-b*(x-a)))
return y
xdata = np.array([400, 600, 800, 1000, 1200, 1400, 1600])
ydata = np.array([0, 0, 0.13, 0.35, 0.75, 0.89, 0.91])
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print(popt)
x = np.linspace(-1, 2000, 50)
y = sigmoid(x, *popt)
pylab.plot(xdata, ydata, 'o', label='data')
pylab.plot(x,y, label='fit')
pylab.ylim(0, 1.05)
pylab.legend(loc='best')
pylab.show()
But I get the following warning:
.../scipy/optimize/minpack.py:779: OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
Can anyone help?
I'm also open for any other possibilities to do it! I just need a curve fit in any way to this data.
You could set some reasonable bounds for parameters, for example, doing
def fsigmoid(x, a, b):
return 1.0 / (1.0 + np.exp(-a*(x-b)))
popt, pcov = curve_fit(fsigmoid, xdata, ydata, method='dogbox', bounds=([0., 600.],[0.01, 1200.]))
I've got output
[7.27380294e-03 1.07431197e+03]
and curve looks like
First point at (400,0) was removed as useless. You could add it, though result won't change much...
UPDATE
Note, that bounds are set as ([low_a,low_b],[high_a,high_b]), so I asked for scale to be within [0...0.01] and location to be within [600...1200]
You may have noticed the resulting fit is completely incorrect.
Try passing some decent initial parameters to curve_fit, with the p0 argument:
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0=[1000, 0.001])
should give a much better fit, and probably no warning either.
(The default starting parameters are [1, 1]; that is too far from the actual parameters to obtain a good fit.)

Gaussian Fit on noisy and 'interesting' data set

I have some data (X-ray diffraction) that looks like this:
I want to fit a Gaussian to this data set to get the FWHM of the 'wider' portion. The double peak at around 7 degrees theta is not important information and coming from unwanted sources.
To make myself more clear I want something like this (which I made in paint :) ):
I have tried to script something in python with the following code:
import math
from pylab import *
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data2=np.loadtxt('FWHM.spc')
x2,y2=data2[:,0],data2[:,7]
plt.title('Full Width Half Max of 002 Peak')
plt.plot(x2, y2, color='b')
plt.xlabel('$\\theta$', fontsize=10)
plt.ylabel('Intensity', fontsize=10)
plt.xlim([3,11])
plt.xticks(np.arange(3, 12, 1), fontsize=10)
plt.yticks(fontsize=10)
def func(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
mean = sum(x2*y2)/sum(y2)
sigma2 = sqrt(abs(sum((x2-mean)**2*y2)/sum(y2)))
popt, pcov = curve_fit(func, x2, y2, p0 = [1, mean, sigma2])
ym = func(x2, popt[0], popt[1], popt[2])
plt.plot(x2, ym, c='r', label='Best fit')
FWHM = round(2*np.sqrt(2*np.log(2))*popt[2],4)
axvspan(popt[1]-FWHM/2, popt[1]+FWHM/2, facecolor='g', alpha=0.3, label='FWHM = %s'%(FWHM))
plt.legend(fontsize=10)
plt.show()
and I get the following output:
Clearly, this is way off from what is desired. Does anyone have some tips how I can acheive this?
(I have enclosed the data here: https://justpaste.it/109qp)
As mentioned in the OP comments, one of the ways to constrain a signal in presence of unwanted data is to model it together with the desired signal. Of course, this approach is valid only when there is a valid model readily available for those contaminating data. For the data that you provide, one can consider a composite model that sums over the following components:
A linear baseline, because all your points are constantly offset from zero.
Two narrow Gaussian components that will model the double-peaked feature at the central part of your spectrum.
A narrow Gaussian component. This is the one you're actually trying to constrain.
All four components (double peak counts twice) can be fit simultaneusly once you pass a reasonable starting guess to curve_fit:
def composite_spectrum(x, # data
a, b, # linear baseline
a1, x01, sigma1, # 1st line
a2, x02, sigma2, # 2nd line
a3, x03, sigma3 ): # 3rd line
return (x*a + b + func(x, a1, x01, sigma1)
+ func(x, a2, x02, sigma2)
+ func(x, a3, x03, sigma3))
guess = [1, 200, 1000, 7, 0.05, 1000, 6.85, 0.05, 400, 7, 0.6]
popt, pcov = curve_fit(composite_spectrum, x2, y2, p0 = guess)
plt.plot(x2, composite_spectrum(x2, *popt), 'k', label='Total fit')
plt.plot(x2, func(x2, *popt[-3:])+x2*popt[0]+popt[1], c='r', label='Broad component')
FWHM = round(2*np.sqrt(2*np.log(2))*popt[10],4)
plt.axvspan(popt[9]-FWHM/2, popt[9]+FWHM/2, facecolor='g', alpha=0.3, label='FWHM = %s'%(FWHM))
plt.legend(fontsize=10)
plt.show()
In a case when the unwanted sources can not be modeled properly, the unwanted discontinuity can be masked out as suggested by Mad Physicist. For a simplest case you can even simply mask out eveything inside [6.5; 7.4] interval.

scipy.optimize.curve_fit failing to fit curve

I am trying to fit some data using the following code:
xdata = [0.03447378, 0.06894757, 0.10342136, 0.13789514, 0.17236893,
0.20684271, 0.24131649, 0.27579028, 0.31026407, 0.34473785,
0.37921163, 0.41368542, 0.44815921, 0.48263299]
ydata = [ 2.5844 , 2.87449, 3.01929, 3.10584, 3.18305, 3.24166,
3.28897, 3.32979, 3.35957, 3.39193, 3.41662, 3.43956,
3.45644, 3.47135]
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d):
return a + b*x - c*np.exp(-d*x)
popt, pcov = curve_fit(func, xdata, ydata))
plt.figure()
plt.plot(xdata, ydata, 'ko', label="Original Noised Data")
plt.plot(xdata, func(xdata, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
The curve is not being fitted:
Data fit with straight line - should be curve
What should I be doing to correctly fit the data?
It looks like the optimizer is getting stuck in a local minimum, or perhaps just a very flat area of the objective function. A better fit can be found by tweaking the initial guess of the parameters that is used by curve_fit. For example, I get a reasonable-looking fit with p0=[1, 1, 1, 2.0] (the default is [1, 1, 1, 1]):
Here's the modified version of your script that I used:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d):
return a + b*x - c*np.exp(-d*x)
xdata = np.array([0.03447378, 0.06894757, 0.10342136, 0.13789514, 0.17236893,
0.20684271, 0.24131649, 0.27579028, 0.31026407, 0.34473785,
0.37921163, 0.41368542, 0.44815921, 0.48263299])
ydata = np.array([ 2.5844 , 2.87449, 3.01929, 3.10584, 3.18305, 3.24166,
3.28897, 3.32979, 3.35957, 3.39193, 3.41662, 3.43956,
3.45644, 3.47135])
p0 = [1, 1, 1, 2.0]
popt, pcov = curve_fit(func, xdata, ydata, p0=p0)
print(popt)
plt.figure()
plt.plot(xdata, ydata, 'ko', label="Original Noised Data")
plt.plot(xdata, func(xdata, *popt), 'r-', label="Fitted Curve")
plt.legend(loc='best')
plt.show()
The printed output is:
[ 3.13903988 0.71827903 0.97047248 15.40936232]
Please try to be more specific with the issue you're having.
Two things I noticed that will prevent your code from working as it is:
line 15 (the curve_fit() call), there is an additional right paranthesis at the end of the line
xdata is a python list, so this won't work once you try to multiply it with a parameter in func, i.e. turn it into a numpy array with
xdata = np.array(xdata)
If you fix these two issues, the fit should work.
Edit: Warren is of course right - fixing the above issues still will get you started in a wrong minimum.

Categories

Resources