I am doing some curve fitting in python with the aid of scipy.optimize curve_fit. Normally I am satisfied with the scipy's default results.
However this time I would like to display the function with chi_squared as the uncertainty of my fit parameters and I don't know how to deal with this.
I have tried to use the parameter absolute_sigma=True instead of the default absolute_sigma=False. However according to my separate calculations, for absolute_sigma=False the uncertainty of the parameters, is equal to reduced_chi_squared. But it isn't chi_squared for absolute_sigma=True.
Within the documentation itself: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html chi_squared is mentioned few time, however it's not written explicitly how to display it within the plot and how to use it as the uncertainties for of the fit parameters.
My code is as follows:
# Necessary libraries
# Numpy will be used for the actual function (in this example it's not necessary)
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from uncertainties import *
#Some shortened data
Wave_lengths_500_est = [380, 447.1, 486, 492.2, 656, 700, 706.5]
Avraged_deflection_angles_500 = [11.965, 12.8, 13.93, 14.325, 19.37, 18.26, 18.335]
#The function to which the data are to be fitted
def lin_function(x,a,b):
return a*x + b
def line_fit_lamp_1_500():
#Niepewność kątowa na poziomie 2 minut kątowych. Wyrażona w stopniach
angle_error = 0.03333
#Dodanie punktów danych na wykresie
plt.scatter(Wave_lengths_500_est, Avraged_deflection_angles_500, color='b')
plt.errorbar(Wave_lengths_500_est, Avraged_deflection_angles_500, yerr = angle_error, fmt = 'o')
#Fitting of the function "function_lamp1_500" to the data points
Wave_length = np.array(Wave_lengths_500_est)
Defletion_angle = np.array(Avraged_deflection_angles_500)
popt, pcov = curve_fit(lin_function, Wave_length, Defletion_angle, absolute_sigma=True)
perr = np.sqrt(np.diag(pcov))
y = lin_function(Wave_length, *popt)
# Graph looks
plt.plot(Wave_length, y, '--', color = 'g', label="fit with: $a={:.3f}\pm{:.5f}$, $b={:.3f}\pm{:.5f}$" .format(popt[0], perr[0], popt[1], perr[1]))
plt.legend()
plt.xlabel('Długość fali [nm]')
plt.ylabel('Kąt załamania [Stopnie]')
plt.show()
# Function call
line_fit_lamp_1_500()
Toggling between absolute_sigma=True/False I get a change of uncertainty of the parameter a\pm0.00308 and b\pm1.74571 for absolute_sigma=True to a\pm0.022 and b\pm1.41679 for absolute_sigma=False. Versus an expected value of a\pm0.0001027 and b\pm0.058132 for chi_squared and a\pm0.002503 and b\pm1.4167 for reduced_chi_squared
Additionally could you please elaborate what does the expression .format(popt[0], perr[0], popt[1], perr[1]) do?
All help is appreciated, thanks in advance.
Related
I am trying to do a simple linear curve fit with scipy, normally this method works fine for me. This time however for a reason unknown to me it doesn't work.
(I suspect that maybe the numbers are so big that it reaches the limit of what can be stored under a given data type.)
Regardless of the reason, the idea is to make a plot that looks like this:
As you see on the axis here the numbers are of a common order of magnitude. However this time I tried to make a fit to much bigger data points on the order of 1E10, for this I tried to use the following code (here I present only the code for making a scatter plot and then fitting only one data set).
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
ucrt_T = 2/np.sqrt(3)
ucrt_U = 0.1/np.sqrt(3)
T = [314.1, 325.1, 335.1, 345.1, 355.1, 365.1, 374.1, 384.1, 393.1]
T_to_4th = [9733560790.61, 11170378213.80, 12609495509.84, 14183383217.88, 15900203737.92, 17768359469.96, 19586229219.65, 21765930026.49, 23878782252.31]
ucrt_T_lst = [143130823.11, 158701221.00, 173801148.95, 189829733.26, 206814686.75, 224783722.22, 241820148.88, 261735288.93, 280568229.17]
UBlack = [1.9,3.1, 4.4, 5.6, 7.0, 8.7, 10.2, 11.8, 13.4]
def lin_function(x,a,b):
return a*x + b
def line_fit_2():
#Dodanie pozostałych punktów na wykresie
plt.scatter(UBlack, T_to_4th, color='blue')
plt.errorbar(UBlack, T_to_4th, yerr=ucrt_T, fmt='o')
#Seria CZARNA
VltBlack = np.array(UBlack)
Tt4 = np.array(T_to_4th)
popt, pcov = curve_fit(lin_function, VltBlack, Tt4, absolute_sigma=False)
perr = np.sqrt(np.diag(pcov))
y = lin_function(VltBlack, *popt)
#Stylistyka i wygląd wykresu
#plt.plot(Pressure1, y, '--', color = 'g', label="fit with: $a={:.3f}\pm{:.3f}$, $b={:.3f}\pm{:.3f}$" .format(popt[0], perr[0], popt[1], perr[1]))
plt.plot(VltBlack, y, '--', color='green')
plt.ylabel(r'$T^4$ w $[K^4]$')
plt.xlabel(r'Napięcie termometru U w [mV]')
plt.legend(['Fit', 'Data points'])
plt.grid()
plt.show()
line_fit_2()
If you will run it you will find out that the scatter plot is created however the fit isn't executed properly, as only a horizontal line will be added. Additionally an error OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning) is raised.
I would be very happy to know what I am doing wrong or how to resolve this problem. All help is appreciated!
You've pretty much already answered your question, so I'll just confirm your suspicion: the reason the OptimizeWarning is raised is because the underlying optimization algorithm doesn't work properly/diverges due to large parameter numbers.
The solution is very simple, just scale your input parameters before using the fitting tool. Just keep the scaling in mind when you add labels to your x/y axis:
T_to_4th = np.array([9733560790.61, 11170378213.80, 12609495509.84, 14183383217.88, 15900203737.92, 17768359469.96, 19586229219.65, 21765930026.49, 23878782252.31])/10e6
ucrt_T_lst = np.array([143130823.11, 158701221.00, 173801148.95, 189829733.26, 206814686.75, 224783722.22, 241820148.88, 261735288.93, 280568229.17])/10e6
What I did is just divide the lists with big numbers by 10e6. This means that the values are no longer in kPa for example, but in mega kPa (which would be GPa now).
To divide the entire list by the same value, first convert it to a numpy array.
Hope this helps :)
I have a function for fitting:
import cvxpy as cp
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from lmfit import Model, Parameters
def f(wdata, pr, pi, cr, ci):
return ( np.arctan2(-2*ci*pi - 2*cr*pr, 2*cr*wdata) - np.arctan2((pi)**2 + (pr)**2 - (wdata)**2, -2*pr*wdata) )
wdata = (500000000.0, 520000000.0, 540000000.0, 560000000.0, 580000000.0, 600000000.0, 620000000.0, 640000000.0, 660000000.0, 680000000.0, 700000000.0, 720000000.0, 740000000.0, 760000000.0])
wdata= np.asarray(wdata)
ydata = f(wdata, -355574682.231318, -9040912422.93189, 31570159.4732856, -6238484.15663787)
fmodel = Model(f)
params = Parameters()
params.add('pr', value=-355574682.231318, vary=True)
params.add('pi', value=-9040912422.93189, vary=True)
params.add('pi', value=-9040912422.93189, vary=True)
params.add('cr', value=31570159.4732856, vary=True)
params.add('ci', expr='-((cr*pr)/pi) < ci < (cr*pr)/pi if pi<0 else ((cr*pr)/pi) < ci < -(cr*pr)/pi ', vary=True)
result = fmodel.fit(ydata, params, wdata=wdata)
print(result.fit_report())
plt.plot(wdata, ydata, 'bo')
plt.plot(wdata, result.init_fit, 'k--')
plt.plot(wdata, result.best_fit, 'r-')
plt.show()
As you can see, the parameter "ci" has to be bounded between other parameters. I put my constraints in an if statement; however, I got an error that name 'ci' is not defined. I think the reason is that I put the ci in two inequalities with other parameters. How can I tell my code that I want "ci" to be bounded? (with the bound that I've shown now in my code)
There are a number of odd things happening here that set off alarm bells and that you should probably fix:
Zeroth, don't name a parameter 'pi'. Code is meant to be read and that's just going to mess with people's minds. Below, I will call this 'phi'.
First, the initial values for your parameters do not need 15 significant digits.
Second, be careful to avoid variables with values that differ in scale by many orders of magnitude. if 'pr' is expected to be ~3e8 and 'phi' is expected to be ~9e9, consider "changing units" by 1e6 or 1e9 so that variable values are closer to unity.
OK, on to the actual question. I would try this:
params.add('pr', value=-3.6e8, vary=True)
params.add('phi', value=-9.0e9, vary=True)
params.add('cr', value=3.2e7, vary=True)
# add a new *internal* variable that is bound on [-pi/2, pi/2]
params.add('xangle', value=0.05, vary=True, min=-np.pi/2, max=np.pi/2)
# constrain 'ci' to be '(cr*pr/phi)*sin(xangle)'
params.add('ci', expr='(cr*pr/phi)*sin(xangle)')
Now, as xangle varies between -pi/2 and +pi/2, ci will be able to take any value that is between -cr*pr/phi and +cr*pr/phi.
I would like to know how to do a non-linear fit in Python 3.3. I am not finding any easy examples online. I am not well aware of these fitting techniques.
Any help will be welcome!
Thanks in advance.
To follow an easy example, visit http://www.walkingrandomly.com/?p=5215
Here is the code with explanations!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
xdata = np.array([-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9])
ydata = np.array([0.69,0.70,0.69,1.0,1.9,2.4,1.9,0.9,-0.7,-1.4])
def func(x, p1,p2):
return p1*np.cos(p2*x) + p2*np.sin(p1*x)
# Here you give the initial parameters for p0 which Python then iterates over
# to find the best fit
popt, pcov = curve_fit(func,xdata,ydata,p0=(1.0,0.3))
print(popt) # This contains your two best fit parameters
# Performing sum of squares
p1 = popt[0]
p2 = popt[1]
residuals = ydata - func(xdata,p1,p2)
fres = sum(residuals**2)
print(fres)
xaxis = np.linspace(-2,3,100) # we can plot with xdata, but fit will not look good
curve_y = func(xaxis,p1,p2)
plt.plot(xdata,ydata,'*')
plt.plot(xaxis,curve_y,'-')
plt.show()
I'm using scipy.interpolate.UnivariateSpline to smoothly interpolate a large amount of data. Works great. I get an object which acts like a function.
Now I want to save the spline points for later and use them in Matlab (and also Python, but that's less urgent), without needing the original data. How can I do this?
In scipy I have no clue; UnivariateSpline does not seem to offer a constructor with the previously-computed knots and coefficients.
In MATLAB, I've tried the Matlab functions spline() and pchip(), and while both come close, they have errors near the endpoints that look kind of like Gibbs ears.
Here is a sample set of data that I have, in Matlab format:
splinedata = struct('coeffs',[-0.0412739180955273 -0.0236463479425733 0.42393753107602 -1.27274336116436 0.255711720888164 1.93923263846732 -2.30438927604816 1.02078680231079 0.997156858475075 -2.35321792387215 0.667027554745454 0.777918416623834],...
'knots',[0 0.125 0.1875 0.25 0.375 0.5 0.625 0.75 0.875 0.9999],...
'y',[-0.0412739180955273 -0.191354308450615 -0.869601364377744 -0.141538578624065 0.895258135865578 -1.04292294390242 0.462652465278345 0.442550440125204 -1.03967756446455 0.777918416623834])
The coefficients and knots are the result of calling get_coeffs() and get_knots() on the scipy UnivariateSpline. The 'y' values are the values of the UnivariateSpline at the knots, or more precisely:
y = f(f.get_knots())
where f is my UnivariateSpline.
How can I use this data to make a spline that matches the behavior of the UnivariateSpline, without having to use the Curve-Fitting Toolbox? I don't need to do any data fitting in Matlab, I just need to know how to construct a cubic spline from knots/coefficients/spline values.
You can do it by using the functions _eval_args() and _from_tck() from the class UnivariateSpline. The first one gives returns the spline parameters, which you can store and later create a similar spline object using the the second one.
Here is an example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
x = np.linspace(-3, 3, 50)
y = np.exp(-x**2) + 0.1 * np.random.randn(50)
spl1 = UnivariateSpline(x, y, s=.5)
xi = np.linspace(-3, 3, 1000)
tck = spl1._eval_args
spl2 = UnivariateSpline._from_tck(tck)
plt.plot(x, y, 'ro', ms=5, label='data')
plt.plot(xi, spl1(xi), 'b', label='original spline')
plt.plot(xi, spl2(xi), 'y:', lw=4, label='recovered spline')
plt.legend()
plt.show()
In scipy, try scipy.interpolate.splev, which takes
tck: a sequence ... containing the knots, coefficients, and degree of the spline.
Added: the following python class creates spline functions:
init with (knots, coefs, degree),
then use it just like spline functions created by UnivariateSpline( x, y, s ):
from scipy.interpolate import splev
# http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splev.html
class Splinefunc:
""" splinef = Splinefunc( knots, coefs, degree )
...
y = splinef( x ) # __call__
19june untested
"""
def __init__( self, knots, coefs, degree ):
self.knots = knots
self.coefs = coefs
self.degree = degree
def __call__( self, x ):
return splev( x, (self.knots, self.coefs, self.degree ))
Since I took a lecture on Python I wanted to use it to fit my data. Although I have been trying for a while now, I still have no idea why this is not working.
What I would like to do
Take one data-file after another from a subfolder (here called: 'Test'), transform the data a little bit and fit it with a Lorentzian function.
Problem description
When I run the code posted below, it does not fit anything and just returns my initial parameters after 4 function calls. I tried scaling the data, playing around with ftol and maxfev after checking the python documentation over and over again, but nothing improved. I also tried changing the lists to numpy.arrays explicitely, as well as the solution given to the question scipy.optimize.leastsq returns best guess parameters not new best fit, x = x.astype(np.float64). No improvement. Strangely enough, for few selected data-files this same code worked at some point, but for the majority it never did. It can definitely be fitted, since a Levenberg-Marquard fitting routine gives reasonably good results in Origin.
Can someone tell me what is going wrong or point out alternatives...?
import numpy,math,scipy,pylab
from scipy.optimize import leastsq
import glob,os
for files in glob.glob("*.txt"):
x=[]
y=[]
z=[]
f = open(files, 'r')
raw=f.readlines()
f.close()
del raw[0:8] #delete Header
for columns in ( raw2.strip().split() for raw2 in raw ): #data columns
x.append(float(columns[0]))
y.append(float(columns[1]))
z.append(10**(float(columns[1])*0.1)) #transform data for the fit
def lorentz(p,x):
return (1/(1+(x/p[0] - 1)**4*p[1]**2))*p[2]
def errorfunc(p,x,z):
return lorentz(p,x)-z
p0=[3.,10000.,0.001]
Params,cov_x,infodict,mesg,ier = leastsq(errorfunc,p0,args=(x,z),full_output=True)
print Params
print ier
Without seeing your data it is hard to tell what is going wrong. I generated some random noise and used your code to perform a fit to it. Everything works okay. This algorithm does not allow for parameter boundaries so you may run into problems if your p0 is close to zero. I did the following:
import numpy as np
from scipy.optimize import leastsq
import matplotlib.pyplot as plt
def lorentz(p,x):
return p[2] / (1.0 + (x / p[0] - 1.0)**4 * p[1]**2)
def errorfunc(p,x,z):
return lorentz(p,x)-z
p = np.array([0.5, 0.25, 1.0], dtype=np.double)
x = np.linspace(-1.5, 2.5, num=30, endpoint=True)
noise = np.random.randn(30) * 0.05
z = lorentz(p,x)
noisyz = z + noise
p0 = np.array([-2.0, -4.0, 6.8], dtype=np.double) #Initial guess
solp, ier = leastsq(errorfunc,
p0,
args=(x,noisyz),
Dfun=None,
full_output=False,
ftol=1e-9,
xtol=1e-9,
maxfev=100000,
epsfcn=1e-10,
factor=0.1)
plt.plot(x, z, 'k-', linewidth=1.5, alpha=0.6, label='Theoretical')
plt.scatter(x, noisyz, c='r', marker='+', color='r', label='Measured Data')
plt.plot(x, lorentz(solp,x), 'g--', linewidth=2, label='leastsq fit')
plt.xlim((-1.5, 2.5))
plt.ylim((0.0, 1.2))
plt.grid(which='major')
plt.legend(loc=8)
plt.show()
This yielded a solution of:
solp = array([ 0.51779002, 0.26727697, 1.02946179])
Which is close to the theoretical value:
np.array([0.5, 0.25, 1.0])