Piecewise fitting with two linear functions and a breaking point in Python

Piecewise fitting with two linear functions and a breaking point in Python - python

I want to fit my data using two linear functions (broken power law) with one breaking point which is user given. Currently Im using the curve_fit function from the scipy.optimize module. Here are my datasets frequencies, binned data, errors
Here is my code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
freqs=np.loadtxt('binf11.dat')
binys=np.loadtxt('binp11.dat')
errs=np.loadtxt('bine11.dat')
def brkPowLaw(xArray, breakp, slopeA, offsetA, slopeB):
returnArray = []
for x in xArray:
if x <= breakp:
returnArray.append(slopeA * x + offsetA)
elif x>breakp:
returnArray.append(slopeB * x + offsetA)
return returnArray
#define initial guesses, breakpoint=-3.2
a_fit,cov=curve_fit(brkPowLaw,freqs,binys,sigma=errs,p0=(-3.2,-2.0,-2.0,-2.0))
modelPredictions = brkPowLaw(freqs, *a_fit)
plt.errorbar(freqs, binys, yerr=errs, fmt='kp',fillstyle='none',elinewidth=1)
plt.xlim(-5,-2)
plt.plot(freqs,modelPredictions,'r')
The offset of the second linear function is set to be equal to the offset of the first one.
It looks like this works but I get this fit:
Now I thought that the condition in by brkPowLaw function should suffice but it does not. What I want is that the first linear equation is used to fit the data up to a chosen breaking point and then from this breaking point a second linear fit will be done, but without the hump as it shows in the plot because now there it looks like there are two breaking points instead of one and three linear functions for fitting which is not what I expected nor wanted.
What I want is that when the first linear fit ends the second one starts from the point where the first linear fit ended.
I have tried using the numpy.piecewise function with no plausible result, looked into some topics like this or this but I did not manage to make my script work
Thank you for your time

This would be my approach, not with linear but quadratic functions.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def soft_step(x, s): ### not my usual np.tanh() ...OMG
return 1+ 0.5 * s * x / np.sqrt( 1 + ( s * x )**2 )
### for the looks of the data I decided to go for two parabolas with
### one discontinuity
def fit_func( x, a0, b0, c0, a1, b1, c1, x0, s ):
out = ( a0 * x**2 + b0 * x + c0 ) * ( 1 - soft_step( x - x0, s ) )
out += ( a1 * x**2 + b1 * x + c1 ) * soft_step( x - x0, s )
return out
### with global parameter for iterative fit
### if using least_squares one could avoid globals
def fit_short( x, a0, b0, c0, a1, b1, c1, x0 ):
global stepwidth
return fit_func( x, a0, b0, c0, a1, b1, c1, x0, stepwidth )
### getting data
xl = np.loadtxt( "binf11.dat" )
yl = np.loadtxt( "binp11.dat" )
el = np.loadtxt( "bine11.dat" )
### check for initial values
p0 = [ 0, -2,-11, 0, -2, -9, -3, 10 ]
xth = np.linspace( -5.5, -1.5, 250 )
yth = np.fromiter( ( fit_func(x, *p0 ) for x in xth ), np.float )
### initial fit
sol, pcov = curve_fit( fit_func, xl, yl, sigma=el, p0=p0, absolute_sigma=True )
yft = np.fromiter( ( fit_func( x, *sol ) for x in xth ), np.float )
sol=sol[: -1]
###iterating with fixed and decreasing softness in the step
for stepwidth in range(10,55,5):
sol, pcov = curve_fit( fit_short, xl, yl, sigma=el, p0=sol, absolute_sigma=True )
### printing the step position
print sol[-1]
yiter = np.fromiter( ( fit_short(x, *sol ) for x in xth ), np.float )
print sol
###plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
# ~ax.plot( xth, yth ) ### no need to show start parameters
ax.plot( xth, yft ) ### first fit with variable softness
ax.plot( xth, yiter ) ### last fit with fixed softness of 50
ax.errorbar( xl, yl, el, marker='o', ls='' ) ### data
plt.show()
This gives:
-3.1762721614559712
-3.1804393481217477
-3.1822672190583603
-3.183493292415725
-3.1846976088390333
-3.185974760198917
-3.1872472903175266
-3.188427041827035
-3.1894705102541843
[ -0.78797351 -5.33255174 -12.48258537 0.53024954 1.14252783 -4.44589397 -3.18947051]
and
putting the jump at -3.189

Related

Attempt of fit a Gaussian distribution: Error scipy/optimize/minpack.py", line 765, in curve_fit raise ValueError("`sigma` has incorrect shape.")

I have a pretty known issue but impossible to fix for the moment.
This is about curve_fit function. I get the error:
Error scipy/optimize/minpack.py", line 765, in curve_fit raise ValueError("sigma has incorrect shape.")
Here is the code, don't make caution to the loop for, it is just I would like 5 different histograms:
for i in range(5):
mean_o[i] = np.mean(y3[:,i])
sigma_o[i] = np.std(y3[:,i])
## Histograms
# Number of bins
Nbins=100
binwidth = np.zeros(5)
# Fitting curves
def gaussian(x, a, mean, sigma):
return a * np.exp(-((x - mean)**2 / (2 * sigma**2)))
for i in range(5):
binwidth[i] = (max(y3[:,i]) - min(y3[:,i]))/Nbins
bins_plot = np.arange(min(y3[:,i]), max(y3[:,i]) + binwidth[i], binwidth[i])
plt.title('Distribution of O observable for redshift bin = '+str(z_ph_fid[i]))
plt.hist(y3[:,i], bins=bins_plot, label='bin '+str(z_ph_fid[i]))
plt.legend(loc='upper right')
# Fitting and plot
range_fit = np.linspace(min(y3[:,i]), max(y3[:,i]), len(y3[:,i]))
popt, pcov = curve_fit(gaussian, range_fit, y3[:,i], mean_o[i], sigma_o[i])
plt.plot(range_fit, gaussian(range_fit, *popt))
# Save figure
plt.savefig('chi2_gaussian_bin_'+str(i+1)+'.png')
plt.close()
The first histogram i=0 look like:
I would like to plot a red Gaussian fit over the histogram.

There are two problems with the OP.
First problem is that the code tries to fit the random samples with a normal distribution. This is wrong. One can fit the output of the histogram, though. This is shown in the code below. Better would be the use of scipy.stats.norm.fit() which allows to fit the random samples. This is also shown.
Second problem is the sigma-shape. Here curve_fit is actually expecting errors on the y-data, which naturally needs the shape of the y-data. What should have been done is: providing start values for the fit. This is shown below as well.
Code looks like:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy.optimize import curve_fit
mean_o = list()
sigma_o = list()
y3 = list()
### generate some data
for i in range( 5 ):
y3.append( norm.rvs( size=150000 ) )
y3 = np.transpose( y3 )
for i in range(5):
mean_o.append( np.mean( y3[ :, i ] ) )
sigma_o.append( np.std( y3[ :, i ] ) )
## Histograms
# Number of bins
Nbins=100
binwidth = np.zeros(5)
# Fitting curves
def gaussian( x, a , mean, sigma ):
return a * np.exp( -( ( x - mean )**2 / (2 * sigma**2 ) ) )
fig = plt.figure()
ax = { i : fig.add_subplot( 2, 3, i+1) for i in range( 5 ) }
for i in range(5):
ymin = min(y3[:,i])
ymax = max(y3[:,i])
binwidth[i] = ( ymax - ymin) / Nbins
bins_plot = np.arange( ymin, ymax + binwidth[i], binwidth[i])
histdata = ax[i].hist(
y3[:,i],
bins=bins_plot,
label='bin '+str(i)
)
range_fit = np.linspace( ymin, ymax, 250)
# Fitting and plot version 1
popt, pcov = curve_fit(
gaussian,
0.5 * ( histdata[1][:-1] + histdata[1][1:] ),
histdata[0],
p0=( max(histdata[0]), mean_o[i], sigma_o[i] ) )
ax[i].plot(range_fit, gaussian( range_fit, *popt ) )
ax[i].axvline( x=mean_o[i], ls=':', c='r' )
# Fitting and plot version 2
params = norm.fit( y3[ ::, i ], loc=mean_o[i], scale=sigma_o[i] )
nth = gaussian(
range_fit,
len( y3[::, i]) * binwidth[i] / np.sqrt( 2 * np.pi ),
*params
)
ax[i].plot(range_fit, nth, ls="--" )
plt.tight_layout()
plt.show()

Smoothing a curve with many peaks with Gaussian

I have spectroscopy data with some very sharp peaks as seen in blue curve. I would like to make the peaks a bit more smooth like the orange curve in the plot.
I thought the easiest way to do this is to convolve my data points with Gaussians. I know both numpy and scipy have convolve functions but I am not sure if I need 1D or 2D convolution to get what I need. So far I tried convolve1d and gaussian_filter1d from scipy and convolve from numpy. None of them improved the sharp lines connecting the data points. I also don't know how to choose the correct sigma or weights...
The text file containing the data points is here.
The orange curve is generated from a visualisation programme and I wish to be able to generate it myself with python rather than using the programme.
EDIT:
New link for file

Looks like you want a "kernel density estimator" which is implemented by:
from scipy.stats import gaussian_kde
X = np.random.rand(50) * 3500
Y = np.random.rand(50) * 50
xi = linspace(0, 3500, 1000)
kde = gaussian_kde(X, weights = Y, bw_method = .01) #tune `bw_method` to get the bandwidth you want
plt.plot(xi, kde.pdf(xi))
You may also need to adjust the y scaling of the graph to match your requirements

This is manually reproducing the orange curve given in the OP. Turns out it is convoluted with a Lorentzian not Gaussian.
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
from scipy.optimize import curve_fit
def gs( x, x0, a, s ):
return a * np.exp( -( x - x0 )**2 / ( 2 * s**2 ) )
def cs( x, x0, a, s ):
return a / ( ( x - x0 )**2 + s**2 )
conrange = 40000
### gasiian is no good
# ~condata = np.fromiter( ( gs(x, 0, 1, 1800 ) for x in np.arange( -5000, 5000 ) ), np.float )
### Cauchy looks much better
condata = np.fromiter(
(
cs( x, 0, 1, 2000 ) for x in np.arange( -conrange, conrange )
), np.float
)
### shift can be zero.
### Amplitude does not matter as it will be scaled later anyway
### width matters of course, but is adjusted manually for the moment.
data = np.loadtxt("ir_data.txt")
xdata = data[:, 0]
ydata = data[:, 1]
xdataint = np.fromiter( ( int( x* 100 ) for x in xdata ), int )
xmin = xdataint[0]
xmax = xdataint[-1]
xfilled = np.arange( xmin , xdataint[-1] + 1 )
yfilled = np.zeros( len( xfilled ), dtype=np.float )
xfloat = np.fromiter( ( x / 100. for x in xfilled), float )
for x, y in zip( xdataint, ydata ):
yfilled[ x - xmin ] = y
### just putting a manual scale here, but the real one can be calculated
### from the convolution properties
yc = 1e6 * np.convolve( condata, yfilled, mode="full" )
xfull = np.arange(
-conrange + xmin, xmin + conrange + len( xfilled ) - 1
)
xfloat = np.fromiter( ( 0.01 * x for x in xfull ), float )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xdata, ydata, ls='', marker='o', ms=2 )
ax.plot( xfloat, yc, ls='-')
plt.show()
Disclaimer
This are preliminary results and only posted due to request from the author of the OP. The might be some refinement.

Odeint and Curve fit of scipy give warning and gives wrong results

I'm troubled by the warnings of odeint and curve fit. So the thing I want to do is:
my 1st problem is that curve fit and odeint gives warnings as below repeatedly for the 3 data sets(6 warnings in total), but meanwhile curve_fit does give results seemingly correct.
828: OptimizeWarning: Covariance of the parameters could not be estimated
247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
** the 2nd problem is for integrated curves, with the EXACT SAME code just by executing it multiple times, it gives differents results. It's seems to have a period of repetition, after a few executions, it gives correct curves, but on the next execution, it become once again incorrect and so on. Maybe a problem of instability?**
import math
import numpy as np
import pathlib as pl
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.integrate import odeint
def loadData(path):
with open(path,"r") as fid:
res=np.loadtxt(fid,comments="#")
return res
def modeleeq(a,N,p,we,wp):
res=(we*a/p[0])**p[1] + (wp*a/p[2])**p[3]
return res
#get path
chemin=pl.Path(input("Paste the path to the directory containing data files\n"))
#get we and wp in array 3X2
wewp=loadData(chemin/'wewp.dat').transpose()
#plotting
plt.style.use('seaborn')
fig, (ax1,ax2) = plt.subplots(1,2)
colors = ['#79ccff', '#f78db4', '#a07ffb']
files = pl.Path(chemin).glob("a_dadN*")
para=[]
for i,f in enumerate(files):
res=loadData(f)
a=res[:,0]
dadN=res[:,1]
we=wewp[i,0]
wp=wewp[i,1]
#exp data
ax1.scatter(a,dadN,c=colors[i],marker="x", label = f"test {i+1}")
#evaluation of parameters
modele=lambda a, *p: (we*a/p[0])**p[1] + (wp*a/p[2])**p[3] #p=[ge,me,gp,mp]
p, pcov= curve_fit(modele,a,dadN, p0 = [2e5,0,1e5,0])
para.append(p)
ax1.plot(a,modele(a,*p),c=colors[i],label=f"test {i+1} identification")
#intergration
a0=a.min() #initial condition
N = np.linspace(0, 4000)
aitgr=odeint(modeleeq,a0,N,args=(p,we,wp))
ax2.plot(N,aitgr,c=colors[i],label = f"test {i+1} integration")
#the following code is just for adding titles and print the identified paras
#so I won't put them here
Big thanks to you all!

I think here are a few issues. First it is clear that the two addends are very similar. So it can, and actually does, happen that the data cannot distinguish between the two. Scaling is a second issue. Having fit parameters spanning orders of magnitude is usually not a good idea. In fact, the gamma just rescale the W. Hence, one can easily rewrite the function as ( f1 * a )**e1 One fits f1 and calculates gamma by knowing f1=W/gamma. Another issue is the possibility of negative powers of a negative number coming up. So one should use either abs( f1 ) or f1**2. With this in mind I modified the code and got the result below. From the fit results in the second and third case, one can see that either f1=0 or the exponents are almost equal. In such a case it is normal that the covariance matrix cannot be determined.
Finally, when it comes to integrating / plotting a(N) The differential equation is somewhat of type da / dN = a**k so we have da / a**k = dN integrating will give a**(-k+1) = N+N0 so a(N) = 1 / (N + N0 )**( 1 / (k-1) ). As the initial slope is positive N0 < 0 and the function diverges at N0. Numerical integration beyond this value does not make sense.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.integrate import odeint
def loadData(path):
with open(path,"r") as fid:
res=np.loadtxt(fid,comments="#")
return res
def simplified_model(a, N, f1, e1, f2, e2 ):
dadN = ( np.fabs( f1 ) * a )**e1 + ( np.fabs( f2 ) * a )**e2
return dadN
plt.style.use('seaborn')
fig = plt.figure()
ax1 = fig.add_subplot( 1, 2, 1)
ax2 = fig.add_subplot( 1, 2, 2)
colors = ['#79ccff', '#f78db4', '#a07ffb']
para = list()
for i in range( 3 ):
res=loadData( "a_dadN_{}.dat".format( i + 1 ) )
a = res[ ::, 0 ]
dadN = res[ ::, 1 ]
# ~we = wewp[i, 0 ]
# ~wp = wewp[ i, 1 ]
def model_wrapper(a, ge, me, gp, mp ):
return simplified_model(a, 0, ge, me, gp, mp )
#exp data
ax1.scatter( a, dadN, c=colors[i], marker="x", label = f"test {i+1}" )
print(i)
al = np.linspace( min(a), max(a), 50 )
guess = [ .2 + i, 2.0, 2, 2.2 ]
dal = np.fromiter( ( model_wrapper( av, *guess ) for av in al ), np.float )
ax1.plot( al, dal, color=colors[i], ls=':')
p, pcov= curve_fit(model_wrapper, a, dadN, p0 = guess, maxfev=100000 )
print( p )
dalf = np.fromiter( ( model_wrapper( av, *p ) for av in al ), np.float )
ax1.plot( al, dalf, color=colors[i])
#intergration
a0 = a.min() #initial condition
N = np.linspace(0, 4000, 100000)
aitgr=odeint( simplified_model, a0, N,args=tuple(p) )
ax2.plot( N , aitgr, c=colors[i], label = f"test {i+1} integration")
ax2.set_ylim([1e-4,1e-2])
ax2.set_yscale("log")
plt.show()
providing

curve fitting equations python

I am trying to fit data with an admittance equation for an rlc circuit having 6 components. I am following an example given he[fit]1re and inserted my equation. The equation is the real part of admittance for the 6 component circuit simplified using Mathcad. In the figure attached the x axis is omega (w=2*pi*f) and y is admittance in milli Siemens.
The program runs but it doesn't do the fitting despite a good trial function. I appreciate any help why the fit is a straight line. I attached also a Gaussian fitting example.
this is what i get when I try to fit with the equation. The data is the one with a smaller peak on the left and the trial function is the dotted line. the fit is a straight line
from numpy import sqrt, pi, exp, linspace, loadtxt
from lmfit import Model
import matplotlib.pyplot as plt
data = loadtxt("C:/Users/susu/circuit_eq_real5.dat")
x = data[:, 0]
y = data[:, 1]
def circuit(x,C0,Cm,Lm,Rm,R0,Rs):
return ((C0**2*Cm**2*Lm**2*R0*x**4)+(Rs*C0**2*Cm**2*Lm**2*x**4)+(C0**2*Cm**2*R0**2*Rm*x**2)+(Rs*C0**2*Cm**2*R0**2*x**2)+(C0**2*Cm**2*R0*Rm**2*x**2)+(2*Rs*C0**2*Cm**2*R0*Rm*x**2)+(Rs*C0**2*Cm**2*Rm**2*x**2)-(2*C0**2*Cm*Lm*R0*x**2)-(2*Rs*C0**2*Cm*Lm*x**2)+(C0**2*R0)+(Rs*C0**2)-(2*Rs*C0*Cm**2*Lm*x**2)+(2*Rs*C0*Cm)+(Cm**2*Rm)+(Rs*Cm**2))/((C0**2*Cm**2*Lm**2*x**4)+(C0**2*Cm**2*R0**2*x**2)+(2*C0**2*Cm**2*R0*Rm*x**2)+(C0**2*Cm**2*Rm**2*x**2)-(2*C0**2*Cm*Lm*x**2)+(C0**2)-(2*C0*Cm**2*Lm*x**2)+(2*C0*Cm)+(Cm**2))
gmodel = Model(circuit)
result = gmodel.fit(y, x=x, C0=1.0408*10**(-12), Cm=5.953*10**(-14),
Lm=1.475*10**(-7), Rm=1.571, R0=2.44088, Rs=0.42)
print(result.fit_report())
plt.plot(x, y, 'bo')
plt.plot(x, result.init_fit, 'k--')
plt.plot(x, result.best_fit, 'r-')
plt.show()
Below is the Fit Report
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 14005
# data points = 237
# variables = 6
chi-square = 32134074.5
reduced chi-square = 139108.548
Akaike info crit = 2812.71607
Bayesian info crit = 2833.52443
[[Variables]]
C0: -7.5344e-15 +/- 6.3081e-09 (83723736.65%) (init = 1.0408e-12)
Cm: -8.9529e-13 +/- 1.4518e-06 (162164237.47%) (init = 5.953e-14)
Lm: 2.4263e-06 +/- 1.94051104 (79978205.20%) (init = 1.475e-07)
Rm: -557.974399 +/- 1.3689e+09 (245334051.75%) (init = 1.571)
R0: -5178.53517 +/- 6.7885e+08 (13108904.45%) (init = 2.44088)
Rs: 2697.67659 +/- 7.3197e+08 (27133477.70%) (init = 0.42)
[[Correlations]] (unreported correlations are < 0.100)
C(R0, Rs) = -1.003
C(Rm, Rs) = -0.987
C(Rm, R0) = 0.973
C(C0, Lm) = 0.952
C(C0, Cm) = -0.502
C(Cm, R0) = -0.483
C(Cm, Rs) = 0.453
C(Cm, Rm) = -0.388
C(Cm, Lm) = -0.349
C(C0, R0) = 0.310
C(C0, Rs) = -0.248
C(C0, Rm) = 0.148
Thank you so much M Newville and Mikuszefski and others for your insights and feedback. I agreed what I put there is perhaps a mess to put in a program. It is apparent from the python code that I am not versed in Python or programming.
Mikuszefsky, thanks for posting the rlc example code. Your approach is neat and interesting. I didn't know Python does direct complex fitting.I will try your approach and see if can do the fit. I want to fit both the real and imaginary part of Y (admittance). I will definitely get stuck somewhere and will post my progress here.
Best,
Susu

Here a way to clean up RLC circuits with parallel and series connections. This avoids this super long line and hard to check function. It also avoids Matlab or similar programs, as it directly computes the circuit. Surely, it can be extended easily the OP's circuit. As pointed out by M Newville, the simple fit fails. If, on the other hand, units are scaled to natural units, it works even without initial parameters. Note, results are onnly correct by a scaling factor. One needs to know at least one components value.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def r_l( w, l ):
return 1j * w * l
def r_c( w, c ):
return 1. / ( 1j * w * c )
def parallel( a, b ):
return( 1. / ( 1./ a + 1. / b ) )
def series( a, b ):
return a + b
# simple rlc band pass filter (to be extended)
def rlc_band( w , r, l, c ):
lc = parallel( r_c( w , c ), r_l( w, l ) )
return lc / series( r, lc )
def rlc_band_real( w , r, l, c ):
return rlc_band( w , r, l, c ).real
def rlc_band_real_milli_nano( w , r, l, c ):
return rlc_band_real( w , r, 1e-6 * l, 1e-9 * c ).real
wList = np.logspace( 5, 7, 25 )
wFullList = np.logspace( 5, 7, 500 )
rComplexList = np.fromiter( ( rlc_band(w, 12, 1.3e-5, 1e-7 ) for w in wList ), np.complex )
rList = np.fromiter( ( r.real for r in rComplexList ), np.float )
pList = np.fromiter( ( np.angle( r ) for r in rComplexList ), np.float )
fit1, pcov = curve_fit( rlc_band_real, wList, rList )
print fit1
print "does not work"
fit2, pcov = curve_fit( rlc_band_real_milli_nano, wList, rList )
print fit2
print "works, but is not unique (scaling is possible)"
print 12, fit2[1] * 12 / fit2[0], fit2[2] * fit2[0] / 12.
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( wList, rList , ls='', marker='o', label='data')
#~ ax.plot( wList, pList )
ax.plot( wFullList, [ rlc_band_real( w , *fit1 ) for w in wFullList ], label='naive fit')
ax.plot( wFullList, [ rlc_band_real_milli_nano( w , *fit2 ) for w in wFullList ], label='scaled units')
ax.set_xscale('log')
ax.legend( loc=0 )
plt.show()
Providing:
>> /...minpack.py:785: OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)
>> [1. 1. 1.]
>> does not work
>> [ 98.869924 107.10908434 12.13715912]
>> works, but is not unique (scaling is possible)
>> 12 13.0 100.0

Providing a real link to a text file of the actual data you are using and/or a real plot of what you are actually seeing would be most helpful. Also, please provide an accurate and complete description of the results including the text of what is actually printed out by the print(result.fit_report()). Basically, ask yourself how you might try to help someone who asked such a question, and provide as much information as you can.
No one (including you) is ever going to be able to spell-check the implementation of your function. You will need thorough and robust testing of this function in order to convince anyone (including you, I hope) that it is doing what you think it should do. You should provide the results of those tests before worrying about why it is not working as a fitting function. You should definitely consider refactoring that mess of an equation into more manageable and readable pieces.
That said, I also strongly recommend that you do not work in units of Farads and Henrys but picoFarads or nanoFarads and microHenrys. That will make the values much closer to 1 (say, order 1e-6 to 1e+6), which will make it much easier for the fit to do its job.

Python power law fit with upper limits & asymmetric errors in data using ODR

I'm trying to fit some data to a power law using python. The problem is that some of my points are upper limits, which I don't know how to include in the fitting routine.
In the data, I have put the upper limits as errors in y equal to 1, when the rest is much smaller. You can put this errors to 0 and change the uplims list generator, but then the fit is terrible.
The code is the following:
import numpy as np
import matplotlib.pyplot as plt
from scipy.odr import *
# Initiate some data
x = [1.73e-04, 5.21e-04, 1.57e-03, 4.71e-03, 1.41e-02, 4.25e-02, 1.28e-01, 3.84e-01, 1.15e+00]
x_err = [1e-04, 1e-04, 1e-03, 1e-03, 1e-02, 1e-02, 1e-01, 1e-01, 1e-01]
y = [1.26e-05, 8.48e-07, 2.09e-08, 4.11e-09, 8.22e-10, 2.61e-10, 4.46e-11, 1.02e-11, 3.98e-12]
y_err = [1, 1, 2.06e-08, 2.5e-09, 5.21e-10, 1.38e-10, 3.21e-11, 1, 1]
# Define upper limits
uplims = np.ones(len(y_err),dtype='bool')
for i in range(len(y_err)):
if y_err[i]<1:
uplims[i]=0
else:
uplims[i]=1
# Define a function (power law in our case) to fit the data with.
def function(p, x):
m, c = p
return m*x**(-c)
# Create a model for fitting.
model = Model(function)
# Create a RealData object using our initiated data from above.
data = RealData(x, y, sx=x_err, sy=y_err)
# Set up ODR with the model and data.
odr = ODR(data, model, beta0=[1e-09, 2])
odr.set_job(fit_type=0) # 0 is full ODR and 2 is least squares; AFAIK, it doesn't change within errors
# more details in https://docs.scipy.org/doc/scipy/reference/generated/scipy.odr.ODR.set_job.html
# Run the regression.
out = odr.run()
# Use the in-built pprint method to give us results.
#out.pprint() #this prints much information, but normally we don't need it, just the parameters and errors; the residual variation is the reduced chi square estimator
print('amplitude = %5.2e +/- %5.2e \nindex = %5.2f +/- %5.2f \nchi square = %12.8f'% (out.beta[0], out.sd_beta[0], out.beta[1], out.sd_beta[1], out.res_var))
# Generate fitted data.
x_fit = np.linspace(x[0], x[-1], 1000) #to do the fit only within the x interval; we can always extrapolate it, of course
y_fit = function(out.beta, x_fit)
# Generate a plot to show the data, errors, and fit.
fig, ax = plt.subplots()
ax.errorbar(x, y, xerr=x_err, yerr=y_err, uplims=uplims, linestyle='None', marker='x')
ax.loglog(x_fit, y_fit)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$f(x) = m·x^{-c}$')
ax.set_title('Power Law fit')
plt.show()
The result of the fit is:
amplitude = 3.42e-12 +/- 5.32e-13
index = 1.33 +/- 0.04
chi square = 0.01484021
As you can see in the plot, the two first and two last points are upper limits and the fit is not taking them into account. Moreover, in the penultimate point, the fit goes over it even though that would be strictly forbidden.
I need that the fit knows this limits are very strict, and not try to fit the point itself but only consider them just as limits. How could I do this with the odr routine (or any other code which makes me the fits and gives me a chi square-esque estimator)?
Please, take into account that I need to change the function to other generalizations easily, so things as the powerlaw module are not desirable.
Thanks!

This answer is related to this post, where I discuss fitting with x and y errors. This, hence does not require the ODR module, but can be done manually. Therefore, one can use leastsq or minimize. Concerning the constraints, I made clear in other posts that I try to avoid them if possible. This can be done here as well, although the details of programming and maths are a little cumbersome, especially if it is supposed to be stable and foolproof. I will just give a rough idea. Say we want y0 > m * x0**(-c). In log-form we can write this as eta0 > mu - c * xeta0. I.e. there is an alpha such that eta0 = mu - c * xeta0 + alpha**2. Same for the other inequalities. For the second upper limit you get a beta**2 but you can decide which one is the smaller one, so you automatically fulfil the other condition. Same thing works for the lower limits with a gamma**2 and a delta**2. Say we can work with alpha and gamma. We can combine the inequality conditions to relate those two as well. At the end we can fit a sigma and alpha = sqrt(s-t)* sigma / sqrt( sigma**2 + 1 ), where s and t are derived from the inequalities. The sigma / sqrt( sigma**2 + 1 ) function is just one option to let alpha vary in a certain range, i.e. alpha**2 < s-t The fact that the radicand may become negative, shows that there are cases without solution. With alpha known, mu and, therefore m are calculated. So fit parameters are c and sigma, which takes the inequalities into account and makes m depended. I tired it and it works, but the version at hand is not the most stable one. I'd post it upon request.
As we have a handmade residual function already, we have a second option, though. We just introduce our own chi**2 function and use minimize, whic allows constraints. As minimize and the constraints keyword solution are very flexible and the residual function is easily modified for other functions and not only for m * x**( -c ) the overall construction is quite flexible. It looks as follows:
import matplotlib.pyplot as plt
import numpy as np
from random import random, seed
from scipy.optimize import minimize,leastsq
seed(7563)
fig1 = plt.figure(1)
###for gaussion distributed errors
def boxmuller(x0,sigma):
u1=random()
u2=random()
ll=np.sqrt(-2*np.log(u1))
z0=ll*np.cos(2*np.pi*u2)
z1=ll*np.cos(2*np.pi*u2)
return sigma*z0+x0, sigma*z1+x0
###for plotting ellipses
def ell_data(a,b,x0=0,y0=0):
tList=np.linspace(0,2*np.pi,150)
k=float(a)/float(b)
rList=[a/np.sqrt((np.cos(t))**2+(k*np.sin(t))**2) for t in tList]
xyList=np.array([[x0+r*np.cos(t),y0+r*np.sin(t)] for t,r in zip(tList,rList)])
return xyList
###function to fit
def f(x,m,c):
y = abs(m) * abs(x)**(-abs(c))
#~ print y,x,m,c
return y
###how to rescale the ellipse to make fitfunction a tangent
def elliptic_rescale(x, m, c, x0, y0, sa, sb):
#~ print "e,r",x,m,c
y=f( x, m, c )
#~ print "e,r",y
r=np.sqrt( ( x - x0 )**2 + ( y - y0 )**2 )
kappa=float( sa ) / float( sb )
tau=np.arctan2( y - y0, x - x0 )
new_a=r*np.sqrt( np.cos( tau )**2 + ( kappa * np.sin( tau ) )**2 )
return new_a
###residual function to calculate chi-square
def residuals(parameters,dataPoint):#data point is (x,y,sx,sy)
m, c = parameters
#~ print "m c", m, c
theData = np.array(dataPoint)
best_t_List=[]
for i in range(len(dataPoint)):
x, y, sx, sy = dataPoint[i][0], dataPoint[i][1], dataPoint[i][2], dataPoint[i][3]
#~ print "x, y, sx, sy",x, y, sx, sy
###getthe point on the graph where it is tangent to an error-ellipse
ed_fit = minimize( elliptic_rescale, x , args = ( m, c, x, y, sx, sy ) )
best_t = ed_fit['x'][0]
best_t_List += [best_t]
#~ exit(0)
best_y_List=[ f( t, m, c ) for t in best_t_List ]
##weighted distance not squared yet, as this is done by scipy.optimize.leastsq
wighted_dx_List = [ ( x_b - x_f ) / sx for x_b, x_f, sx in zip( best_t_List,theData[:,0], theData[:,2] ) ]
wighted_dy_List = [ ( x_b - x_f ) / sx for x_b, x_f, sx in zip( best_y_List,theData[:,1], theData[:,3] ) ]
return wighted_dx_List + wighted_dy_List
def chi2(params, pnts):
r = np.array( residuals( params, pnts ) )
s = sum( [ x**2 for x in r] )
#~ print params,s,r
return s
def myUpperIneq(params,pnt):
m, c = params
x,y=pnt
return y - f( x, m, c )
def myLowerIneq(params,pnt):
m, c = params
x,y=pnt
return f( x, m, c ) - y
###to create some test data
def test_data(m,c, xList,const_sx,rel_sx,const_sy,rel_sy):
yList=[f(x,m,c) for x in xList]
xErrList=[ boxmuller(x,const_sx+x*rel_sx)[0] for x in xList]
yErrList=[ boxmuller(y,const_sy+y*rel_sy)[0] for y in yList]
return xErrList,yErrList
###some start values
mm_0=2.3511
expo_0=.3588
csx,rsx=.01,.07
csy,rsy=.04,.09,
limitingPoints=dict()
limitingPoints[0]=np.array([[.2,5.4],[.5,5.0],[5.1,.9],[5.7,.9]])
limitingPoints[1]=np.array([[.2,5.4],[.5,5.0],[5.1,1.5],[5.7,1.2]])
limitingPoints[2]=np.array([[.2,3.4],[.5,5.0],[5.1,1.1],[5.7,1.2]])
limitingPoints[3]=np.array([[.2,3.4],[.5,5.0],[5.1,1.7],[5.7,1.2]])
####some data
xThData=np.linspace(.2,5,15)
yThData=[ f(x, mm_0, expo_0) for x in xThData]
#~ ###some noisy data
xNoiseData,yNoiseData=test_data(mm_0, expo_0, xThData, csx,rsx, csy,rsy)
xGuessdError=[csx+rsx*x for x in xNoiseData]
yGuessdError=[csy+rsy*y for y in yNoiseData]
for testing in range(4):
###Now fitting with limits
zipData=zip(xNoiseData,yNoiseData, xGuessdError, yGuessdError)
estimate = [ 2.4, .3 ]
con0={'type': 'ineq', 'fun': myUpperIneq, 'args': (limitingPoints[testing][0],)}
con1={'type': 'ineq', 'fun': myUpperIneq, 'args': (limitingPoints[testing][1],)}
con2={'type': 'ineq', 'fun': myLowerIneq, 'args': (limitingPoints[testing][2],)}
con3={'type': 'ineq', 'fun': myLowerIneq, 'args': (limitingPoints[testing][3],)}
myResult = minimize( chi2 , estimate , args=( zipData, ), constraints=[ con0, con1, con2, con3 ] )
print "############"
print myResult
###plot that
ax=fig1.add_subplot(4,2,2*testing+1)
ax.plot(xThData,yThData)
ax.errorbar(xNoiseData,yNoiseData, xerr=xGuessdError, yerr=yGuessdError, fmt='none',ecolor='r')
testX = np.linspace(.2,6,25)
testY = np.fromiter( ( f( x, myResult.x[0], myResult.x[1] ) for x in testX ), np.float)
bx=fig1.add_subplot(4,2,2*testing+2)
bx.plot(xThData,yThData)
bx.errorbar(xNoiseData,yNoiseData, xerr=xGuessdError, yerr=yGuessdError, fmt='none',ecolor='r')
ax.plot(limitingPoints[testing][:,0],limitingPoints[testing][:,1],marker='x', linestyle='')
bx.plot(limitingPoints[testing][:,0],limitingPoints[testing][:,1],marker='x', linestyle='')
ax.plot(testX, testY, linestyle='--')
bx.plot(testX, testY, linestyle='--')
bx.set_xscale('log')
bx.set_yscale('log')
plt.show()
Providing results
############
status: 0
success: True
njev: 8
nfev: 36
fun: 13.782127248002116
x: array([ 2.15043226, 0.35646436])
message: 'Optimization terminated successfully.'
jac: array([-0.00377715, 0.00350225, 0. ])
nit: 8
############
status: 0
success: True
njev: 7
nfev: 32
fun: 41.372277637885716
x: array([ 2.19005695, 0.23229378])
message: 'Optimization terminated successfully.'
jac: array([ 123.95069313, -442.27114677, 0. ])
nit: 7
############
status: 0
success: True
njev: 5
nfev: 23
fun: 15.946621924326545
x: array([ 2.06146362, 0.31089065])
message: 'Optimization terminated successfully.'
jac: array([-14.39131606, -65.44189298, 0. ])
nit: 5
############
status: 0
success: True
njev: 7
nfev: 34
fun: 88.306027468763432
x: array([ 2.16834392, 0.14935514])
message: 'Optimization terminated successfully.'
jac: array([ 224.11848736, -791.75553417, 0. ])
nit: 7
I checked four different limiting points (rows). The result are displayed normally and in logarithmic scale (columns). With some additional work you could get errors as well.
Update on asymmetric errors
To be honest, at the moment I do not know how to handle this property. Naively, I'd define my own asymmetric loss function similar to this post.
With x and y errors I do it by quadrant instead of just checking positive or negative side. My error ellipse, hence, changes to four connected pieces.
Nevertheless, it is somewhat reasonable. For testing and to show how it works, I made an example with a linear function. I guess the OP can combine the two pieces of code according to his requirements.
In case of a linear fit it looks like this:
import matplotlib.pyplot as plt
import numpy as np
from random import random, seed
from scipy.optimize import minimize,leastsq
#~ seed(7563)
fig1 = plt.figure(1)
ax=fig1.add_subplot(2,1,1)
bx=fig1.add_subplot(2,1,2)
###function to fit, here only linear for testing.
def f(x,m,y0):
y = m * x +y0
return y
###for gaussion distributed errors
def boxmuller(x0,sigma):
u1=random()
u2=random()
ll=np.sqrt(-2*np.log(u1))
z0=ll*np.cos(2*np.pi*u2)
z1=ll*np.cos(2*np.pi*u2)
return sigma*z0+x0, sigma*z1+x0
###for plotting ellipse quadrants
def ell_data(aN,aP,bN,bP,x0=0,y0=0):
tPPList=np.linspace(0, 0.5 * np.pi, 50)
kPP=float(aP)/float(bP)
rPPList=[aP/np.sqrt((np.cos(t))**2+(kPP*np.sin(t))**2) for t in tPPList]
tNPList=np.linspace( 0.5 * np.pi, 1.0 * np.pi, 50)
kNP=float(aN)/float(bP)
rNPList=[aN/np.sqrt((np.cos(t))**2+(kNP*np.sin(t))**2) for t in tNPList]
tNNList=np.linspace( 1.0 * np.pi, 1.5 * np.pi, 50)
kNN=float(aN)/float(bN)
rNNList=[aN/np.sqrt((np.cos(t))**2+(kNN*np.sin(t))**2) for t in tNNList]
tPNList = np.linspace( 1.5 * np.pi, 2.0 * np.pi, 50)
kPN = float(aP)/float(bN)
rPNList = [aP/np.sqrt((np.cos(t))**2+(kPN*np.sin(t))**2) for t in tPNList]
tList = np.concatenate( [ tPPList, tNPList, tNNList, tPNList] )
rList = rPPList + rNPList+ rNNList + rPNList
xyList=np.array([[x0+r*np.cos(t),y0+r*np.sin(t)] for t,r in zip(tList,rList)])
return xyList
###how to rescale the ellipse to touch fitfunction at point (x,y)
def elliptic_rescale_asymmetric(x, m, c, x0, y0, saN, saP, sbN, sbP , getQuadrant=False):
y=f( x, m, c )
###distance to function
r=np.sqrt( ( x - x0 )**2 + ( y - y0 )**2 )
###angle to function
tau=np.arctan2( y - y0, x - x0 )
quadrant=0
if tau >0:
if tau < 0.5 * np.pi: ## PP
kappa=float( saP ) / float( sbP )
quadrant=1
else:
kappa=float( saN ) / float( sbP )
quadrant=2
else:
if tau < -0.5 * np.pi: ## PP
kappa=float( saN ) / float( sbN)
quadrant=3
else:
kappa=float( saP ) / float( sbN )
quadrant=4
new_a=r*np.sqrt( np.cos( tau )**2 + ( kappa * np.sin( tau ) )**2 )
if quadrant == 1 or quadrant == 4:
rel_a=new_a/saP
else:
rel_a=new_a/saN
if getQuadrant:
return rel_a, quadrant, tau
else:
return rel_a
### residual function to calculate chi-square
def residuals(parameters,dataPoint):#data point is (x,y,sxN,sxP,syN,syP)
m, c = parameters
theData = np.array(dataPoint)
bestTList=[]
qqList=[]
weightedDistanceList = []
for i in range(len(dataPoint)):
x, y, sxN, sxP, syN, syP = dataPoint[i][0], dataPoint[i][1], dataPoint[i][2], dataPoint[i][3], dataPoint[i][4], dataPoint[i][5]
### get the point on the graph where it is tangent to an error-ellipse
### i.e. smallest ellipse touching the graph
edFit = minimize( elliptic_rescale_asymmetric, x , args = ( m, c, x, y, sxN, sxP, syN, syP ) )
bestT = edFit['x'][0]
bestTList += [ bestT ]
bestA,qq , tau= elliptic_rescale_asymmetric( bestT, m, c , x, y, aN, aP, bN, bP , True)
qqList += [ qq ]
bestYList=[ f( t, m, c ) for t in bestTList ]
### weighted distance not squared yet, as this is done by scipy.optimize.leastsq or manual chi2 function
for counter in range(len(dataPoint)):
xb=bestTList[counter]
xf=dataPoint[counter][0]
yb=bestYList[counter]
yf=dataPoint[counter][1]
quadrant=qqList[counter]
if quadrant == 1:
sx, sy = sxP, syP
elif quadrant == 2:
sx, sy = sxN, syP
elif quadrant == 3:
sx, sy = sxN, syN
elif quadrant == 4:
sx, sy = sxP, syN
else:
assert 0
weightedDistanceList += [ ( xb - xf ) / sx, ( yb - yf ) / sy ]
return weightedDistanceList
def chi2(params, pnts):
r = np.array( residuals( params, pnts ) )
s = np.fromiter( ( x**2 for x in r), np.float ).sum()
return s
####...to make data with asymmetric error (fixed); for testing only
def noisy_data(xList,m0,y0, sxN,sxP,syN,syP):
yList=[ f(x, m0, y0) for x in xList]
gNList=[boxmuller(0,1)[0] for dummy in range(len(xList))]
xerrList=[]
for x,err in zip(xList,gNList):
if err < 0:
xerrList += [ sxP * err + x ]
else:
xerrList += [ sxN * err + x ]
gNList=[boxmuller(0,1)[0] for dummy in range(len(xList))]
yerrList=[]
for y,err in zip(yList,gNList):
if err < 0:
yerrList += [ syP * err + y ]
else:
yerrList += [ syN * err + y ]
return xerrList, yerrList
###some start values
m0=1.3511
y0=-2.2
aN, aP, bN, bP=.2,.5, 0.9, 1.6
#### some data
xThData=np.linspace(.2,5,15)
yThData=[ f(x, m0, y0) for x in xThData]
xThData0=np.linspace(-1.2,7,3)
yThData0=[ f(x, m0, y0) for x in xThData0]
### some noisy data
xErrList,yErrList = noisy_data(xThData, m0, y0, aN, aP, bN, bP)
###...and the fit
dataToFit=zip(xErrList,yErrList, len(xThData)*[aN], len(xThData)*[aP], len(xThData)*[bN], len(xThData)*[bP])
fitResult = minimize(chi2, (m0,y0) , args=(dataToFit,) )
fittedM, fittedY=fitResult.x
yThDataF=[ f(x, fittedM, fittedY) for x in xThData0]
### plot that
for cx in [ax,bx]:
cx.plot([-2,7], [f(x, m0, y0 ) for x in [-2,7]])
ax.errorbar(xErrList,yErrList, xerr=[ len(xThData)*[aN],len(xThData)*[aP] ], yerr=[ len(xThData)*[bN],len(xThData)*[bP] ], fmt='ro')
for x,y in zip(xErrList,yErrList)[:]:
xEllList,yEllList = zip( *ell_data(aN,aP,bN,bP,x,y) )
ax.plot(xEllList,yEllList ,c='#808080')
### rescaled
### ...as well as a scaled version that touches the original graph. This gives the error shortest distance to that graph
ed_fit = minimize( elliptic_rescale_asymmetric, 0 ,args=(m0, y0, x, y, aN, aP, bN, bP ) )
best_t = ed_fit['x'][0]
best_a,qq , tau= elliptic_rescale_asymmetric( best_t, m0, y0 , x, y, aN, aP, bN, bP , True)
xEllList,yEllList = zip( *ell_data( aN * best_a, aP * best_a, bN * best_a, bP * best_a, x, y) )
ax.plot( xEllList, yEllList, c='#4040a0' )
###plot the fit
bx.plot(xThData0,yThDataF)
bx.errorbar(xErrList,yErrList, xerr=[ len(xThData)*[aN],len(xThData)*[aP] ], yerr=[ len(xThData)*[bN],len(xThData)*[bP] ], fmt='ro')
for x,y in zip(xErrList,yErrList)[:]:
xEllList,yEllList = zip( *ell_data(aN,aP,bN,bP,x,y) )
bx.plot(xEllList,yEllList ,c='#808080')
####rescaled
####...as well as a scaled version that touches the original graph. This gives the error shortest distance to that graph
ed_fit = minimize( elliptic_rescale_asymmetric, 0 ,args=(fittedM, fittedY, x, y, aN, aP, bN, bP ) )
best_t = ed_fit['x'][0]
#~ print best_t
best_a,qq , tau= elliptic_rescale_asymmetric( best_t, fittedM, fittedY , x, y, aN, aP, bN, bP , True)
xEllList,yEllList = zip( *ell_data( aN * best_a, aP * best_a, bN * best_a, bP * best_a, x, y) )
bx.plot( xEllList, yEllList, c='#4040a0' )
plt.show()
which plots
The upper graph shows the original linear function and some data generated from this using asymmetric Gaussian errors. Error bars are plotted, as well as the piecewise error ellipses (grey...and rescaled to touch the linear function, blue). The lower graph additionally shows the fitted function as well as the rescaled piecewise ellipses, touching the fitted function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Piecewise fitting with two linear functions and a breaking point in Python - python

Related

Attempt of fit a Gaussian distribution: Error scipy/optimize/minpack.py", line 765, in curve_fit raise ValueError("`sigma` has incorrect shape.")

Smoothing a curve with many peaks with Gaussian

Odeint and Curve fit of scipy give warning and gives wrong results

curve fitting equations python

Python power law fit with upper limits & asymmetric errors in data using ODR

Categories

Resources