lmfit minimize (or scipy.optimize leastsq) on complex equation/data - python

Edit:
Modeling and fitting with this approach work fine, the data in here is not good.-------------------
I want to do a curve-fitting on a complex dataset. After thorough reading and searching, I found that i can use a couple of methods (e.g. lmfit optimize, scipy leastsq).
But none gives me a good fit at all.
here is the fit equation:
here is the data to be fitted (list of y values):
[(0.00011342104914066835+8.448890220616275e-07j),
(0.00011340386404065371+7.379293582429708e-07j),
(0.0001133540327309949+6.389834505824625e-07j),
(0.00011332170913939336+5.244566142401774e-07j),
(0.00011331311156154074+4.3841061618015007e-07j),
(0.00011329383047059048+3.6163513508002877e-07j),
(0.00011328700094846502+3.0542249453666894e-07j),
(0.00011327650033983806+2.548725558622188e-07j),
(0.00011327702539337786+2.2508174567697671e-07j),
(0.00011327342238146558+1.9607648998100523e-07j),
(0.0001132710747364799+1.721721661949941e-07j),
(0.00011326933241850936+1.5246061350710235e-07j),
(0.00011326798040984542+1.3614817802178457e-07j),
(0.00011326752037650585+1.233483784504962e-07j),
(0.00011326758290166552+1.1258801448459512e-07j),
(0.00011326813100914905+1.0284749122099354e-07j),
(0.0001132684076390416+9.45791423595816e-08j),
(0.00011326982474882009+8.733105218572698e-08j),
(0.00011327158639135678+8.212191452217794e-08j),
(0.00011327366823516856+7.747920115589205e-08j),
(0.00011327694366034208+7.227069986108343e-08j),
(0.00011327915327873038+6.819405851172907e-08j),
(0.00011328181165961218+6.468392148750885e-08j),
(0.00011328531688122571+6.151393311227958e-08j),
(0.00011328857849500441+5.811704586613896e-08j),
(0.00011329241716561626+5.596645863242474e-08j),
(0.0001132970129528527+5.4722461511610696e-08j),
(0.0001133002881788021+5.064523218904898e-08j),
(0.00011330507671740223+5.0307457368330284e-08j),
(0.00011331106068787993+4.7703959367963307e-08j),
(0.00011331577350707601+4.634615394867111e-08j),
(0.00011332064001939156+4.6914747648361504e-08j),
(0.00011333034985824086+4.4992151257444304e-08j),
(0.00011334188526870483+4.363662798446445e-08j),
(0.00011335491299924776+4.364164366097129e-08j),
(0.00011337451201475147+4.262881852644385e-08j),
(0.00011339778209066752+4.275096587356569e-08j),
(0.00011342832992628646+4.4463907608604945e-08j),
(0.00011346526768580432+4.35706649329342e-08j),
(0.00011351108008292451+4.4155812379491554e-08j),
(0.00011356967192325835+4.327004709646922e-08j),
(0.00011364164970635006+4.420660396556604e-08j),
(0.00011373150199883139+4.3672898914161596e-08j),
(0.00011384660942003356+4.326171366194325e-08j),
(0.00011399193321804955+4.1493065523925126e-08j),
(0.00011418043916260295+4.0762418512759096e-08j),
(0.00011443271767970721+3.91359909722939e-08j),
(0.00011479600563688605+3.845666332695652e-08j),
(0.0001153652105925112+3.6224677316584614e-08j),
(0.00011638635682516399+3.386843079212692e-08j),
(0.00011836223959714231+3.6692295450490655e-08j)]
here is the list of x values:
[999.9999960000001,
794.328231,
630.957342,
501.18723099999994,
398.107168,
316.22776400000004,
251.188642,
199.52623,
158.489318,
125.89254,
99.999999,
79.432823,
63.095734,
50.118722999999996,
39.810717,
31.622776,
25.118864000000002,
19.952623000000003,
15.848932000000001,
12.589253999999999,
10.0,
7.943282000000001,
6.309573,
5.011872,
3.981072,
3.1622779999999997,
2.511886,
1.9952619999999999,
1.584893,
1.258925,
1.0,
0.7943279999999999,
0.630957,
0.5011869999999999,
0.398107,
0.316228,
0.251189,
0.199526,
0.15848900000000002,
0.125893,
0.1,
0.079433,
0.063096,
0.050119,
0.039811,
0.031623000000000005,
0.025119,
0.019953,
0.015849000000000002,
0.012589,
0.01]
and here is the code which works but not the way I want:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters
#%% the equation
def ColeCole(params, fr): #fr is x values array and params are the fitting parameters
sig0 = params['sig0']
m = params['m']
tau = params['tau']
c = params['c']
w = fr*2*np.pi
num = 1
denom = 1+(1j*w*tau)**c
sigComplex = sig0*(1.0+(m/(1-m))*(1-num/denom))
return sigComplex
def res(params, fr, data): #calculating reseduals of fit
resedual = ColeCole(params, fr) - data
return resedual.view(np.float)
#%% Adding model parameters and fitting
params = Parameters()
params.add('sig0', value=0.00166)
params.add('m', value=0.19,)
params.add('tau', value=0.05386)
params.add('c', value=0.80)
params['tau'].min = 0 # these conditions must be met but even if I remove them the fit is ugly!!
params['m'].min = 0
out= minimize(res, params , args= (np.array(fr2), np.array(data)))
#%%plotting Imaginary part
fig, ax = plt.subplots()
plotX = fr2
plotY = data.imag
fitplot = ColeCole(out.params, fr2)
ax.semilogx(plotX,plotY,'o',label='imc')
ax.semilogx(plotX,fitplot.imag,label='fit')
#%%plotting real part
fig2, ax2 = plt.subplots()
plotX2 = fr2
plotY2 = data.real
fitplot2 = ColeCole(out.params, fr2)
ax2.semilogx(plotX2,plotY2,'o',label='imc')
ax2.semilogx(plotX2,fitplot2.real,label='fit')
I might be doing it completely wrong, please help me if you know the proper solution to do a curve fitting on complex data.

I would suggest first converting the complex data to numpy arrays and get real, imag pairs separately and then using lmfit Model to model that same sort of data. Perhaps something like this:
cdata = np.array((0.00011342104914066835+8.448890220616275e-07j,
0.00011340386404065371+7.379293582429708e-07j,
0.0001133540327309949+6.389834505824625e-07j,
0.00011332170913939336+5.244566142401774e-07j,
0.00011331311156154074+4.3841061618015007e-07j,
0.00011329383047059048+3.6163513508002877e-07j,
0.00011328700094846502+3.0542249453666894e-07j,
0.00011327650033983806+2.548725558622188e-07j,
0.00011327702539337786+2.2508174567697671e-07j,
0.00011327342238146558+1.9607648998100523e-07j,
0.0001132710747364799+1.721721661949941e-07j,
0.00011326933241850936+1.5246061350710235e-07j,
0.00011326798040984542+1.3614817802178457e-07j,
0.00011326752037650585+1.233483784504962e-07j,
0.00011326758290166552+1.1258801448459512e-07j,
0.00011326813100914905+1.0284749122099354e-07j,
0.0001132684076390416+9.45791423595816e-08j,
0.00011326982474882009+8.733105218572698e-08j,
0.00011327158639135678+8.212191452217794e-08j,
0.00011327366823516856+7.747920115589205e-08j,
0.00011327694366034208+7.227069986108343e-08j,
0.00011327915327873038+6.819405851172907e-08j,
0.00011328181165961218+6.468392148750885e-08j,
0.00011328531688122571+6.151393311227958e-08j,
0.00011328857849500441+5.811704586613896e-08j,
0.00011329241716561626+5.596645863242474e-08j,
0.0001132970129528527+5.4722461511610696e-08j,
0.0001133002881788021+5.064523218904898e-08j,
0.00011330507671740223+5.0307457368330284e-08j,
0.00011331106068787993+4.7703959367963307e-08j,
0.00011331577350707601+4.634615394867111e-08j,
0.00011332064001939156+4.6914747648361504e-08j,
0.00011333034985824086+4.4992151257444304e-08j,
0.00011334188526870483+4.363662798446445e-08j,
0.00011335491299924776+4.364164366097129e-08j,
0.00011337451201475147+4.262881852644385e-08j,
0.00011339778209066752+4.275096587356569e-08j,
0.00011342832992628646+4.4463907608604945e-08j,
0.00011346526768580432+4.35706649329342e-08j,
0.00011351108008292451+4.4155812379491554e-08j,
0.00011356967192325835+4.327004709646922e-08j,
0.00011364164970635006+4.420660396556604e-08j,
0.00011373150199883139+4.3672898914161596e-08j,
0.00011384660942003356+4.326171366194325e-08j,
0.00011399193321804955+4.1493065523925126e-08j,
0.00011418043916260295+4.0762418512759096e-08j,
0.00011443271767970721+3.91359909722939e-08j,
0.00011479600563688605+3.845666332695652e-08j,
0.0001153652105925112+3.6224677316584614e-08j,
0.00011638635682516399+3.386843079212692e-08j,
0.00011836223959714231+3.6692295450490655e-08j))
fr = np.array((999.9999960000001, 794.328231, 630.957342,
501.18723099999994, 398.107168, 316.22776400000004,
251.188642, 199.52623, 158.489318, 125.89254, 99.999999,
79.432823, 63.095734, 50.118722999999996, 39.810717,
31.622776, 25.118864000000002, 19.952623000000003,
15.848932000000001, 12.589253999999999, 10.0,
7.943282000000001, 6.309573, 5.011872, 3.981072,
3.1622779999999997, 2.511886, 1.9952619999999999, 1.584893,
1.258925, 1.0, 0.7943279999999999, 0.630957,
0.5011869999999999, 0.398107, 0.316228, 0.251189, 0.199526,
0.15848900000000002, 0.125893, 0.1, 0.079433, 0.063096,
0.050119, 0.039811, 0.031623000000000005, 0.025119, 0.019953,
0.015849000000000002, 0.012589, 0.01))
data = np.concatenate((cdata.real, cdata.imag))
# model function for lmfit
def colecole_function(x, sig0, m, tau, c):
w = x*2*np.pi
denom = 1+(1j*w*tau)**c
sig = sig0*(1.0+(m/(1.0-m))*(1-1.0/denom))
return np.concatenate((sig.real, sig.imag))
mod = Model(colecole_function)
params = mod.make_params(sig0=0.002, m=-0.19, tau=0.05, c=0.8)
params['tau'].min = 0
result = mod.fit(data, params, x=fr)
print(result.fit_report())
You would then want to plot the results like
nf = len(fr)
plt.plot(fr, data[:nf], label='data(real)')
plt.plot(fr, result.best_fit[:nf], label='fit(real)')
and similarly
plt.plot(fr, data[nf:], label='data(imag)')
plt.plot(fr, result.best_fit[nf:], label='fit(imag)')
Note that I think you're going to want to allow m to be negative (or maybe I misuderstand your model). I did not work carefully on getting a great fit, but I think this should get you started.

Related

lmfit - SineModel+ConstantModel appears inaccurate fit

I'm trying to fit a simple sine function to some experimental data using lmfit and I find that the SineModel with a constant model offset returns, what looks like an inaccurate fit to the data (to me). I suppose it may be helpful to highlight that I am most interested in the frequency of the peaks (and I appreciate that I can simply use a scipy.find_peaks() but would prefer to show a fit to the data).
I use the function below for lmfit model:
def Sine(self, x_axis, y_axis):
sine = SineModel()
const = ConstantModel()
x_fit = np.linspace(min(x_axis), max(x_axis), x_axis.size)
guess_sine = sine.guess(y_axis, x=x_fit)
pars = sine.guess(y_axis, x=x_fit)
sine_offset = SineModel() + ConstantModel()
pars.add('c', value=1, vary=True)
result = sine_offset.fit(y_axis, pars, x=x_fit)
return result
Sine function output (graph and report results) are provided here:
SineModel+ConstModel
I then tried to define my own function, defining my own parameters and evaluating in the same lmfit method, providing sensible "guess" initial values etc.
def Sine_User2(self, x_axis, y_axis):
def sine_func(x, amplitude, freq, shift, c):
return amplitude * np.sin(freq * x + shift) + c
sinemodel = Model(sine_func)
# Take a FFT of the data to provide a guess starting location for the curve fitting
x = np.array(x_axis)
y = np.array(y_axis)
ff = np.fft.fftfreq(len(x), (x[1] - x[0])) # assume uniform spacing
Fyy = abs(np.fft.fft(y))
guess_freq = abs(ff[np.argmax(Fyy[1:]) + 1]) * 2. * np.pi
guess_amp = np.std(y) * 2.**0.5
guess_offset = np.mean(y)
x_fit = np.linspace(min(x_axis), max(x_axis), x_axis.size)
params = sinemodel.make_params(amplitude = guess_amp, freq = guess_freq, shift = 0, c = guess_offset )
result = sinemodel.fit(y_axis, params, x = x_fit)
return result
The output of the user defined model appears to provide a much closer fit to the data, however, the report does not provide uncertainties citing a warning that the "Uncertainties could not be estimated":
SineUser2 function outputs (graph and report results) are provided here: User Defined Model
I then tried to include min/max values to the parameters by replacing the "sinmodel.make_params" line with:
params = Parameters()
params.add('amplitude', value=guess_amp, min = 0)
params.add('freq', value=guess_freq, min=0)
params.add('shift', value=0, min=-2*np.pi, max=2*np.pi)
params.add('c', value=guess_offset)
But the results resort back to the SineModel+ConstModel results seen in the first linked graph/report results. Therefore it must be something to do with the way I'm setting initial values.
The fit using the "SineUser2" function appears to be better. Is there a way to improve the fit for "Sine" function in the first block of code.
Why are the uncertainties not calculated in the second function "Sine_User2"?
Data (.csv):
Wavelength (nm),Power (dBm),,,,,
1549.9,-13.76008731,,,,,
1549.905,-13.69423162,,,,,
1549.91,-12.59004339,,,,,
1549.915,-11.31061848,,,,,
1549.92,-10.58731809,,,,,
1549.925,-10.19024329,,,,,
1549.93,-10.07301418,,,,,
1549.935,-10.19513172,,,,,
1549.94,-10.45582159,,,,,
1549.945,-11.15984161,,,,,
1549.95,-12.15876596,,,,,
1549.955,-13.44674933,,,,,
1549.96,-13.56388277,,,,,
1549.965,-12.2513065,,,,,
1549.97,-11.08699015,,,,,
1549.975,-10.43829185,,,,,
1549.98,-10.12861158,,,,,
1549.985,-10.0962929,,,,,
1549.99,-10.1852173,,,,,
1549.995,-10.55438183,,,,,
1550,-11.19555345,,,,,
1550.005,-12.28715299,,,,,
1550.01,-13.5153863,,,,,
1550.015,-13.47019261,,,,,
1550.02,-12.12394732,,,,,
1550.025,-11.01946751,,,,,
1550.03,-10.42138778,,,,,
1550.035,-10.14438079,,,,,
1550.04,-10.05681218,,,,,
1550.045,-10.17148605,,,,,
1550.05,-10.56046759,,,,,
1550.055,-11.11621478,,,,,
1550.06,-12.19930263,,,,,
1550.065,-13.48428349,,,,,
1550.07,-13.43424913,,,,,
1550.075,-12.08019952,,,,,
1550.08,-11.08731704,,,,,
1550.085,-10.45730899,,,,,
1550.09,-10.11278169,,,,,
1550.095,-10.00651194,,,,,
,,,,,,

Problem with curve_fit using a trig function of numerical integration, spicy, Python 3

Attempting to fit a model to observational data. The code uses data in the range of 0.5 to 1.0 for the independent variable with scipy curve_fit and numerical integration. The function to be integrated also includes an unknown parameter, then subjecting the integrand to evaluation using the trig function sinh(integrand).
After applying curve_fit I get an error message of "loop of ufunc does not support argument 0 of type function which has no callable sinh method". Have I hit a dead end with Python 3? Hope not.
This evaluation code is
#O_m, Hu are unknown parameters to be estimated with model, data
def integr(x,O_m):
return intg.quad(lambda x: 1/x(np.sqrt((O_m/x) + (1-O_m))) , x, 1, args=(0.02))[0]
O_m = 0.02 #Guess for value of O_m, which shall lie between 0.01 and 1.0
def funcX(x,O_m):
result = np.asarray([integr(xx,O_m) for xx in x]) * np.sqrt(abs(1-O_m))
return result
litsped=299793 #the constant speed of light in a vacuum (m/s)
def funcY(x,Hu,O_m):
return (litsped/(x * Hu * np.sqrt(abs(1-O_m))))*np.sinh(funcX)
init_guess = [65,0.02]
bnds=([50,0.001],[80,1.0])
params, pcov = curve_fit(funcY, xdata, ydata, p0 = init_guess, bounds = bnds, sigma = error, absolute_sigma = True)
ans_Hu, ans_O_m = params
perr = np.sqrt(np.diag(pcov))
##################################
Complete code below - as far as I have gotten with this curve_fit.
import numpy as np
import csv
import matplotlib.pylab as plt
from scipy.optimize import curve_fit
from scipy import integrate as intg
with open("Riess_1998_D_L.csv",'r') as i: #SNe Ia data file
rawdata = list(csv.reader(i,delimiter=",")) #make a data list
exmdata = np.array(rawdata[1:],dtype=float) #convert to data array
xdata = exmdata[:,1]
ydata = exmdata[:,2]
error = exmdata[:,3]
#plot of imported data
plt.title("Observed SNe Ia Data")
plt.figure(1,dpi=120)
plt.xlabel("Expansion factor")
plt.ylabel("Distance (Mpc)")
plt.plot(xdata,ydata,label = "Observed SNe Ia data")
plt.xlim(0.5,1)
plt.ylim(0.0,9000)
plt.xscale("linear")
plt.yscale("linear")
plt.errorbar(xdata, ydata, yerr=error, fmt='.k', capsize = 4)
# O_m and Hu are the unknown parameters which shall be estimated using the model and observational data
def integr(x,O_m):
return intg.quad(lambda x: 1/x(np.sqrt((O_m/x) + (1-O_m))) , x, 1, args=(0.02))[0]
O_m = 0.02 # Guess for value of O_m, which are between 0.01 and 1.0
def funcX(x,O_m):
result = np.asarray([integr(xx,O_m) for xx in x])* np.sqrt(abs(1-O_m))
return result
litsped=299793 #the constant speed of light in a vacuum (m/s)
def funcY(x,Hu,O_m):
return (litsped/(x*Hu*np.sqrt(abs(1-O_m))))*np.sinh(funcX)
init_guess = [65,0.02]
bnds=([50,0.001],[80,1.0])
params, pcov = curve_fit(funcY, xdata, ydata, p0 = init_guess, bounds = bnds, sigma = error, absolute_sigma = True)
ans_b, ans_c = params
perr = np.sqrt(np.diag(pcov))
TotalInt = intg.trapz(ydata,xdata) #Compute numerical integral to check data import
print("The total area is: ", TotalInt)
########################
Some more information would be useful, e.g. what is your xdata/ydata? Could you rewrite your code as a minimal reproducable example?
P.S. you can format things on stackoverflow as code by writing ``` before and after the code for better readability ;)
Your problem has nothing to do with the fitting procedure. It was a bit hard for me to understand the code. IF I understood correctly, I recommend sth like this:
import numpy as np
import csv
import matplotlib.pylab as plt
from scipy.optimize import curve_fit
from scipy import integrate as intg
exmdata = np.array(np.random.random((10,4)),dtype=float) #convert to data array
xdata = exmdata[:,1]
ydata = exmdata[:,2]
error = exmdata[:,3]
#plot of imported data
plt.title("Observed SNe Ia Data")
plt.figure(1,dpi=120)
plt.xlabel("Expansion factor")
plt.ylabel("Distance (Mpc)")
plt.plot(xdata,ydata,label = "Observed SNe Ia data")
plt.xlim(0.5,1)
plt.ylim(0.0,9000)
plt.xscale("linear")
plt.yscale("linear")
plt.errorbar(xdata, ydata, yerr=error, fmt='.k', capsize = 4)
# O_m and Hu are the unknown parameters which shall be estimated using the model and observational data
def integr(x,O_m):
return 5*x+3*O_m#Some analytical form
O_m = 0.02 # Guess for value of O_m, which are between 0.01 and 1.0
def funcX(x,O_m):
result = integr(x,O_m)* np.sqrt(abs(1-O_m))
return result
litsped=299793 #the constant speed of light in a vacuum (m/s)
def funcY(x,Hu,O_m):
return (litsped/(x*Hu*np.sqrt(abs(1-O_m))))*np.sinh(funcX(x,O_m))
init_guess = np.array([65,0.02])
bnds=([50,0.001],[80,1.0])
params, pcov = curve_fit(funcY, xdata, ydata, p0 = init_guess, bounds = bnds, sigma = error, absolute_sigma = True)
Where you still need to put in an analytical form of the integral in intgr and replace my random arrays with your CSV file data. The error you referred to earlier was indeed due to you passing the whole function instead of the function evaluated at some point. Please first try to implement these steps and make sure that you can call your three functions independently without errors. It is quite hard to search for bugs, if you tackle the whole program immediately. Try to make the individual parts work first ;). If you still need help after you have implemented these changes, say for the actual fitting procedure, just ask me again ;).

How to get the confidence interval of a Weibull distribution using Python?

I want to perform a probability Weibull fit with 0.95% confidence bounds by means of Python. As test data, I use fail cycles of a measurement which are plotted against the reliability R(t).
So far, I found a way to perform the Weibull fit, however, I still do not manage to get the confidence bounds. The Weibull plot with the same test data set was already performed with origin, therfore I know which shape I would "expect" for the confidence interval. But I do not understand how to get there.
I found information about Weibull confidence intervals on reliawiki(cf. Bounds on Reliability based on Fisher Matrix confidence bounds) and used the description there to calculate the variance and the upper and lower confidence bound (R_U and R_L).
Here is a working code example for my Weibull fit and my confidence bounds with the test data set based on the discription of reliawiki (cf. Bounds on Reliability). For the fit, I used a OLS model fit.
import os, sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from scipy.optimize import curve_fit
import math
import statsmodels.api as sm
def weibull_ticks(y, pos):
return "{:.0f}%".format(100 * (1 - np.exp(-np.exp(y))))
def loglog(x):
return np.log(-np.log(1 - np.asarray(x)))
class weibull_example(object):
def __init__(self, dat):
self.fits = {}
dat.index = np.arange(1, len(dat) + 1)
dat.sort_values('data', inplace=True)
#define yaxis-values
dat['percentile'] = dat.index*1/len(dat)
self.data = dat
self.fit()
self.plot_data()
def fit(self):
#fit the data points with a the OLS model
self.data=self.data[:-1]
x0 = np.log(self.data.dropna()['data'].values)
Y = loglog(self.data.dropna()['percentile'])
Yx = sm.add_constant(Y)
model = sm.OLS(x0, Yx)
results = model.fit()
yy = loglog(np.linspace(.001, .999, 100))
YY = sm.add_constant(yy)
XX = np.exp(results.predict(YY))
self.eta = np.exp(results.params[0])
self.beta = 1 / results.params[1]
self.fits['syx'] = {'results': results, 'model': model,
'line': np.row_stack([XX, yy]),
'beta': self.beta,
'eta': self.eta}
cov = results.cov_params()
#get variance and covariance
self.beta_var = cov[1, 1]
self.eta_var = cov[0, 0]
self.cov = cov[1, 0]
def plot_data(self, fit='yx'):
dat = self.data
#plot data points
plt.semilogx(dat['data'], loglog(dat['percentile']), 'o')
fit = 's' + fit
self.plot_fit(fit)
ax = plt.gca()
formatter = mpl.ticker.FuncFormatter(weibull_ticks)
ax.yaxis.set_major_formatter(formatter)
yt_F = np.array([0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 0.95, 0.99])
yt_lnF = loglog(yt_F)
plt.yticks(yt_lnF)
plt.ylim(loglog([.01, .99]))
def plot_fit(self, fit='syx'):
dat = self.fits[fit]['line']
plt.plot(dat[0], dat[1])
#calculate variance to get confidence bound
def variance(x):
return (math.log(x) - math.log(self.eta)) ** 2 * self.beta_var + \
(self.beta/self.eta) ** 2 * self.eta_var - \
2 * (math.log(x) - math.log(self.eta)) * (-self.beta/self.eta) * self.cov
#calculate confidence bounds
def confidence_upper(x):
return 1-np.exp(-np.exp(self.beta*(math.log(x)-math.log(self.eta)) - 0.95*np.sqrt(variance(x))))
def confidence_lower(x):
return 1-np.exp(-np.exp(self.beta*(math.log(x)-math.log(self.eta)) + 0.95*np.sqrt(variance(x))))
yvals_1 = list(map(confidence_upper, dat[0]))
yvals_2 = list(map(confidence_lower, dat[0]))
#plot confidence bounds
plt.semilogx(dat[0], loglog(yvals_1), linestyle="solid", color="black", linewidth=2,
label="fit_u_1", alpha=0.8)
plt.semilogx(dat[0], loglog(yvals_2), linestyle="solid", color="green", linewidth=2,
label="fit_u_1", alpha=0.8)
def main():
fig, ax1 = plt.subplots()
ax1.set_xlabel("$Cycles\ til\ Failure$")
ax1.set_ylabel("$Weibull\ Percentile$")
#my data points
data = pd.DataFrame({'data': [1556, 2595, 11531, 38079, 46046, 57357]})
weibull_example(data)
plt.savefig("Weibull.png")
plt.close(fig)
if __name__ == "__main__":
main()
The confidence bounds in my plot look not like I expected. I tried a lot of different 'variances', just to understand the function and to check, if the problem is just a typing error. Meanwhile, I am convinced that the problem is more general and that I understood something false from the description on reliawiki. Unfortunately, I really do not get what's the problem and I do not know anyone else I can ask. In the internet and on different forums, I did not find an appropriate answer.
That's why I decided to ask this question here. It's the first time I ask a question in a forum. Therefore, I hope that I explained everything sufficiently and that the code example is useful.
Thank you very much :)
Apologies for the very late answer, but I'll provide it for any future readers.
Rather than try implementing this yourself, you may want to consider using a package designed for exactly this called reliability.
Here is the example for your use case.
Remember to upvote this answer if it helps you :)

Fitting a quadratic function in python without numpy polyfit

I am trying to fit a quadratic function to some data, and I'm trying to do this without using numpy's polyfit function.
Mathematically I tried to follow this website https://neutrium.net/mathematics/least-squares-fitting-of-a-polynomial/ but somehow I don't think that I'm doing it right. If anyone could assist me that would be great, or If you could suggest another way to do it that would also be awesome.
What I've tried so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
ones = np.ones(3)
A = np.array( ((0,1),(1,1),(2,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
b = np.array( (1,2,0), ndmin=2 ).T
b = b.reshape(3)
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature)), axis = 1)
featuresc = features.copy()
print(features)
m_det = np.linalg.det(features)
print(m_det)
determinants = []
for i in range(3):
featuresc.T[i] = b
print(featuresc)
det = np.linalg.det(featuresc)
determinants.append(det)
print(det)
featuresc = features.copy()
determinants = determinants / m_det
print(determinants)
plt.scatter(A.T[0],b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
p2 = np.polyfit(A.T[0],b,2)
plt.plot(u, np.polyval(p2,u), 'b--')
plt.show()
As you can see my curve doesn't compare well to nnumpy's polyfit curve.
Update:
I went through my code and removed all the stupid mistakes and now it works, when I try to fit it over 3 points, but I have no idea how to fit over more than three points.
This is the new code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
ones = np.ones(3)
A = np.array( ((0,1),(1,1),(2,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
b = np.array( (1,2,0), ndmin=2 ).T
b = b.reshape(3)
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature)), axis = 1)
featuresc = features.copy()
print(features)
m_det = np.linalg.det(features)
print(m_det)
determinants = []
for i in range(3):
featuresc.T[i] = b
print(featuresc)
det = np.linalg.det(featuresc)
determinants.append(det)
print(det)
featuresc = features.copy()
determinants = determinants / m_det
print(determinants)
plt.scatter(A.T[0],b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
p2 = np.polyfit(A.T[0],b,2)
plt.plot(u, np.polyval(p2,u), 'r--')
plt.show()
Instead using Cramer's Rule, actually solve the system using least squares. Remember that Cramer's Rule will only work if the total number of points you have equals the desired order of polynomial plus 1.
If you don't have this, then Cramer's Rule will not work as you're trying to find an exact solution to the problem. If you have more points, the method is unsuitable as we will create an overdetermined system of equations.
To adapt this to more points, numpy.linalg.lstsq would be a better fit as it solves the solution to the Ax = b by computing the vector x that minimizes the Euclidean norm using the matrix A. Therefore, remove the y values from the last column of the features matrix and solve for the coefficients and use numpy.linalg.lstsq to solve for the coefficients:
import numpy as np
import matplotlib.pyplot as plt
ones = np.ones(4)
xfeature = np.asarray([0,1,2,3])
squaredfeature = xfeature ** 2
b = np.asarray([1,2,0,3])
features = np.concatenate((np.vstack(ones),np.vstack(xfeature),np.vstack(squaredfeature)), axis = 1) # Change - remove the y values
determinants = np.linalg.lstsq(features, b)[0] # Change - use least squares
plt.scatter(xfeature,b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
plt.show()
I get this plot now, which matches what the dashed curve is in your graph, also matching what numpy.polyfit gives you:

python: setting width to fit parameters

I have been trying to fit a data file with unknown fit parameter "ga" and "MA". What I want to do is set a range withing which the value of "MA" will reside and fit the data, for example I want the fitted value of MA in the range [0.5,0.8] and want to keep "ga" as an arbitrary fit paramter. I am not sure how to do it. I am copying the python code here:
#!/usr/bin/env python3
# to the data in "data_file", each line of which contains the data for one point, x_i, y_i, sigma_i.
import numpy as np
from pylab import *
from scipy.optimize import curve_fit
from scipy.stats import chi2
fname = sys.argv[1] if len(sys.argv) > 1000 else 'data.txt'
x, y, err = np.loadtxt(fname, unpack = True)
n = len(x)
p0 = [-1,1]
f = lambda x, ga, MA: ga/((1+x/(MA*MA))*(1+x/(MA*MA)))
p, covm = curve_fit(f, x, y, p0, err)
ga, MA = p
chisq = sum(((f(x, ga, MA) -y)/err)**2)
ndf = n -len(p)
Q = 1. -chi2.cdf(chisq, ndf)
chisq = chisq / ndf
gaerr, MAerr = sqrt(diag(covm)/chisq) # correct the error bars
print 'ga = %10.4f +/- %7.4f' % (ga, gaerr)
print 'MA = %10.4f +/- %7.4f' % (MA, MAerr)
print 'chi squared / NDF = %7.4lf' % chisq
print (covm)
You might consider using lmfit (https://lmfit.github.io/lmfit-py) for this problem. Lmfit provides a higher-level interface to optimization and curve fitting, including treating Parameters as python objects that have bounds.
Your script might be translated to use lmfit as
import numpy as np
from lmfit import Model
fname = sys.argv[1] if len(sys.argv) > 1000 else 'data.txt'
x, y, err = np.loadtxt(fname, unpack = True)
# define the fitting model function, similar to your `f`:
def f(x, ga, ma):
return ga/((1+x/(ma*ma))*(1+x/(ma*ma)))
# turn this model function into a Model:
mymodel = Model(f)
# now create parameters for this model, giving initial values
# note that the parameters will be *named* from the arguments of your model function:
params = mymodel.make_params(ga=-1, ma=1)
# params is now an orderded dict with parameter names ('ga', 'ma') as keys.
# you can set min/max values for any parameter:
params['ma'].min = 0.5
params['ma'].max = 2.0
# you can fix the value to not be varied in the fit:
# params['ga'].vary = False
# you can also constrain it to be a simple mathematical expression of other parameters
# now do the fit to your `y` data with `params` and your `x` data
# note that you pass in weights for the residual, so 1/err:
result = mymodel.fit(y, params, x=x, weights=1./err)
# print out fit report with fit statistics and best fit values
# and uncertainties and correlations for variables:
print(result.fit_report())
You can get access to the best-fit parameters as result.params; the initial params will not be changed by the fit. There are also routines to plot the best-fit result and/or residual.

Categories

Resources