I am a little out of my depth in terms of the math involved in my problem, so I apologise for any incorrect nomenclature.
I was looking at using the scipy function leastsq, but am not sure if it is the correct function.
I have the following equation:
eq = lambda PLP,p0,l0,kd : 0.5*(-1-((p0+l0)/kd) + np.sqrt(4*(l0/kd)+(((l0-p0)/kd)-1)**2))
I have data (8 sets) for all the terms except for kd (PLP,p0,l0). I need to find the value of kd by non-linear regression of the above equation.
From the examples I have read, leastsq seems to not allow for the inputting of the data, to get the output I need.
Thank you for your help
This is a bare-bones example of how to use scipy.optimize.leastsq:
import numpy as np
import scipy.optimize as optimize
import matplotlib.pylab as plt
def func(kd,p0,l0):
return 0.5*(-1-((p0+l0)/kd) + np.sqrt(4*(l0/kd)+(((l0-p0)/kd)-1)**2))
The sum of the squares of the residuals is the function of kd we're trying to minimize:
def residuals(kd,p0,l0,PLP):
return PLP - func(kd,p0,l0)
Here I generate some random data. You'd want to load your real data here instead.
N=1000
kd_guess=3.5 # <-- You have to supply a guess for kd
p0 = np.linspace(0,10,N)
l0 = np.linspace(0,10,N)
PLP = func(kd_guess,p0,l0)+(np.random.random(N)-0.5)*0.1
kd,cov,infodict,mesg,ier = optimize.leastsq(
residuals,kd_guess,args=(p0,l0,PLP),full_output=True,warning=True)
print(kd)
yields something like
3.49914274899
This is the best fit value for kd found by optimize.leastsq.
Here we generate the value of PLP using the value for kd we just found:
PLP_fit=func(kd,p0,l0)
Below is a plot of PLP versus p0. The blue line is from data, the red line is the best fit curve.
plt.plot(p0,PLP,'-b',p0,PLP_fit,'-r')
plt.show()
Another option is to use lmfit.
They provide a great example to get you started:.
#!/usr/bin/env python
#<examples/doc_basic.py>
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit
import numpy as np
# create data to be fitted
x = np.linspace(0, 15, 301)
data = (5. * np.sin(2 * x - 0.1) * np.exp(-x*x*0.025) +
np.random.normal(size=len(x), scale=0.2) )
# define objective function: returns the array to be minimized
def fcn2min(params, x, data):
""" model decaying sine wave, subtract data"""
amp = params['amp']
shift = params['shift']
omega = params['omega']
decay = params['decay']
model = amp * np.sin(x * omega + shift) * np.exp(-x*x*decay)
return model - data
# create a set of Parameters
params = Parameters()
params.add('amp', value= 10, min=0)
params.add('decay', value= 0.1)
params.add('shift', value= 0.0, min=-np.pi/2., max=np.pi/2)
params.add('omega', value= 3.0)
# do fit, here with leastsq model
minner = Minimizer(fcn2min, params, fcn_args=(x, data))
kws = {'options': {'maxiter':10}}
result = minner.minimize()
# calculate final result
final = data + result.residual
# write error report
report_fit(result)
# try to plot results
try:
import pylab
pylab.plot(x, data, 'k+')
pylab.plot(x, final, 'r')
pylab.show()
except:
pass
#<end of examples/doc_basic.py>
Related
To find the co variance matrix of a fitted model in python (equivalent to vcov() (R fucntion) in python)
lmfit <- lm(formula = Y ~ X, data=Data_df)
lmpred <- predict(lmfit, newdata=Data_df, se.fit=TRUE, interval = "prediction")
std_er <- sqrt(((X0) %*% vcov(lmfit)) %*% t(X0))
trying to convert the above code in python. For which i need to find the co variance matrix of the fitted model ie, vcov.
I wont be able to use np.cov() as im trying to find the co variance matrix of the model.
i have already used statsmodels.regression.linear_model.OLSResults.cov_params(), But i m not getting the same values as in R.
The scipy ODR code can independently calculate the parameter covariance matrix, here is an example extracted from the source code of my zunzun.com online curve fitter:
from scipy.optimize import curve_fit
import numpy as np
import scipy.odr
import scipy.stats
x = np.array([5.357, 5.797, 5.936, 6.161, 6.697, 6.731, 6.775, 8.442, 9.861])
y = np.array([0.376, 0.874, 1.049, 1.327, 2.054, 2.077, 2.138, 4.744, 7.104])
def f(x,b0,b1):
return b0 + (b1 * x)
def f_wrapper_for_odr(beta, x): # parameter order for odr
return f(x, *beta)
parameters, cov= curve_fit(f, x, y)
model = scipy.odr.odrpack.Model(f_wrapper_for_odr)
data = scipy.odr.odrpack.Data(x,y)
myodr = scipy.odr.odrpack.ODR(data, model, beta0=parameters, maxit=0)
myodr.set_job(fit_type=2)
parameterStatistics = myodr.run()
df_e = len(x) - len(parameters) # degrees of freedom, error
cov_beta = parameterStatistics.cov_beta # parameter covariance matrix from ODR
sd_beta = parameterStatistics.sd_beta * parameterStatistics.sd_beta
ci = []
t_df = scipy.stats.t.ppf(0.975, df_e)
ci = []
for i in range(len(parameters)):
ci.append([parameters[i] - t_df * parameterStatistics.sd_beta[i], parameters[i] + t_df * parameterStatistics.sd_beta[i]])
tstat_beta = parameters / parameterStatistics.sd_beta # coeff t-statistics
pstat_beta = (1.0 - scipy.stats.t.cdf(np.abs(tstat_beta), df_e)) * 2.0 # coef. p-values
for i in range(len(parameters)):
print('parameter:', parameters[i])
print(' conf interval:', ci[i][0], ci[i][1])
print(' tstat:', tstat_beta[i])
print(' pstat:', pstat_beta[i])
print()
print('Covariance matrix:')
print(cov_beta)
Please provide specific details on what you're using.
Assuming you're using numpy arrays for your data, there's numpy.cov estimator
This works for when vcov() returns a 1x1 dataframe. I solved my function in Python using:
fit = scipy.optimize.minimize(fun, x0=x, method = 'L-BFGS-B')
Then, I specified the hessian inverse return value as follows:
vcov = fit['hess_inv'].todense().ravel()
This gave me the same result ~(±1e-3) as stats4::vcov() in R for scenarios where vcov() returns a 1x1 data frame.
I am trying to fit below mentioned two equations using python leastsq method but am not sure whether this is the right approach. First equation has incomplete gamma function in it while the second one is slightly complex, and along with an exponential function contains a term which is obtained by using a separate fitting formula.
J_mg = T_incomplete(hw/T_mag)
J_nmg = e^(-hw/T)*g(w,T)
Here g is a function of w and T and is calucated using a given fitting formula.
I am following the steps outlined in this question.
Here is what I have done
import numpy as np
from scipy.optimize import leastsq
from scipy.special import gammaincc
from scipy.special import gamma
from matplotlib.pyplot import plot
# generating data
NPTS = 10
hw = np.linspace(0.5, 10, NPTS)
j1 = np.linspace(0.001,10,NPTS)
j2 = np.linspace(0.003,10,NPTS)
T_mag = np.linspace(0.3,0.5,NPTS)
#defining functions
def calc_gaunt_factor(hw,T):
fitting_coeff= np.loadtxt('fitting_coeff.txt', skiprows=1)
#T is in KeV
#K_b = 8.6173303(50)e−5 ev/K
g = 0
gamma = 0.0136/T
theta= hw/T
A= (np.log10(gamma**2) +0.5)*0.4
B= (np.log10(theta)+1.5)*0.4
for i in range(11):
for j in range(11):
g_ij = fitting_coeff[i][j]*(A**i)*(B**j)
g = g_ij+g
return g
def j_w_mag(hw,T_mag):
order= 0.001
return np.sqrt(1/T_mag)*gamma(order)*gammaincc(order,hw/T_mag)
def j_w_nonmag(hw,T):
gamma = 0.0136/T
theta= hw/T
return np.sqrt(1/T)*np.exp((-hw)/T)*calc_gaunt_factor(hw,T)
def residual_func(T,T_mag,hw,j1,j2):
err_unmag = np.nan_to_num(j1 - j_w_nonmag(hw,T))
err_mag = np.nan_to_num(j2 - j_w_mag(hw,T_mag))
err= np.concatenate((err_unmag, err_mag))
return err
par_init = np.array([.35])
best, cov, info, message, ler = leastsq(residual_func,par_init,args=(T_mag,hw,j1,j2),full_output=True)
print("Best-Fit Parameters:")
print("T=%s" %(best[0]))
I am getting weird value for my fitting parameter, T. Is this the right approach? Thanks.
I have a some data and want to fit a given psychometric function p.
I'm intereseted in the fit parameters and the errors as well. With the 'classical' method using the curve_fit function from the scipy package it's easy to get the parameters of p and the errors. However I want to do the same using a maximum likelihood estimation (MLE). From the output and the figure you can see that both methods offer slight different parameters. Implementing the MLE is not the problem but I don't know how to get the errors using this method. Is there an easy way to get them? My likelihood function L is:
I was not able to adapt the code described here http://rlhick.people.wm.edu/posts/estimating-custom-mle.html but this is probably a solution. How can I implement this? Or this there any other way?
A similar function is fitted here using scipy stats models: https://stats.stackexchange.com/questions/66199/maximum-likelihood-curve-model-fitting-in-python. However the errors of the parameters are not calculated neither.
The negative log-likelihood function is correct, since it offers the right parameters, but I was wondering if this function depends on y-data? The negative log likelihood function l is obviously l = -ln(L).
Here is my code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
## libary
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import minimize
def p(x,x50,s50):
"""return y value of psychometric function p"""
return 1./(1+np.exp(4.*s50*(x50-x)))
def initialparams(x,y):
"""return initial fit parameters for function p with given dataset"""
midpoint = np.mean(x)
slope = (np.max(y)-np.min(y))/(np.max(x)-np.min(x))
return [midpoint, slope]
def cfit_error(pcov):
"""return errors of fir from covariance matrix"""
return np.sqrt(np.diag(pcov))
def neg_loglike(params):
"""analytical negative log likelihood function. This function is dependend on the dataset (x and y) and the two parameters x50 and s50."""
x50 = params[0]
s50 = params[1]
i = len(xdata)
prod = 1.
for i in range(i):
#print prod
prod *= p(xdata[i],x50,s50)**(ydata[i]*5) * (1-p(xdata[i],x50,s50))**((1.-ydata[i])*5)
return -np.log(prod)
xdata = [0.,-7.5,-9.,-13.500001,-12.436171,-16.208617,-13.533123,-12.998025,-13.377527,-12.570075,-13.320075,-13.070075,-11.820075,-12.070075,-12.820075,-13.070075,-12.320075,-12.570075,-11.320075,-12.070075]
ydata = [1.,0.6,0.8,0.4,1.,0.,0.4,0.6,0.2,0.8,0.4,0.,0.6,0.8,0.6,0.2,0.6,0.,0.8,0.6]
intparams = initialparams(xdata, ydata)## guess some initial parameters
## normal curve fit using least squares algorithm
popt, pcov = curve_fit(p, xdata, ydata, p0=intparams)
print('scipy.optimize.curve_fit:')
print('x50 = {:f} +- {:f}'.format(popt[0], cfit_error(pcov)[0]))
print('s50 = {:f} +- {:f}\n'.format(popt[1], cfit_error(pcov)[1]))
## fitting using maximum likelihood estimation
results = minimize(neg_loglike, initialparams(xdata,ydata), method='Nelder-Mead')
print('MLE with self defined likelihood-function:')
print('x50 = {:f}'.format(results.x[0]))
print('s50 = {:f}'.format(results.x[1]))
#print results
## ploting the data and results
xfit = np.arange(-20,1,0.1)
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(xdata, ydata, 'xb', label='measured data')
ax.plot(xfit, p(xfit, *popt), '-r', label='curve fit')
ax.plot(xfit, p(xfit, *results.x), '-g', label='MLE')
plt.legend()
plt.show()
The output is:
scipy.optimize.curve_fit:
x50 = -12.681586 +- 0.252561
s50 = 0.264371 +- 0.117911
MLE with self defined likelihood-function:
x50 = -12.406544
s50 = 0.107389
Both fits and measured data can be seen here:
My Python version is 2.7 on Debian Stretch. Thank you for your help.
Finally the method described by Rob Hicks (http://rlhick.people.wm.edu/posts/estimating-custom-mle.html) worked out. After installing numdifftools, I could calculate the errors of estimated parameters from the hessian matrix.
Installing numdifftools on Linux with su rights:
apt-get install python-pip
pip install numdifftools
An complete code example of my programm from above is here:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
## libary
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import minimize
import numdifftools as ndt
def p(x,x50,s50):
"""return y value of psychometric function p"""
return 1./(1+np.exp(4.*s50*(x50-x)))
def initialparams(x,y):
"""return initial fit parameters for function p with given dataset"""
midpoint = np.mean(x)
slope = (np.max(y)-np.min(y))/(np.max(x)-np.min(x))
return [midpoint, slope]
def cfit_error(pcov):
"""return errors of fir from covariance matrix"""
return np.sqrt(np.diag(pcov))
def neg_loglike(params):
"""analytical negative log likelihood function. This function is dependend on the dataset (x and y) and the two parameters x50 and s50."""
x50 = params[0]
s50 = params[1]
i = len(xdata)
prod = 1.
for i in range(i):
#print prod
prod *= p(xdata[i],x50,s50)**(ydata[i]*5) * (1-p(xdata[i],x50,s50))**((1.-ydata[i])*5)
return -np.log(prod)
xdata = [0.,-7.5,-9.,-13.500001,-12.436171,-16.208617,-13.533123,-12.998025,-13.377527,-12.570075,-13.320075,-13.070075,-11.820075,-12.070075,-12.820075,-13.070075,-12.320075,-12.570075,-11.320075,-12.070075]
ydata = [1.,0.6,0.8,0.4,1.,0.,0.4,0.6,0.2,0.8,0.4,0.,0.6,0.8,0.6,0.2,0.6,0.,0.8,0.6]
intparams = initialparams(xdata, ydata)## guess some initial parameters
## normal curve fit using least squares algorithm
popt, pcov = curve_fit(p, xdata, ydata, p0=intparams)
print('scipy.optimize.curve_fit:')
print('x50 = {:f} +- {:f}'.format(popt[0], cfit_error(pcov)[0]))
print('s50 = {:f} +- {:f}\n'.format(popt[1], cfit_error(pcov)[1]))
## fitting using maximum likelihood estimation
results = minimize(neg_loglike, initialparams(xdata,ydata), method='Nelder-Mead')
## calculating errors from hessian matrix using numdifftools
Hfun = ndt.Hessian(neg_loglike, full_output=True)
hessian_ndt, info = Hfun(results.x)
se = np.sqrt(np.diag(np.linalg.inv(hessian_ndt)))
print('MLE with self defined likelihood-function:')
print('x50 = {:f} +- {:f}'.format(results.x[0], se[0]))
print('s50 = {:f} +- {:f}'.format(results.x[1], se[1]))
Generates the following output:
scipy.optimize.curve_fit:
x50 = -18.702375 +- 1.246728
s50 = 0.063620 +- 0.041207
MLE with self defined likelihood-function:
x50 = -18.572181 +- 0.779847
s50 = 0.078935 +- 0.028783
However some RuntimeErrors occur in calculating the hessian matrix with numdifftools. There is some Division by Zero. This is maybe because of my self defined neg_loglike funtion. At the end there some results for the errors. The method using "Extending Statsmodels" is probably more elegant, but I couldn't figure it out.
My goal is to create a dataset of random points whose histogram looks like an exponential decay function and then plot an exponential decay function through those points.
First I tried to create a series of random numbers (but did not do so successfully since these should be points, not numbers) from an exponential distribution.
from pylab import *
from scipy.optimize import curve_fit
import random
import numpy as np
import pandas as pd
testx = pd.DataFrame(range(10)).astype(float)
testx = testx[0]
for i in range(1,11):
x = random.expovariate(15) # rate = 15 arrivals per second
data[i] = [x]
testy = pd.DataFrame(data).T.astype(float)
testy = testy[0]; testy
plot(testx, testy, 'ko')
The result could look something like this.
And then I define a function to draw a line through my points:
def func(x, a, e):
return a*np.exp(-a*x)+e
popt, pcov = curve_fit(f=func, xdata=testx, ydata=testy, p0 = None, sigma = None)
print popt # parameters
print pcov # covariance
plot(testx, testy, 'ko')
xx = np.linspace(0, 15, 1000)
plot(xx, func(xx,*popt))
plt.show()
What I'm looking for is: (1) a more elegant way to create an array of random numbers from an exponential (decay) distribution and (2) how to test that my function is indeed going through the data points.
I would guess that the following is close to what you want. You can generate some random numbers drawn from an exponential distribution with numpy,
data = numpy.random.exponential(5, size=1000)
You can then create a histogram of them using numpy.hist and draw the histogram values into a plot. You may decide to take the middle of the bins as position for the point (this assumption is of course wrong, but gets the more valid the more bins you use).
Fitting works as in the code from the question. You will then find out that our fit roughly finds the parameter used for the data generation (in this case below ~5).
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.random.exponential(5, size=1000)
hist,edges = np.histogram(data,bins="auto",density=True )
x = edges[:-1]+np.diff(edges)/2.
plt.scatter(x,hist)
func = lambda x,beta: 1./beta*np.exp(-x/beta)
popt, pcov = curve_fit(f=func, xdata=x, ydata=hist)
print(popt)
xx = np.linspace(0, x.max(), 101)
plt.plot(xx, func(xx,*popt), ls="--", color="k",
label="fit, $beta = ${}".format(popt))
plt.legend()
plt.show()
I think you are actually asking about a regression problem, which is what Praveen was suggesting.
You have a bog standard exponential decay that arrives at the y-axis at about y=0.27. Its equation is therefore y = 0.27*exp(-0.27*x). I can model gaussian error around the values of this function and plot the result using the following code.
import matplotlib.pyplot as plt
from math import exp
from scipy.stats import norm
x = range(0, 16)
Y = [0.27*exp(-0.27*_) for _ in x]
error = norm.rvs(0, scale=0.05, size=9)
simulated_data = [max(0, y+e) for (y,e) in zip(Y[:9],error)]
plt.plot(x, Y, 'b-')
plt.plot(x[:9], simulated_data, 'r.')
plt.show()
print (x[:9])
print (simulated_data)
Here's the plot. Notice that I save the output values for subsequent use.
Now I can calculate the nonlinear regression of the exponential decay values, contaminated with noise, on the independent variable, which is what curve_fit does.
from math import exp
from scipy.optimize import curve_fit
import numpy as np
def model(x, p):
return p*np.exp(-p*x)
x = list(range(9))
Y = [0.22219001972988275, 0.15537454187341937, 0.15864069451825827, 0.056411162886672819, 0.037398831058143338, 0.10278251869912845, 0.03984605649260467, 0.0035360087611421981, 0.075855255999424692]
popt, pcov = curve_fit(model, x, Y)
print (popt[0])
print (pcov)
The bonus is that, not only does curve_fit calculate an estimate for the parameter — 0.207962159793 — it also offers an estimate for this estimate's variance — 0.00086071 — as an element of pcov. This would appear to be a fairly small value, given the small sample size.
Here's how to calculate the residuals. Notice that each residual is the difference between the data value and the value estimated from x using the parameter estimate.
residuals = [y-model(_, popt[0]) for (y, _) in zip(Y, x)]
print (residuals)
If you wanted to further 'test that my function is indeed going through the data points' then I would suggest looking for patterns in the residuals. But discussions like this might be beyond what's welcomed on stackoverflow: Q-Q and P-P plots, plots of residuals vs y or x, and so on.
I agree with the solution of #ImportanceOfBeingErnes, but I'd like to add a (well known?) general solution for distributions. If you have a distribution function f with integral F (i.e. f = dF / dx) then you get the required distribution by mapping random numbers with inv F i.e. the inverse function of the integral. In case of the exponential function, the integral is, again, an exponential and the inverse is the logarithm. So it can be done like this:
import matplotlib.pyplot as plt
import numpy as np
from random import random
def gen( a ):
y=random()
return( -np.log( y ) / a )
def dist_func( x, a ):
return( a * np.exp( -a * x) )
data = [ gen(3.14) for x in range(20000) ]
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.hist(data, bins=80, normed=True, histtype="step")
ax.plot(np.linspace(0,5,150), dist_func( np.linspace(0,5,150), 3.14 ) )
plt.show()
I'm using Python to fit a time series with a sinusoidal function. I found quite a good match and now I want to be able to predict future values.. I'm at lost here.
Here's what I've got:
timeSeries = [0.01146, 0.00724, 0.00460, 0.00192, 0.00145, 0.01559, 0.02585, 0.04118, 0.05073, 0.01966, 0.01486, 0.02784]
import numpy as np
from scipy.optimize import curve_fit
def createSinFromFit(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
def sinRegr(series):
t = np.linspace(0, 4*np.pi, len(series))
guess_freq = 1
guess_amplitude = 3*np.std(series)/(2**0.5)
guess_phase = 0
guess_offset = np.mean(series)
p0=[guess_freq, guess_amplitude, guess_phase, guess_offset]
fit = curve_fit(createSinFromFit, t, series, p0=p0)
results = createSinFromFit(t,*fit[0])
return results
plotThis = sinRegr(timeSeries)
This code produces the fitting you see in this picture:
How can I extend the sin function so that it predicts the future points of the series? i.e. how can I have the sine plot span on to the right, beyond the area covered by the 'known' data points?
You need to distinguish a data timeline (input) and a fit timeline (output). Once you do that, the approach is fairly clear. Below I called them tdata and tfit:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
tdata = np.linspace(0, 10)
timeSeries = np.sin(tdata) + .4*np.random.random(tdata.shape)
def createSinFromFit(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
def sinRegr(tdata, series):
tfit = np.linspace(0, 6*np.pi, len(series))
guess_freq = .2
guess_amplitude = 3*np.std(series)/(2**0.5)
guess_phase = 0
guess_offset = np.mean(series)
p0=[guess_freq, guess_amplitude, guess_phase, guess_offset]
fit = curve_fit(createSinFromFit, tdata, series, p0=p0) # use tdata to create the fit
results = createSinFromFit(tfit,*fit[0]) # use tfit to generate a new curve
return tfit, results
tfit, plotThis = sinRegr(tdata, timeSeries)
plt.plot(tfit, plotThis)
plt.plot(tdata, timeSeries, "ro")
plt.show()