nonlinear least-square with Python statsmodels - python

Within the Python library statsmodels, is it possible to perform a nonlinear least-square fitting with nonlinear parameter? In other words, I would like to find the best fit (in term of least-square) for p in the following stat model:
y = ln(p)*x^2 + p
Let's assume I have a set of observation x and y, I can use the function scipy.optimize.leastsq to find the best fit. Here is an example:
import numpy as np
import scipy.optimize as spopt
import matplotlib.pyplot as plt
def fitFunc(p,x):
return np.log(p[0])*x**2 + p[0]
def errFunc(p,x,y):
return fitFunc(p,x)-y
# create x, y and y_true
nSample = 100
x = np.linspace(-10,10,nSample)
y_true = fitFunc([1.2],x)
err = np.random.normal(scale=7.0,size=nSample)
y = y_true+err
# find optimal p with least-square and plot it
p1, success = spopt.leastsq(errFunc, [2], args=(x,y))
plt.figure()
plt.scatter(x,y,label='obsevation')
plt.plot(x,y_true,label='true y')
plt.plot(x,fitFunc(p1,x),label='model')
plt.grid(True)
plt.legend()
Is it possible to perform such type of analysis with the library statsmodels?

Related

curve fitting different in Python than Matlab

I have a code in Matlab that I want to convert to python. In the Matlab code, I'm using the curve fitting toolbox to fit some data to the Fourier series of order 3. Here is how I did it in Matlab:
ft= fittype('fourier3');
myfit = fit(x,y,ft)
figure(20)
plot(y)
hold
figure(20)
plot(myfit)
And here is the plot of the data
So, to convert it to Python, I searched for equivalent library to the curve fitting toolbox, and found a library named 'symfit' which serves the same purpose. I looked the documentation and found example that can help, so I used the example as in the description with my data as follows:
from symfit import parameters, variables, sin, cos, Fit
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def fourier_series(x, f, n=0):
"""
Returns a symbolic fourier series of order `n`.
:param n: Order of the fourier series.
:param x: Independent variable
:param f: Frequency of the fourier series
"""
# Make the parameter objects for all the terms
a0, *cos_a = parameters(','.join(['a{}'.format(i) for i in range(0, n + 1)]))
sin_b = parameters(','.join(['b{}'.format(i) for i in range(1, n + 1)]))
# Construct the series
series = a0 + sum(ai * cos(i * f * x) + bi * sin(i * f * x)
for i, (ai, bi) in enumerate(zip(cos_a, sin_b), start=1))
return series
T = pd.read_excel('data.xls')
A = pd.DataFrame(T)
x, y = variables('x, y')
w, = parameters('w')
model_dict = {y: fourier_series(x, f=w, n=3)}
print(model_dict)
xdata = np.array(A.iloc[:, 0])
ydata = np.array(A.iloc[:, 1])
# Define a Fit object for this model and data
fit = Fit(model_dict, x=xdata, y=ydata)
fit_result = fit.execute()
print(fit_result)
# Plot the result
plt.plot(xdata, ydata)
plt.plot(xdata, fit.model(x=xdata, **fit_result.params).y, ls=':')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
But when running the code, here is the plot I get:
I don't know why the fitted data is a straight line. Can anyone help with that problem? I don't know whether I used the wrong algorithm or I'm plotting the data incorrectly.
Edit:
Here is the data file for those who would like to try: https://docs.google.com/spreadsheets/d/18lL1iMZ3kdaqUUtRDLNRK4A3uCPzOrXt/edit?usp=sharing&ouid=112684448221465330517&rtpof=true&sd=true

How to fit a specific exponential function with numpy

I'm trying to fit a series of data to a exponential equation, I've found some great answer here: How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting But it didn't contain the step forward that I need for this question.
I'm trying to fit y and x against a equation: y = -AeBx + A. The final A has proven to be a big trouble and I don't know how to transform the equation like log(y) = log(A) + Bx as if the final A was not there.
Any help is appreciated.
You can always just use scipy.optimize.curve_fit as long as your equation isn't too crazy:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as sio
def f(x, A, B):
return -A*np.exp(B*x) + A
A = 2
B = 1
x = np.linspace(0,1)
y = f(x, A, B)
scale = (max(y) - min(y))*.10
noise = np.random.normal(size=x.size)*scale
y += noise
fit = sio.curve_fit(f, x, y)
plt.scatter(x, y)
plt.plot(x, f(x, *fit[0]))
plt.show()
This produces:

Simultaneous numerical fit of two equations using Numpy least square method

I am trying to fit below mentioned two equations using python leastsq method but am not sure whether this is the right approach. First equation has incomplete gamma function in it while the second one is slightly complex, and along with an exponential function contains a term which is obtained by using a separate fitting formula.
J_mg = T_incomplete(hw/T_mag)
J_nmg = e^(-hw/T)*g(w,T)
Here g is a function of w and T and is calucated using a given fitting formula.
I am following the steps outlined in this question.
Here is what I have done
import numpy as np
from scipy.optimize import leastsq
from scipy.special import gammaincc
from scipy.special import gamma
from matplotlib.pyplot import plot
# generating data
NPTS = 10
hw = np.linspace(0.5, 10, NPTS)
j1 = np.linspace(0.001,10,NPTS)
j2 = np.linspace(0.003,10,NPTS)
T_mag = np.linspace(0.3,0.5,NPTS)
#defining functions
def calc_gaunt_factor(hw,T):
fitting_coeff= np.loadtxt('fitting_coeff.txt', skiprows=1)
#T is in KeV
#K_b = 8.6173303(50)e−5 ev/K
g = 0
gamma = 0.0136/T
theta= hw/T
A= (np.log10(gamma**2) +0.5)*0.4
B= (np.log10(theta)+1.5)*0.4
for i in range(11):
for j in range(11):
g_ij = fitting_coeff[i][j]*(A**i)*(B**j)
g = g_ij+g
return g
def j_w_mag(hw,T_mag):
order= 0.001
return np.sqrt(1/T_mag)*gamma(order)*gammaincc(order,hw/T_mag)
def j_w_nonmag(hw,T):
gamma = 0.0136/T
theta= hw/T
return np.sqrt(1/T)*np.exp((-hw)/T)*calc_gaunt_factor(hw,T)
def residual_func(T,T_mag,hw,j1,j2):
err_unmag = np.nan_to_num(j1 - j_w_nonmag(hw,T))
err_mag = np.nan_to_num(j2 - j_w_mag(hw,T_mag))
err= np.concatenate((err_unmag, err_mag))
return err
par_init = np.array([.35])
best, cov, info, message, ler = leastsq(residual_func,par_init,args=(T_mag,hw,j1,j2),full_output=True)
print("Best-Fit Parameters:")
print("T=%s" %(best[0]))
I am getting weird value for my fitting parameter, T. Is this the right approach? Thanks.

Curve fit in a log-log plot in matplotlib and getting the equation of line

I am trying to draw a best fit curve for my data. It is terribly bad sample of data, but for simplicity's sake let's say, I expect to draw a straight line as a best fit in log-log scale.
I think I already did that with regression and it returns me a reasonable fit line. But I want to double check it with curve fit function in scipy. And I also want to extract the equation of the fit line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
x = np.array([ 1.72724547e-08, 1.81960233e-08, 1.68093027e-08, 2.22839973e-08,
2.23090589e-08, 4.28020801e-08, 2.30004711e-08, 2.48543008e-08,
1.08633065e-07, 3.24417303e-08, 3.22946248e-08, 3.82328031e-08,
3.97713860e-08, 3.44080732e-08, 3.81526816e-08, 3.30756706e-08
])
y = np.array([ 4.18793565e+12, 4.40554864e+12, 4.48745390e+12, 4.50816705e+12,
4.57088190e+12, 4.60256574e+12, 4.66659380e+12, 4.79733449e+12, 7.31139083e+12, 7.53355564e+12, 8.03526122e+12, 8.14704284e+12,
8.47227414e+12, 8.62978548e+12, 8.81048873e+12, 9.46237161e+12
])
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
allx, ally = x, y # data, non-transformed
y_pred, _ = regress(np.log(allx), np.log(ally)) # change here # transformed input
plt.loglog(allx, ally, marker='$\\star$',color ='g', markersize=5,linestyle='None')
plt.loglog(allx, np.exp(y_pred), "c--", label="regression") # transformed output
# Let's fit an exponential function.
# This looks like a line on a lof-log plot.
def myExpFunc(x, a, b):
return a * np.power(x, b)
popt, pcov = curve_fit(myExpFunc, x, y, maxfev=1000)
plt.plot(x, myExpFunc(x, *popt), 'r:',
label="({0:.3f}*x**{1:.3f})".format(*popt))
print "Exponential Fit: y = (a*(x**b))"
print "\ta = popt[0] = {0}\n\tb = popt[1] = {1}".format(*popt)
plt.show()
Again I apologize for a bad dataset. your help will be very appreciated.
My plot looks like this:
enter code here

Generate random numbers from exponential distribution and model using python

My goal is to create a dataset of random points whose histogram looks like an exponential decay function and then plot an exponential decay function through those points.
First I tried to create a series of random numbers (but did not do so successfully since these should be points, not numbers) from an exponential distribution.
from pylab import *
from scipy.optimize import curve_fit
import random
import numpy as np
import pandas as pd
testx = pd.DataFrame(range(10)).astype(float)
testx = testx[0]
for i in range(1,11):
x = random.expovariate(15) # rate = 15 arrivals per second
data[i] = [x]
testy = pd.DataFrame(data).T.astype(float)
testy = testy[0]; testy
plot(testx, testy, 'ko')
The result could look something like this.
And then I define a function to draw a line through my points:
def func(x, a, e):
return a*np.exp(-a*x)+e
popt, pcov = curve_fit(f=func, xdata=testx, ydata=testy, p0 = None, sigma = None)
print popt # parameters
print pcov # covariance
plot(testx, testy, 'ko')
xx = np.linspace(0, 15, 1000)
plot(xx, func(xx,*popt))
plt.show()
What I'm looking for is: (1) a more elegant way to create an array of random numbers from an exponential (decay) distribution and (2) how to test that my function is indeed going through the data points.
I would guess that the following is close to what you want. You can generate some random numbers drawn from an exponential distribution with numpy,
data = numpy.random.exponential(5, size=1000)
You can then create a histogram of them using numpy.hist and draw the histogram values into a plot. You may decide to take the middle of the bins as position for the point (this assumption is of course wrong, but gets the more valid the more bins you use).
Fitting works as in the code from the question. You will then find out that our fit roughly finds the parameter used for the data generation (in this case below ~5).
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.random.exponential(5, size=1000)
hist,edges = np.histogram(data,bins="auto",density=True )
x = edges[:-1]+np.diff(edges)/2.
plt.scatter(x,hist)
func = lambda x,beta: 1./beta*np.exp(-x/beta)
popt, pcov = curve_fit(f=func, xdata=x, ydata=hist)
print(popt)
xx = np.linspace(0, x.max(), 101)
plt.plot(xx, func(xx,*popt), ls="--", color="k",
label="fit, $beta = ${}".format(popt))
plt.legend()
plt.show()
I think you are actually asking about a regression problem, which is what Praveen was suggesting.
You have a bog standard exponential decay that arrives at the y-axis at about y=0.27. Its equation is therefore y = 0.27*exp(-0.27*x). I can model gaussian error around the values of this function and plot the result using the following code.
import matplotlib.pyplot as plt
from math import exp
from scipy.stats import norm
x = range(0, 16)
Y = [0.27*exp(-0.27*_) for _ in x]
error = norm.rvs(0, scale=0.05, size=9)
simulated_data = [max(0, y+e) for (y,e) in zip(Y[:9],error)]
plt.plot(x, Y, 'b-')
plt.plot(x[:9], simulated_data, 'r.')
plt.show()
print (x[:9])
print (simulated_data)
Here's the plot. Notice that I save the output values for subsequent use.
Now I can calculate the nonlinear regression of the exponential decay values, contaminated with noise, on the independent variable, which is what curve_fit does.
from math import exp
from scipy.optimize import curve_fit
import numpy as np
def model(x, p):
return p*np.exp(-p*x)
x = list(range(9))
Y = [0.22219001972988275, 0.15537454187341937, 0.15864069451825827, 0.056411162886672819, 0.037398831058143338, 0.10278251869912845, 0.03984605649260467, 0.0035360087611421981, 0.075855255999424692]
popt, pcov = curve_fit(model, x, Y)
print (popt[0])
print (pcov)
The bonus is that, not only does curve_fit calculate an estimate for the parameter — 0.207962159793 — it also offers an estimate for this estimate's variance — 0.00086071 — as an element of pcov. This would appear to be a fairly small value, given the small sample size.
Here's how to calculate the residuals. Notice that each residual is the difference between the data value and the value estimated from x using the parameter estimate.
residuals = [y-model(_, popt[0]) for (y, _) in zip(Y, x)]
print (residuals)
If you wanted to further 'test that my function is indeed going through the data points' then I would suggest looking for patterns in the residuals. But discussions like this might be beyond what's welcomed on stackoverflow: Q-Q and P-P plots, plots of residuals vs y or x, and so on.
I agree with the solution of #ImportanceOfBeingErnes, but I'd like to add a (well known?) general solution for distributions. If you have a distribution function f with integral F (i.e. f = dF / dx) then you get the required distribution by mapping random numbers with inv F i.e. the inverse function of the integral. In case of the exponential function, the integral is, again, an exponential and the inverse is the logarithm. So it can be done like this:
import matplotlib.pyplot as plt
import numpy as np
from random import random
def gen( a ):
y=random()
return( -np.log( y ) / a )
def dist_func( x, a ):
return( a * np.exp( -a * x) )
data = [ gen(3.14) for x in range(20000) ]
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.hist(data, bins=80, normed=True, histtype="step")
ax.plot(np.linspace(0,5,150), dist_func( np.linspace(0,5,150), 3.14 ) )
plt.show()

Categories

Resources