I'm trying to fit a series of data to a exponential equation, I've found some great answer here: How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting But it didn't contain the step forward that I need for this question.
I'm trying to fit y and x against a equation: y = -AeBx + A. The final A has proven to be a big trouble and I don't know how to transform the equation like log(y) = log(A) + Bx as if the final A was not there.
Any help is appreciated.
You can always just use scipy.optimize.curve_fit as long as your equation isn't too crazy:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as sio
def f(x, A, B):
return -A*np.exp(B*x) + A
A = 2
B = 1
x = np.linspace(0,1)
y = f(x, A, B)
scale = (max(y) - min(y))*.10
noise = np.random.normal(size=x.size)*scale
y += noise
fit = sio.curve_fit(f, x, y)
plt.scatter(x, y)
plt.plot(x, f(x, *fit[0]))
plt.show()
This produces:
Related
I am trying to draw a best fit curve for my data. It is terribly bad sample of data, but for simplicity's sake let's say, I expect to draw a straight line as a best fit in log-log scale.
I think I already did that with regression and it returns me a reasonable fit line. But I want to double check it with curve fit function in scipy. And I also want to extract the equation of the fit line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
x = np.array([ 1.72724547e-08, 1.81960233e-08, 1.68093027e-08, 2.22839973e-08,
2.23090589e-08, 4.28020801e-08, 2.30004711e-08, 2.48543008e-08,
1.08633065e-07, 3.24417303e-08, 3.22946248e-08, 3.82328031e-08,
3.97713860e-08, 3.44080732e-08, 3.81526816e-08, 3.30756706e-08
])
y = np.array([ 4.18793565e+12, 4.40554864e+12, 4.48745390e+12, 4.50816705e+12,
4.57088190e+12, 4.60256574e+12, 4.66659380e+12, 4.79733449e+12, 7.31139083e+12, 7.53355564e+12, 8.03526122e+12, 8.14704284e+12,
8.47227414e+12, 8.62978548e+12, 8.81048873e+12, 9.46237161e+12
])
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
allx, ally = x, y # data, non-transformed
y_pred, _ = regress(np.log(allx), np.log(ally)) # change here # transformed input
plt.loglog(allx, ally, marker='$\\star$',color ='g', markersize=5,linestyle='None')
plt.loglog(allx, np.exp(y_pred), "c--", label="regression") # transformed output
# Let's fit an exponential function.
# This looks like a line on a lof-log plot.
def myExpFunc(x, a, b):
return a * np.power(x, b)
popt, pcov = curve_fit(myExpFunc, x, y, maxfev=1000)
plt.plot(x, myExpFunc(x, *popt), 'r:',
label="({0:.3f}*x**{1:.3f})".format(*popt))
print "Exponential Fit: y = (a*(x**b))"
print "\ta = popt[0] = {0}\n\tb = popt[1] = {1}".format(*popt)
plt.show()
Again I apologize for a bad dataset. your help will be very appreciated.
My plot looks like this:
enter code here
My goal is to create a dataset of random points whose histogram looks like an exponential decay function and then plot an exponential decay function through those points.
First I tried to create a series of random numbers (but did not do so successfully since these should be points, not numbers) from an exponential distribution.
from pylab import *
from scipy.optimize import curve_fit
import random
import numpy as np
import pandas as pd
testx = pd.DataFrame(range(10)).astype(float)
testx = testx[0]
for i in range(1,11):
x = random.expovariate(15) # rate = 15 arrivals per second
data[i] = [x]
testy = pd.DataFrame(data).T.astype(float)
testy = testy[0]; testy
plot(testx, testy, 'ko')
The result could look something like this.
And then I define a function to draw a line through my points:
def func(x, a, e):
return a*np.exp(-a*x)+e
popt, pcov = curve_fit(f=func, xdata=testx, ydata=testy, p0 = None, sigma = None)
print popt # parameters
print pcov # covariance
plot(testx, testy, 'ko')
xx = np.linspace(0, 15, 1000)
plot(xx, func(xx,*popt))
plt.show()
What I'm looking for is: (1) a more elegant way to create an array of random numbers from an exponential (decay) distribution and (2) how to test that my function is indeed going through the data points.
I would guess that the following is close to what you want. You can generate some random numbers drawn from an exponential distribution with numpy,
data = numpy.random.exponential(5, size=1000)
You can then create a histogram of them using numpy.hist and draw the histogram values into a plot. You may decide to take the middle of the bins as position for the point (this assumption is of course wrong, but gets the more valid the more bins you use).
Fitting works as in the code from the question. You will then find out that our fit roughly finds the parameter used for the data generation (in this case below ~5).
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.random.exponential(5, size=1000)
hist,edges = np.histogram(data,bins="auto",density=True )
x = edges[:-1]+np.diff(edges)/2.
plt.scatter(x,hist)
func = lambda x,beta: 1./beta*np.exp(-x/beta)
popt, pcov = curve_fit(f=func, xdata=x, ydata=hist)
print(popt)
xx = np.linspace(0, x.max(), 101)
plt.plot(xx, func(xx,*popt), ls="--", color="k",
label="fit, $beta = ${}".format(popt))
plt.legend()
plt.show()
I think you are actually asking about a regression problem, which is what Praveen was suggesting.
You have a bog standard exponential decay that arrives at the y-axis at about y=0.27. Its equation is therefore y = 0.27*exp(-0.27*x). I can model gaussian error around the values of this function and plot the result using the following code.
import matplotlib.pyplot as plt
from math import exp
from scipy.stats import norm
x = range(0, 16)
Y = [0.27*exp(-0.27*_) for _ in x]
error = norm.rvs(0, scale=0.05, size=9)
simulated_data = [max(0, y+e) for (y,e) in zip(Y[:9],error)]
plt.plot(x, Y, 'b-')
plt.plot(x[:9], simulated_data, 'r.')
plt.show()
print (x[:9])
print (simulated_data)
Here's the plot. Notice that I save the output values for subsequent use.
Now I can calculate the nonlinear regression of the exponential decay values, contaminated with noise, on the independent variable, which is what curve_fit does.
from math import exp
from scipy.optimize import curve_fit
import numpy as np
def model(x, p):
return p*np.exp(-p*x)
x = list(range(9))
Y = [0.22219001972988275, 0.15537454187341937, 0.15864069451825827, 0.056411162886672819, 0.037398831058143338, 0.10278251869912845, 0.03984605649260467, 0.0035360087611421981, 0.075855255999424692]
popt, pcov = curve_fit(model, x, Y)
print (popt[0])
print (pcov)
The bonus is that, not only does curve_fit calculate an estimate for the parameter — 0.207962159793 — it also offers an estimate for this estimate's variance — 0.00086071 — as an element of pcov. This would appear to be a fairly small value, given the small sample size.
Here's how to calculate the residuals. Notice that each residual is the difference between the data value and the value estimated from x using the parameter estimate.
residuals = [y-model(_, popt[0]) for (y, _) in zip(Y, x)]
print (residuals)
If you wanted to further 'test that my function is indeed going through the data points' then I would suggest looking for patterns in the residuals. But discussions like this might be beyond what's welcomed on stackoverflow: Q-Q and P-P plots, plots of residuals vs y or x, and so on.
I agree with the solution of #ImportanceOfBeingErnes, but I'd like to add a (well known?) general solution for distributions. If you have a distribution function f with integral F (i.e. f = dF / dx) then you get the required distribution by mapping random numbers with inv F i.e. the inverse function of the integral. In case of the exponential function, the integral is, again, an exponential and the inverse is the logarithm. So it can be done like this:
import matplotlib.pyplot as plt
import numpy as np
from random import random
def gen( a ):
y=random()
return( -np.log( y ) / a )
def dist_func( x, a ):
return( a * np.exp( -a * x) )
data = [ gen(3.14) for x in range(20000) ]
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.hist(data, bins=80, normed=True, histtype="step")
ax.plot(np.linspace(0,5,150), dist_func( np.linspace(0,5,150), 3.14 ) )
plt.show()
I am trying to do a linear fit of some data, but I cannot get curve_fit in Python to give me anything but a slope and y-intercept of 1. Here is an example of my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b):
return a*x + b
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
popt, pcov = curve_fit(func, x, y)
print popt
I have also tried giving curve_fit a "guess," but when I do that it gives me an overflow error, which I'm guessing is because the numbers are too large.
Another way of doing this without using curve_fit is to use numpy's polyfit.
import matplotlib.pyplot as plt
import numpy as np
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
xp = np.linspace(290, 310, 100)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
print (z)
fig, ax = plt.subplots()
ax.plot(x, y, '.')
ax.plot(xp, p(xp), '-')
plt.show()
This prints the coefficients as [2.10000000e+19 -4.22333333e+21] and produces the following graph:
I got something in the ballpark as Excel's linear fit by using scipy basinhopping instead of curve_fit with a large number of iterations. It takes a bit to run the iterations and it also requires an error function, but it was done without scaling the original data. Basinhopping docs.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import basinhopping
def func( x0, x_data, y_data ):
error = 0
for x_val, y_val in zip(x_data, y_data):
error += (y_val - (x0[0]*x_val + x0[1]))**2
return error
x_data = [290., 300., 310.]
y_data = [1.87e+21, 2.07e+21, 2.29e+21]
a = 1
b = 1
x0 = [a, b]
minimizer_kwargs = { 'method': 'TNC', 'args': (x_data, y_data) }
res = basinhopping(func, x0, niter=1000000, minimizer_kwargs=minimizer_kwargs)
print res
This gives x: array([ 7.72723434e+18, -2.38554994e+20]) but if you try again, you'll see this has the problem of non-unique outcomes, although it will give similar ballpark values.
Here's a comparison of the fit with the Excel solution.
Confirmed correct results are returned using:
x = [290., 300., 310.]
y = [300., 301., 302.]
My guess is magnitudes ≅ 10²¹ are too large for the function to work well.
What you can try doing is taking the logarithm of both sides:
def func(x, a, b):
# might need to check if ≤ 0.0
return math.log(a*x + b)
# ... code omitted
y = [48.9802253837, 49.0818355602, 49.1828387704]
Then undo the transformation afterwards.
Also for simple linear approximation, there is an easy deterministic method.
Within the Python library statsmodels, is it possible to perform a nonlinear least-square fitting with nonlinear parameter? In other words, I would like to find the best fit (in term of least-square) for p in the following stat model:
y = ln(p)*x^2 + p
Let's assume I have a set of observation x and y, I can use the function scipy.optimize.leastsq to find the best fit. Here is an example:
import numpy as np
import scipy.optimize as spopt
import matplotlib.pyplot as plt
def fitFunc(p,x):
return np.log(p[0])*x**2 + p[0]
def errFunc(p,x,y):
return fitFunc(p,x)-y
# create x, y and y_true
nSample = 100
x = np.linspace(-10,10,nSample)
y_true = fitFunc([1.2],x)
err = np.random.normal(scale=7.0,size=nSample)
y = y_true+err
# find optimal p with least-square and plot it
p1, success = spopt.leastsq(errFunc, [2], args=(x,y))
plt.figure()
plt.scatter(x,y,label='obsevation')
plt.plot(x,y_true,label='true y')
plt.plot(x,fitFunc(p1,x),label='model')
plt.grid(True)
plt.legend()
Is it possible to perform such type of analysis with the library statsmodels?
I am trying to fit the differential equation ay' + by''=0 to a curve by varying a and b The following code does not work. The problem with curve_fit seems to be that lack of initial guess results in failure in fitting. I also tried leastsq. Can anyone suggest me other ways to fit such differential equation? If I don't have good guess curve_fit fails!
from scipy.integrate import odeint
from scipy.optimize import curve_fit
from numpy import linspace, random, array
time = linspace(0.0,10.0,100)
def deriv(time,a,b):
dy=lambda y,t : array([ y[1], a*y[0]+b*y[1] ])
yinit = array([0.0005,0.2]) # initial values
Y=odeint(dy,yinit,time)
return Y[:,0]
z = deriv(time, 2, 0.1)
zn = z + 0.1*random.normal(size=len(time))
popt, pcov = curve_fit(deriv, time, zn)
print popt # it only outputs the initial values of a, b!
Let's rewrite the equation:
ay' + by''=0
y'' = -a/b*y'
So this equation may be represented in this way
dy/dt = y'
d(y')/dt = -a/b*y'
The code in Python 2.7:
from scipy.integrate import odeint
from pylab import *
a = -2
b = -0.1
def deriv(Y,t):
'''Get the derivatives of Y at the time moment t
Y = [y, y' ]'''
return array([ Y[1], -a/b*Y[1] ])
time = linspace(0.0,1.0,1000)
yinit = array([0.0005,0.2]) # initial values
y = odeint(deriv,yinit,time)
figure()
plot(time,y[:,0])
xlabel('t')
ylabel('y')
show()
You may compare the resultant plots with the plots in WolframAlpha
If your problem is that the default initial guesses, read the documentation curve_fit to find out how to specify them manually by giving it the p0 parameter. For instance, curve_fit(deriv, time, zn, p0=(12, 0.23)) if you want a=12 and b=0.23 be the initial guess.