Scipy curve_fit giving straight line - python

So I'm trying to get an exponential curve for some COVID data, but I can't seem to get my curve_fit function to show any sort of curve whatsoever. It's so bad it perfectly overlaps the regression line seaborn generated in my graph.
I've tried making both my date and case data smaller/bigger before throwing it into the curve_fit function, but I still either get a similar line and/or an Optimization error. I even tried calculating my function manually but that was (naturally) also way off.
#Plot scatter plot for total case count
x = df_sb['date_ordinal']
y1 = df_sb['totalcountconfirmed']
y2 = df_sb['totalcountdeaths']
plt.figure(figsize=(14,10))
ax = plt.subplot(1,1,1)
# Plot scatter plot along with linear regression line
sns.regplot(x='date_ordinal', y='totalcountconfirmed', data=df_sb)
# Formatting axes
ax.set_xlim(x.min() - 1, x.max() + 10)
ax.set_ylim(0, y1.max() + 1)
ax.set_xlabel('Date')
labels = [dt.date.fromordinal(int(item)) for item in ax.get_xticks()]
ax.set_xticklabels(labels)
plt.xticks(rotation = 45)
plt.ylabel("Total Confirmed Cases")
# Exponential Curve
from scipy.optimize import curve_fit
from scipy.special import expit
x_data = df_sb['date_ordinal'].to_numpy()
Y_data = df_sb['totalcountconfirmed'].to_numpy()
def func(x, a, b, c):
return a * expit(-b * x) + c
popt, pcov = curve_fit(func, x_data, Y_data, maxfev=10000)
a, b, c = popt
fit_y = func(x_data, a, b, c)
plt.plot(x_data, fit_y)
plt.legend(['Total Cases (Linear)','Total Cases (Exponential)'])
# Inserting Significant Date Labels
add_sig_dates(df_sb, 'totalcountconfirmed')
plt.show()

Despite you did not give any access to the data, just by looking at the plot I'm pretty sure you mean
def func(x, a, b, c):
return a * np.exp(-b * x) + c
instead of
def func(x, a, b, c):
return a * expit(-b * x) + c
Since it's an exponential fit, I think you should provide initial guess for parameters in order to achieve good results. This can be done with the p0 argument.
For example:
p0 = [2 ,1, 0] # < -- just an example, they are bad guesses
popt, pcov = curve_fit(func, x_data, Y_data, maxfev=10000, p0=p0)

Related

Gaussian fitted curve showing a tail that does not go back to base-level

Based upon existing topics on Stackoverflow, I have managed to fit a Gaussian curve to my dataset. However, the fitted Gaussian shows one tail that does not go back to base-level (i.e., in the example below, the right tail suddenly stops at a higher y-value compared to the left tail). This surprises me, as per definition a Gaussian should show a perfectly symmetrical bell-shaped curve. How can I generate a Gaussian curve of which both tails are equally long (i.e., the tails stop at the same width measured from the plume center-line) and end at the same base-level (i.e., the same y-value)? The reason I would like to have this, is because in my data sometimes a second peak starts to arise while the first peak did not go back to base-level yet. I would like to separate these peaks by fitting a Gaussian that goes back to base-level, as theoretically each peak should go back to its base-level. Thanks a lot in advance!
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
x = np.array([-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0])
y = np.array([1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532])
def gaussian(x, amp, cen, wid):
return (amp / (np.sqrt(2*np.pi) * wid)) * np.exp(-(x-cen)**2 / (2*wid**2))
def line(x, slope, intercept):
return slope*x + intercept
peak_index = find_peaks(y,height=27.6)[0][0]
mean = sum(x*y)/np.sum(y) #weighted arithmetic mean
mod = Model(gaussian) + Model(line)
pars = mod.make_params(amp=max(y), cen=x[peak_index],
wid=np.sqrt(sum((x-mean)**2 * y)/sum(y)), slope=0, intercept=1)
result = mod.fit(y, pars, x=x)
comps = result.eval_components()
plt.plot(x, y, 'bo')
plt.plot(x, comps['gaussian'], 'k--')
Edit: The following example hopefully illustrates why I am interested in this. I have a long data-set in which the signal of different sources are being measured. The data-set is processed such that it generates the arrays x_measured and y_measured that contain the measured values belonging to one source. My program automatically detects the plume that occurs within the measured values, and stores the values of this plume in arrays called x and y. To these x and y arrays, I perform a Gaussian fit.
However, sometimes the measured values show that 2 plumes are overlapping, hence there is no measured plume from and back to base-level. An example is given in the code below. My program for these measured values now gives a Gaussian fit whereby the right tail goes to around y=0, but the left tail of the Gaussian fit stops around y=4.5. I would like the left tail to also go back to around y=0. This is, because theoretically I know that each plume should start and go back to the same base-level, and I want to compute the plume-width of such a Gaussian plume. For the example below, the left tail does not go back to around y=0, hence I cannot determine the width of the plume. I would like to have a Gaussian-fit of which both tails go back to the same base-level of y=0, such that I can determine the width of the plume.
x_measured = np.arange(-20,245,3)
y_measured = np.array([38.7586,38.2323,37.2958,35.9924,34.4196,32.7123,31.0257,29.5169,28.3244,27.5502,27.2458,27.4078,27.9815,28.8728,29.9643,31.1313,32.2545,33.2276,33.9594,34.373,34.4041,34.0009,33.1267,31.7649,29.9247,27.6458,24.9992,22.0845,19.0215,15.9397,12.966,10.2127,7.76834,5.69046,4.00296,2.69719,1.73733,1.06907,0.629744,0.358021,0.201123,0.11878,0.0839719,0.0813392,0.104295,0.151634,0.224209,0.321912,0.441478,0.575581,0.713504,0.843351,0.954777,1.04109,1.09974,1.13118,1.13683,1.11758,1.07369,1.0059,0.917066,0.81321,0.703288,0.597775,0.506678,0.437843,0.396256,0.384633,0.405147,0.461496,0.560387,0.71144,0.925262,1.21022,1.56925,1.99788,2.48458,3.01314,3.56626,4.12898,4.69031,5.24283,5.78014,6.29365,6.77004,7.19071,7.53399,7.78019,7.91889])
x = np.arange(10,104,3)
y = np.array([22.4548,23.4302,25.3389,27.9929,30.486,32.0528,33.5527,35.1304,35.9941,36.8606,37.1889,37.723,36.4069,35.9751,33.8824,31.0909,27.4247,23.3213,18.8772,14.3363,11.1075,7.68792,4.54899,2.2057,0,0,0,0,0,0,0.179834,0])
def gaussian(x, amp, cen, wid):
return (amp / (np.sqrt(2*np.pi) * wid)) * np.exp(-(x-cen)**2 / (2*wid**2))
def line(x, slope, intercept):
return slope*x + intercept
peak_index = find_peaks(y,height=27.6)[0][0]
mean = sum(x*y)/np.sum(y) #weighted arithmetic mean
mod = Model(gaussian) + Model(line)
pars = mod.make_params(amp=max(y), cen=x[peak_index],
wid=np.sqrt(sum((x-mean)**2 * y)/sum(y)), slope=0, intercept=1)
result = mod.fit(y, pars, x=x)
comps = result.eval_components()
plt.plot(x, y, 'bo')
plt.plot(x, comps['gaussian'], 'k--')
plt.plot(x_measured,y_measured)
It is unclear why you expect a bimodal fit with the model you defined. Use two different Gaussian functions for your fit, then evaluate the fitted functions for a longer interval x_fit to see the curves returning to baseline:
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
x = np.array([-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0])
y = np.array([1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532])
def gaussian1(x, amp1, cen1, wid1):
return (amp1 / (np.sqrt(2*np.pi) * wid1)) * np.exp(-(x-cen1)**2 / (2*wid1**2))
def gaussian2(x, amp2, cen2, wid2):
return (amp2 / (np.sqrt(2*np.pi) * wid2)) * np.exp(-(x-cen2)**2 / (2*wid2**2))
#peak_index = find_peaks(y,height=27.6)[0][0]
#mean = sum(x*y)/np.sum(y) #weighted arithmetic mean
mod = Model(gaussian1) + Model(gaussian2)
#I just filled in some start values, the details of educated guesses can be filled in later by you
pars = mod.make_params(amp1=30, amp2=40, cen1=20, cen2=40, wid1=2, wid2=2)
result = mod.fit(y, pars, x=x)
print(result.params)
x_fit=np.linspace(-30, 120, 500)
comps_elem = result.eval_components(x=x_fit)
comps_comb = result.eval(x=x_fit)
plt.plot(x, y, 'bo')
plt.plot(x_fit, comps_comb, 'k')
plt.plot(x_fit, comps_elem['gaussian1'], 'k-.')
plt.plot(x_fit, comps_elem['gaussian2'], 'k--')
plt.show()
Sample output:
The corresponding scipy.curve_fit function would look like this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
x = [-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0]
y = [1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532]
def gauss(x, mu, sigma, A):
return A*np.exp(-(x-mu)**2/2/sigma**2)
def bimodal(x, mu1, sigma1, A1, mu2, sigma2, A2):
return gauss(x, mu1, sigma1, A1) + gauss(x, mu2, sigma2, A2)
expected = (20, 2, 30, 40, 2, 40)
params, cov = curve_fit(bimodal, x, y, expected)
sigma=np.sqrt(np.diag(cov))
x_fit = np.linspace(-20, 120, 500)
plt.plot(x_fit, bimodal(x_fit, *params), color='red', lw=3, label='model')
plt.plot(x_fit, gauss(x_fit, *params[:3]), color='red', lw=1, ls="--", label='distribution 1')
plt.plot(x_fit, gauss(x_fit, *params[3:]), color='red', lw=1, ls=":", label='distribution 2')
plt.scatter(x, y, marker="X", color="black", label="original data")
plt.legend()
print(pd.DataFrame(data={'params': params, 'sigma': sigma}, index=bimodal.__code__.co_varnames[1:]))
plt.show()

scipy.curve_fit() returns multiple lines

I am new to python and was trying to fit dataset distribution using the following code. The actual data is a list that contains two columns- predicted market price and actual market price. And I was trying to use scipy.curve_fit() but it gave me many lines plotted at the same place. Any help is appreciated.
# import the necessary modules and define a func.
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
def func(x, a, b, c):
return a * x** b + c
# my data
pred_data = [3.0,1.0,1.0,7.0,6.0,1.0,7.0,4.0,9.0,3.0,5.0,5.0,2.0,6.0,8.0]
actu_data =[ 3.84,1.55,1.15,7.56,6.64,1.09,7.12,4.17,9.45,3.12,5.37,5.65,1.92,6.27,7.63]
popt, pcov = curve_fit(func, pred_data, actu_data)
#adjusting y
yaj = func(pred_data, popt[0],popt[1], popt[2])
# plot the data
plt.plot(pred_data,actu_data, 'ro', label = 'Data')
plt.plot(pred_data,yaj,'b--', label = 'Best fit')
plt.legend()
plt.show()
Scipy doesn't produce multiple lines, the strange output is caused by the way you present your unsorted data to matplotlib. Sort your x-values and you get the desired output:
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
def func(x, a, b, c):
return a * x** b + c
# my data
pred_data = [3.0,1.0,1.0,7.0,6.0,1.0,7.0,4.0,9.0,3.0,5.0,5.0,2.0,6.0,8.0]
actu_data =[ 3.84,1.55,1.15,7.56,6.64,1.09,7.12,4.17,9.45,3.12,5.37,5.65,1.92,6.27,7.63]
popt, pcov = curve_fit(func, pred_data, actu_data)
#adjusting y
yaj = func(sorted(pred_data), *popt)
# plot the data
plt.plot(pred_data,actu_data, 'ro', label = 'Data')
plt.plot(sorted(pred_data),yaj,'b--', label = 'Best fit')
plt.legend()
plt.show()
A better way is of course to define an evenly-spaced high resolution array for your x-values and calculate the fit for this array to have a smoother representation of your fit function:
from scipy.optimize import curve_fit
import numpy as np
from matplotlib import pyplot as plt
def func(x, a, b, c):
return a * x** b + c
# my data
pred_data = [3.0,1.0,1.0,7.0,6.0,1.0,7.0,4.0,9.0,3.0,5.0,5.0,2.0,6.0,8.0]
actu_data =[ 3.84,1.55,1.15,7.56,6.64,1.09,7.12,4.17,9.45,3.12,5.37,5.65,1.92,6.27,7.63]
popt, pcov = curve_fit(func, pred_data, actu_data)
xaj = np.linspace(min(pred_data), max(pred_data), 1000)
yaj = func(xaj, *popt)
# plot the data
plt.plot(pred_data,actu_data, 'ro', label = 'Data')
plt.plot(xaj, yaj,'b--', label = 'Best fit')
plt.legend()
plt.show()

Specify a function with unknown coefficients as well as data and find coefficients in python

I have a function: f(theta) = a+b*cos(theta - c) as well as sampled data. I'd like to find the coefficients a, b, and c that minimize mean square error. Any idea if there's an efficient way to do this in python?
EDIT:
import numpy as np
from scipy.optimize import curve_fit
#definition of the function
def myfunc(x, a, b, c):
return a + b * np.cos(x - c)
#sample data
x_data = [0, 60, 120, 180, 240, 300]
y_data = [25, 40, 70, 30, 10, 15]
#the actual curve fitting procedure, a, b, c are stored in popt
popt, _pcov = curve_fit(myfunc, x_data, y_data)
print(popt)
print(np.degrees(popt[2]))
#the rest is just a graphic representation of the data points and the fitted curve
from matplotlib import pyplot as plt
#x_fit = np.linspace(-1, 6, 1000)
y_fit = myfunc(x_data, *popt)
plt.plot(x_data, y_data, "ro")
plt.plot(x_data, y_fit, "b")
plt.xlabel(r'$\theta$ (degrees)');
plt.ylabel(r'$f(\theta)$');
plt.legend()
plt.show()
Here is a picture showing how the curve doesn't really fit the points. It seems like the amplitude should be higher. The local mins and maxes appear to be in the right places.
scipy.optimize.curve_fit makes it really easy to fit data points to your custom function:
import numpy as np
from scipy.optimize import curve_fit
#definition of the function
def myfunc(x, a, b, c):
return a + b * np.cos(x - c)
#sample data
x_data = np.arange(5)
y_data = 2.34 + 1.23 * np.cos(x_data + .23)
#the actual curve fitting procedure, a, b, c are stored in popt
popt, _pcov = curve_fit(myfunc, x_data, y_data)
print(popt)
#the rest is just a graphic representation of the data points and the fitted curve
from matplotlib import pyplot as plt
x_fit = np.linspace(-1, 6, 1000)
y_fit = myfunc(x_fit, *popt)
plt.plot(x_data, y_data, "ro", label = "data points")
plt.plot(x_fit, y_fit, "b", label = "fitted curve\na = {}\nb = {}\nc = {}".format(*popt))
plt.legend()
plt.show()
Output:
[ 2.34 1.23 -0.23]
Edit:
Your question update introduces several problems. First, your x-values are in degree, while np.cos expects values in radians. Therefore, we better convert the values with np.deg2rad. The reverse function would be np.rad2deg.
Second, it is a good idea to fit for different frequencies as well, let's introduce an additional parameter for that.
Third, fits are usually quite sensitive to initial guesses. You can provide a parameter p0 in scipy for that.
Fourth, you changed the resolution of the fitted curve to the low resolution of your data points, hence it looks so undersampled. If we address all these problems:
import numpy as np
from scipy.optimize import curve_fit
#sample data
x_data = [0, 60, 120, 180, 240, 300]
y_data = [25, 40, 70, 30, 10, 15]
#definition of the function with additional frequency value d
def myfunc(x, a, b, c, d):
return a + b * np.cos(d * np.deg2rad(x) - c)
#initial guess of parameters a, b, c, d
p_initial = [np.average(y_data), np.average(y_data), 0, 1]
#the actual curve fitting procedure, a, b, c, d are stored in popt
popt, _pcov = curve_fit(myfunc, x_data, y_data, p0 = p_initial)
print(popt)
#we have to convert the phase shift back into degrees
print(np.rad2deg(popt[2]))
#graphic representation of the data points and the fitted curve
from matplotlib import pyplot as plt
#define x_values for a smooth curve representation
x_fit = np.linspace(np.min(x_data), np.max(x_data), 1000)
y_fit = myfunc(x_fit, *popt)
plt.plot(x_data, y_data, "ro", label = "data")
plt.plot(x_fit, y_fit, "b", label = "fit")
plt.xlabel(r'$\theta$ (degrees)');
plt.ylabel(r'$f(\theta)$');
plt.legend()
plt.show()
we get this output:
[34.31293761 26.92479369 2.20852009 1.18144319]
126.53888003953764

Wrong graph with scipy.optimize.curve_fit (similar to moving average)

I am trying to fit an exponential law into my data. My (x,y) sample is rather complicated to explain, so for general understanding and reproducibility I will say that: both variables are float and continuous, 0<=x<=100, and 0<=y<=1.
from scipy.optimize import curve_fit
import numpy
import matplotlib.pyplot as plt
#ydata=[...] is my list with y values, which contains 0 values
#xdata=[...] is my list with x values
transf_y=[]
for i in range(len(ydata)):
transf_y.append(ydata[i]+0.00001) #Adding something to avoid zero values
x=numpy.array(xdata,dtype=float)
y=numpy.array(transf_y,dtype=float)
def func(x, a, c, d):
return a * numpy.exp(-c*x)+d
popt, pcov = curve_fit(func, x, y,p0 = (1, 1e-6, 1))
print ("a = %s , c = %s, d = %s" % (popt[0], popt[1], popt[2]))
xx = numpy.linspace(300, 6000, 1000)
yy = func(xx, *popt)
plt.plot(x,y,label='Original Data')
plt.plot(xx, yy, label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
Now my fitted curve doesn't look anything like a fitted exponential curve. Rather, it looks like a moving average curve as if such curve was added as a trendline on Excel. What could be the problem? If necessary I'll find a way to make the datasets available to make the example reproducible.
This is what I get out of my code (I don't even know why I am getting three elements in the legend while only two are plotted, at least apparently):
A multitude of things:
your plot depicts a original data twice and no discernible fitted data
your data does not seem to be ordered, I assume that is why you get zickzack lines
in your example, your predicted plot will be in the range between 300 and 6000 whereas your raw data 0<=x<=100
That aside, your code is more or less correct and works.
from scipy.optimize import curve_fit
import numpy
import matplotlib.pyplot as plt
xdata=[100.0, 0.0, 90.0, 20.0, 80.0] #is my list with y values, which contains 0 values - edit, you need some raw data which you fit, I inserted some
ydata=[0.001, 1.0, 0.02, 0.56, 0.03] #is my list with x values
transf_y=[]
for i in range(len(ydata)):
transf_y.append(ydata[i]+0.00001) #Adding something to avoid zero values
x1=numpy.array(xdata,dtype=float)
y1=numpy.array(transf_y,dtype=float)
def func(x, a, c, d):
return a * numpy.exp(-c*x)+d
popt, pcov = curve_fit(func, x1, y1,p0 = (1, 1e-6, 1))
print ("a = %s , c = %s, d = %s" % (popt[0], popt[1], popt[2]))
#ok, sorting your data
pairs = []
for i, j in zip(x1, y1):
pairs.append([i,j])
sortedList = sorted(pairs, key = lambda x:x[0])
sorted_x = numpy.array(sortedList)[:,0]
sorted_y = numpy.array(sortedList)[:,1]
#adjusting interval to the limits of your raw data
xx = numpy.linspace(0, 100.0, 1000)
yy = func(xx, *popt)
#and everything looks fine
plt.plot(sorted_x,sorted_y, 'o',label='Original Data')
plt.plot(xx,yy,label='Fitted Data')
plt.legend(loc='upper left')
plt.show()

Sigmoidal curve fit, how to get the value of x when y=0.5

I want to solve the following function so that after fitting, I want to get the value of x when y=0.5.
The function:
import numpy as np
from scipy.optimize import curve_fit
def sigmoid(x, b, c):
y = 1 / (1 + c*np.exp(-b*x))
return y
x_data = [4, 6, 8, 10]
y_data = [0.86, 0.73, 0.53, 0.3]
popt, pcov = curve_fit(sigmoid, x_data, y_data,(28.14,-0.25))
please explain how would you carry out this using python!
Thanks!
When I run your code I get a warning, and popt is the same as your initial guess, (28.14, -0.25). If you try plotting this you'll see that it's essentially a straight line at y == 1 that doesn't fit your data well at all:
from matplotlib import pyplot as plt
x = np.linspace(4, 10, 1000)
y = sigmoid(x, *popt)
fig, ax = plt.subplots(1, 1)
ax.hold(True)
ax.scatter(x_data, y_data, s=50, zorder=20)
ax.plot(x, y, '-k', lw=2)
The problem is that you're initializing with a negative value for the b parameter. Remember that b gets negated, so you're actually exponentiating x times a positive number, which blows up your denominator. Instead you want to initialize with a positive value for b, but perhaps a negative value for c (to give you your negative slope):
popt2, pcov2 = curve_fit(sigmoid, x_data, y_data, (-0.5, 0.1))
y2 = sigmoid(x, *popt2)
ax.plot(x, y2, '-r', lw=2)
To get the value of x at y == 0.5 using nonlinear optimization you need to define an objective function, which could be the square of the difference between 0.5 and sigmoid(x, b, c):
def objective(x, b, c):
return (0.5 - sigmoid(x, b, c)) ** 2
You can then use scipy.optimize.minimize or scipy.optimize.minimize_scalar to find the value of x that minimizes the objective function:
from scipy.optimize import minimize_scalar
res = minimize_scalar(objective, bracket=(4, 10), args=tuple(popt2))
ax.annotate("$y = 0.5$", (res.x, 0.5), (30, 30), textcoords='offset points',
arrowprops=dict(facecolor='black', shrink=0.05), fontsize='x-large')

Categories

Resources