Wrong graph with scipy.optimize.curve_fit (similar to moving average) - python
I am trying to fit an exponential law into my data. My (x,y) sample is rather complicated to explain, so for general understanding and reproducibility I will say that: both variables are float and continuous, 0<=x<=100, and 0<=y<=1.
from scipy.optimize import curve_fit
import numpy
import matplotlib.pyplot as plt
#ydata=[...] is my list with y values, which contains 0 values
#xdata=[...] is my list with x values
transf_y=[]
for i in range(len(ydata)):
transf_y.append(ydata[i]+0.00001) #Adding something to avoid zero values
x=numpy.array(xdata,dtype=float)
y=numpy.array(transf_y,dtype=float)
def func(x, a, c, d):
return a * numpy.exp(-c*x)+d
popt, pcov = curve_fit(func, x, y,p0 = (1, 1e-6, 1))
print ("a = %s , c = %s, d = %s" % (popt[0], popt[1], popt[2]))
xx = numpy.linspace(300, 6000, 1000)
yy = func(xx, *popt)
plt.plot(x,y,label='Original Data')
plt.plot(xx, yy, label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
Now my fitted curve doesn't look anything like a fitted exponential curve. Rather, it looks like a moving average curve as if such curve was added as a trendline on Excel. What could be the problem? If necessary I'll find a way to make the datasets available to make the example reproducible.
This is what I get out of my code (I don't even know why I am getting three elements in the legend while only two are plotted, at least apparently):
A multitude of things:
your plot depicts a original data twice and no discernible fitted data
your data does not seem to be ordered, I assume that is why you get zickzack lines
in your example, your predicted plot will be in the range between 300 and 6000 whereas your raw data 0<=x<=100
That aside, your code is more or less correct and works.
from scipy.optimize import curve_fit
import numpy
import matplotlib.pyplot as plt
xdata=[100.0, 0.0, 90.0, 20.0, 80.0] #is my list with y values, which contains 0 values - edit, you need some raw data which you fit, I inserted some
ydata=[0.001, 1.0, 0.02, 0.56, 0.03] #is my list with x values
transf_y=[]
for i in range(len(ydata)):
transf_y.append(ydata[i]+0.00001) #Adding something to avoid zero values
x1=numpy.array(xdata,dtype=float)
y1=numpy.array(transf_y,dtype=float)
def func(x, a, c, d):
return a * numpy.exp(-c*x)+d
popt, pcov = curve_fit(func, x1, y1,p0 = (1, 1e-6, 1))
print ("a = %s , c = %s, d = %s" % (popt[0], popt[1], popt[2]))
#ok, sorting your data
pairs = []
for i, j in zip(x1, y1):
pairs.append([i,j])
sortedList = sorted(pairs, key = lambda x:x[0])
sorted_x = numpy.array(sortedList)[:,0]
sorted_y = numpy.array(sortedList)[:,1]
#adjusting interval to the limits of your raw data
xx = numpy.linspace(0, 100.0, 1000)
yy = func(xx, *popt)
#and everything looks fine
plt.plot(sorted_x,sorted_y, 'o',label='Original Data')
plt.plot(xx,yy,label='Fitted Data')
plt.legend(loc='upper left')
plt.show()
Related
Gaussian fitted curve showing a tail that does not go back to base-level
Based upon existing topics on Stackoverflow, I have managed to fit a Gaussian curve to my dataset. However, the fitted Gaussian shows one tail that does not go back to base-level (i.e., in the example below, the right tail suddenly stops at a higher y-value compared to the left tail). This surprises me, as per definition a Gaussian should show a perfectly symmetrical bell-shaped curve. How can I generate a Gaussian curve of which both tails are equally long (i.e., the tails stop at the same width measured from the plume center-line) and end at the same base-level (i.e., the same y-value)? The reason I would like to have this, is because in my data sometimes a second peak starts to arise while the first peak did not go back to base-level yet. I would like to separate these peaks by fitting a Gaussian that goes back to base-level, as theoretically each peak should go back to its base-level. Thanks a lot in advance! import numpy as np from lmfit import Model import matplotlib.pyplot as plt from scipy.signal import find_peaks x = np.array([-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0]) y = np.array([1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532]) def gaussian(x, amp, cen, wid): return (amp / (np.sqrt(2*np.pi) * wid)) * np.exp(-(x-cen)**2 / (2*wid**2)) def line(x, slope, intercept): return slope*x + intercept peak_index = find_peaks(y,height=27.6)[0][0] mean = sum(x*y)/np.sum(y) #weighted arithmetic mean mod = Model(gaussian) + Model(line) pars = mod.make_params(amp=max(y), cen=x[peak_index], wid=np.sqrt(sum((x-mean)**2 * y)/sum(y)), slope=0, intercept=1) result = mod.fit(y, pars, x=x) comps = result.eval_components() plt.plot(x, y, 'bo') plt.plot(x, comps['gaussian'], 'k--') Edit: The following example hopefully illustrates why I am interested in this. I have a long data-set in which the signal of different sources are being measured. The data-set is processed such that it generates the arrays x_measured and y_measured that contain the measured values belonging to one source. My program automatically detects the plume that occurs within the measured values, and stores the values of this plume in arrays called x and y. To these x and y arrays, I perform a Gaussian fit. However, sometimes the measured values show that 2 plumes are overlapping, hence there is no measured plume from and back to base-level. An example is given in the code below. My program for these measured values now gives a Gaussian fit whereby the right tail goes to around y=0, but the left tail of the Gaussian fit stops around y=4.5. I would like the left tail to also go back to around y=0. This is, because theoretically I know that each plume should start and go back to the same base-level, and I want to compute the plume-width of such a Gaussian plume. For the example below, the left tail does not go back to around y=0, hence I cannot determine the width of the plume. I would like to have a Gaussian-fit of which both tails go back to the same base-level of y=0, such that I can determine the width of the plume. x_measured = np.arange(-20,245,3) y_measured = np.array([38.7586,38.2323,37.2958,35.9924,34.4196,32.7123,31.0257,29.5169,28.3244,27.5502,27.2458,27.4078,27.9815,28.8728,29.9643,31.1313,32.2545,33.2276,33.9594,34.373,34.4041,34.0009,33.1267,31.7649,29.9247,27.6458,24.9992,22.0845,19.0215,15.9397,12.966,10.2127,7.76834,5.69046,4.00296,2.69719,1.73733,1.06907,0.629744,0.358021,0.201123,0.11878,0.0839719,0.0813392,0.104295,0.151634,0.224209,0.321912,0.441478,0.575581,0.713504,0.843351,0.954777,1.04109,1.09974,1.13118,1.13683,1.11758,1.07369,1.0059,0.917066,0.81321,0.703288,0.597775,0.506678,0.437843,0.396256,0.384633,0.405147,0.461496,0.560387,0.71144,0.925262,1.21022,1.56925,1.99788,2.48458,3.01314,3.56626,4.12898,4.69031,5.24283,5.78014,6.29365,6.77004,7.19071,7.53399,7.78019,7.91889]) x = np.arange(10,104,3) y = np.array([22.4548,23.4302,25.3389,27.9929,30.486,32.0528,33.5527,35.1304,35.9941,36.8606,37.1889,37.723,36.4069,35.9751,33.8824,31.0909,27.4247,23.3213,18.8772,14.3363,11.1075,7.68792,4.54899,2.2057,0,0,0,0,0,0,0.179834,0]) def gaussian(x, amp, cen, wid): return (amp / (np.sqrt(2*np.pi) * wid)) * np.exp(-(x-cen)**2 / (2*wid**2)) def line(x, slope, intercept): return slope*x + intercept peak_index = find_peaks(y,height=27.6)[0][0] mean = sum(x*y)/np.sum(y) #weighted arithmetic mean mod = Model(gaussian) + Model(line) pars = mod.make_params(amp=max(y), cen=x[peak_index], wid=np.sqrt(sum((x-mean)**2 * y)/sum(y)), slope=0, intercept=1) result = mod.fit(y, pars, x=x) comps = result.eval_components() plt.plot(x, y, 'bo') plt.plot(x, comps['gaussian'], 'k--') plt.plot(x_measured,y_measured)
It is unclear why you expect a bimodal fit with the model you defined. Use two different Gaussian functions for your fit, then evaluate the fitted functions for a longer interval x_fit to see the curves returning to baseline: import numpy as np from lmfit import Model import matplotlib.pyplot as plt from scipy.signal import find_peaks x = np.array([-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0]) y = np.array([1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532]) def gaussian1(x, amp1, cen1, wid1): return (amp1 / (np.sqrt(2*np.pi) * wid1)) * np.exp(-(x-cen1)**2 / (2*wid1**2)) def gaussian2(x, amp2, cen2, wid2): return (amp2 / (np.sqrt(2*np.pi) * wid2)) * np.exp(-(x-cen2)**2 / (2*wid2**2)) #peak_index = find_peaks(y,height=27.6)[0][0] #mean = sum(x*y)/np.sum(y) #weighted arithmetic mean mod = Model(gaussian1) + Model(gaussian2) #I just filled in some start values, the details of educated guesses can be filled in later by you pars = mod.make_params(amp1=30, amp2=40, cen1=20, cen2=40, wid1=2, wid2=2) result = mod.fit(y, pars, x=x) print(result.params) x_fit=np.linspace(-30, 120, 500) comps_elem = result.eval_components(x=x_fit) comps_comb = result.eval(x=x_fit) plt.plot(x, y, 'bo') plt.plot(x_fit, comps_comb, 'k') plt.plot(x_fit, comps_elem['gaussian1'], 'k-.') plt.plot(x_fit, comps_elem['gaussian2'], 'k--') plt.show() Sample output: The corresponding scipy.curve_fit function would look like this: import matplotlib.pyplot as plt import numpy as np import pandas as pd from scipy.optimize import curve_fit x = [-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0] y = [1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532] def gauss(x, mu, sigma, A): return A*np.exp(-(x-mu)**2/2/sigma**2) def bimodal(x, mu1, sigma1, A1, mu2, sigma2, A2): return gauss(x, mu1, sigma1, A1) + gauss(x, mu2, sigma2, A2) expected = (20, 2, 30, 40, 2, 40) params, cov = curve_fit(bimodal, x, y, expected) sigma=np.sqrt(np.diag(cov)) x_fit = np.linspace(-20, 120, 500) plt.plot(x_fit, bimodal(x_fit, *params), color='red', lw=3, label='model') plt.plot(x_fit, gauss(x_fit, *params[:3]), color='red', lw=1, ls="--", label='distribution 1') plt.plot(x_fit, gauss(x_fit, *params[3:]), color='red', lw=1, ls=":", label='distribution 2') plt.scatter(x, y, marker="X", color="black", label="original data") plt.legend() print(pd.DataFrame(data={'params': params, 'sigma': sigma}, index=bimodal.__code__.co_varnames[1:])) plt.show()
How to plot error bars in python curve fit?
I'm trying to calculate the error bars and plot them in python. I'm completely beginner in python plotting. Could someone's help how can I do that. Here is my plot Here is my code!! Literally I want the slope and the intercept and fit the deviations to the function. Thanks!! import numpy as np from scipy.optimize import curve_fit import matplotlib.pyplot as mpl """ Fitting Function""" def func(x, a, b): y = a *np.exp(-1*b/x) return y data = np.loadtxt("S005_CP_0011_N20.dat", skiprows=0, dtype=np.float128) xData, yData = np.hsplit(data,2) x = xData[:,0] y = yData[:,0] popt, pcov = curve_fit(func, x, y, sigma = None) fig1= mpl.figure(figsize=(8,6)) mpl.plot(x, func(x, *popt), label="Fit function") mpl.plot(x, y, 'r.', markersize=10, label="data")
The first part of this problem is calculating the error bars. There is no such thing as calculating an error bar, because an error bar represents the accuracy of each data point, and as such, you cannot just use the data you already have to calculate it. For example, if you were plotting age against height (just an arbitrary example) it would be on you to find out how accurate your measurement of height would be - usually this is done by taking an average of multiple measurements. The next part is plotting an error bar. With Matplotlib this is quite simple, as you can just use plt.errorbar(x, y, yerr = error_array, fmt = 'o'), where error_array is the array containing the error bar height for each of your points, and 'o' is just the format of the error bar - in this case a vertical line. For example: import matplotlib.pyplot as plt X = sorted([35,12,58,43,27,39,68]) # Age Y = sorted([1.75, 1.32, 1.65, 1.49, 1.80, 1.67, 1.83]) # Height error_array = [0.02, 0.1, 0.04, 0.03, 0.09, 0.12, 0.01] # Error bar for height fig, ax = plt.subplots() plt.scatter(X, Y) plt.errorbar(X, Y, yerr=error_array) plt.show() EDIT: Oh, one thing I forgot to mention is that you must order your X data, and have your Y data corresponding to that order, so that you have a line graph that makes sense. Do this using the sorted() inbuild function in Python.
Scipy curve_fit giving straight line
So I'm trying to get an exponential curve for some COVID data, but I can't seem to get my curve_fit function to show any sort of curve whatsoever. It's so bad it perfectly overlaps the regression line seaborn generated in my graph. I've tried making both my date and case data smaller/bigger before throwing it into the curve_fit function, but I still either get a similar line and/or an Optimization error. I even tried calculating my function manually but that was (naturally) also way off. #Plot scatter plot for total case count x = df_sb['date_ordinal'] y1 = df_sb['totalcountconfirmed'] y2 = df_sb['totalcountdeaths'] plt.figure(figsize=(14,10)) ax = plt.subplot(1,1,1) # Plot scatter plot along with linear regression line sns.regplot(x='date_ordinal', y='totalcountconfirmed', data=df_sb) # Formatting axes ax.set_xlim(x.min() - 1, x.max() + 10) ax.set_ylim(0, y1.max() + 1) ax.set_xlabel('Date') labels = [dt.date.fromordinal(int(item)) for item in ax.get_xticks()] ax.set_xticklabels(labels) plt.xticks(rotation = 45) plt.ylabel("Total Confirmed Cases") # Exponential Curve from scipy.optimize import curve_fit from scipy.special import expit x_data = df_sb['date_ordinal'].to_numpy() Y_data = df_sb['totalcountconfirmed'].to_numpy() def func(x, a, b, c): return a * expit(-b * x) + c popt, pcov = curve_fit(func, x_data, Y_data, maxfev=10000) a, b, c = popt fit_y = func(x_data, a, b, c) plt.plot(x_data, fit_y) plt.legend(['Total Cases (Linear)','Total Cases (Exponential)']) # Inserting Significant Date Labels add_sig_dates(df_sb, 'totalcountconfirmed') plt.show()
Despite you did not give any access to the data, just by looking at the plot I'm pretty sure you mean def func(x, a, b, c): return a * np.exp(-b * x) + c instead of def func(x, a, b, c): return a * expit(-b * x) + c Since it's an exponential fit, I think you should provide initial guess for parameters in order to achieve good results. This can be done with the p0 argument. For example: p0 = [2 ,1, 0] # < -- just an example, they are bad guesses popt, pcov = curve_fit(func, x_data, Y_data, maxfev=10000, p0=p0)
Specify a function with unknown coefficients as well as data and find coefficients in python
I have a function: f(theta) = a+b*cos(theta - c) as well as sampled data. I'd like to find the coefficients a, b, and c that minimize mean square error. Any idea if there's an efficient way to do this in python? EDIT: import numpy as np from scipy.optimize import curve_fit #definition of the function def myfunc(x, a, b, c): return a + b * np.cos(x - c) #sample data x_data = [0, 60, 120, 180, 240, 300] y_data = [25, 40, 70, 30, 10, 15] #the actual curve fitting procedure, a, b, c are stored in popt popt, _pcov = curve_fit(myfunc, x_data, y_data) print(popt) print(np.degrees(popt[2])) #the rest is just a graphic representation of the data points and the fitted curve from matplotlib import pyplot as plt #x_fit = np.linspace(-1, 6, 1000) y_fit = myfunc(x_data, *popt) plt.plot(x_data, y_data, "ro") plt.plot(x_data, y_fit, "b") plt.xlabel(r'$\theta$ (degrees)'); plt.ylabel(r'$f(\theta)$'); plt.legend() plt.show() Here is a picture showing how the curve doesn't really fit the points. It seems like the amplitude should be higher. The local mins and maxes appear to be in the right places.
scipy.optimize.curve_fit makes it really easy to fit data points to your custom function: import numpy as np from scipy.optimize import curve_fit #definition of the function def myfunc(x, a, b, c): return a + b * np.cos(x - c) #sample data x_data = np.arange(5) y_data = 2.34 + 1.23 * np.cos(x_data + .23) #the actual curve fitting procedure, a, b, c are stored in popt popt, _pcov = curve_fit(myfunc, x_data, y_data) print(popt) #the rest is just a graphic representation of the data points and the fitted curve from matplotlib import pyplot as plt x_fit = np.linspace(-1, 6, 1000) y_fit = myfunc(x_fit, *popt) plt.plot(x_data, y_data, "ro", label = "data points") plt.plot(x_fit, y_fit, "b", label = "fitted curve\na = {}\nb = {}\nc = {}".format(*popt)) plt.legend() plt.show() Output: [ 2.34 1.23 -0.23] Edit: Your question update introduces several problems. First, your x-values are in degree, while np.cos expects values in radians. Therefore, we better convert the values with np.deg2rad. The reverse function would be np.rad2deg. Second, it is a good idea to fit for different frequencies as well, let's introduce an additional parameter for that. Third, fits are usually quite sensitive to initial guesses. You can provide a parameter p0 in scipy for that. Fourth, you changed the resolution of the fitted curve to the low resolution of your data points, hence it looks so undersampled. If we address all these problems: import numpy as np from scipy.optimize import curve_fit #sample data x_data = [0, 60, 120, 180, 240, 300] y_data = [25, 40, 70, 30, 10, 15] #definition of the function with additional frequency value d def myfunc(x, a, b, c, d): return a + b * np.cos(d * np.deg2rad(x) - c) #initial guess of parameters a, b, c, d p_initial = [np.average(y_data), np.average(y_data), 0, 1] #the actual curve fitting procedure, a, b, c, d are stored in popt popt, _pcov = curve_fit(myfunc, x_data, y_data, p0 = p_initial) print(popt) #we have to convert the phase shift back into degrees print(np.rad2deg(popt[2])) #graphic representation of the data points and the fitted curve from matplotlib import pyplot as plt #define x_values for a smooth curve representation x_fit = np.linspace(np.min(x_data), np.max(x_data), 1000) y_fit = myfunc(x_fit, *popt) plt.plot(x_data, y_data, "ro", label = "data") plt.plot(x_fit, y_fit, "b", label = "fit") plt.xlabel(r'$\theta$ (degrees)'); plt.ylabel(r'$f(\theta)$'); plt.legend() plt.show() we get this output: [34.31293761 26.92479369 2.20852009 1.18144319] 126.53888003953764
Plotting confidence and prediction intervals with repeated entries
I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd order polynomial. I would like to also plot confidence and prediction intervals. The method described in this answer seems perfect. However, my dataset (n=2340) has repeated entries for many (x,y) pairs. My resulting plot looks like this: Here is my relevant code (slightly modified from linked answer above): import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels.sandbox.regression.predstd import wls_prediction_std import statsmodels.formula.api as smf from statsmodels.stats.outliers_influence import summary_table d = {'temp': x, 'dens': y} df = pd.DataFrame(data=d) x = df.temp y = df.dens plt.figure(figsize=(6 * 1.618, 6)) plt.scatter(x,y, s=10, alpha=0.3) plt.xlabel('temp') plt.ylabel('density') # points linearly spaced for predictor variable x1 = pd.DataFrame({'temp': np.linspace(df.temp.min(), df.temp.max(), 100)}) # 2nd order polynomial poly_2 = smf.ols(formula='dens ~ 1 + temp + I(temp ** 2.0)', data=df).fit() # this correctly plots my single 2nd-order poly best-fit line: plt.plot(x1.temp, poly_2.predict(x1), 'g-', label='Poly n=2 $R^2$=%.2f' % poly_2.rsquared, alpha=0.9) prstd, iv_l, iv_u = wls_prediction_std(poly_2) st, data, ss2 = summary_table(poly_2, alpha=0.05) fittedvalues = data[:,2] predict_mean_se = data[:,3] predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T predict_ci_low, predict_ci_upp = data[:,6:8].T # check we got the right things print np.max(np.abs(poly_2.fittedvalues - fittedvalues)) print np.max(np.abs(iv_l - predict_ci_low)) print np.max(np.abs(iv_u - predict_ci_upp)) plt.plot(x, y, 'o') plt.plot(x, fittedvalues, '-', lw=2) plt.plot(x, predict_ci_low, 'r--', lw=2) plt.plot(x, predict_ci_upp, 'r--', lw=2) plt.plot(x, predict_mean_ci_low, 'r--', lw=2) plt.plot(x, predict_mean_ci_upp, 'r--', lw=2) The print statements evaluate to 0.0, as expected. However, I need single lines for the polynomial best fit line, and the confidence and prediction intervals (rather than the multiple lines I currently have in my plot). Any ideas? Update: Following first answer from #kpie, I ordered my confidence and prediction interval arrays according to temperature: data_intervals = {'temp': x, 'predict_low': predict_ci_low, 'predict_upp': predict_ci_upp, 'conf_low': predict_mean_ci_low, 'conf_high': predict_mean_ci_upp} df_intervals = pd.DataFrame(data=data_intervals) df_intervals_sort = df_intervals.sort(columns='temp') This achieved desired results:
You need to order your predict values based on temperature. I think* So to get nice curvy lines you will have to use numpy.polynomial.polynomial.polyfit This will return a list of coefficients. You will have to split the x and y data into 2 lists so it fits in the function. You can then plot this function with: def strPolynomialFromArray(coeffs): return("".join([str(k)+"*x**"+str(n)+"+" for n,k in enumerate(coeffs)])[0:-1]) from numpy import * from matplotlib.pyplot import * x = linespace(-15,45,300) # your smooth line will be made of 300 smooth pieces y = exec(strPolynomialFromArray(numpy.polynomial.polynomial.polyfit(xs,ys,degree))) plt.plot(x , y) You can look more into plotting smooth lines here just remember all lines are linear splines, becasue continuous curvature is irrational. I believe that the polynomial fitting is done with least squares fitting (process described here) Good Luck!