How to plot error bars in python curve fit?

How to plot error bars in python curve fit? - python

I'm trying to calculate the error bars and plot them in python. I'm completely beginner in python plotting. Could someone's help how can I do that.
Here is my plot
Here is my code!! Literally I want the slope and the intercept and fit the deviations to the function. Thanks!!
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as mpl
""" Fitting Function"""
def func(x, a, b):
y = a *np.exp(-1*b/x)
return y
data = np.loadtxt("S005_CP_0011_N20.dat", skiprows=0, dtype=np.float128)
xData, yData = np.hsplit(data,2)
x = xData[:,0]
y = yData[:,0]
popt, pcov = curve_fit(func, x, y, sigma = None)
fig1= mpl.figure(figsize=(8,6))
mpl.plot(x, func(x, *popt), label="Fit function")
mpl.plot(x, y, 'r.', markersize=10, label="data")

The first part of this problem is calculating the error bars. There is no such thing as calculating an error bar, because an error bar represents the accuracy of each data point, and as such, you cannot just use the data you already have to calculate it.
For example, if you were plotting age against height (just an arbitrary example) it would be on you to find out how accurate your measurement of height would be - usually this is done by taking an average of multiple measurements.
The next part is plotting an error bar. With Matplotlib this is quite simple, as you can just use plt.errorbar(x, y, yerr = error_array, fmt = 'o'), where error_array is the array containing the error bar height for each of your points, and 'o' is just the format of the error bar - in this case a vertical line. For example:
import matplotlib.pyplot as plt
X = sorted([35,12,58,43,27,39,68]) # Age
Y = sorted([1.75, 1.32, 1.65, 1.49, 1.80, 1.67, 1.83]) # Height
error_array = [0.02, 0.1, 0.04, 0.03, 0.09, 0.12, 0.01] # Error bar for height
fig, ax = plt.subplots()
plt.scatter(X, Y)
plt.errorbar(X, Y, yerr=error_array)
plt.show()
EDIT: Oh, one thing I forgot to mention is that you must order your X data, and have your Y data corresponding to that order, so that you have a line graph that makes sense. Do this using the sorted() inbuild function in Python.

Related

Smoothening a lineplot: Fix TypeError for interpolator1D y_axis

I am trying to smoothen a lineplot with scipy.interpolate. However, for some reason I get an error with this method.
This my code:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.array([1348.4256 , 1342.99776, 1345.86432, 1352.97024, 1353.09312, 1355.0304])
y = np.array([232.2108 , 233.60184, 236.09988, 235.40544, 235.51776, 238.42728])
smooth = interpolate.interp1d(x, y, 'cubic')
y_range = np.linspace(min(y), max(y), 20)
plt.plot(smooth(y_range), y_range)
plt.plot(x, y,
linewidth=1)
plt.plot(smooth(y_range),
y_range)
plt.show()
The error I get is TypeError: descriptor '_y_axis' for '_Interpolator1D' objects doesn't apply to 'interp1d' object
My question is: what can I do to resolve this error and get a smooth plot over this line?

Your smooth() function works fine within the limits of min(x) to max(x) and it returns y values for such operation. When you feed any value that falls beyond the limits, you will get error.
If you want to interpolate for x by specifying values of y, try this code:
smooth_for_y = interpolate.interp1d(y, x, 'cubic')
y_range = np.linspace(min(y), max(y), 20)
The interpolation:
smooth_for_y( y_range )
Output:
array([1348.4256 , 1344.64896129, 1342.64269324, 1342.09678777,
1342.70123678, 1344.14603221, 1346.12116596, 1348.31662995,
1350.4224161 , 1352.12851632, 1353.09777093, 1350.92268063,
1345.07380415, 1337.3796737 , 1329.67182218, 1323.78178249,
1321.54108752, 1324.78127019, 1335.33386338, 1355.0304 ])

Here is a possible solution based on some assumptions (mentioned in the code).
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
# raw data
x = np.array([1348.4256 , 1342.99776, 1345.86432, 1352.97024, 1353.09312, 1355.0304])
y = np.array([232.2108 , 233.60184, 236.09988, 235.40544, 235.51776, 238.42728])
smooth_for_y = interpolate.interp1d(x, y, 'cubic')
x_range = np.linspace(min(x), max(x), 20, endpoint=True)
y_smoothed = smooth_for_y( x_range )
# plot data as red dots
plt.scatter(x, y, linewidth=1, color="red")
# plot smoothed line across data
# assuming first point begins at xmin, ends at xmax, thus ...
# ignoring the sequence of the original data
plt.plot(x_range, y_smoothed)
plt.show()
The plot:

Q: how to calculate a slope with measurments error in python?

I don't know much in python or math for that matter but I need to use it for an assignment. I am trying to calculate the slope of a graph and I need to include the error (STD) in the calculation.
Lets say I calculated y for x (the results are shown in the code), and lets say every measurement has a +-2 error (STD), I found how to show it on the graph but didn't found how to include it in the calculation, so right now the STD is 0 and R^2 is 1 which obviously is wrong
How do I calculate and show the inaccuracy of the results?
from matplotlib import pyplot as plt
import numpy as np
from scipy.stats import linregress
x = [1,2,3,4,5,6,7,8,9]
y = [1,2,3,4,5,6,7,8,9]
std = [2,2,2,2,2,2,2,2,2]
plt.errorbar(x, y, yerr=std, fmt='o', color='cadetblue', ecolor='red')
coef = np.polyfit(x,y,1)
poly1d_fn = np.poly1d(coef)
plt.plot(x, poly1d_fn(x), '--k', label='linear fit')
plt.legend(loc='upper left', fontsize=10)
plt.grid(color='black', linestyle='-', linewidth=0.1)
linregress(x,y)
thanks

Plotting contour lines that show percentage of particles

What I am trying to produce is something similar to this plot:
Which is a contour plot representing 68%, 95%, 99.7% of the particles comprised in two data sets.
So far, I have tried to implement a gaussain KDE estimate, and plotting those particles gaussians on a contour.
Files are added here https://www.dropbox.com/sh/86r9hf61wlzitvy/AABG2mbmmeokIiqXsZ8P76Swa?dl=0
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
# My data
x = RelDist
y = RadVel
# Peform the kernel density estimate
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
fig = plt.figure()
ax = fig.gca()
CS = ax.contour(xi, yi, zi.reshape(xi.shape), colors='darkslateblue')
plt.clabel(CS, inline=1, fontsize=10)
ax.set_xlim(20, 800)
ax.set_ylim(-450, 450)
ax.set_xscale('log')
plt.show()
Producing this:
]2
Where 1) I do not know how to necessarily control the bin number in gaussain kde, 2) The contour labels are all zero, 3) I have no clue on determining the percentiles.
Any help is appreciated.

taken from this example in the matplotlib documentation
you can transform your data zi to a percentage scale (0-1) and then contour plot.
You can also manually determine the levels of the countour plot when you call plt.contour().
Below is an example with 2 randomly generated normal bivariate distributions:
delta = 0.025
x = y = np.arange(-3.0, 3.01, delta)
X, Y = np.meshgrid(x, y)
Z1 = plt.mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = plt.mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
Z = 10* (Z1- Z2)
#transform zi to a 0-1 range
Z = Z = (Z - Z.min())/(Z.max() - Z.min())
levels = [0.68, 0.95, 0.997]
origin = 'lower'
CS = plt.contour(X, Y, Z, levels,
colors=('k',),
linewidths=(3,),
origin=origin)
plt.clabel(CS, fmt='%2.3f', colors='b', fontsize=14)
Using the data you provided the code works just as well:
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
RadVel = np.loadtxt('RadVel.txt')
RelDist = np.loadtxt('RelDist.txt')
x = RelDist
y = RadVel
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
#set zi to 0-1 scale
zi = (zi-zi.min())/(zi.max() - zi.min())
zi =zi.reshape(xi.shape)
#set up plot
origin = 'lower'
levels = [0,0.1,0.25,0.5,0.68, 0.95, 0.975,1]
CS = plt.contour(xi, yi, zi,levels = levels,
colors=('k',),
linewidths=(1,),
origin=origin)
plt.clabel(CS, fmt='%.3f', colors='b', fontsize=8)
plt.gca()
plt.xlim(10,1000)
plt.xscale('log')
plt.ylim(-200,200)

The answer from #Tkanno is programmatically correct but does not do exactly what was asked in the question.
The kde returns the likelihood of a sample according to the modeled distribution. The contour plots are therefore limits on the probability of a sample. The 0.1 contour plot would show the limit beyond which samples have less than 10% of chance to appear according to the modeled distribution. Now by normalising the z value as proposed by Tkanno, it is now relative probabilities that are plotted so in Tkanno's answer the 0.1 contour plot is the limit beyond which samples are 10 times less likely to appear than the most likely sample.
You could very similar contour plots as proposed by Tkanno (yet not smoothed) by doing a 2d-histogram, normalizing by the most frequent bin and plotting the contours with same levels.
This is not to be assimilated with a limit containing 90% of the data.
I think contour plots that encompass given fraction of the data are a bit more complicated to get (cf https://stats.stackexchange.com/questions/68105/contours-containing-a-given-fraction-of-x-y-points and the solution with bag plots).
Apparently there is an implementation of bag plots in R, maybe someone has/will make it for python.
To illustrate the difficulty of solving the question, one can think of a dataset with 100 points. Any volume containing 95 points, excluding 5 would actually answer the question. What is probably implicitly asked is the smallest volume containing 95 points (hence representing the highest likelyhood or density), and this is a combinatorial optimisation problem.

Wrong graph with scipy.optimize.curve_fit (similar to moving average)

I am trying to fit an exponential law into my data. My (x,y) sample is rather complicated to explain, so for general understanding and reproducibility I will say that: both variables are float and continuous, 0<=x<=100, and 0<=y<=1.
from scipy.optimize import curve_fit
import numpy
import matplotlib.pyplot as plt
#ydata=[...] is my list with y values, which contains 0 values
#xdata=[...] is my list with x values
transf_y=[]
for i in range(len(ydata)):
transf_y.append(ydata[i]+0.00001) #Adding something to avoid zero values
x=numpy.array(xdata,dtype=float)
y=numpy.array(transf_y,dtype=float)
def func(x, a, c, d):
return a * numpy.exp(-c*x)+d
popt, pcov = curve_fit(func, x, y,p0 = (1, 1e-6, 1))
print ("a = %s , c = %s, d = %s" % (popt[0], popt[1], popt[2]))
xx = numpy.linspace(300, 6000, 1000)
yy = func(xx, *popt)
plt.plot(x,y,label='Original Data')
plt.plot(xx, yy, label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
Now my fitted curve doesn't look anything like a fitted exponential curve. Rather, it looks like a moving average curve as if such curve was added as a trendline on Excel. What could be the problem? If necessary I'll find a way to make the datasets available to make the example reproducible.
This is what I get out of my code (I don't even know why I am getting three elements in the legend while only two are plotted, at least apparently):

A multitude of things:
your plot depicts a original data twice and no discernible fitted data
your data does not seem to be ordered, I assume that is why you get zickzack lines
in your example, your predicted plot will be in the range between 300 and 6000 whereas your raw data 0<=x<=100
That aside, your code is more or less correct and works.
from scipy.optimize import curve_fit
import numpy
import matplotlib.pyplot as plt
xdata=[100.0, 0.0, 90.0, 20.0, 80.0] #is my list with y values, which contains 0 values - edit, you need some raw data which you fit, I inserted some
ydata=[0.001, 1.0, 0.02, 0.56, 0.03] #is my list with x values
transf_y=[]
for i in range(len(ydata)):
transf_y.append(ydata[i]+0.00001) #Adding something to avoid zero values
x1=numpy.array(xdata,dtype=float)
y1=numpy.array(transf_y,dtype=float)
def func(x, a, c, d):
return a * numpy.exp(-c*x)+d
popt, pcov = curve_fit(func, x1, y1,p0 = (1, 1e-6, 1))
print ("a = %s , c = %s, d = %s" % (popt[0], popt[1], popt[2]))
#ok, sorting your data
pairs = []
for i, j in zip(x1, y1):
pairs.append([i,j])
sortedList = sorted(pairs, key = lambda x:x[0])
sorted_x = numpy.array(sortedList)[:,0]
sorted_y = numpy.array(sortedList)[:,1]
#adjusting interval to the limits of your raw data
xx = numpy.linspace(0, 100.0, 1000)
yy = func(xx, *popt)
#and everything looks fine
plt.plot(sorted_x,sorted_y, 'o',label='Original Data')
plt.plot(xx,yy,label='Fitted Data')
plt.legend(loc='upper left')
plt.show()

Plotting confidence and prediction intervals with repeated entries

I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd order polynomial. I would like to also plot confidence and prediction intervals. The method described in this answer seems perfect. However, my dataset (n=2340) has repeated entries for many (x,y) pairs. My resulting plot looks like this:
Here is my relevant code (slightly modified from linked answer above):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import summary_table
d = {'temp': x, 'dens': y}
df = pd.DataFrame(data=d)
x = df.temp
y = df.dens
plt.figure(figsize=(6 * 1.618, 6))
plt.scatter(x,y, s=10, alpha=0.3)
plt.xlabel('temp')
plt.ylabel('density')
# points linearly spaced for predictor variable
x1 = pd.DataFrame({'temp': np.linspace(df.temp.min(), df.temp.max(), 100)})
# 2nd order polynomial
poly_2 = smf.ols(formula='dens ~ 1 + temp + I(temp ** 2.0)', data=df).fit()
# this correctly plots my single 2nd-order poly best-fit line:
plt.plot(x1.temp, poly_2.predict(x1), 'g-', label='Poly n=2 $R^2$=%.2f' % poly_2.rsquared,
alpha=0.9)
prstd, iv_l, iv_u = wls_prediction_std(poly_2)
st, data, ss2 = summary_table(poly_2, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T
# check we got the right things
print np.max(np.abs(poly_2.fittedvalues - fittedvalues))
print np.max(np.abs(iv_l - predict_ci_low))
print np.max(np.abs(iv_u - predict_ci_upp))
plt.plot(x, y, 'o')
plt.plot(x, fittedvalues, '-', lw=2)
plt.plot(x, predict_ci_low, 'r--', lw=2)
plt.plot(x, predict_ci_upp, 'r--', lw=2)
plt.plot(x, predict_mean_ci_low, 'r--', lw=2)
plt.plot(x, predict_mean_ci_upp, 'r--', lw=2)
The print statements evaluate to 0.0, as expected.
However, I need single lines for the polynomial best fit line, and the confidence and prediction intervals (rather than the multiple lines I currently have in my plot). Any ideas?
Update:
Following first answer from #kpie, I ordered my confidence and prediction interval arrays according to temperature:
data_intervals = {'temp': x, 'predict_low': predict_ci_low, 'predict_upp': predict_ci_upp, 'conf_low': predict_mean_ci_low, 'conf_high': predict_mean_ci_upp}
df_intervals = pd.DataFrame(data=data_intervals)
df_intervals_sort = df_intervals.sort(columns='temp')
This achieved desired results:

You need to order your predict values based on temperature. I think*
So to get nice curvy lines you will have to use numpy.polynomial.polynomial.polyfit This will return a list of coefficients. You will have to split the x and y data into 2 lists so it fits in the function.
You can then plot this function with:
def strPolynomialFromArray(coeffs):
return("".join([str(k)+"*x**"+str(n)+"+" for n,k in enumerate(coeffs)])[0:-1])
from numpy import *
from matplotlib.pyplot import *
x = linespace(-15,45,300) # your smooth line will be made of 300 smooth pieces
y = exec(strPolynomialFromArray(numpy.polynomial.polynomial.polyfit(xs,ys,degree)))
plt.plot(x , y)
You can look more into plotting smooth lines here just remember all lines are linear splines, becasue continuous curvature is irrational.
I believe that the polynomial fitting is done with least squares fitting (process described here)
Good Luck!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.