How Can I calculate the uncertainty of a fourth degree polynomial fit?

How Can I calculate the uncertainty of a fourth degree polynomial fit? - python

I have an experimental data with relative uncertainties. And I would applied a fourth degree polynomial considering a specified model with logarithms. The issue im facing is how can I get an uncertainty on the fit considering the experimental data. I have no idea how can I do this. Any suggestion would be appreciated.
Thank you.
import numpy as np
import matplotlib.pyplot as plt
x = [121.78, 244.69, 344.278, 411.1165, 778.904, 867.38, 964.079, 1112.076, 1212.948, 1299.142, 1408.013]
y = [2.98E-03, 2.18E-03, 1.64E-03, 1.79E-03, 8.09E-04, 7.54E-04, 6.80E-04, 6.11E-04, 5.83E-04, 5.49E-04, 5.01E-04]
error_abs = [4.72E-05, 3.64E-05, 2.63E-05, 3.38E-05, 1.38E-05, 1.50E-05, 1.15E-05, 1.06E-05, 1.66E-05, 1.51E-05, 8.41E-06] # y_err
# x and y in logarithms
t1 = np.log(x)
t2 = np.log(y)
l = np.linspace(np.min(t1),np.max(t1),1000)
p,cov = np.polyfit(t1,t2,4, cov=True)
fit = np.polyval(p,l)
plt.errorbar(x, y, yerr=error_abs, lw=0.5, capsize=2, capthick=0.5, linestyle='none', marker='s',markersize=3, label='efficiency', color='orange')
plt.plot(np.exp(l),np.exp(fit), label ='Fit')
plt.show()
print(np.sqrt(np.diag(cov)))

Related

Curve fitting using python

I am trying to fit a curve for a set of points using numpy and scipy libraries but am getting a closed curve as shown below.
Could anyone let me know how to fit a curve without closing curve?
The code I followed is:
import numpy as np
from scipy.interpolate import splprep, splev
import matplotlib.pyplot as plt
coords = np.array([(3,8),(3,9),(4,10),(5,11),(6,11), (7,13), (9,13),(10,14),(11,14),(12,14),(14,16),(16,17),(17,18),(18,18),(19,18), (20,19),
(21,19),(22,20),(23,20),(24,21),(26,21),(27,21),(28,21),(30,21),(32,20),(33,20),(32,17),(33,16),(33,15),(34,12), (34,10),(33,10),
(33,9),(33,8),(33,6),(34,6),(34,5)])
tck, u = splprep(coords.T, u=None, s=0.0, per=1)
u_new = np.linspace(u.min(), u.max(), 1000)
x_new, y_new = splev(u_new, tck, der=0)
plt.plot(coords[:,1], coords[:,0], 'ro')
plt.plot(y_new, x_new, 'b--')
plt.show()
Output:
I need output without joining the 1st and last point.
Thank you.

Just set per parameter to 0 in scipy.interpolate.splprep:
tck, u = splprep(coords.T, u=None, s=0.0, per=0)

Fitting a line through 3D x,y,z scatter plot data

I have a handful of data points that cluster along a line in 3d space. I have the x,y,z data in a csv file that I want to import. I would like to find an equation that represents that line, or the plane perpendicular to that line, or whatever is mathematically correct. These data are independent of each other. Maybe there are better ways to do this than what I tried to do but...
I attempted to replicate an old post here that seemed to be doing exactly what I'm trying to do
Fitting a line in 3D
but it seems that maybe updates over the past decade have left the second part of the code not working? Or maybe I'm just doing something wrong. I've included the entire thing that I frankensteined together from this at the bottom. There are two lines that seem to be giving me a problem.
I've snippeted them out here...
import numpy as np
pts = np.add.accumulate(np.random.random((10,3)))
x,y,z = pts.T
# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]
# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]
# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
x = (z - c_xz)/m_xz
y = (z - c_yz)/m_yz
return x,y
#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()
They return this, but no values...
FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]
FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]
I don't know where to go from here. I don't even actually need the plot, I just needed an equation and am ill-equipped to move forward. If anyone knows an easier way to do this, or can point me in the right direction, I'm willing to learn, but I'm very, very lost. Thank you in advance!!
Here is my entire frankensteined code in case that is what is causing the issue.
import pandas as pd
import numpy as np
mydataset = pd.read_csv('line1.csv')
x = mydataset.iloc[:,0]
y = mydataset.iloc[:,1]
z = mydataset.iloc[:,2]
data = np.concatenate((x[:, np.newaxis],
y[:, np.newaxis],
z[:, np.newaxis]),
axis=1)
# Calculate the mean of the points, i.e. the 'center' of the cloud
datamean = data.mean(axis=0)
# Do an SVD on the mean-centered data.
uu, dd, vv = np.linalg.svd(data - datamean)
# Now vv[0] contains the first principal component, i.e. the direction
# vector of the 'best fit' line in the least squares sense.
# Now generate some points along this best fit line, for plotting.
# we want it to have mean 0 (like the points we did
# the svd on). Also, it's a straight line, so we only need 2 points.
linepts = vv[0] * np.mgrid[-100:100:2j][:, np.newaxis]
# shift by the mean to get the line in the right place
linepts += datamean
# Verify that everything looks right.
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d as m3d
ax = m3d.Axes3D(plt.figure())
ax.scatter3D(*data.T)
ax.plot3D(*linepts.T)
plt.show()
# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]
# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]
# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
x = (z - c_xz)/m_xz
y = (z - c_yz)/m_yz
return x,y
print(x,y)
#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()

As was proposed in the old post you refer to, you could also make use of principal component analysis instead of a least squares approach. For that I suggest sklearn.decomposition.PCA from the sklearn package.
An example can be found below using the csv-file you provided.
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
mydataset = pd.read_csv('line1.csv')
x = mydataset.iloc[:,0]
y = mydataset.iloc[:,1]
z = mydataset.iloc[:,2]
coords = np.array((x, y, z)).T
pca = PCA(n_components=1)
pca.fit(coords)
direction_vector = pca.components_
print(direction_vector)
# Create plot
origin = np.mean(coords, axis=0)
euclidian_distance = np.linalg.norm(coords - origin, axis=1)
extent = np.max(euclidian_distance)
line = np.vstack((origin - direction_vector * extent,
origin + direction_vector * extent))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(coords[:, 0], coords[:, 1], coords[:,2])
ax.plot(line[:, 0], line[:, 1], line[:, 2], 'r')

You can get rid of the complaint from leastsquares by adding rcond=None like this:
m_xz, c_xz = np.linalg.lstsq(A_xz, z, rcond=None)[0]
Is this the right decision for your situation? I have no idea. But there's more about it in the docs.
When I run your code with your inputs it seems to run just fine and I get values assigned to m_xz, c_xz, etc. If you don't call them explicitly with print('m_xz') (or whatever) then you won't see them.
m_xz
Out[42]: 5.186132604596112
c_xz
Out[43]: 62.5764694106141
Also, you reference your data in kind of two different ways. You get x, y, and z from your csv, but also put it into a numpy array. You can get rid of the duplication and pandas by just using numpy:
data = np.genfromtxt('line1.csv', delimiter=',', skip_header=1)
x = data[:,0]
y = data[:,1]
z = data[:,2]

Plotting and modeling data with lmfit - Fit doesn't match data. What am I doing wrong?

I have some data I'm trying to model with lmfit's Model.
Specifically, I'm measuring superconducting resistors. I'm trying fit the experimental data (resistance vs. temperature) to a model which incorporates the critical temperature Tc (material dependent), the resistance below Tc (nominally 0), and the resistance above Tc (structure dependent).
Here's a simplified version (with simulated data) of the code I'm using to plot my data, along with the output plot.
I'm not getting any errors but, as you can see, I'm also not getting a fit that matches my data.
What am I doing wrong? This is my first time using lmfit and Model, so I may be making a newbie mistake. I thought I was following the lmfit example but, as I said, I'm obviously doing something wrong.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lmfit import Model
def main():
x = np.linspace(0, 12, 50)
x_ser = pd.Series(x) # Simulated temperature data
y1 = [0] * 20
y2 = [10] * 30
y1_ser = pd.Series(y1) # Simulated resistance data below Tc
y2_ser = pd.Series(y2) # Simulated resistance data above Tc (
y_ser = y1_ser.append(y2_ser, ignore_index=True)
xcrit_model = Model(data_equation)
params = xcrit_model.make_params(y1_guess=0, y2_guess=12, xcrit_guess=9)
print('params: {}'.format(params))
result = xcrit_model.fit(y_ser, params, x=x_ser)
print(result.fit_report())
plt.plot(x_ser, y_ser, 'bo', label='simulated data')
plt.plot(x_ser, result.init_fit, 'k.', label='initial fit')
plt.plot(x_ser, result.best_fit, 'r:', label='best fit')
plt.legend()
plt.show()
def data_equation(x, y1_guess, y2_guess, xcrit_guess):
x_lt_xcrit = x[x < xcrit_guess]
x_ge_xcrit = x[x >= xcrit_guess]
y1 = [y1_guess] * x_lt_xcrit.size
y1_ser = pd.Series(data=y1)
y2 = [y2_guess] * x_ge_xcrit.size
y2_ser = pd.Series(data=y2)
y = y1_ser.append(y2_ser, ignore_index=True)
return y
if __name__ == '__main__':
main()

lmfit (and basically all similar solvers) work with continuous variables and investigate how they alter the result by making tiny changes in the parameter values and seeing how that effects this fit.
But your xcrit_guess parameter is used only as a discrete variable. If its value changes from 9.0000 to 9.00001, the fit will not change at all.
So, basically, don't do:
x_lt_xcrit = x[x < xcrit_guess]
x_ge_xcrit = x[x >= xcrit_guess]
Instead, you should use a smoother sigmoidal step function. In fact, lmfit has one of these built-in. So you might try something like this (note, there is no point in converting numpy.arrays to pandas.Series - the code will just turn these back to numpy arrays anyway):
import numpy as np
from lmfit.models import StepModel
import matplotlib.pyplot as plt
x = np.linspace(0, 12, 50)
y = 9.5*np.ones(len(x))
y[:26] = 0.0
y = y + np.random.normal(size=len(y), scale=0.0002)
xcrit_model = StepModel(form='erf')
params = xcrit_model.make_params(amplitude=4, center=5, sigma=1)
result = xcrit_model.fit(y, params, x=x)
print(result.fit_report())
plt.plot(x, y, 'bo', label='simulated data')
plt.plot(x, result.init_fit, 'k', label='initial fit')
plt.plot(x, result.best_fit, 'r:', label='best fit')
plt.legend()
plt.show()

Gaussian fitting in Python

I am trying to fit some data with Gaussian fit. This data from lateral flow image. The fitted line (red) does not cover data. Please check my code. In the code x is just index. y actually real data.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y = np.array([2.22097081, 2.24776432, 2.35519896, 2.43780396, 2.49708355,
2.54224971, 2.58350984, 2.62965057, 2.68644093, 2.75454015,
2.82912617, 2.90423835, 2.97921199, 3.05864617, 3.14649922,
3.2430853 , 3.3471892 , 3.45919857, 3.58109399, 3.71275641,
3.84604379, 3.94884214, 3.94108998, 3.72148453, 3.28407665,
2.7651018 ])
x = np.linspace(1,np.mean(y),len(y))
n = len(x)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))/(sigma*np.sqrt(2*np.pi))
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.figure()
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.xlabel('Index')
plt.ylabel('Row Mean')

identify graph uptrend or downtrend

I am attempting to read in data and plot them on to a graph using python (standard line graph). Can someone please advise on how I can classify whether certain points in a graph are uptrends or downtrends programmatically? Which would be the most optimal way to achieve this? Surely this is a solved problem and a mathematical equation exists to identify this?
here is some sample data with some up trends and downtrends
x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
y = [2,5,7,9,10,13,16,18,21,22,21,20,19,18,17,14,10,9,7,5,7,9,10,12,13,15,16,17,22,27]
thanks in advance

A simple way would be to look at the 'rate in change of y with respect to x', known as the derivative. This usually works better with continuous (smooth) functions, and so you could implement it with your data by interpolating your data with an n-th order polynomial as already suggested. A simple implementation would look something like this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.misc import derivative
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,\
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30])
y = np.array([2,5,7,9,10,13,16,18,21,22,21,20,19,18,\
17,14,10,9,7,5,7,9,10,12,13,15,16,17,22,27])
# Simple interpolation of x and y
f = interp1d(x, y)
x_fake = np.arange(1.1, 30, 0.1)
# derivative of y with respect to x
df_dx = derivative(f, x_fake, dx=1e-6)
# Plot
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.errorbar(x, y, fmt="o", color="blue", label='Input data')
ax1.errorbar(x_fake, f(x_fake), label="Interpolated data", lw=2)
ax1.set_xlabel("x")
ax1.set_ylabel("y")
ax2.errorbar(x_fake, df_dx, lw=2)
ax2.errorbar(x_fake, np.array([0 for i in x_fake]), ls="--", lw=2)
ax2.set_xlabel("x")
ax2.set_ylabel("dy/dx")
leg = ax1.legend(loc=2, numpoints=1,scatterpoints=1)
leg.draw_frame(False)
You see that when the plot transitions from an 'upwards trend' (positive gradient) to a 'downwards trend' (negative gradient) the derivative (dy/dx) goes from positive to negative. The transition of this happens at dy/dx = 0, which is shown by the green dashed line. For the scipy routines you can look at:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html
http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
NumPy's diff/gradient should also work, and not require the interpolation, but I showed the above so you could get the idea. For a complete mathemetical description of differentiation/calculus, look at wikipedia.

I found this topic very important and interesting. I would like to extend the above-mentioned answer:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.misc import derivative
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,\
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30])
y = np.array([2,5,7,9,10,13,16,18,21,22,21,20,19,18,\
17,14,10,9,7,5,7,9,10,12,13,15,16,17,22,27])
# Simple interpolation of x and y
f = interp1d(x, y, fill_value="extrapolate")
x_fake = np.arange(1.1, 30, 0.1)
# derivative of y with respect to x
df_dx = derivative(f, x_fake, dx=1e-6)
plt.plot(x,y, label = "Data")
plt.plot(x_fake,df_dx,label = "Trend")
plt.legend()
plt.show()
average = np.average(df_dx)
if average > 0 :
print("Uptrend", average)
elif average < 0:
print("Downtrend", average)
elif average == 0:
print("No trend!", average)
print("Max trend measure is:")
print(np.max(df_dx))
print("min trend measure is:")
print(np.min(df_dx))
print("Overall trend measure:")
print(((np.max(df_dx))-np.min(df_dx)-average)/((np.max(df_dx))-np.min(df_dx)))
extermum_list_y = []
extermum_list_x = []
for i in range(0,df_dx.shape[0]):
if df_dx[i] < 0.001 and df_dx[i] > -0.001:
extermum_list_x.append(x_fake[i])
extermum_list_y.append(df_dx[i])
plt.scatter(extermum_list_x, extermum_list_y, label="Extermum", marker = "o", color = "green")
plt.plot(x,y, label = "Data")
plt.plot(x_fake, df_dx, label="Trend")
plt.legend()
plt.show()
So, in overall the total trend is upward for this graph!
This approach is also nice when you want to find the x where the slope is zero; for example, the extremum in the curves. The local minimum and maximum points are found with the best accuracy and computation time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How Can I calculate the uncertainty of a fourth degree polynomial fit? - python

Related

Curve fitting using python

Fitting a line through 3D x,y,z scatter plot data

Plotting and modeling data with lmfit - Fit doesn't match data. What am I doing wrong?

Gaussian fitting in Python

identify graph uptrend or downtrend

Categories

Resources