Related
]2
I want to plot the regression line and calculate the regression coefficient. Along with that the R square/Goodness of fit.
I have determined the regression line on the curve and calculated the coefficient of that regression line but not be able to calculate the goodness of fit for those.
Kindly help me to solve that problem.
The code calculated the regression coefficient and draw the regression line on the plot. But i am not be able to calculate the R squared/Goodness of fit.
Code of goodness of fit is also written but giving an error "TypeError: can't convert type 'ndarray' to numerator/denominator"
The error is related with the "def coefficient of determination part"
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from statistics import mean
from matplotlib import style
style.use('ggplot')
#loading of imageries
im1 = Image.open('D:\\code\\test\\subset\\image1.tif')
im2 = Image.open('D:\\code\\test\\subset\\image2.tif')
#im.show()
#Conversion into array
img1 = np.array(im1)
img2 = np.array(im2)
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
def squared_error(ys_orig,ys_line):
return sum((ys_line - ys_orig) * (ys_line - ys_orig))
def coefficient_of_determination(ys_orig,ys_line):
y_mean_line = [mean(ys_orig) for y in ys_orig]
squared_error_regr = squared_error(ys_orig, ys_line)
squared_error_y_mean = squared_error(ys_orig, ys_mean_line)
return 1 - (squared_error_regr/squared_error_y_mean)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
b_0, b_1 = estimate_coef(img1, img2)
regression_line = [(b_0*x)+b_1 for x in img1]
r_squared = coefficient_of_determination(img2,regression_line)
print(r_squared)
def main():
# observations
x = img1
y = img2
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \ \nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
I'd like to make a Gaussian Fit for some data that has a rough gaussian fit. I'd like the information of data peak (A), center position (mu), and standard deviation (sigma), along with 95% confidence intervals for these values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import norm
# gaussian function
def gaussian_func(x, A, mu, sigma):
return A * np.exp( - (x - mu)**2 / (2 * sigma**2))
# generate toy data
x = np.arange(50)
y = [ 97.04421053, 96.53052632, 96.85684211, 96.33894737, 96.85052632,
96.30526316, 96.87789474, 96.75157895, 97.05052632, 96.73473684,
96.46736842, 96.23368421, 96.22526316, 96.11789474, 96.41263158,
96.32631579, 96.33684211, 96.44421053, 96.48421053, 96.49894737,
97.30105263, 98.58315789, 100.07368421, 101.43578947, 101.92210526,
102.26736842, 101.80421053, 101.91157895, 102.07368421, 102.02105263,
101.35578947, 99.83578947, 98.28, 96.98315789, 96.61473684,
96.82947368, 97.09263158, 96.82105263, 96.24210526, 95.95578947,
95.84210526, 95.67157895, 95.83157895, 95.37894737, 95.25473684,
95.32842105, 95.45684211, 95.31578947, 95.42526316, 95.30526316]
plt.scatter(x,y)
# initial_guess_of_parameters
# この値はソルバーとかで求めましょう.
parameter_initial = np.array([652, 2.9, 1.3])
# estimate optimal parameter & parameter covariance
popt, pcov = curve_fit(gaussian_func, x, y, p0=parameter_initial)
# plot result
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_func(xd, popt[0], popt[1], popt[2])
plt.plot(xd, estimated_curve, label="Estimated curve", color="r")
plt.legend()
plt.savefig("gaussian_fitting.png")
plt.show()
# estimate standard Error
StdE = np.sqrt(np.diag(pcov))
# estimate 95% confidence interval
alpha=0.025
lwCI = popt + norm.ppf(q=alpha)*StdE
upCI = popt + norm.ppf(q=1-alpha)*StdE
# print result
mat = np.vstack((popt,StdE, lwCI, upCI)).T
df=pd.DataFrame(mat,index=("A", "mu", "sigma"),
columns=("Estimate", "Std. Error", "lwCI", "upCI"))
print(df)
Data Plot with Fitted Curve
The data peak and center position seems correct, but the standard deviation is off. Any input is greatly appreciated.
Your scatter indeed looks similar to a gaussian distribution, but it is not centered around zero. Given the specifics of the Gaussian function it will therefor be hard to nicely fit a Gaussian distribution to the data the way you gave us. I would therefor propose by starting with demeaning the x series:
x = np.arange(0, 50) - 24.5
Next I would add one additional parameter to your gaussian function, the offset. Since the regular Gaussian function will always have its tails close to zero it is impossible to otherwise nicely fit your scatterplot:
def gaussian_function(x, A, mu, sigma, offset):
return A * np.exp(-np.power((x - mu)/sigma, 2.)/2.) + offset
Next you should define an error_loss_function to minimise:
def error_loss_function(params):
gaussian = gaussian_function(x, params[0], params[1], params[2], params[3])
errors = gaussian - y
return sum(np.power(errors, 2)) # You can also pick a different error loss function!
All that remains is fitting our curve now:
fit = scipy.optimize.minimize(fun=error_loss_function, x0=[2, 0, 0.2, 97])
params = fit.x # A: 6.57592661, mu: 1.95248855, sigma: 3.93230503, offset: 96.12570778
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_function(xd, params[0], params[1], params[2], params[3])
plt.plot(xd, estimated_curve, label="Estimated curve", color="b")
plt.legend()
plt.show(block=False)
Hopefully this helps. Looks like a fun project, let me know if my answer is not clear.
I have a bunch of x, y points that represent a sigmoidal function:
x=[ 1.00094909 1.08787635 1.17481363 1.2617564 1.34867881 1.43562284
1.52259341 1.609522 1.69631283 1.78276102 1.86426648 1.92896789
1.9464453 1.94941586 2.00062852 2.073691 2.14982808 2.22808316
2.30634034 2.38456905 2.46280126 2.54106611 2.6193345 2.69748825]
y=[-0.10057627 -0.10172142 -0.10320428 -0.10378959 -0.10348456 -0.10312503
-0.10276956 -0.10170055 -0.09778279 -0.08608644 -0.05797392 0.00063599
0.08732999 0.16429878 0.2223306 0.25368884 0.26830932 0.27313931
0.27308756 0.27048902 0.26626313 0.26139534 0.25634544 0.2509893 ]
I use scipy.interpolate.UnivariateSpline() to fit to some cubic spline as follows:
from scipy.interpolate import UnivariateSpline
s = UnivariateSpline(x, y, k=3, s=0)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x,y)
plt.plot(xfit, s(xfit))
plt.show()
This is what I get:
Since I specify s=0, the spline adheres completely to the data, but there are too many wiggles. Using a higher k value leads to even more wiggles.
So my questions are --
How should I correctly use scipy.interpolate.UnivariateSpline() to fit my data? More precisely, how do I make the spline minimise its wiggling?
Is this even the correct choice for this kind of a sigmoidal function? Should I be using something like scipy.optimize.curve_fit() with a trial tanh(x) function instead?
There are several options, I list a few below. The last one seems to give the best output. Whether you should use a spline or an actual function depends on what you want to do with the output; I list two analytical functions below that could be used but I don't know in which context the data were derived so it is hard to find the best one for you.
You can play with s, e.g. for s=0.005, the plot looks like this (still not extremely pretty but you could further adjust):
But I would indeed use a "proper" function and fit using e.g. curve_fit. The function below is still not ideal as it is monotonically increasing, so we miss the decrease at the end; the plot looks as follows:
This is the entire code, for both the spline and the actual fit:
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def func(x, ymax, n, k, c):
return ymax * x ** n / (k ** n + x ** n) + c
x=np.array([ 1.00094909, 1.08787635, 1.17481363, 1.2617564, 1.34867881, 1.43562284,
1.52259341, 1.609522, 1.69631283, 1.78276102, 1.86426648, 1.92896789,
1.9464453, 1.94941586, 2.00062852, 2.073691, 2.14982808, 2.22808316,
2.30634034, 2.38456905, 2.46280126, 2.54106611, 2.6193345, 2.69748825])
y=np.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
-0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392, 0.00063599,
0.08732999, 0.16429878, 0.2223306, 0.25368884, 0.26830932, 0.27313931,
0.27308756, 0.27048902, 0.26626313, 0.26139534, 0.25634544, 0.2509893 ])
popt, pcov = curve_fit(func, x, y, p0=[y.max(), 2, 2, -0.1], bounds=([0, 0, 0, -0.2], [0.4, 45, 2000, 10]))
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, func(xfit, *popt))
plt.show()
s = UnivariateSpline(x, y, k=3, s=0.005)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, s(xfit))
plt.show()
A third option is to use a more advanced function that can also reproduce the decrease at the end and differential_evolution for the fit; that seems to give the best fit:
The code is as follows (using the same data as above):
from scipy.optimize import curve_fit, differential_evolution
def sigmoid_with_decay(x, a, b, c, d, e, f):
return a * (1. / (1. + np.exp(-b * (x - c)))) * (1. / (1. + np.exp(d * (x - e)))) + f
def error_sigmoid_with_decay(parameters, x_data, y_data):
return np.sum((y_data - sigmoid_with_decay(x_data, *parameters)) ** 2)
res = differential_evolution(error_sigmoid_with_decay,
bounds=[(0, 10), (0, 25), (0, 10), (0, 10), (0, 10), (-1, 0.1)],
args=(x, y),
seed=42)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, sigmoid_with_decay(xfit, *res.x))
plt.show()
The fit is quite sensitive regarding the bounds, so be careful when you play with that...
This illustrates the result of fitting two halves of the data to different functions, the lower half to all data with X < 2.0 and the upper half to all data with X >= 1.9, so that there is overlap in the data for the fitted curves. The code switches from one equation to another at the center of the overlap region, X = 1.95.
import numpy, matplotlib
import matplotlib.pyplot as plt
xData=numpy.array([ 1.00094909, 1.08787635, 1.17481363, 1.2617564, 1.34867881, 1.43562284,
1.52259341, 1.609522, 1.69631283, 1.78276102, 1.86426648, 1.92896789,
1.9464453, 1.94941586, 2.00062852, 2.073691, 2.14982808, 2.22808316,
2.30634034, 2.38456905, 2.46280126, 2.54106611, 2.6193345, 2.69748825])
yData=numpy.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
-0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392, 0.00063599,
0.08732999, 0.16429878, 0.2223306, 0.25368884, 0.26830932, 0.27313931,
0.27308756, 0.27048902, 0.26626313, 0.26139534, 0.25634544, 0.2509893 ])
# function for x < 1.95 (fitted up to 2.0 for overlap)
def lowerFunc(x_in): # Bleasdale-Nelder Power With Offset
# coefficients
a = -1.1431476643503597E+03
b = 3.3819340844164983E+21
c = -6.3633178925040745E+01
d = 3.1481973843740194E+00
Offset = -1.0300724909782859E-01
temp = numpy.power(a + b * numpy.power(x_in, c), -1.0 / d)
temp += Offset
return temp
# function for x >= 1.95 (fitted down to 1.9 for overlap)
def upperFunc(x_in): # rational equation with Offset
# coefficients
a = -2.5294212380048242E-01
b = 1.4262697377369586E+00
c = -2.6141935706529118E-01
d = -8.8730045918252121E-02
Offset = -4.8283287597672708E-01
temp = (a * numpy.power(x_in, 2) + b * numpy.log(x_in)) # numerator
temp /= (1.0 + c * numpy.power(numpy.log(x_in), -1) + d * numpy.exp(x_in)) # denominator
temp += Offset
return temp
def combinedFunc(x_in):
returnVal = []
for x in x_in:
if x < 1.95:
returnVal.append(lowerFunc(x))
else:
returnVal.append(upperFunc(x))
return returnVal
modelPredictions = combinedFunc(xData)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = combinedFunc(xModel)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
I am very new to Gaussian processes and python as well.
I am trying to produce a very simple Gaussian regression for a 3d model.
I have a very simple Python code for a function:
import numpy as np
def exponential_cov(x, y, params):
return params[0] * np.exp( -0.5 * params[1] * np.subtract.outer(x, y)**2)
def conditional(x_new, x, y, params):
B = exponential_cov(x_new, x, params)
C = exponential_cov(x, x, params)
A = exponential_cov(x_new, x_new, params)
mu = np.linalg.inv(C).dot(B.T).T.dot(y)
sigma = A - B.dot(np.linalg.inv(C).dot(B.T))
return(mu.squeeze(), sigma.squeeze())
import matplotlib.pylab as plt
# GP PRIOR
tu = [1, 10]
Si_tu = exponential_cov(0, 0, tu)
xpts = np.arange(-5, 5, step=0.01)
plt.errorbar(xpts, np.zeros(len(xpts)), yerr=Si_tu, capsize=0, color='#95daed', alpha=0.5, label='error') #error
plt.plot(xpts, np.zeros(len(xpts)), linestyle='dashed', color='#3105b2', linewidth=2.5, label='mu'); #mu
# GP FOR 1ST POINT
x = [1.]
y = np.sin(x)+np.cos(np.sqrt(15)*x)
Si_1 = exponential_cov(x, x, tu)
def predict(x, data, kernel, params, sigma, t):
k = [kernel(x, y, params) for y in data]
Sinv = np.linalg.inv(sigma)
y_pred = np.dot(k, Sinv).dot(t)
sigma_new = kernel(x, x, params) - np.dot(k, Sinv).dot(k)
return y_pred, sigma_new
x_pred = np.linspace(-5, 5, 1000) #change step here!!
print "x_pred="
print(x_pred)
predictions = [predict(i, x, exponential_cov, tu, Si_1, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
print "y_pred ="
print(y_pred )
print "sigmas ="
print(sigmas )
# GP FOR 2ND POINT
m, s = conditional([-1], x, y, tu)
y2 = np.sin(-1)+np.cos(np.sqrt(15)*(-1))
x.append(-1)
y=np.append(y,y2)
Si_2 = exponential_cov(x, x, tu)
predictions = [predict(i, x, exponential_cov, tu, Si_2, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
print "y_pred ="
print(y_pred )
print "sigmas ="
print(sigmas )
By using this code I get very nice fitting results for the function np.sin(x) + np.cos(np.sqrt(15) * x), but what I really want to do is to try the same Gaussian process for the function Z = np.sin(2*X) * np.cos(2*Y) / 2.
I know that the idea is basically the same, but I cannot adapt my python code to the [x,y] input to obtain z.
I will really appreciate your help, hints or links!
In the previous, the input of your function is 1-D, and then the new function is 2-D. So you have to change the covariance function, for example, use ard-based kernel, please refer to cook book for kernel. Also, you can do the isotropic kernel for 2-D, just make sure the suitable distance function (e.g. L2-norm) and the single lengthscale you choose.
I am trying to fit some data points with y uncertainties in python. The data are labeled in python as x,y and yerr.
I need to do a linear fit on that data in loglog scale. As a reference if the fit results are properly, i compare the python results with the ones from Scidavis
I tried curve_fit with
def func(x, a, b):
return np.exp(a* np.log(x)+np.log(b))
popt, pcov = curve_fit(func, x, y,sigma=yerr)
as well as kmpfit with
def funcL(p, x):
a,b = p
return ( np.exp(a*np.log(x)+np.log(b)) )
def residualsL(p, data):
a,b=p
x, y, errorfit = data
return (y-funcL(p,x)) / errorfit
a0=1
b0=0.1
p0 = [a0,b0]
fitterL = kmpfit.Fitter(residuals=residualsL, data=(x,y,yerr))
fitterL.parinfo = [{}, {}]
fitterL.fit(params0=p0)
and when i am trying to fit the data with one of those without uncertainties (i.e setting yerr=1), everything works just fine and the results are identical with the ones from scidavis. But if i set yerr to the uncertainties of the data file i get some disturbing results.
In python i get i.e. a=0.86 and in scidavis a=0.14. I read something about that the errors are included as weights. Do i have to change anything, in order to calculate the fit correctly? Or what am i doing wrong?
edit: here is an example of a data file (x,y,yerr)
3.942387e-02 1.987800e+00 5.513165e-01
6.623142e-02 7.126161e+00 1.425232e+00
9.348280e-02 1.238530e+01 1.536208e+00
1.353088e-01 1.090471e+01 7.829126e-01
2.028446e-01 1.023087e+01 3.839575e-01
3.058446e-01 8.403626e+00 1.756866e-01
4.584524e-01 7.345275e+00 8.442288e-02
6.879677e-01 6.128521e+00 3.847194e-02
1.032592e+00 5.359025e+00 1.837428e-02
1.549152e+00 5.380514e+00 1.007010e-02
2.323985e+00 6.404229e+00 6.534108e-03
3.355974e+00 9.489101e+00 6.342546e-03
4.384128e+00 1.497998e+01 2.273233e-02
and the result:
in python:
without uncertainties: a=0.06216 +/- 0.00650 ; b=8.53594 +/- 1.13985
with uncertainties: a=0.86051 +/- 0.01640 ; b=3.38081 +/- 0.22667
in scidavis:
without uncertainties: a = 0.06216 +/- 0.08060; b = 8.53594 +/- 1.06763
with uncertainties: a = 0.14154 +/- 0.005731; b = 7.38213 +/- 2.13653
I must be misunderstanding something. Your posted data does not look anything like
f(x,a,b) = np.exp(a*np.log(x)+np.log(b))
The red line is the result of scipy.optimize.curve_fit,
the green line is the result of scidavis.
My guess is that neither algorithm is converging toward a good fit, so it is not surprising that the results do not match.
I can't explain how scidavis finds its parameters, but according to the definitions as I understand them, scipy is finding parameters with lower least squares residuals than scidavis:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimize
def func(x, a, b):
return np.exp(a* np.log(x)+np.log(b))
def sum_square(residuals):
return (residuals**2).sum()
def residuals(p, x, y, sigma):
return 1.0/sigma*(y - func(x, *p))
data = np.loadtxt('test.dat').reshape((-1,3))
x, y, yerr = np.rollaxis(data, axis = 1)
sigma = yerr
popt, pcov = optimize.curve_fit(func, x, y, sigma = sigma, maxfev = 10000)
print('popt: {p}'.format(p = popt))
scidavis = (0.14154, 7.38213)
print('scidavis: {p}'.format(p = scidavis))
print('''\
sum of squares for scipy: {sp}
sum of squares for scidavis: {d}
'''.format(
sp = sum_square(residuals(popt, x = x, y = y, sigma = sigma)),
d = sum_square(residuals(scidavis, x = x, y = y, sigma = sigma))
))
plt.plot(x, y, 'bo', x, func(x,*popt), 'r-', x, func(x, *scidavis), 'g-')
plt.errorbar(x, y, yerr)
plt.show()
yields
popt: [ 0.86051258 3.38081125]
scidavis: (0.14154, 7.38213)
sum of squares for scipy: 53249.9915654
sum of squares for scidavis: 239654.84276