fit a sine curve to my data in python, matplotlib - python

Here is my data in excel
I want to fit this data in a sine curve
here is my code,
#Fitting function
def func(x, offset, A, freq, phi):
return offset + A * np.sin(freq * x + phi)
#Experimental x and y data points
# test_df is the input excel df
x_data = test_df['x_data']
y_data = test_df['y_data']
#Plot input data points
plt.plot(x_data, y_data, 'bo', label='experimental-data')
# Initial guess for the parameters
initial_guess = [.38, 2.3, .76, 2.77]
#Perform the curve-fit
popt, pcov = curve_fit(func, x_data, y_data, initial_guess)
print(popt)
#x values for the fitted function
x_fit = np.arange(0.0, 31, 0.01)
#Plot the fitted function
plt.plot(x_fit, func(x_fit, *popt), 'r')
plt.show()
This is the graph.
I think this is not the best fit. I would like to have suggestion to improve the curve fit.

Well, it does not seem to be a mathematical function, as for example for argument value 15 you may have multiple values (f(x) equals what?). Thus, it won't be a classical
interpolation in this case. If you could normalize the data somehow, ie make a function out of it, then you could use numpy.
Simplest approach would be to add some small disturbance where arguments' values are equal. Let's look at an example in your data:
4 0.0326
4 0.014
4 -0.0086
4 0.0067
So, as you can see, you can't tell what's the relation's value for f(4). If you'd disturb the arguments a bit, eg:
3.9 -0.0086
3.95 0.0067
4 0.014
4.05 0.0326
And so on for all such examples from your data file. Simplest approach would be to group these values by their x argument, sort and disturb.
That would obviously introduce some error, but, well...you are curve fitting anyway, right?
To formulate a sine, you have to know the amplitude, frequency and phase: f(x) = A * sin(F*x + p) where A is the amplitude, F is the frequency and p is the phase. Numpy has dedicated methods for this if you've got a proper data set prepared:
How do I fit a sine curve to my data with pylab and numpy?

Related

Finding a better scipy curve_fit! -- exponential function?

I am trying to fit data using scipy curve_fit. I believe that negative exponential is probably best, as this works well for some of my other (similarly generated) data -- but I am achieving sub-optimal results.
I've normalized the dataset to avoid supplying initial values and am applying an exponential function as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.array([[0.32,0.38],[0.61,0.32],[0.28,0.50],[0.60,0.32],[0.26,0.45],[0.19,0.57],[0.61,0.32],[0.59,0.29],[0.39,0.42],[0.61,0.32],[0.20,0.46],[0.24,0.45],[0.59,0.29],[0.39,0.42],[0.56,0.39],[0.32,0.43],[0.38,0.44],[0.54,0.34],[0.61,0.32],[0.20,0.46],[0.28,0.51],[0.54,0.34],[0.60,0.32],[0.30,0.42],[0.28,0.43],[0.14,0.57],[0.24,0.54],[0.39,0.42],[0.20,0.56],[0.56,0.39],[0.24,0.54],[0.33,0.37],[0.33,0.51],[0.20,0.46],[0.32,0.39],[0.20,0.56],[0.19,0.57],[0.32,0.39],[0.30,0.42],[0.33,0.50],[0.54,0.34],[0.28,0.50],[0.32,0.39],[0.28,0.43],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.19,0.57],[0.60,0.32],[0.44,0.41],[0.27,0.42],[0.19,0.57],[0.24,0.38],[0.24,0.54],[0.61,0.32],[0.39,0.40],[0.30,0.41],[0.19,0.57],[0.14,0.57],[0.32,0.43],[0.14,0.57],[0.59,0.29],[0.44,0.41],[0.30,0.41],[0.32,0.38],[0.61,0.32],[0.20,0.46],[0.20,0.56],[0.30,0.41],[0.33,0.36],[0.14,0.57],[0.19,0.57],[0.46,0.38],[0.36,0.44],[0.61,0.32],[0.31,0.48],[0.60,0.32],[0.39,0.40],[0.14,0.57],[0.44,0.41],[0.24,0.49],[0.41,0.40],[0.19,0.57],[0.19,0.57],[0.31,0.49],[0.31,0.43],[0.35,0.35],[0.20,0.46],[0.54,0.34],[0.20,0.56],[0.39,0.44],[0.33,0.36],[0.20,0.56],[0.30,0.41],[0.56,0.39],[0.31,0.48],[0.28,0.51],[0.14,0.57],[0.61,0.32],[0.30,0.50],[0.20,0.56],[0.19,0.57],[0.59,0.31],[0.20,0.56],[0.27,0.42],[0.29,0.48],[0.56,0.39],[0.32,0.39],[0.20,0.56],[0.59,0.29],[0.24,0.49],[0.56,0.39],[0.60,0.32],[0.35,0.35],[0.28,0.50],[0.46,0.38],[0.14,0.57],[0.54,0.34],[0.32,0.38],[0.26,0.45],[0.26,0.45],[0.39,0.42],[0.19,0.57],[0.28,0.51],[0.27,0.42],[0.33,0.50],[0.54,0.34],[0.39,0.40],[0.19,0.57],[0.33,0.36],[0.22,0.44],[0.33,0.51],[0.61,0.32],[0.28,0.51],[0.25,0.50],[0.39,0.40],[0.34,0.35],[0.59,0.31],[0.31,0.49],[0.20,0.46],[0.39,0.46],[0.20,0.50],[0.32,0.39],[0.30,0.41],[0.23,0.44],[0.29,0.53],[0.28,0.50],[0.31,0.48],[0.61,0.32],[0.54,0.34],[0.28,0.53],[0.56,0.39],[0.19,0.57],[0.14,0.57],[0.59,0.29],[0.29,0.48],[0.44,0.41],[0.27,0.51],[0.50,0.29],[0.14,0.57],[0.60,0.32],[0.32,0.39],[0.19,0.57],[0.24,0.38],[0.56,0.39],[0.14,0.57],[0.54,0.34],[0.61,0.38],[0.27,0.53],[0.20,0.46],[0.61,0.32],[0.27,0.42],[0.27,0.42],[0.20,0.56],[0.30,0.41],[0.31,0.51],[0.32,0.39],[0.31,0.51],[0.29,0.48],[0.20,0.46],[0.33,0.51],[0.31,0.43],[0.30,0.41],[0.27,0.44],[0.31,0.51],[0.29,0.48],[0.35,0.35],[0.46,0.38],[0.28,0.51],[0.61,0.38],[0.31,0.49],[0.33,0.51],[0.59,0.29],[0.14,0.57],[0.31,0.51],[0.39,0.40],[0.32,0.39],[0.20,0.56],[0.55,0.31],[0.56,0.39],[0.24,0.49],[0.56,0.39],[0.27,0.50],[0.60,0.32],[0.54,0.34],[0.19,0.57],[0.28,0.51],[0.54,0.34],[0.56,0.39],[0.19,0.57],[0.59,0.31],[0.37,0.45],[0.19,0.57],[0.44,0.41],[0.32,0.43],[0.35,0.48],[0.24,0.49],[0.26,0.45],[0.14,0.57],[0.59,0.30],[0.26,0.45],[0.26,0.45],[0.14,0.57],[0.20,0.50],[0.31,0.45],[0.27,0.51],[0.30,0.41],[0.19,0.57],[0.30,0.41],[0.27,0.50],[0.34,0.35],[0.30,0.42],[0.27,0.42],[0.27,0.42],[0.34,0.35],[0.35,0.35],[0.14,0.57],[0.45,0.36],[0.26,0.45],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.26,0.45],[0.26,0.45],[0.59,0.29],[0.19,0.57],[0.26,0.45],[0.32,0.39],[0.30,0.50],[0.28,0.50],[0.32,0.39],[0.59,0.29],[0.32,0.51],[0.56,0.39],[0.59,0.29],[0.61,0.38],[0.33,0.51],[0.22,0.44],[0.33,0.36],[0.27,0.42],[0.20,0.56],[0.28,0.51],[0.31,0.48],[0.20,0.56],[0.61,0.32],[0.24,0.54],[0.59,0.29],[0.32,0.43],[0.61,0.32],[0.19,0.57],[0.61,0.38],[0.55,0.31],[0.19,0.57],[0.31,0.46],[0.32,0.52],[0.30,0.41],[0.28,0.51],[0.28,0.50],[0.60,0.32],[0.61,0.32],[0.27,0.50],[0.59,0.29],[0.41,0.47],[0.39,0.42],[0.20,0.46],[0.19,0.57],[0.14,0.57],[0.23,0.47],[0.54,0.34],[0.28,0.51],[0.19,0.57],[0.33,0.37],[0.46,0.38],[0.27,0.42],[0.20,0.56],[0.39,0.42],[0.30,0.47],[0.26,0.45],[0.61,0.32],[0.61,0.38],[0.35,0.35],[0.14,0.57],[0.35,0.35],[0.28,0.51],[0.61,0.32],[0.24,0.54],[0.54,0.34],[0.28,0.43],[0.24,0.54],[0.30,0.41],[0.56,0.39],[0.23,0.52],[0.14,0.57],[0.26,0.45],[0.30,0.42],[0.32,0.43],[0.19,0.57],[0.45,0.36],[0.27,0.42],[0.29,0.48],[0.28,0.43],[0.27,0.51],[0.39,0.44],[0.32,0.49],[0.24,0.49],[0.56,0.39],[0.20,0.56],[0.30,0.42],[0.24,0.38],[0.46,0.38],[0.28,0.50],[0.26,0.45],[0.27,0.50],[0.23,0.47],[0.39,0.42],[0.28,0.51],[0.24,0.49],[0.27,0.42],[0.26,0.45],[0.60,0.32],[0.32,0.43],[0.39,0.42],[0.28,0.50],[0.28,0.52],[0.61,0.32],[0.32,0.39],[0.24,0.50],[0.39,0.40],[0.33,0.36],[0.24,0.38],[0.54,0.33],[0.19,0.57],[0.61,0.32],[0.33,0.36],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.34,0.35],[0.24,0.42],[0.27,0.42],[0.54,0.34],[0.54,0.34],[0.24,0.49],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.20,0.50],[0.14,0.57],[0.30,0.41],[0.30,0.41],[0.33,0.36],[0.26,0.45],[0.26,0.45],[0.23,0.47],[0.32,0.39],[0.27,0.53],[0.30,0.41],[0.20,0.46],[0.34,0.35],[0.34,0.35],[0.14,0.57],[0.46,0.38],[0.27,0.42],[0.36,0.44],[0.17,0.51],[0.60,0.32],[0.27,0.42],[0.20,0.56],[0.24,0.49],[0.41,0.40],[0.61,0.38],[0.19,0.57],[0.28,0.50],[0.23,0.52],[0.61,0.32],[0.39,0.46],[0.33,0.51],[0.19,0.57],[0.39,0.44],[0.56,0.39],[0.35,0.35],[0.28,0.43],[0.54,0.34],[0.36,0.44],[0.14,0.57],[0.61,0.38],[0.46,0.38],[0.61,0.32],[0.19,0.57],[0.54,0.34],[0.27,0.53],[0.33,0.51],[0.31,0.51],[0.59,0.29],[0.24,0.42],[0.28,0.43],[0.56,0.39],[0.28,0.50],[0.61,0.32],[0.29,0.48],[0.20,0.46],[0.50,0.29],[0.56,0.39],[0.20,0.50],[0.24,0.38],[0.32,0.39],[0.32,0.43],[0.28,0.50],[0.22,0.44],[0.20,0.56],[0.27,0.42],[0.61,0.38],[0.31,0.49],[0.20,0.46],[0.27,0.42],[0.24,0.38],[0.61,0.32],[0.26,0.45],[0.23,0.44],[0.59,0.30],[0.56,0.39],[0.33,0.44],[0.27,0.42],[0.31,0.51],[0.27,0.53],[0.32,0.39],[0.28,0.51],[0.30,0.42],[0.46,0.38],[0.27,0.42],[0.30,0.47],[0.39,0.40],[0.28,0.43],[0.30,0.42],[0.32,0.39],[0.59,0.31],[0.36,0.44],[0.54,0.34],[0.34,0.35],[0.30,0.41],[0.32,0.49],[0.32,0.43],[0.31,0.51],[0.32,0.52],[0.60,0.32],[0.19,0.57],[0.41,0.47],[0.32,0.39],[0.28,0.43],[0.28,0.51],[0.32,0.51],[0.56,0.39],[0.24,0.45],[0.55,0.31],[0.24,0.43],[0.61,0.38],[0.33,0.51],[0.30,0.41],[0.32,0.47],[0.32,0.38],[0.33,0.51],[0.39,0.40],[0.19,0.57],[0.27,0.42],[0.54,0.33],[0.59,0.29],[0.28,0.51],[0.61,0.38],[0.19,0.57],[0.30,0.41],[0.14,0.57],[0.32,0.39],[0.34,0.35],[0.54,0.34],[0.24,0.54],[0.56,0.39],[0.24,0.49],[0.61,0.32],[0.61,0.38],[0.61,0.32],[0.19,0.57],[0.14,0.57],[0.54,0.34],[0.59,0.29],[0.28,0.43],[0.19,0.57],[0.61,0.32],[0.32,0.43],[0.29,0.48],[0.56,0.39],[0.19,0.57],[0.56,0.39],[0.59,0.29],[0.59,0.29],[0.59,0.30],[0.14,0.57],[0.23,0.44],[0.28,0.50],[0.29,0.48],[0.31,0.45],[0.27,0.51],[0.24,0.45],[0.61,0.38],[0.24,0.49],[0.14,0.57],[0.61,0.32],[0.39,0.40],[0.33,0.44],[0.54,0.33],[0.33,0.51],[0.20,0.50],[0.19,0.57],[0.25,0.50],[0.28,0.43],[0.17,0.51],[0.19,0.57],[0.27,0.42],[0.20,0.56],[0.24,0.38],[0.19,0.57],[0.28,0.50],[0.28,0.50],[0.27,0.42],[0.26,0.45],[0.39,0.42],[0.23,0.47],[0.28,0.43],[0.32,0.39],[0.32,0.39],[0.24,0.54],[0.33,0.36],[0.29,0.53],[0.27,0.42],[0.44,0.41],[0.27,0.42],[0.33,0.36],[0.24,0.43],[0.61,0.38],[0.20,0.50],[0.55,0.31],[0.31,0.46],[0.60,0.32],[0.30,0.41],[0.41,0.47],[0.39,0.40],[0.27,0.53],[0.61,0.38],[0.46,0.38],[0.28,0.43],[0.44,0.41],[0.35,0.35],[0.24,0.49],[0.31,0.43],[0.27,0.42],[0.61,0.38],[0.29,0.48],[0.54,0.34],[0.61,0.32],[0.20,0.56],[0.24,0.49],[0.39,0.40],[0.27,0.42],[0.59,0.29],[0.59,0.29],[0.19,0.57],[0.24,0.54],[0.59,0.31],[0.24,0.38],[0.33,0.51],[0.23,0.44],[0.20,0.46],[0.24,0.45],[0.29,0.48],[0.28,0.50],[0.61,0.32],[0.19,0.57],[0.22,0.44],[0.19,0.57],[0.39,0.44],[0.19,0.57],[0.28,0.50],[0.30,0.41],[0.44,0.41],[0.28,0.52],[0.28,0.43],[0.54,0.33],[0.28,0.50],[0.19,0.57],[0.14,0.57],[0.30,0.41],[0.26,0.45],[0.56,0.39],[0.27,0.51],[0.20,0.46],[0.24,0.38],[0.32,0.38],[0.26,0.45],[0.61,0.32],[0.59,0.29],[0.19,0.57],[0.43,0.45],[0.14,0.57],[0.35,0.35],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.56,0.39],[0.27,0.42],[0.19,0.57],[0.60,0.32],[0.24,0.54],[0.54,0.34],[0.61,0.38],[0.33,0.51],[0.27,0.42],[0.32,0.39],[0.34,0.35],[0.20,0.56],[0.26,0.45],[0.32,0.51],[0.33,0.51],[0.35,0.35],[0.31,0.43],[0.56,0.39],[0.59,0.29],[0.28,0.43],[0.30,0.42],[0.27,0.44],[0.28,0.53],[0.29,0.48],[0.33,0.51],[0.60,0.32],[0.54,0.33],[0.19,0.57],[0.33,0.49],[0.30,0.41],[0.54,0.34],[0.27,0.53],[0.19,0.57],[0.19,0.57],[0.32,0.39],[0.20,0.56],[0.35,0.35],[0.30,0.42],[0.46,0.38],[0.54,0.34],[0.54,0.34],[0.14,0.57],[0.33,0.51],[0.32,0.39],[0.14,0.57],[0.59,0.29],[0.59,0.31],[0.30,0.41],[0.26,0.45],[0.32,0.38],[0.32,0.39],[0.59,0.31],[0.20,0.56],[0.20,0.46],[0.29,0.48],[0.59,0.29],[0.39,0.40],[0.28,0.50],[0.32,0.39],[0.28,0.53],[0.44,0.41],[0.20,0.50],[0.24,0.49],[0.20,0.46],[0.28,0.52],[0.24,0.50],[0.32,0.43],[0.39,0.40],[0.38,0.44],[0.60,0.32],[0.54,0.33],[0.61,0.32],[0.19,0.57],[0.59,0.29],[0.33,0.49],[0.28,0.43],[0.24,0.38],[0.30,0.41],[0.27,0.51],[0.35,0.48],[0.61,0.32],[0.43,0.45],[0.20,0.50],[0.24,0.49],[0.20,0.50],[0.20,0.56],[0.29,0.48],[0.14,0.57],[0.14,0.57],[0.26,0.45],[0.26,0.45],[0.39,0.40],[0.33,0.36],[0.56,0.39],[0.59,0.29],[0.27,0.42],[0.35,0.35],[0.30,0.41],[0.20,0.50],[0.19,0.57],[0.29,0.48],[0.39,0.42],[0.37,0.45],[0.30,0.41],[0.20,0.56],[0.30,0.42],[0.41,0.47],[0.28,0.43],[0.14,0.57],[0.27,0.53],[0.32,0.39],[0.30,0.41],[0.34,0.35],[0.32,0.47],[0.33,0.51],[0.20,0.56],[0.56,0.39],[0.60,0.32],[0.28,0.52],[0.56,0.39],[0.44,0.41],[0.27,0.42],[0.00,1.00],[0.29,0.49],[0.89,0.06],[0.22,0.66],[0.18,0.70],[0.67,0.22],[0.14,0.79],[0.58,0.17],[0.67,0.12],[0.95,0.05],[0.46,0.26],[0.15,0.54],[0.16,0.67],[0.48,0.31],[0.41,0.29],[0.18,0.66],[0.10,0.71],[0.11,0.72],[0.65,0.15],[0.94,0.03],[0.17,0.67],[0.44,0.29],[0.32,0.38],[0.79,0.10],[0.52,0.26],[0.25,0.59],[0.89,0.04],[0.69,0.13],[0.43,0.34],[0.75,0.07],[0.16,0.65],[0.02,0.70],[0.38,0.33],[0.57,0.23],[0.75,0.07],[0.25,0.58],[0.94,0.02],[0.55,0.22],[0.58,0.17],[0.14,0.79],[0.20,0.56],[0.10,0.88],[0.15,0.79],[0.11,0.77],[0.67,0.22],[0.07,0.87],[0.43,0.33],[0.08,0.84],[0.05,0.67],[0.07,0.77],[0.17,0.68],[1.00,0.00],[0.15,0.79],[0.08,0.77],[0.16,0.67],[0.69,0.13],[0.07,0.87],[0.15,0.54],[0.55,0.19],[0.14,0.63],[0.75,0.18],[0.25,0.63],[0.83,0.05],[0.55,0.50],[0.86,0.04],[0.73,0.18],[0.44,0.32],[0.70,0.15],[0.89,0.06],[0.17,0.67],[0.61,0.12],[0.55,0.50],[0.36,0.56],[0.03,0.86],[0.09,0.82],[0.09,0.82],[0.09,0.83],[0.17,0.68],[0.88,0.03],[0.64,0.22],[0.08,0.85],[0.74,0.16],[0.47,0.28],[0.05,0.84],[0.14,0.54],[0.01,0.93],[0.77,0.16],[0.17,0.60],[0.64,0.22],[0.84,0.05],[0.85,0.03],[0.23,0.67],[0.20,0.69],[0.00,0.87],[0.14,0.77],[0.11,0.69],[0.17,0.67],[0.56,0.27],[0.14,0.67],[0.37,0.31],[0.11,0.69],[0.35,0.52],[0.53,0.27],[0.50,0.21],[0.25,0.64],[0.36,0.56],[0.39,0.26],[0.02,0.83],[0.41,0.29],[0.07,0.77],[0.16,0.63],[0.92,0.03],[0.10,0.71],[0.83,0.05],[0.42,0.27],[0.62,0.12],[0.23,0.60],[0.19,0.61],[0.69,0.19],[0.21,0.65],[0.67,0.19],[0.18,0.69],[0.44,0.29],[0.14,0.65],[0.73,0.18],[0.15,0.66],[0.44,0.34],[0.74,0.10],[0.18,0.69],[0.25,0.61],[0.52,0.23],[0.06,0.82],[0.52,0.29],[0.22,0.68],[0.46,0.26],[0.14,0.54],[0.78,0.07],[0.80,0.05],[0.15,0.67],[0.10,0.82],[0.56,0.27],[0.64,0.22],[0.87,0.06],[0.14,0.66],[0.10,0.84],[0.88,0.05],[0.02,0.81],[0.62,0.15],[0.13,0.68],[0.50,0.28],[0.11,0.62],[0.46,0.32],[0.56,0.28],[0.43,0.28],[0.12,0.83],[0.11,0.80],[0.10,0.83],[0.90,0.04],[0.17,0.65],[0.15,0.63],[0.72,0.15],[0.64,0.26],[0.84,0.06],[0.09,0.83],[0.16,0.68],[0.09,0.63],[0.43,0.29],[0.88,0.05],[0.20,0.69],[0.73,0.09],[0.61,0.20],[0.67,0.13],[0.08,0.85],[0.73,0.16],[0.89,0.05],[0.41,0.25],[0.61,0.23],[0.58,0.22],[0.03,0.84],[0.58,0.24],[0.48,0.30],[0.25,0.54],[0.23,0.63],[0.41,0.46],[0.84,0.06],[0.45,0.29],[0.09,0.55],[0.54,0.26],[0.11,0.82],[0.69,0.18],[0.43,0.45],[0.43,0.28],[0.45,0.32],[0.07,0.78],[0.26,0.64],[0.92,0.04],[0.12,0.66],[0.32,0.51],[0.28,0.59],[0.70,0.18]])
x = data[:,0]
y = data[:,1]
def func(x,a,b,c):
return a * np.exp(-b*x) + c
popt, pcov = curve_fit(func, x, y)
a, b, c = popt
x_line = np.arange(min(x), max(x), 0.01)
x_line =np.reshape(x_line,(-1,1))
y_line = func(x_line, a, b, c)
y_line = np.reshape(y_line,(-1,1))
plt.scatter(x,y)
plt.plot(x_line,y_line)
plt.show()
As you can see, the fit deviates towards high x values. example plot I know there are numerous similar questions out there, and I have read many - but my math skills are not phenomenal so i am struggling coming up with a better solution for my particular problem.
I'm not tied to the exponential function - can anyone suggest something better?
I need to do this semi-automatically for hundreds of datasets, so ideally I want something as flexible as possible.
Any help greatly appreciated!
p.s. I am sorry about posting such a large sample dataset - but I figured this kind of question necessitates the actual data, and I didn't want to post links to suspicious looking files.. =)
This is not an optimal solution, but this should work for any kind of density distribution in your data. The idea is to resample the data a given number of times by computing local averages along the x-axis to have evenly distributed points.
#!/usr/bin/python3.6
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def my_floor(a, precision=0):
return np.round(a - 0.5 * 10**(-precision), precision)
data = np.array([[0.32,0.38],[0.61,0.32],[0.28,0.50],[0.60,0.32],[0.26,0.45],[0.19,0.57],[0.61,0.32],[0.59,0.29],[0.39,0.42],[0.61,0.32],[0.20,0.46],[0.24,0.45],[0.59,0.29],[0.39,0.42],[0.56,0.39],[0.32,0.43],[0.38,0.44],[0.54,0.34],[0.61,0.32],[0.20,0.46],[0.28,0.51],[0.54,0.34],[0.60,0.32],[0.30,0.42],[0.28,0.43],[0.14,0.57],[0.24,0.54],[0.39,0.42],[0.20,0.56],[0.56,0.39],[0.24,0.54],[0.33,0.37],[0.33,0.51],[0.20,0.46],[0.32,0.39],[0.20,0.56],[0.19,0.57],[0.32,0.39],[0.30,0.42],[0.33,0.50],[0.54,0.34],[0.28,0.50],[0.32,0.39],[0.28,0.43],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.19,0.57],[0.60,0.32],[0.44,0.41],[0.27,0.42],[0.19,0.57],[0.24,0.38],[0.24,0.54],[0.61,0.32],[0.39,0.40],[0.30,0.41],[0.19,0.57],[0.14,0.57],[0.32,0.43],[0.14,0.57],[0.59,0.29],[0.44,0.41],[0.30,0.41],[0.32,0.38],[0.61,0.32],[0.20,0.46],[0.20,0.56],[0.30,0.41],[0.33,0.36],[0.14,0.57],[0.19,0.57],[0.46,0.38],[0.36,0.44],[0.61,0.32],[0.31,0.48],[0.60,0.32],[0.39,0.40],[0.14,0.57],[0.44,0.41],[0.24,0.49],[0.41,0.40],[0.19,0.57],[0.19,0.57],[0.31,0.49],[0.31,0.43],[0.35,0.35],[0.20,0.46],[0.54,0.34],[0.20,0.56],[0.39,0.44],[0.33,0.36],[0.20,0.56],[0.30,0.41],[0.56,0.39],[0.31,0.48],[0.28,0.51],[0.14,0.57],[0.61,0.32],[0.30,0.50],[0.20,0.56],[0.19,0.57],[0.59,0.31],[0.20,0.56],[0.27,0.42],[0.29,0.48],[0.56,0.39],[0.32,0.39],[0.20,0.56],[0.59,0.29],[0.24,0.49],[0.56,0.39],[0.60,0.32],[0.35,0.35],[0.28,0.50],[0.46,0.38],[0.14,0.57],[0.54,0.34],[0.32,0.38],[0.26,0.45],[0.26,0.45],[0.39,0.42],[0.19,0.57],[0.28,0.51],[0.27,0.42],[0.33,0.50],[0.54,0.34],[0.39,0.40],[0.19,0.57],[0.33,0.36],[0.22,0.44],[0.33,0.51],[0.61,0.32],[0.28,0.51],[0.25,0.50],[0.39,0.40],[0.34,0.35],[0.59,0.31],[0.31,0.49],[0.20,0.46],[0.39,0.46],[0.20,0.50],[0.32,0.39],[0.30,0.41],[0.23,0.44],[0.29,0.53],[0.28,0.50],[0.31,0.48],[0.61,0.32],[0.54,0.34],[0.28,0.53],[0.56,0.39],[0.19,0.57],[0.14,0.57],[0.59,0.29],[0.29,0.48],[0.44,0.41],[0.27,0.51],[0.50,0.29],[0.14,0.57],[0.60,0.32],[0.32,0.39],[0.19,0.57],[0.24,0.38],[0.56,0.39],[0.14,0.57],[0.54,0.34],[0.61,0.38],[0.27,0.53],[0.20,0.46],[0.61,0.32],[0.27,0.42],[0.27,0.42],[0.20,0.56],[0.30,0.41],[0.31,0.51],[0.32,0.39],[0.31,0.51],[0.29,0.48],[0.20,0.46],[0.33,0.51],[0.31,0.43],[0.30,0.41],[0.27,0.44],[0.31,0.51],[0.29,0.48],[0.35,0.35],[0.46,0.38],[0.28,0.51],[0.61,0.38],[0.31,0.49],[0.33,0.51],[0.59,0.29],[0.14,0.57],[0.31,0.51],[0.39,0.40],[0.32,0.39],[0.20,0.56],[0.55,0.31],[0.56,0.39],[0.24,0.49],[0.56,0.39],[0.27,0.50],[0.60,0.32],[0.54,0.34],[0.19,0.57],[0.28,0.51],[0.54,0.34],[0.56,0.39],[0.19,0.57],[0.59,0.31],[0.37,0.45],[0.19,0.57],[0.44,0.41],[0.32,0.43],[0.35,0.48],[0.24,0.49],[0.26,0.45],[0.14,0.57],[0.59,0.30],[0.26,0.45],[0.26,0.45],[0.14,0.57],[0.20,0.50],[0.31,0.45],[0.27,0.51],[0.30,0.41],[0.19,0.57],[0.30,0.41],[0.27,0.50],[0.34,0.35],[0.30,0.42],[0.27,0.42],[0.27,0.42],[0.34,0.35],[0.35,0.35],[0.14,0.57],[0.45,0.36],[0.26,0.45],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.26,0.45],[0.26,0.45],[0.59,0.29],[0.19,0.57],[0.26,0.45],[0.32,0.39],[0.30,0.50],[0.28,0.50],[0.32,0.39],[0.59,0.29],[0.32,0.51],[0.56,0.39],[0.59,0.29],[0.61,0.38],[0.33,0.51],[0.22,0.44],[0.33,0.36],[0.27,0.42],[0.20,0.56],[0.28,0.51],[0.31,0.48],[0.20,0.56],[0.61,0.32],[0.24,0.54],[0.59,0.29],[0.32,0.43],[0.61,0.32],[0.19,0.57],[0.61,0.38],[0.55,0.31],[0.19,0.57],[0.31,0.46],[0.32,0.52],[0.30,0.41],[0.28,0.51],[0.28,0.50],[0.60,0.32],[0.61,0.32],[0.27,0.50],[0.59,0.29],[0.41,0.47],[0.39,0.42],[0.20,0.46],[0.19,0.57],[0.14,0.57],[0.23,0.47],[0.54,0.34],[0.28,0.51],[0.19,0.57],[0.33,0.37],[0.46,0.38],[0.27,0.42],[0.20,0.56],[0.39,0.42],[0.30,0.47],[0.26,0.45],[0.61,0.32],[0.61,0.38],[0.35,0.35],[0.14,0.57],[0.35,0.35],[0.28,0.51],[0.61,0.32],[0.24,0.54],[0.54,0.34],[0.28,0.43],[0.24,0.54],[0.30,0.41],[0.56,0.39],[0.23,0.52],[0.14,0.57],[0.26,0.45],[0.30,0.42],[0.32,0.43],[0.19,0.57],[0.45,0.36],[0.27,0.42],[0.29,0.48],[0.28,0.43],[0.27,0.51],[0.39,0.44],[0.32,0.49],[0.24,0.49],[0.56,0.39],[0.20,0.56],[0.30,0.42],[0.24,0.38],[0.46,0.38],[0.28,0.50],[0.26,0.45],[0.27,0.50],[0.23,0.47],[0.39,0.42],[0.28,0.51],[0.24,0.49],[0.27,0.42],[0.26,0.45],[0.60,0.32],[0.32,0.43],[0.39,0.42],[0.28,0.50],[0.28,0.52],[0.61,0.32],[0.32,0.39],[0.24,0.50],[0.39,0.40],[0.33,0.36],[0.24,0.38],[0.54,0.33],[0.19,0.57],[0.61,0.32],[0.33,0.36],[0.19,0.57],[0.30,0.41],[0.19,0.57],[0.34,0.35],[0.24,0.42],[0.27,0.42],[0.54,0.34],[0.54,0.34],[0.24,0.49],[0.27,0.42],[0.56,0.39],[0.19,0.57],[0.20,0.50],[0.14,0.57],[0.30,0.41],[0.30,0.41],[0.33,0.36],[0.26,0.45],[0.26,0.45],[0.23,0.47],[0.32,0.39],[0.27,0.53],[0.30,0.41],[0.20,0.46],[0.34,0.35],[0.34,0.35],[0.14,0.57],[0.46,0.38],[0.27,0.42],[0.36,0.44],[0.17,0.51],[0.60,0.32],[0.27,0.42],[0.20,0.56],[0.24,0.49],[0.41,0.40],[0.61,0.38],[0.19,0.57],[0.28,0.50],[0.23,0.52],[0.61,0.32],[0.39,0.46],[0.33,0.51],[0.19,0.57],[0.39,0.44],[0.56,0.39],[0.35,0.35],[0.28,0.43],[0.54,0.34],[0.36,0.44],[0.14,0.57],[0.61,0.38],[0.46,0.38],[0.61,0.32],[0.19,0.57],[0.54,0.34],[0.27,0.53],[0.33,0.51],[0.31,0.51],[0.59,0.29],[0.24,0.42],[0.28,0.43],[0.56,0.39],[0.28,0.50],[0.61,0.32],[0.29,0.48],[0.20,0.46],[0.50,0.29],[0.56,0.39],[0.20,0.50],[0.24,0.38],[0.32,0.39],[0.32,0.43],[0.28,0.50],[0.22,0.44],[0.20,0.56],[0.27,0.42],[0.61,0.38],[0.31,0.49],[0.20,0.46],[0.27,0.42],[0.24,0.38],[0.61,0.32],[0.26,0.45],[0.23,0.44],[0.59,0.30],[0.56,0.39],[0.33,0.44],[0.27,0.42],[0.31,0.51],[0.27,0.53],[0.32,0.39],[0.28,0.51],[0.30,0.42],[0.46,0.38],[0.27,0.42],[0.30,0.47],[0.39,0.40],[0.28,0.43],[0.30,0.42],[0.32,0.39],[0.59,0.31],[0.36,0.44],[0.54,0.34],[0.34,0.35],[0.30,0.41],[0.32,0.49],[0.32,0.43],[0.31,0.51],[0.32,0.52],[0.60,0.32],[0.19,0.57],[0.41,0.47],[0.32,0.39],[0.28,0.43],[0.28,0.51],[0.32,0.51],[0.56,0.39],[0.24,0.45],[0.55,0.31],[0.24,0.43],[0.61,0.38],[0.33,0.51],[0.30,0.41],[0.32,0.47],[0.32,0.38],[0.33,0.51],[0.39,0.40],[0.19,0.57],[0.27,0.42],[0.54,0.33],[0.59,0.29],[0.28,0.51],[0.61,0.38],[0.19,0.57],[0.30,0.41],[0.14,0.57],[0.32,0.39],[0.34,0.35],[0.54,0.34],[0.24,0.54],[0.56,0.39],[0.24,0.49],[0.61,0.32],[0.61,0.38],[0.61,0.32],[0.19,0.57],[0.14,0.57],[0.54,0.34],[0.59,0.29],[0.28,0.43],[0.19,0.57],[0.61,0.32],[0.32,0.43],[0.29,0.48],[0.56,0.39],[0.19,0.57],[0.56,0.39],[0.59,0.29],[0.59,0.29],[0.59,0.30],[0.14,0.57],[0.23,0.44],[0.28,0.50],[0.29,0.48],[0.31,0.45],[0.27,0.51],[0.24,0.45],[0.61,0.38],[0.24,0.49],[0.14,0.57],[0.61,0.32],[0.39,0.40],[0.33,0.44],[0.54,0.33],[0.33,0.51],[0.20,0.50],[0.19,0.57],[0.25,0.50],[0.28,0.43],[0.17,0.51],[0.19,0.57],[0.27,0.42],[0.20,0.56],[0.24,0.38],[0.19,0.57],[0.28,0.50],[0.28,0.50],[0.27,0.42],[0.26,0.45],[0.39,0.42],[0.23,0.47],[0.28,0.43],[0.32,0.39],[0.32,0.39],[0.24,0.54],[0.33,0.36],[0.29,0.53],[0.27,0.42],[0.44,0.41],[0.27,0.42],[0.33,0.36],[0.24,0.43],[0.61,0.38],[0.20,0.50],[0.55,0.31],[0.31,0.46],[0.60,0.32],[0.30,0.41],[0.41,0.47],[0.39,0.40],[0.27,0.53],[0.61,0.38],[0.46,0.38],[0.28,0.43],[0.44,0.41],[0.35,0.35],[0.24,0.49],[0.31,0.43],[0.27,0.42],[0.61,0.38],[0.29,0.48],[0.54,0.34],[0.61,0.32],[0.20,0.56],[0.24,0.49],[0.39,0.40],[0.27,0.42],[0.59,0.29],[0.59,0.29],[0.19,0.57],[0.24,0.54],[0.59,0.31],[0.24,0.38],[0.33,0.51],[0.23,0.44],[0.20,0.46],[0.24,0.45],[0.29,0.48],[0.28,0.50],[0.61,0.32],[0.19,0.57],[0.22,0.44],[0.19,0.57],[0.39,0.44],[0.19,0.57],[0.28,0.50],[0.30,0.41],[0.44,0.41],[0.28,0.52],[0.28,0.43],[0.54,0.33],[0.28,0.50],[0.19,0.57],[0.14,0.57],[0.30,0.41],[0.26,0.45],[0.56,0.39],[0.27,0.51],[0.20,0.46],[0.24,0.38],[0.32,0.38],[0.26,0.45],[0.61,0.32],[0.59,0.29],[0.19,0.57],[0.43,0.45],[0.14,0.57],[0.35,0.35],[0.56,0.39],[0.34,0.35],[0.19,0.57],[0.56,0.39],[0.27,0.42],[0.19,0.57],[0.60,0.32],[0.24,0.54],[0.54,0.34],[0.61,0.38],[0.33,0.51],[0.27,0.42],[0.32,0.39],[0.34,0.35],[0.20,0.56],[0.26,0.45],[0.32,0.51],[0.33,0.51],[0.35,0.35],[0.31,0.43],[0.56,0.39],[0.59,0.29],[0.28,0.43],[0.30,0.42],[0.27,0.44],[0.28,0.53],[0.29,0.48],[0.33,0.51],[0.60,0.32],[0.54,0.33],[0.19,0.57],[0.33,0.49],[0.30,0.41],[0.54,0.34],[0.27,0.53],[0.19,0.57],[0.19,0.57],[0.32,0.39],[0.20,0.56],[0.35,0.35],[0.30,0.42],[0.46,0.38],[0.54,0.34],[0.54,0.34],[0.14,0.57],[0.33,0.51],[0.32,0.39],[0.14,0.57],[0.59,0.29],[0.59,0.31],[0.30,0.41],[0.26,0.45],[0.32,0.38],[0.32,0.39],[0.59,0.31],[0.20,0.56],[0.20,0.46],[0.29,0.48],[0.59,0.29],[0.39,0.40],[0.28,0.50],[0.32,0.39],[0.28,0.53],[0.44,0.41],[0.20,0.50],[0.24,0.49],[0.20,0.46],[0.28,0.52],[0.24,0.50],[0.32,0.43],[0.39,0.40],[0.38,0.44],[0.60,0.32],[0.54,0.33],[0.61,0.32],[0.19,0.57],[0.59,0.29],[0.33,0.49],[0.28,0.43],[0.24,0.38],[0.30,0.41],[0.27,0.51],[0.35,0.48],[0.61,0.32],[0.43,0.45],[0.20,0.50],[0.24,0.49],[0.20,0.50],[0.20,0.56],[0.29,0.48],[0.14,0.57],[0.14,0.57],[0.26,0.45],[0.26,0.45],[0.39,0.40],[0.33,0.36],[0.56,0.39],[0.59,0.29],[0.27,0.42],[0.35,0.35],[0.30,0.41],[0.20,0.50],[0.19,0.57],[0.29,0.48],[0.39,0.42],[0.37,0.45],[0.30,0.41],[0.20,0.56],[0.30,0.42],[0.41,0.47],[0.28,0.43],[0.14,0.57],[0.27,0.53],[0.32,0.39],[0.30,0.41],[0.34,0.35],[0.32,0.47],[0.33,0.51],[0.20,0.56],[0.56,0.39],[0.60,0.32],[0.28,0.52],[0.56,0.39],[0.44,0.41],[0.27,0.42],[0.00,1.00],[0.29,0.49],[0.89,0.06],[0.22,0.66],[0.18,0.70],[0.67,0.22],[0.14,0.79],[0.58,0.17],[0.67,0.12],[0.95,0.05],[0.46,0.26],[0.15,0.54],[0.16,0.67],[0.48,0.31],[0.41,0.29],[0.18,0.66],[0.10,0.71],[0.11,0.72],[0.65,0.15],[0.94,0.03],[0.17,0.67],[0.44,0.29],[0.32,0.38],[0.79,0.10],[0.52,0.26],[0.25,0.59],[0.89,0.04],[0.69,0.13],[0.43,0.34],[0.75,0.07],[0.16,0.65],[0.02,0.70],[0.38,0.33],[0.57,0.23],[0.75,0.07],[0.25,0.58],[0.94,0.02],[0.55,0.22],[0.58,0.17],[0.14,0.79],[0.20,0.56],[0.10,0.88],[0.15,0.79],[0.11,0.77],[0.67,0.22],[0.07,0.87],[0.43,0.33],[0.08,0.84],[0.05,0.67],[0.07,0.77],[0.17,0.68],[1.00,0.00],[0.15,0.79],[0.08,0.77],[0.16,0.67],[0.69,0.13],[0.07,0.87],[0.15,0.54],[0.55,0.19],[0.14,0.63],[0.75,0.18],[0.25,0.63],[0.83,0.05],[0.55,0.50],[0.86,0.04],[0.73,0.18],[0.44,0.32],[0.70,0.15],[0.89,0.06],[0.17,0.67],[0.61,0.12],[0.55,0.50],[0.36,0.56],[0.03,0.86],[0.09,0.82],[0.09,0.82],[0.09,0.83],[0.17,0.68],[0.88,0.03],[0.64,0.22],[0.08,0.85],[0.74,0.16],[0.47,0.28],[0.05,0.84],[0.14,0.54],[0.01,0.93],[0.77,0.16],[0.17,0.60],[0.64,0.22],[0.84,0.05],[0.85,0.03],[0.23,0.67],[0.20,0.69],[0.00,0.87],[0.14,0.77],[0.11,0.69],[0.17,0.67],[0.56,0.27],[0.14,0.67],[0.37,0.31],[0.11,0.69],[0.35,0.52],[0.53,0.27],[0.50,0.21],[0.25,0.64],[0.36,0.56],[0.39,0.26],[0.02,0.83],[0.41,0.29],[0.07,0.77],[0.16,0.63],[0.92,0.03],[0.10,0.71],[0.83,0.05],[0.42,0.27],[0.62,0.12],[0.23,0.60],[0.19,0.61],[0.69,0.19],[0.21,0.65],[0.67,0.19],[0.18,0.69],[0.44,0.29],[0.14,0.65],[0.73,0.18],[0.15,0.66],[0.44,0.34],[0.74,0.10],[0.18,0.69],[0.25,0.61],[0.52,0.23],[0.06,0.82],[0.52,0.29],[0.22,0.68],[0.46,0.26],[0.14,0.54],[0.78,0.07],[0.80,0.05],[0.15,0.67],[0.10,0.82],[0.56,0.27],[0.64,0.22],[0.87,0.06],[0.14,0.66],[0.10,0.84],[0.88,0.05],[0.02,0.81],[0.62,0.15],[0.13,0.68],[0.50,0.28],[0.11,0.62],[0.46,0.32],[0.56,0.28],[0.43,0.28],[0.12,0.83],[0.11,0.80],[0.10,0.83],[0.90,0.04],[0.17,0.65],[0.15,0.63],[0.72,0.15],[0.64,0.26],[0.84,0.06],[0.09,0.83],[0.16,0.68],[0.09,0.63],[0.43,0.29],[0.88,0.05],[0.20,0.69],[0.73,0.09],[0.61,0.20],[0.67,0.13],[0.08,0.85],[0.73,0.16],[0.89,0.05],[0.41,0.25],[0.61,0.23],[0.58,0.22],[0.03,0.84],[0.58,0.24],[0.48,0.30],[0.25,0.54],[0.23,0.63],[0.41,0.46],[0.84,0.06],[0.45,0.29],[0.09,0.55],[0.54,0.26],[0.11,0.82],[0.69,0.18],[0.43,0.45],[0.43,0.28],[0.45,0.32],[0.07,0.78],[0.26,0.64],[0.92,0.04],[0.12,0.66],[0.32,0.51],[0.28,0.59],[0.70,0.18]])
x = data[:,0]
y = data[:,1]
#---------------------------ADD THIS---------------------------
# Define how to resample the data
n_bins = 20 # choose how many samples to use
bin_size = (max(x) - min(x))/n_bins
# Prepare empty arrays for resampled x and y
x_res, y_res = [],[]
# Resample the data with consistent density
for i in range(n_bins-1):
lower = x >= min(x)+i*bin_size
higher = x < min(x)+(i+1)*bin_size
x_res.append(np.mean(x[np.where(lower & higher)]))
y_res.append(np.mean(y[np.where(lower & higher)]))
#------------------------------------------------------
def func(x,a,b,c):
return a * np.exp(-b*x) + c
popt, pcov = curve_fit(func, x_res, y_res)
a, b, c = popt
x_line = np.arange(min(x), max(x), 0.01)
x_line = np.reshape(x_line,(-1,1))
y_line = func(x_line, a, b, c)
y_line = np.reshape(y_line,(-1,1))
plt.scatter(x,y,alpha=0.5)
plt.scatter(x_res, y_res)
plt.plot(x_line,y_line, c='red')
plt.show()
Which gives the output:
curve_fit does not give you a great deal of control over the fit, you may want to look at the much more general, but somewhat more complicated to use, least_squares:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html
where you can control a lot of things. curve_fit does give you a sigma parameter which allows you to weight the 'uncertainty' in your points. The trick here is to assign lower uncertainty to the points around x=1 where the fit is poor. By giving it lower uncertainty the fitter will try harder to fit them.
after some experimenting, if replacing your ...curve_fit... line with
uncertainty = np.exp(-5*x*x)
popt, pcov = curve_fit(func, x, y, sigma = uncertainty)
I got the following fit
you can try to improve this by playing with the uncertainty vector above

Non-linear least-square regression in Python

I have to calculate a non-linear least-square regression for my ~30 data points following the formula
I tried the curve_fit function out of scipy.optimize using the following code
def func(x, p1 ,p2):
return p1*x/(1-x/p2)
popt, pcov = curve_fit(func, CSV[:,1], CSV[:,0])
p1 = popt[0]
p2 = popt[1]
with p1 and p2 being equivalent to A and C, respectively, and CSV being my data-array. The functions runs without error message, but the outcome is not as expected. I've plotted the outcome of the function together with the original data points. I was not looking to get this nearly straight line (red line in plot), but something more close to the green line, which is simply a second order polynomial fit from Excel. The green dashed line shows just a quick manual try to get closer to the polynomial fit.
wrong calcualtin of the fit-function, together with the original data points: 1
Does anyone has an idea how to make the calculation run as i want it to?
Your code is fine. The data though is not easy to fit to. There are too few points on the right side of the chart and too much noise on the left hand side. This is why curve_fit fails.
Some ways to improve the solution could be:
raising maxfev parameter for curve_fit() see here
giving starting values to curve_fit() - see same place
add more data points
use more parameters in the function or different function.
curve_fit() may not be the strongest tool. See if you can get better results with other regression-type tools.
Below is the best I could get with your initial data and formula:
df = pd.read_csv("c:\\temp\\data.csv", header=None, dtype = 'float' )
df.columns = ('x','y')
def func(x, p1 ,p2):
return p1*x/(1-x/p2)
popt, pcov = curve_fit(func, df.x, df.y, maxfev=3000)
print('p1,p2:',popt)
p1, p2 = popt
y_pred = [ p1*x/(1-x/p2)+p3*x for x in range (0, 140, 5)]
plt.scatter(df.x, df.y)
plt.scatter(range (0, 140, 5), y_pred)
plt.show()
p1,p2: [-8.60771432e+02 1.08755430e-05]
I think i've figured out the best way to solve this problem by using the lmfit package (https://lmfit.github.io/lmfit-py/v). It worked best when i tried to fit the non-linear least-square regression not to the original data but to the fitting function provided by Excel (not very elegant, though).
from lmfit import Model
import matplotlib.pyplot as plt
import numpy as np
def func(x, o1 ,o2):
return o1*x/(1-x/o2)
xt = np.arange(0, 0.12, 0.005)
yt = 2.2268*np.exp(40.755*xt)
model = Model(func)
result = model.fit(yt, x=xt, o1=210, o2=0.118)
print(result.fit_report())
plt.plot(xt, yt, 'bo')
plt.plot(xt, result.init_fit, 'k--', label='initial fit')
plt.plot(xt, result.best_fit, 'r-', label='best fit')
plt.legend(loc='best')
plt.show
The results look pretty nice and the package is really easy to use (i've left out the final plot)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 25
# data points = 24
# variables = 2
chi-square = 862.285318
reduced chi-square = 39.1947872
Akaike info crit = 89.9567771
Bayesian info crit = 92.3128848
[[Variables]]
o1: 310.243771 +/- 12.7126811 (4.10%) (init = 210)
o2: 0.13403974 +/- 0.00120453 (0.90%) (init = 0.118)
[[Correlations]] (unreported correlations are < 0.100)
C(o1, o2) = 0.930

Gaussian data fit varying depending on position of x data

I am having a hard time trying to understand why my Gaussian fit to a set of data (ydata) does not work well if I shift the interval of x-values corresponding to that data (xdata1 to xdata2). The Gaussian is written as:
where A is just an amplitude factor. Changing some of the values of the data, it is easy to make it work for both cases, but one can also easily find cases in which it does not work well for xdata1 and also in which covariance of the parameters is not estimated.
I am using scipy.optimize.curve_fit in Spyder with Python 3.7.1 on Windows 7.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xdata1 = np.linspace(-9,4,20, endpoint=True) # works fine
xdata2 = xdata1+2
ydata = np.array([8,9,15,12,14,20,24,40,54,94,160,290,400,420,300,130,40,10,8,4])
def gaussian(x, amp, mean, sigma):
return amp*np.exp(-(((x-mean)**2)/(2*sigma**2)))/(sigma*np.sqrt(2*np.pi))
popt1, pcov1 = curve_fit(gaussian, xdata1, ydata)
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata)
fig, ([ax1, ax2]) = plt.subplots(nrows=1, ncols=2,figsize=(9, 4))
ax1.plot(xdata1, ydata, 'b+:', label='xdata1')
ax1.plot(xdata1, gaussian(xdata1, *popt1), 'r-', label='fit')
ax1.legend()
ax2.plot(xdata2, ydata, 'b+:', label='xdata2')
ax2.plot(xdata2, gaussian(xdata2, *popt2), 'r-', label='fit')
ax2.legend()
The problem is your second attempt at fitting a gaussian is getting stuck in a local minimum while searching parameter space: curve_fit is a wrapper for least_squares which uses gradient descent to minimize the cost function and this is liable to get stuck in local minima.
You should try providing reasonable starting parameters (by using the p0 argument of curve_fit) to avoid this:
#... your code
y_max = np.max(y_data)
max_pos = ydata[ydata==y_max][0]
initial_guess = [y_max, max_pos, 1] # amplitude, mean, std
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata, p0=initial_guess)
Which as you can see provides a reasonable fit:
You should write a function which can provide reasonable estimates of the starting parameters. Here I just found the maximum y value and used this to determine the initial parameters. I've found this works well for the fitting normal distributions but you could consider other methods.
Edit:
You can also solve the problem by scaling the amplitude: the amplitude is so large the parameter space is distorted and the gradient descent simply follows the direction of greatest change in the amplitude and effectively ignores the sigma. Consider the following plot in parameter space (Colour is the sum of the squared residuals of the fit for given parameters and the white cross shows the optimal solution):
Make sure to make note of the different scales for the x and y axis.
One needs to make a large number of 'unit' sized steps in y (amplitude) to get to the minimum from the point x,y = (0,0), where as you only need less than one 'unit' sized step to get to the minimum in x (sigma). The algorithm simply takes steps in amplitude as this is the steepest gradient. When it gets to the amplitude which minimises the cost function it simply stops the algorithm as it appears to have converged and makes little or no changes in the sigma parameter.
One way to fix this is to scale your ydata to un-distort the parameter space: divide your ydata by 100 and you will see your fit works without providing any starting parameters!

Python Spinmob curve_fit works but fitter does not

I'm trying to fit data with a Gaussian.
The raw data itself displays a very obvious peak.
When I attempt fitting using curve_fit, the fit identifies the peak but it does not have a curved top.
I am trying to fit the data now with spinmob's fitter as well. However, this fitting just gives a straight line.
I've tried changing several parameters of the fitter, the Gaussian function definition, and the initial parameters for the fit but nothing seems to work.
Here is the code:
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import spinmob as s
x = x30
y = ydata
def gaussian(x, A, mu, sig): # See http://mathworld.wolfram.com/GaussianFunction.html
return A/(sig * np.sqrt(2*np.pi)) * np.exp(-np.power(x-mu, 2) / (2 * np.power(sig, 2)))
popt,pcov = curve_fit(gaussian,x,y,p0=[1,7.688,0.005])
FWHM = 2*np.sqrt(2*np.log(2))*popt[2]
print("FWHM: {}".format(FWHM))
plt.plot(x,y,'bo',label='data')
plt.plot(x,gaussian(x,*popt),'r+-',label='fit')
plt.legend()
fitter = s.data.fitter()
fitter.set(subtract_bg=True, plot_guess_zoom=True)
fitter.set_functions(f=gaussian, p='A=1,mu=8.688,sig=0.001')
fitter.set_data(x, y, eydata = 0.03)
fitter.fit()
The curve_fit returns this plot:
Curve_fit plot
The spinmob fitter plot gives this:
Spinmob Fitter Plot
Assuming that spinmob actually uses scipy.curve_fit under the hood, I would guess (sorry) that the problem is that the initial values you give to it are so far off that it cannot possibly find a solution.
For sure, A=1 is not a very good guess for either scipy.curve_fit() or spinmob.fitter(). The peak is definitely negative, and you should be guessing a value more like -0.1 than +1. In fact you could probably assert that A must be < 0.
The initial value of 7.688 for mu that you give to curve_fit() is pretty good, and will allow a solution. I do not know whether it is a typo or not, but the initial value of 8.688 for mu that you give to spinmob.fitter() is very far off (that is, way outside the data range), and the fit will never be able to refine its way to the correct solution from there.
Initial values matter for curve-fitting and poor initial values can lead to bad results.
It might be viewed by some as a shameless plug, but allow me to encourage you to try lmfit (https://lmfit.github.io/lmfit-py/) (I am a lead author) for this kind of problem. Lmfit replaces the array of parameter values with named Parameter objects for better organization of fits. It also has a built-in Gaussian model (which also calculates FWHM, including an uncertainty). That is, with Lmfit, your script might look like:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel
from lmfit.lineshapes import gaussian
# create fake data that looks like yours
xdata = 7.670 + np.arange(41)*0.0010
ydata = gaussian(xdata, amplitude=-0.196, center=7.6881, sigma=0.001)
ydata += np.random.normal(size=41, scale=10.0)
# create gaussian model
gmodel = GaussianModel()
# fit data, giving initial values for amplitude, center, and sigma
result = gmodel.fit(ydata, x=xdata, amplitude=-0.1, center=7.688, sigma=0.005)
# show results
print(result.fit_report())
plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, result.best_fit, 'r+-', label='fit')
plt.legend()
plt.show()
This will print out a report like
[Model]]
Model(gaussian)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 21
# data points = 41
# variables = 3
chi-square = 5114.87632
reduced chi-square = 134.602009
Akaike info crit = 203.879794
Bayesian info crit = 209.020510
[[Variables]]
sigma: 9.7713e-04 +/- 1.5456e-04 (15.82%) (init = 0.005)
center: 7.68822727 +/- 1.5484e-04 (0.00%) (init = 7.688)
amplitude: -0.19273945 +/- 0.02643400 (13.71%) (init = -0.1)
fwhm: 0.00230096 +/- 3.6396e-04 (15.82%) == '2.3548200*sigma'
height: -78.6917624 +/- 10.7894236 (13.71%) == '0.3989423*amplitude/max(1.e-15, sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(sigma, amplitude) = -0.577
and produce a plot of data and best fit like
which should be close to what you are trying to do.

How to do linear regression, taking errorbars into account?

I am doing a computer simulation for some physical system of finite size, and after this I am doing extrapolation to the infinity (Thermodynamic limit). Some theory says that data should scale linearly with system size, so I am doing linear regression.
The data I have is noisy, but for each data point I can estimate errorbars. So, for example data points looks like:
x_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]
y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]
y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]
Let's say I am trying to do this in Python.
First way that I know is:
m, c, r_value, p_value, std_err = scipy.stats.linregress(x_list, y_list)
I understand this gives me errorbars of the result, but this does not take into account errorbars of the initial data.
Second way that I know is:
m, c = numpy.polynomial.polynomial.polyfit(x_list, y_list, 1, w = [1.0 / ty for ty in y_err], full=False)
Here we use the inverse of the errorbar for the each point as a weight that is used in the least square approximation. So if a point is not really that reliable it will not influence result a lot, which is reasonable.
But I can not figure out how to get something that combines both these methods.
What I really want is what second method does, meaning use regression when every point influences the result with different weight. But at the same time I want to know how accurate my result is, meaning, I want to know what are errorbars of the resulting coefficients.
How can I do this?
Not entirely sure if this is what you mean, but…using pandas, statsmodels, and patsy, we can compare an ordinary least-squares fit and a weighted least-squares fit which uses the inverse of the noise you provided as a weight matrix (statsmodels will complain about sample sizes < 20, by the way).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import statsmodels.formula.api as sm
x_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]
y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]
y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]
# put x and y into a pandas DataFrame, and the weights into a Series
ws = pd.DataFrame({
'x': x_list,
'y': y_list
})
weights = pd.Series(y_err)
wls_fit = sm.wls('x ~ y', data=ws, weights=1 / weights).fit()
ols_fit = sm.ols('x ~ y', data=ws).fit()
# show the fit summary by calling wls_fit.summary()
# wls fit r-squared is 0.754
# ols fit r-squared is 0.701
# let's plot our data
plt.clf()
fig = plt.figure()
ax = fig.add_subplot(111, facecolor='w')
ws.plot(
kind='scatter',
x='x',
y='y',
style='o',
alpha=1.,
ax=ax,
title='x vs y scatter',
edgecolor='#ff8300',
s=40
)
# weighted prediction
wp, = ax.plot(
wls_fit.predict(),
ws['y'],
color='#e55ea2',
lw=1.,
alpha=1.0,
)
# unweighted prediction
op, = ax.plot(
ols_fit.predict(),
ws['y'],
color='k',
ls='solid',
lw=1,
alpha=1.0,
)
leg = plt.legend(
(op, wp),
('Ordinary Least Squares', 'Weighted Least Squares'),
loc='upper left',
fontsize=8)
plt.tight_layout()
fig.set_size_inches(6.40, 5.12)
plt.show()
WLS residuals:
[0.025624005084707302,
0.013611438189866154,
-0.033569595462217161,
0.044110895217014695,
-0.025071632845910546,
-0.036308252199571928,
-0.010335514810672464,
-0.0081511479431851663]
The mean squared error of the residuals for the weighted fit (wls_fit.mse_resid or wls_fit.scale) is 0.22964802498892287, and the r-squared value of the fit is 0.754.
You can obtain a wealth of data about the fits by calling their summary() method, and/or doing dir(wls_fit), if you need a list of every available property and method.
I wrote a concise function to perform the weighted linear regression of a data set, which is a direct translation of GSL's "gsl_fit_wlinear" function. This is useful if you want to know exactly what your function is doing when it performs the fit
def wlinear_fit (x,y,w) :
"""
Fit (x,y,w) to a linear function, using exact formulae for weighted linear
regression. This code was translated from the GNU Scientific Library (GSL),
it is an exact copy of the function gsl_fit_wlinear.
"""
# compute the weighted means and weighted deviations from the means
# wm denotes a "weighted mean", wm(f) = (sum_i w_i f_i) / (sum_i w_i)
W = np.sum(w)
wm_x = np.average(x,weights=w)
wm_y = np.average(y,weights=w)
dx = x-wm_x
dy = y-wm_y
wm_dx2 = np.average(dx**2,weights=w)
wm_dxdy = np.average(dx*dy,weights=w)
# In terms of y = a + b x
b = wm_dxdy / wm_dx2
a = wm_y - wm_x*b
cov_00 = (1.0/W) * (1.0 + wm_x**2/wm_dx2)
cov_11 = 1.0 / (W*wm_dx2)
cov_01 = -wm_x / (W*wm_dx2)
# Compute chi^2 = \sum w_i (y_i - (a + b * x_i))^2
chi2 = np.sum (w * (y-(a+b*x))**2)
return a,b,cov_00,cov_11,cov_01,chi2
To perform your fit, you would do
a,b,cov_00,cov_11,cov_01,chi2 = wlinear_fit(x_list,y_list,1.0/y_err**2)
Which will return the best estimate for the coefficients a (the intercept) and b (the slope) of the linear regression, along with the elements of the covariance matrix cov_00, cov_01 and cov_11. The best estimate on the error on a is then the square root of cov_00 and the one on b is the square root of cov_11. The weighted sum of the residuals is returned in the chi2 variable.
IMPORTANT: this function accepts inverse variances, not the inverse standard deviations as the weights for the data points.
sklearn.linear_model.LinearRegression supports specification of weights during fit:
x_data = np.array(x_list).reshape(-1, 1) # The model expects shape (n_samples, n_features).
y_data = np.array(y_list)
y_err = np.array(y_err)
model = LinearRegression()
model.fit(x_data, y_data, sample_weight=1/y_err)
Here the sample weight is specified as 1 / y_err. Different versions are possible and often it's a good idea to clip these sample weights to a maximum value in case the y_err varies strongly or has small outliers:
sample_weight = 1 / y_err
sample_weight = np.minimum(sample_weight, MAX_WEIGHT)
where MAX_WEIGHT should be determined from your data (by looking at the y_err or 1 / y_err distributions, e.g. if they have outliers they can be clipped).
I found this document helpful in understanding and setting up my own weighted least squares routine (applicable for any programming language).
Typically learning and using optimized routines is the best way to go but there are times where understanding the guts of a routine is important.

Categories

Resources