I need to do a simple curve fitting using scipy's curve_fit function. However, my data is in the form of a matrix. I can easily do this in numpy but I wanted to see the goodness of fit for scipy.
Problem:
AX = B --> given A, find X for least square error.
from scipy.optimize import curve_fit
def getXval():
a = 4; b = 3, c = 1;
f0 = a*pow(b, 2)*c
f1 = a*b/c
return [f0, f1]
def fit(x, a0, a1):
res = a0*x[0] + a1*x[1]
return [res]
x = getXval()
y = [0.15]
popt, pcov = curve_fit(fit, x, y)
This is, however, not working. Can someone point what is going on here?
Your code has a few problems.
1) Use numpy arrays instead of Python lists
2) your are missing values for y.
This works for me:
from scipy.optimize import curve_fit
import numpy as np
def getXval():
a = 4; b = 3; c = 1;
f0 = a*pow(b, 2)*c
f1 = a*b/c
return np.array([f0, f1])
def fit(x, a0, a1):
res = a0*x[0] + a1*x[1]
return np.array([res])
x = getXval()
y = np.array([0.15, 0.34])
popt, pcov = curve_fit(fit, x, y)
print popt, pcov
Related
I am trying to fit a curve to some data but the resulting curve looks like a scrambled mess. I don't know whether or not the coefficients are accurate. With this sample data set it prints something like a triangle and with my original data set it looks even worse. It's mostly tutorial. I tried removing the sympy code from an alternate tutorial, but doing so accomplished nothing.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
x = [0.0009425070688029959,
0.0009398496240601303,
0.0018779342723004293,
0.004694835680751241,
0.0009425070688029959,
0.004734848484848552,
0.0018993352326685255,
0.0009460737937558928]
y = [0.0028301886792453904,
0.003762935089369628,
0.001881467544684814,
0.0009433962264150743,
0.0028301886792453904,
0.0019029495718363059,
0.0038058991436727804,
0.0018939393939393534]
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, b, c, d):
return b * x * x + c * x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print("b = " + str(popt[0]) + " c = " + str(popt[1]) + " d = " + str(popt[2]))
"""t
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
This is because Matplotlib will only draw lines between the few points in your original data (in the x and y arrays) and in the order they are defined. There are only 3 unique x values (plus some noise) which is why you see what looks like a triangle.
The fix is to create a new array with evenly spread, and ordered, x values across the range you're interested in. You can do that with the linspace function in numpy.
For example, try this for your second plot command:
x_eval = np.linspace(min(x), max(x), 100)
plt.plot(x_eval, func(x_eval, *popt), label="Fitted Curve")
x_eval above is a list of 100 evenly spread values between the minimum and maximum x value in your original data.
Looks like you need to sort on xdata.
Try inserting this:
x,y = zip(*sorted(zip(x, y)))
Such that
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
x = [0.0009425070688029959,
0.0009398496240601303,
0.0018779342723004293,
0.004694835680751241,
0.0009425070688029959,
0.004734848484848552,
0.0018993352326685255,
0.0009460737937558928]
y = [0.0028301886792453904,
0.003762935089369628,
0.001881467544684814,
0.0009433962264150743,
0.0028301886792453904,
0.0019029495718363059,
0.0038058991436727804,
0.0018939393939393534]
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x,y = zip(*sorted(zip(x, y)))
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, b, c, d):
return b * x * x + c * x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print("b = " + str(popt[0]) + " c = " + str(popt[1]) + " d = " + str(popt[2]))
"""t
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
The plotted curve from the data above.
I have some data points to fit with a model. My model is not defined as an equation but as a numerical solution of 3 equations.
My model is defined as below:
def eq(q):
z1=q[0]
z2=q[1]
H=q[2]
F=empty((3))
F[0] = ((J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*sin(z1-z2))+(H*sin(z1-pi/4))+(((3.6*10**5)/2)*sin(2*z1))
F[1] = ((-J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*sin(z1-z2))+(H*sin(z2-pi/4))+(((3.6*10**5)/2)*sin(2*z2))
F[2] = cos(z1-pi/4)
return F
guess1=array([2.35,0.2,125000])
z=fsolve(eq,guess1)
Hc=z[2]*(1-(T/Tb)**(1/2))
that
D=10**(-8)
a=2.2*10**(-28)
Lx=4.28*10**(-9)
and J, b, Tb are parameters and z1, z2, H are variables
My data points are:
T=[10, 60, 110, 160, 210, 260, 300]
Hc=[0.58933, 0.5783, 0.57938, 0.58884, 0.60588, 0.62788, 0.6474]
how can I find J, b, Tb according to fitting model with data points?
You can use scipy.optimize.curve_fit. You need to understand that curve_fit will only care about the input and the output of your function, such that you need to define it this way :
def func(x, *params):
....
return y
Then you can apply curve_fit (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html):
popt, pcov = curve_fit(func, x, y_data)
I wrote and example taking your case. There is 100 ways to write it more efficiently. However, I tried to make it clear so you can understand the link between what you gave and what I propose.
import numpy as np
from scipy.optimize import fsolve, curve_fit
def eq(q, T, J, b):
z1 = q[0]
z2 = q[1]
H = q[2]
D=10**(-8)
a=2.2*10**(-28)
Lx=4.28*10**(-9)
F = np.empty((3))
F[0] = ((J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*np.sin(z1-z2))+(H*np.sin(z1-np.pi/4))+(((3.6*10**5)/2)*np.sin(2*z1))
F[1] = ((-J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*np.sin(z1-z2))+(H*np.sin(z2-np.pi/4))+(((3.6*10**5)/2)*np.sin(2*z2))
F[2] = np.cos(z1-np.pi/4)
return F
def func(T,J,b,Tb):
Hc = []
guess1 = np.array([2.35,0.2,125000])
for t in T :
z = fsolve(eq, guess1, args = (t, J, b))
Hc.append(z[2]*(1-(t/Tb)**(1/2)))
return Hc
T = [10, 60, 110, 160, 210, 260, 300]
Hc_exp = [0.58933, 0.5783, 0.57938, 0.58884, 0.60588, 0.62788, 0.6474]
p0 = (1,1,100)
popt, pcov = curve_fit(func, T, Hc_exp, p0)
J = popt[0]
b = popt[1]
Tb = popt[2]
It seems that the fit is hard to be done. To improve that, you can add an initial guess to the parameters through p0, or adding bounds to the parameters (see curve_fit documentation). Once it converges, you can have the error on the estimated parameters.
Say I want to fit a sine function using scipy.optimize.curve_fit. I don't know any parameters of the function. To get the frequency, I do Fourier transform and guess all the other parameters - amplitude, phase, and offset. When running my program, I do get a fit but it does not make sense. What is the problem? Any help will be appreciated.
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
ampl = 1
freq = 24.5
phase = np.pi/2
offset = 0.05
t = np.arange(0,10,0.001)
func = np.sin(2*np.pi*t*freq + phase) + offset
fastfft = np.fft.fft(func)
freq_array = np.fft.fftfreq(len(t),t[0]-t[1])
max_value_index = np.argmax(abs(fastfft))
frequency = abs(freq_array[max_value_index])
def fit(a, f, p, o, t):
return a * np.sin(2*np.pi*t*f + p) + o
guess = (0.9, frequency, np.pi/4, 0.1)
params, fit = sp.optimize.curve_fit(fit, t, func, p0=guess)
a, f, p, o = params
fitfunc = lambda t: a * np.sin(2*np.pi*t*f + p) + o
plt.plot(t, func, 'r-', t, fitfunc(t), 'b-')
The main problem in your program was a misunderstanding, how scipy.optimize.curve_fit is designed and its assumption of the fit function:
ydata = f(xdata, *params) + eps
This means that the fit function has to have the array for the x values as the first parameter followed by the function parameters in no particular order and must return an array for the y values. Here is an example, how to do this:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
#t has to be the first parameter of the fit function
def fit(t, a, f, p, o):
return a * np.sin(2*np.pi*t*f + p) + o
ampl = 1
freq = 2
phase = np.pi/2
offset = 0.5
t = np.arange(0,10,0.01)
#is the same as fit(t, ampl, freq, phase, offset)
func = np.sin(2*np.pi*t*freq + phase) + offset
fastfft = np.fft.fft(func)
freq_array = np.fft.fftfreq(len(t),t[0]-t[1])
max_value_index = np.argmax(abs(fastfft))
frequency = abs(freq_array[max_value_index])
guess = (0.9, frequency, np.pi/4, 0.1)
#renamed the covariance matrix
params, pcov = scipy.optimize.curve_fit(fit, t, func, p0=guess)
a, f, p, o = params
#calculate the fit plot using the fit function
plt.plot(t, func, 'r-', t, fit(t, *params), 'b-')
plt.show()
As you can see, I have also changed the way the fit function for the plot is calculated. You don't need another function - just utilise the fit function with the parameter list, the fit procedure gives you back.
The other problem was that you called the covariance array fit - overwriting the previously defined function fit. I fixed that as well.
P.S.: Of course now you only see one curve, because the perfect fit covers your data points.
I want to fit the function f(x) = b + a / x to my data set. For that I found scipy leastsquares from optimize were suitable.
My code is as follows:
x = np.asarray(range(20,401,20))
y is distances that I calculated, but is an array of length 20, here is just random numbers for example
y = np.random.rand(20)
Initial guesses of the params a and b:
params = np.array([1,1])
Function to minimize
def funcinv(x):
return params[0]/x+params[1]
res = least_squares(funinv, params, args=(x, y))
Error given:
return np.atleast_1d(fun(x, *args, **kwargs))
TypeError: funinv() takes 1 positional argument but 3 were given
How can I fit my data?
To make a little of clarity. There are two related problems:
Minimizing a function
Fitting model to data
To fit a model to observed data is to find such parameters of a model which minimize some sort of error between model data and observed data.
least_squares method just minimizes a following function with respect to x (x can be a vector).
F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
(rho is a loss function and default is rho(x) = x so don't mind it for now)
least_squares(func, x0) expects that call to func(x) will return a vector [a1, a2, a3, ...] for which a sum of squares will be computed: S = 0.5 * (a1^2 + a2^2 + a3^2 + ...).
least_squares will tweak x0 to minimize S.
Thus, in order to use it to fit model to data, one must construct a function of error between a model and actual data - residuals and then minimize that residuals function. In your case you can write it as follows:
import numpy as np
from scipy.optimize import least_squares
x = np.asarray(range(20,401,20))
y = np.random.rand(20)
params = np.array([1,1])
def funcinv(x, a, b):
return b + a/x
def residuals(params, x, data):
# evaluates function given vector of params [a, b]
# and return residuals: (observed_data - model_data)
a, b = params
func_eval = funcinv(x, a, b)
return (data - func_eval)
res = least_squares(residuals, params, args=(x, y))
This gives a result:
print(res)
...
message: '`gtol` termination condition is satisfied.'
nfev: 4
njev: 4 optimality: 5.6774618339971994e-10
status: 1
success: True
x: array([ 6.89518618, 0.37118815])
However, as a residuals function pretty much the same all the time (res = observed_data - model_data), there is a shortcut in scipy.optimize called curve_fit: curve_fit(func, xdata, ydata, x0). curve_fit builds residuals function automatically and you can simply write:
import numpy as np
from scipy.optimize import curve_fit
x = np.asarray(range(20,401,20))
y = np.random.rand(20)
params = np.array([1,1])
def funcinv(x, a, b):
return b + a/x
res = curve_fit(funcinv, x, y, params)
print(res) # ... array([ 6.89518618, 0.37118815]), ...
I am trying to do a linear fit of some data, but I cannot get curve_fit in Python to give me anything but a slope and y-intercept of 1. Here is an example of my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b):
return a*x + b
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
popt, pcov = curve_fit(func, x, y)
print popt
I have also tried giving curve_fit a "guess," but when I do that it gives me an overflow error, which I'm guessing is because the numbers are too large.
Another way of doing this without using curve_fit is to use numpy's polyfit.
import matplotlib.pyplot as plt
import numpy as np
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
xp = np.linspace(290, 310, 100)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
print (z)
fig, ax = plt.subplots()
ax.plot(x, y, '.')
ax.plot(xp, p(xp), '-')
plt.show()
This prints the coefficients as [2.10000000e+19 -4.22333333e+21] and produces the following graph:
I got something in the ballpark as Excel's linear fit by using scipy basinhopping instead of curve_fit with a large number of iterations. It takes a bit to run the iterations and it also requires an error function, but it was done without scaling the original data. Basinhopping docs.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import basinhopping
def func( x0, x_data, y_data ):
error = 0
for x_val, y_val in zip(x_data, y_data):
error += (y_val - (x0[0]*x_val + x0[1]))**2
return error
x_data = [290., 300., 310.]
y_data = [1.87e+21, 2.07e+21, 2.29e+21]
a = 1
b = 1
x0 = [a, b]
minimizer_kwargs = { 'method': 'TNC', 'args': (x_data, y_data) }
res = basinhopping(func, x0, niter=1000000, minimizer_kwargs=minimizer_kwargs)
print res
This gives x: array([ 7.72723434e+18, -2.38554994e+20]) but if you try again, you'll see this has the problem of non-unique outcomes, although it will give similar ballpark values.
Here's a comparison of the fit with the Excel solution.
Confirmed correct results are returned using:
x = [290., 300., 310.]
y = [300., 301., 302.]
My guess is magnitudes ≅ 10²¹ are too large for the function to work well.
What you can try doing is taking the logarithm of both sides:
def func(x, a, b):
# might need to check if ≤ 0.0
return math.log(a*x + b)
# ... code omitted
y = [48.9802253837, 49.0818355602, 49.1828387704]
Then undo the transformation afterwards.
Also for simple linear approximation, there is an easy deterministic method.