I was wondering if there is a way I can use LMfit to fit datasets that have different sizes with the same model with a some shared and some independent parameters.
This link is the closest I found to a similar question but it assumes the same x for all y.
Thanks for all advice and input
Per your comment that a scipy example is OK:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y1 = np.array([ 16.00, 18.42, 20.84, 23.26])
y2 = np.array([-20.00, -25.50, -31.00, -36.50, -42.00])
comboY = np.append(y1, y2)
x1 = np.array([5.0, 6.1, 7.2, 8.3])
x2 = np.array([15.0, 16.1, 17.2, 18.3, 19.4])
comboX = np.append(x1, x2)
if len(y1) != len(x1):
raise(Exception('Unequal x1 and y1 data length'))
if len(y2) != len(x2):
raise(Exception('Unequal x2 and y2 data length'))
def function1(data, a, b, c): # not all parameters are used here, c is shared
return a * data + c
def function2(data, a, b, c): # not all parameters are used here, c is shared
return b * data + c
def combinedFunction(comboData, a, b, c):
# single data reference passed in, extract separate data
extract1 = comboData[:len(x1)] # first data
extract2 = comboData[len(x1):] # second data
result1 = function1(extract1, a, b, c)
result2 = function2(extract2, a, b, c)
return np.append(result1, result2)
# some initial parameter values
initialParameters = np.array([1.0, 1.0, 1.0])
# curve fit the combined data to the combined function
fittedParameters, pcov = curve_fit(combinedFunction, comboX, comboY, initialParameters)
# values for display of fitted function
a, b, c = fittedParameters
y_fit_1 = function1(x1, a, b, c) # first data set, first equation
y_fit_2 = function2(x2, a, b, c) # second data set, second equation
plt.plot(comboX, comboY, 'D') # plot the raw data
plt.plot(x1, y_fit_1) # plot the equation using the fitted parameters
plt.plot(x2, y_fit_2) # plot the equation using the fitted parameters
plt.show()
print('a, b, c:', fittedParameters)
Related
I was following a tutorial for data fitting, and when I just changed original data to my data the fit became not quadratic.
Here's my code, thanks a lot for help:
# fit a second degree polynomial to the economic data
import numpy as np
from numpy import arange
from pandas import read_csv
from scipy.optimize import curve_fit
from matplotlib import pyplot
x = np.array([1,2,3,4,5,6])
y = np.array([1,4,12,29,54,104])
# define the true objective function
def objective(x, a, b, c):
return a * x + b * x**2 + c
# load the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/longley.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# choose the input and output variables
# curve fit
popt, _ = curve_fit(objective, x, y)
# summarize the parameter values
a, b, c = popt
print('y = %.5f * x + %.5f * x^2 + %.5f' % (a, b, c))
# plot input vs output
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 1)
# calculate the output for the range
y_line = objective(x_line, a, b, c)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()```
I tried python matplotlib quadratic data fit, and I expected quadratic function but visually it's not.
I rewrite code written in Matlab for Python and I can´t resolve correctly the fit function in Python.
Code in Matlab:
y = *Array 361x1*;
x = *Array 361x1*;
y1 = *Single value 1x1*;
x1 = *Single value 1x1*;
fo = fitoptions('Method','NonlinearLeastSquares','Lower',[-Inf,-Inf],'Upper',[Inf, Inf],'Startpoint',[0.0,0.0]);
ft = fittype( #(A,B, x) A.*x.^2 + B.*x + y1 - A.*x1.^2 - B.*x1, 'independent','x','dependent','y','options',fo);
[fitobject,gof,out] = fit(x,y,ft);
A = fitobject.A;
B = fitobject.B;
I tried the solution in Python through Scipy Least Squares and based on this article. I wrote the following code:
ft = least_squares(lambda coeffs: coeffs[0]*x**2 + coeffs[1]*x + y1 - coeffs[0]*x1**2 - coeffs[1]*x1, [0, 0], bounds=([-np.inf, -np.inf], [np.inf, np.inf]))
print(ft('x'))
Obviously it is not correct (array y is not considered in Python code) and I get different values for coefficients A and B. I´ve already tried difrferent functions like curve%fit, etc. But with no result...
You could use scipy.optimize's curve_fit. Assuming x and y are numpy.ndarrays containing the data:
from scipy.optimize import curve_fit
def fitme(x, *coeffs_and_args):
A, B, x1, y1 = coeffs_and_args
return A*x**2 + B*x + y1 - A*x1**2 - B*x1
popt, pcov = curve_fit(lambda x, A, B: fitme(x, A, B, x1, y1), x, y)
Then the array popt contains the optimal values for the parameters and pcov the estimated covariance of popt.
I am fitting a set of experimental data (sample) within two different experimental regions and can be expressed with two mathematical functions as follows:
1st region:
y = m*x + c ( the slope can be constrained to zero)
2nd region:
y = d*exp(-k*x)
the experimental data is shown below and I coded it in python as follows:
def func(x, m, c, d, k):
return m*x+ c + d*np.exp(-k*x)
popt, pcov = curve_fit(func, t, y)
Unfortunately, my data is not fitting properly and fitted (returned) parameters do not make sense (see picture below).
Any assistance will be appreciated.
Very interesting question. As said by a_guest, you will have to fit to the two regions separately. However, I think you probably also want the two regions to connect smoothly at the point t0, the point where we switch from one model to the other. In order to do this, we need to add the constraint that y1 == y2 at the point t0.
In order to do this with scipy, look at scipy.optimize.minimize with the SLSQP method. However, I wrote a scipy wrapper to make this kind of thing easier, called symfit. I will show you how to do this with symfit, because I think it's better suited to the task, but with this example you should also be able to implement it with pure scipy if you prefer.
from symfit import parameters, variables, Fit, Piecewise, exp, Eq
import numpy as np
import matplotlib.pyplot as plt
t, y = variables('t, y')
m, c, d, k, t0 = parameters('m, c, d, k, t0')
# Help the fit by bounding the switchpoint between the models
t0.min = 0.6
t0.max = 0.9
# Make a piecewise model
y1 = m * t + c
y2 = d * exp(- k * t)
model = {y: Piecewise((y1, t <= t0), (y2, t > t0))}
# As a constraint, we demand equality between the two models at the point t0
# to do this, we substitute t -> t0 and demand equality using `Eq`
constraints = [Eq(y1.subs({t: t0}), y2.subs({t: t0}))]
# Read the data
tdata, ydata = np.genfromtxt('Experimental Data.csv', delimiter=',', skip_header=1).T
fit = Fit(model, t=tdata, y=ydata, constraints=constraints)
fit_result = fit.execute()
print(fit_result)
plt.scatter(tdata, ydata)
plt.plot(tdata, fit.model(t=tdata, **fit_result.params).y)
plt.show()
Since your data shows different behavior in different regions you also need to fit the data on these different regions. That is instead of making a sum of the two models (functions) you should fit one the left region with y = m*x + c and separately on the right region with y = d*exp(-k*x). If you have trouble finding the boundary of the two regions you could assess this by comparing the goodness of fit.
popt_1, pcov_1 = curve_fit(lambda x, m, c: m*x + c, t[t < 0.8], y[t < 0.8], p0=(1, 0))
popt_2, pcov_2 = curve_fit(lambda x, d, k: d*exp(-k*x), t[t >= 0.8], y[t >= 0.8], p0=(400, 1))
Edit
Example code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
df = pd.read_csv('test.csv', index_col=None)
t = df.t.values
y = df.Y.values
boundary = t[y.argmax()]
t1 = t[t < boundary]
y1 = y[t < boundary]
t2 = t[t >= boundary]
y2 = y[t >= boundary]
f1 = lambda x, m, c: m*x + c
f2 = lambda x, d, k: d*np.exp(-k*x)
popt_1 ,pcov_1 = curve_fit(f1, t1, y1, p0=((y1[-1] - y1[0]) / (t1[-1] - t1[0]), y1[0]))
popt_2 ,pcov_2 = curve_fit(f2, t2, y2, p0=(y2[0], 1))
plt.title('Fitted data on two different domains')
plt.xlabel('t [a.u.]')
plt.ylabel('y [a.u.]')
plt.plot(t, y, '-o', label='Data')
plt.plot(t1, f1(t1, *popt_1), '--', color='#ff7f0e', lw=3, label='Fit')
plt.plot(t2, f2(t2, *popt_2), '--', color='#ff7f0e', lw=3, label='_nolegend_')
plt.grid()
plt.legend()
plt.show()
Which produces the following plot:
Note that the resulting "compound" function is not continuous at the boundary. If that is undesired you can resolve it by fixing one the fit parameters (e.g.k) before fitting the other domain (one way or the other). Alternatively you could fit both regions separately, then determine the value at the boundary as the average of the two separate functions (i.e. y_b = (f1(t1[-1], *popt_1) + f2(t2[0], *popt_2)) / 2) and then repeat the fitting by constraining the parameters such that this boundary condition is fulfilled.
For example fitting the linear function first and then fixing the d parameter of the exponential in order to have a continuous transition at the boundary (note that the linear function f1 is extrapolated outside its domain at t2[0] in order to ensure the continuity):
f1 = lambda x, m, c: m*x + c
popt_1, pcov_1 = curve_fit(f1, t1, y1, p0=((y1[-1] - y1[0]) / (t1[-1] - t1[0]), y1[0]))
d = f1(t2[0], *popt_1)
f2 = lambda x, k: d*np.exp(-k*(x - boundary))
popt_2, pcov_2 = curve_fit(f2, t2, y2, p0=(1,))
Which produces the following plot:
If you would prefer to use a single equation, I found that the Hocket-Sherby equation "y = b - (b-a) * exp(-c * (x**d))" seems like an OK fit to your data, yielding an R-squared of 0.99 and RMSE of 11.2 with parameters a = 1.1262189756312683E+01, b = 3.2040596733114870E+02, c = 3.9385197507261771E-01, and d = -4.7723382040098095E+00
I have two equations that originate from one single matrix equation:
[x,y] = [[cos(n), -sin(n)],[sin(n), cos(n)]]*[x', y'],
where x' = Acos(w1*t+ p1) and y' = Bcos(w2*t + p2).
This is just a single matrix equation for the vector [x,y], but it can be decomposed into 2 scalar equations: x = A*cos(s)*cos(w1*t+ p1)*x' - B*sin(s)*sin(w2*t + p2)*y' and y = A*sin(s)*cos(w1*t+ p1)*x' + B*cos(s)*sin(w2*t + p2)*y'.
So I am fitting two datasets, x vs. t and y vs. t, but there are some shared parameters in these fits, namely A, B and s.
1) Can I fit directly the matrix equation, or do I have to decompose it into scalar ones? The former would be more elegant.
2) Can I fit with shared parameters on curve_fit? All relevant questions use other packages.
The below example code fits two different equations with one shared parameter using curve_fit. This is not an answer to your question, but I cannot format code in a comment so I'm posting it here.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y1 = np.array([ 16.00, 18.42, 20.84, 23.26, 25.68])
y2 = np.array([-20.00, -25.50, -31.00, -36.50, -42.00])
comboY = np.append(y1, y2)
h = np.array([5.0, 6.1, 7.2, 8.3, 9.4])
comboX = np.append(h, h)
def mod1(data, a, b, c): # not all parameters are used here
return a * data + c
def mod2(data, a, b, c): # not all parameters are used here
return b * data + c
def comboFunc(comboData, a, b, c):
# single data set passed in, extract separate data
extract1 = comboData[:len(y1)] # first data
extract2 = comboData[len(y2):] # second data
result1 = mod1(extract1, a, b, c)
result2 = mod2(extract2, a, b, c)
return np.append(result1, result2)
# some initial parameter values
initialParameters = np.array([1.0, 1.0, 1.0])
# curve fit the combined data to the combined function
fittedParameters, pcov = curve_fit(comboFunc, comboX, comboY, initialParameters)
# values for display of fitted function
a, b, c = fittedParameters
y_fit_1 = mod1(h, a, b, c) # first data set, first equation
y_fit_2 = mod2(h, a, b, c) # second data set, second equation
plt.plot(comboX, comboY, 'D') # plot the raw data
plt.plot(h, y_fit_1) # plot the equation using the fitted parameters
plt.plot(h, y_fit_2) # plot the equation using the fitted parameters
plt.show()
print(fittedParameters)
I want to fit complex data set with a two functions which shared the same parameters. For this I used
def funcReal(x,a,b,c,d):
return np.real((a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x)))
def funcImag(x,a,b,c,d):
return np.imag((a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x)))`
poptReal, pcovReal = curve_fit(funcReal, x, yReal)
poptImag, pcovImag = curve_fit(funcImag, x, yImag)
Here funcReal is the real part of my model, funcImag the imaginary part, yReal the real part of the data and yImag the imaginary part of the data.
However, both fits does not give me the same parameters for the real and imaginary part.
My question is there a package or a method such that I can realized multi fits for multiple data sets and multiple functions with shared parameters?
To fit both the complex function given above, we can treat the real and imaginary components as a coordinate point, or as a vector. Since curve_fit doesn't care about the order at which data points are inserted in the vectors x (independent data) and y (dependent data), we can simply split the complex data and stack the real and imaginary components using hstack. See the example below.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
kappa1 = np.pi
kappa2 = -0.01
def long_function(x, a, b, c, d):
return (a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x))
def funcBoth(x, a, b, c, d):
N = len(x)
x_real = x[:N//2]
x_imag = x[N//2:]
y_real = np.real(long_function(x_real, a, b, c, d))
y_imag = np.imag(long_function(x_imag, a, b, c, d))
return np.hstack([y_real, y_imag])
# Create an independent variable with 100 measurements
N = 100
x = np.linspace(0, 10, N)
# True values of the dependent variable
y = long_function(x, a=1.1, b=0.3, c=-0.2, d=0.23)
# Add uniform complex noise (real + imaginary)
noise = (np.random.rand(N) + 1j * np.random.rand(N) - 0.5 - 0.5j) * 0.1
yNoisy = y + noise
# Split the measurements into a real and imaginary part
yReal = np.real(yNoisy)
yImag = np.imag(yNoisy)
yBoth = np.hstack([yReal, yImag])
# Find the best-fit solution
poptBoth, pcovBoth = curve_fit(funcBoth, np.hstack([x, x]), yBoth)
# Compute the best-fit solution
yFit = long_function(x, *poptBoth)
print(poptBoth)
# Plot the results
plt.figure(figsize=(9, 4))
plt.subplot(121)
plt.plot(x, np.real(yNoisy), "k.", label="Noisy y")
plt.plot(x, np.real(y), "r--", label="True y")
plt.plot(x, np.real(yFit), label="Best fit")
plt.ylabel("Real part of y")
plt.xlabel("x")
plt.legend()
plt.subplot(122)
plt.plot(x, np.imag(yNoisy), "k.")
plt.plot(x, np.imag(y), "r--")
plt.plot(x, np.imag(yFit))
plt.ylabel("Imaginary part of y")
plt.xlabel("x")
plt.tight_layout()
plt.show()
Result:
The best-fit parameters that were found in this example were a = 1.14, b = 0.375, c = -0.236, and d = 0.163, which are close enough to the true parameter values given the amplitude of the noise that I inserted here.