I am using SciPy.optimize.curve_fit https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
to get the coefficients of curve fitting function, the SciPy function takes the model function as its first argument
so if I want to make a linear curve fit, I pass to it the following function:
def objective(x, a, b):
return a * x + b
If I want polynomial curve fitting of second degree, I pass the following:
def objective(x, a, b, c):
return a * x + b * x**2 + c
And so on, what I want to achieve is to make this model function generic
for example, if the user wanted to curve fit to a polynomial of 5th degree by inputting 5 it should change to
def objective(x, a, b, c, d, e, f):
return (a * x) + (b * x**2) + (c * x**3) + (d * x**4) + (e * x**5) + f
While the code is running
is this possible?
And if it is not possible using SciPy because it requires to change a function, is there any other way to achieve what I want ?
If you really want to implement it on your own, you can either use a variable number of coefficients *coeffs:
def objective(x, *coeffs):
result = coeffs[-1]
for i, coeff in enumerate(coeffs[:-1]):
result += coeff * x**(i+1)
return result
or use np.polyval:
import numpy as np
def objective(x, *coeffs):
return np.polyval(x, coeffs)
However, note that there's no need to use curve_fit. You can directly use np.polyfit to do a least-squares polynomial fit.
The task can be accomplished in a few ways. If you want to use scipy, you can simply create a dictionary to refer to specific function by using numerical input from user:
import scipy.optimize as optimization
polnum = 2 # suppose this is input from user
def objective1(x, a, b):
return a * x + b
def objective2(x, a, b, c):
return a * x + b * x**2 + c
# Include some more functions
# Do not include round brackets in the dictionary
object_dict = {
1: objective1,
2: objective2
# Include the numbers with corresponding functions
}
opt = optimization.curve_fit(object_dict[polnum], x, y) # Curve fitted
print(opt[0]) # Returns parameters
However, I would suggest you going with a bit better way where you do not have to define each function:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
polnum = 2 # suppose this is input from user
# Creates a Polynomial of polnum degree
model = make_pipeline(PolynomialFeatures(polnum), LinearRegression)
# You can change the estimator Linear Regression to any other offered in sklearn.linear_model
# Make sure that x and y data are of the same shape
model.fit(x, y)
# Now you can use the model to check some other values
y_predict = model.predict(x_predict[:, np.newaxis])
Related
I was following a tutorial for data fitting, and when I just changed original data to my data the fit became not quadratic.
Here's my code, thanks a lot for help:
# fit a second degree polynomial to the economic data
import numpy as np
from numpy import arange
from pandas import read_csv
from scipy.optimize import curve_fit
from matplotlib import pyplot
x = np.array([1,2,3,4,5,6])
y = np.array([1,4,12,29,54,104])
# define the true objective function
def objective(x, a, b, c):
return a * x + b * x**2 + c
# load the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/longley.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# choose the input and output variables
# curve fit
popt, _ = curve_fit(objective, x, y)
# summarize the parameter values
a, b, c = popt
print('y = %.5f * x + %.5f * x^2 + %.5f' % (a, b, c))
# plot input vs output
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 1)
# calculate the output for the range
y_line = objective(x_line, a, b, c)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()```
I tried python matplotlib quadratic data fit, and I expected quadratic function but visually it's not.
I am a newbie in using scipy.optimize. I have the following function calls func. I have x and y values given as a list and need to get the estimated value of a, b and c. I could use curve_fit to get the estimation of a, b and c. However, I want to explore the possibilities of using least_squares.
When I run the following code, I get the below error. It'd be great if anyone could point me to the right direction.
import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import least_squares
np.random.seed(0)
x = np.random.randint(0, 100, 100) # sample dataset for independent variables
y = np.random.randint(0, 100, 100) # sample dataset for dependent variables
def func(x,a,b,c):
return a*x**2 + b*x + c
def result(list_x, list_y):
popt = curve_fit(func, list_x, list_y)
sol = least_squares(result,x, args=(y,),method='lm',jac='2-point',max_nfev=2000)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''
The following code uses the least_squares() routine for optimization. The most important change in comparison to your code is ensuring that func() returns a vector of residuals. I also compared the solution to the linear algebra result to ensure correctness.
import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import least_squares
np.random.seed(0)
x = np.random.randint(0, 100, 100) # sample dataset for independent variables
y = np.random.randint(0, 100, 100) # sample dataset for dependent variables
def func(theta, x, y):
# Return residual = fit-observed
return (theta[0]*x**2 + theta[1]*x + theta[2]) - y
# Initial parameter guess
theta0 = np.array([0.5, -0.1, 0.3])
# Compute solution providing initial guess theta0, x input, and y input
sol = least_squares(func, theta0, args=(x,y))
print(sol.x)
#------------------- OPTIONAL -------------------#
# Compare to linear algebra solution
temp = x.reshape((100,1))
X = np.hstack( (temp**2, temp, np.ones((100,1))) )
OLS = np.linalg.lstsq(X, y.reshape((100,1)), rcond=None)
print(OLS[0])
To use least_squares you need a residual function and not the curve_fit. Also, least_squares requires a guess for the parameters that you are fitting (i.e. a,b,c). In your case, if you want to use least_squares you can write something similar (I just used random values for the guess)
import numpy as np
from scipy.optimize import least_squares
np.random.seed(0)
x = np.random.randint(0, 100, 100) # sample dataset for independent variables
y = np.random.randint(0, 100, 100) # sample dataset for dependent variables
def func(x,a,b,c):
return a*x**2 + b*x + c
def residual(p, x, y):
return y - func(x, *p)
guess = np.random.rand(3)
sol = least_squares(residual, guess, args=(x, y,),method='lm',jac='2-point',max_nfev=2000)
so what I'm trying to do is to fit the specific model onto the dataset of x-y values and get constants of the model. I can get the constant (in this case there is only one) and then the fitted y_opt. Below is the working example of doing so:
import pandas as pd
from scipy.optimize import curve_fit
data = pd.read_csv(r'')
x_measured = data['x[-]'].values
y_measured = data['y[-]'].values
def y_NH(x_eng, D):
y_comp = D * x_eng*(x_eng**2 + 3 * x_eng + 3) / (1 + x_eng)**2
return y_comp
D = curve_fit(y_NH, x_measured, y_measured)
y_opt = y_NH(x_measured, D[0])
This works well, but it's not exactly good for me.
The formula for y_comp is something I needed to derive manually - originally I had other variable, let's say Y_comp, and got y_comp by differentiating Y_comp (by x_eng, obviously). What I would like to achieve is to feed my function with Y_comp (because there will be more like Z_comp, F_comp etc.), it would differentiate it resulting in y_comp (z_comp, f_comp) and then it would fit the model it onto my dataset - the result then would be constant(s) of the particular model.
I started with some work, but still I am not sufficient and would appreciate some help on this topic. The bugged code is:
import sympy as sy
from sympy.utilities.lambdify import lambdify
def y_NH2(x_eng, D):
lambda1 = sy.Symbol('lambda1')
x_eng = sy.Symbol('x_eng')
#Gi = sy.Symbol('Gi')
lambda1 = x_eng + 1
W = lambda1**2 + 2 / lambda1
y_comp_symb = sy.diff(W, x_eng)
y_comp = lambdify(x_eng, y_comp_symb,'numpy')
y_return = D / 2 * y_comp(x_eng)
return y_return
y_p = y_NH2(x_measured, 12)
print(y_p)
D = curve_fit(y_NH2, x_measured, y_measured)
y_opt = y_NH2(x_measured, D[0])
This raises an error in curve_fit that is: "error: Result from function call is not a proper array of floats."
Could you please give me a hint?
Does statsmodels support nonlinear regression to an arbitrary equation? (I know that there are some forms that are already built in, e.g. for logistic regression, but I am after something more flexible)
In the solution https://stats.stackexchange.com/a/44249 to a question about non-linear regression,
the code is in R and uses the function nls. There it has the equation's parameters defined with start = list(a1=0, ...). These are of course just some initial guesses and not the final fitted values. But what is different here compared to lm is that the parameters don't need to be from the columns of the input data.
I've been able to use statsmodels.formula.api.ols as an equivalent for R's lm, but when I try to use it with an equation that has parameters (and not weights for the inputs / combinations of inputs), statsmodels complains about the parameters not being defined. It does not seem to have an equivalent argument as start= so it isn't obvious how to introduce them.
Is there a different class or function in statsmodels that accepts definition of these initial parameter values?
EDIT:
My current attempt and also workaround following suggestion with lmfit
from statsmodels.formula.api import ols
import numpy as np
import pandas as pd
def eqn_poly(x, a, b):
''' simple polynomial '''
return a*x**2.0 + b*x
def eqn_nl(x, a, b):
''' fractional equation '''
return 1.0 / ((a+x)*b)
x = np.arange(0, 3, 0.1)
y1 = eqn_poly(x, 0.1, 0.5)
y2 = eqn_nl(x, 0.1, 0.5)
sigma =0.05
y1_noise = y1 + sigma * np.random.randn(*y1.shape)
y2_noise = y2 + sigma * np.random.randn(*y2.shape)
df1 = pd.DataFrame(np.vstack([x, y1_noise]).T, columns= ['x', 'y'])
df2 = pd.DataFrame(np.vstack([x, y2_noise]).T, columns= ['x', 'y'])
res1 = ols("y ~ 1 + x + I(x ** 2.0)", df1).fit()
print res1.summary()
res3 = ols("y ~ 1 + x + I(x ** 2.0)", df2).fit()
#res2 = ols("y ~ eqn_nl(x, a, b)", df2).fit()
# ^^^ this fails if a, b are not initialised ^^^
# so initialise a, b
a,b = 1.0, 1.0
res2 = ols("y ~ eqn_nl(x, a, b)", df2).fit()
print res2.summary()
# ===> and now the fitting is bad, it has an intercept -4.79, and a weight
# on the equation 15.7.
Giving lmfit the formula it is able to find parameters.
import lmfit
mod = lmfit.Model(eqn_nl)
lm_result = mod.fit(y2_noise, x=x, a=1.0, b=1.0)
print lm_result.fit_report()
# ===> this one works fine, a=0.101, b=0.4977
But trying to put y1, x into ols doesn't seem to work ("PatsyError: model is missing required outcome variables"). I didn't really follow that suggestion.
consider scipy.optimize.curve_fit as desired R.nls-like function
I want to fit the function f(x) = b + a / x to my data set. For that I found scipy leastsquares from optimize were suitable.
My code is as follows:
x = np.asarray(range(20,401,20))
y is distances that I calculated, but is an array of length 20, here is just random numbers for example
y = np.random.rand(20)
Initial guesses of the params a and b:
params = np.array([1,1])
Function to minimize
def funcinv(x):
return params[0]/x+params[1]
res = least_squares(funinv, params, args=(x, y))
Error given:
return np.atleast_1d(fun(x, *args, **kwargs))
TypeError: funinv() takes 1 positional argument but 3 were given
How can I fit my data?
To make a little of clarity. There are two related problems:
Minimizing a function
Fitting model to data
To fit a model to observed data is to find such parameters of a model which minimize some sort of error between model data and observed data.
least_squares method just minimizes a following function with respect to x (x can be a vector).
F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
(rho is a loss function and default is rho(x) = x so don't mind it for now)
least_squares(func, x0) expects that call to func(x) will return a vector [a1, a2, a3, ...] for which a sum of squares will be computed: S = 0.5 * (a1^2 + a2^2 + a3^2 + ...).
least_squares will tweak x0 to minimize S.
Thus, in order to use it to fit model to data, one must construct a function of error between a model and actual data - residuals and then minimize that residuals function. In your case you can write it as follows:
import numpy as np
from scipy.optimize import least_squares
x = np.asarray(range(20,401,20))
y = np.random.rand(20)
params = np.array([1,1])
def funcinv(x, a, b):
return b + a/x
def residuals(params, x, data):
# evaluates function given vector of params [a, b]
# and return residuals: (observed_data - model_data)
a, b = params
func_eval = funcinv(x, a, b)
return (data - func_eval)
res = least_squares(residuals, params, args=(x, y))
This gives a result:
print(res)
...
message: '`gtol` termination condition is satisfied.'
nfev: 4
njev: 4 optimality: 5.6774618339971994e-10
status: 1
success: True
x: array([ 6.89518618, 0.37118815])
However, as a residuals function pretty much the same all the time (res = observed_data - model_data), there is a shortcut in scipy.optimize called curve_fit: curve_fit(func, xdata, ydata, x0). curve_fit builds residuals function automatically and you can simply write:
import numpy as np
from scipy.optimize import curve_fit
x = np.asarray(range(20,401,20))
y = np.random.rand(20)
params = np.array([1,1])
def funcinv(x, a, b):
return b + a/x
res = curve_fit(funcinv, x, y, params)
print(res) # ... array([ 6.89518618, 0.37118815]), ...