Related
I have searched and referenced the code for solving the SIR model from others on this website, but the fitting effect is very poor. Is there something wrong with my data? Still what? How should I predict given new data for this SIR model?
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate, optimize
import pandas as pd
y_total = [0.0, 0.0010131712259371835, 0.0035460992907801418, 0.00911854103343465,
0.008611955420466059, 0.021783181357649443, 0.00911854103343465, 0.07852077001013172, 0.4397163120567376,
0.21681864235055726, 0.232016210739615, 0.5278622087132725, 0.13576494427558258, 0.2988855116514691, 0.37436676798378926,
0.4209726443768997, 0.544579533941236, 0.7254305977710233, 1.0, 0.7740628166160081, 0.43617021276595747, 0.48226950354609927]
x_total = range(0,22)
ydata = np.array(y_total, dtype=float)
xdata = np.array(x_total, dtype=float)
# IO + SO + R0 is always 1 regardless of "value"
I0 = 0.3
S0 = 1 - I0
R0 = 0
def sir_model(y, x, beta, gamma):
S = -beta * y[0] * y[1] / N
R = gamma * y[1]
I = -(S + R)
return S, I, R
def fit_odeint(x, beta, gamma):
return integrate.odeint(sir_model, (S0, I0, R0), x, args=(beta, gamma))[:,1]
N = 1.0
I0 = ydata[0]
S0 = N - I0
R0 = 0.0
popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata)
fitted = fit_odeint(xdata, *popt)
plt.plot(xdata, ydata, 'o')
plt.plot(xdata, fitted)
plt.show()
Make the initial infected number also a variable, which is easily done by shift the computation of the initial state into the target function
def sir_model(y, x, beta, gamma):
N = sum(y)
S = -beta * y[0] * y[1] / N
R = gamma * y[1]
I = -(S + R)
return S, I, R
def fit_odeint(x, beta, gamma, I0):
# IO + SO + R0 is always 1 regardless of "value"
S0 = 1 - I0
R0 = 0
return integrate.odeint(sir_model, (S0, I0, R0), x, args=(beta, gamma))[:,1]
popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata,(1/5,1/8,0.1))
There were some other changes, especially adding an initial guess to curve_fit. Still this gets a warning on some difficulty in odeint. But a result is reached anyway, with popt = [0.36714402, 0.04176973, 0.01311579], 1/popt = [2.72372678, 23.94078424, 76.2439491]. Changing the initial guess to values close to this, (1/3, 1/24, 0.05), eliminates the warning.
I was wondering if it is possible to set bounds for the parameters in curve_fit() such that the bounds are dependent on another parameter. For example, say if I wanted to set the slope of a line to be greater than the intercept.
def linear(x, m, b):
return lambda x: (m*x) + b
def plot_linear(x, y):
B = ([b, -np.inf], [np.inf, np.inf])
p, v = curve_fit(linear, x, y, bounds = B)
xs = np.linspace(min(x), max(x), 1000)
plt.plot(x,y,'.')
plt.plot(xs, linear(xs, *p), '-')
I know that this doesn't work because the parameter b is not defined before it is called in the bounds, but I am not sure if there is a way to make this work?
We can always re-parameterize w.r.t. the specific curve-fitting problem. For example, if you wanted to fit y=mx+b s.t. m >= b, it can be re-written as m=b+k*k with another parameter k and we can optimize with the parameters b, k now as follows:
def linear(x, m, b):
return m*x + b
def linear2(x, k, b): # constrained fit, m = b + k**2 >= b
return (b+k**2)*x + b
def plot_linear(x, y):
p, v = curve_fit(linear, x, y)
print(p)
# [3.1675609 6.01025041]
p2, v2 = curve_fit(linear2, x, y)
print(p2)
# [2.13980283e-05 4.99368661e+00]
xs = np.linspace(min(x), max(x), 1000)
plt.plot(x,y,'.')
plt.plot(xs, linear(xs, *p), 'r-', label='unconstrained fit')
plt.plot(xs, linear2(xs, *p2), 'b-', label='constrained (m>b) fit')
plt.legend()
Now let's fit the curves on following data, using both the constrained and unconstrained fit functions (note the unconstrained optimal fit will have slope less than intercept)
x = np.linspace(0,1,100)
y = 3*x + 5 + 2*np.random.rand(len(x))
plot_linear(x, y)
To my understanding, Statsmodel WLS (weighted least squared) and curve_fit (which uses least-squares by default) should provide the same outcome.
When using non weighted values, it does (code below #1). However, when I used weights, the output values become totally different (code below #2).
Why is this?
_ #1 non weighted
import numpy as np
import statsmodels.api as sm
import pandas as pd
from scipy.optimize import curve_fit
Y = [-5,2,3,4,5,6,7]
X = [1,2,3,4,5,6,7]
W = [1,1,1,1,1,1,1]
def equation_func (x, m, c):
return m * x + c
popt, pcov = curve_fit(equation_func,X, Y, p0=[-1.6, 1.28], sigma=[1 / w for w in W],
absolute_sigma=True)
curve_results = [equation_func(x, popt[0], popt[1]) for x in X]
wls_model = sm.WLS(Y,sm.add_constant(X), weights=W)
results = wls_model.fit()
print(results.params)
print([popt[1], popt[0]])
#WLS - [-3.42857143 1.64285714]
#Curve_fit - [-3.42857143, 1.64285714]
_#2 Weighted
Y = [-5,2,3,4,5,6,7]
X = [1,2,3,4,5,6,7]
W = [1,2,3,4,5,6,7]
def equation_func (x, m, c):
return m * x + c
popt, pcov = curve_fit(equation_func,X, Y, p0=[-1.6, 1.28], sigma=[1 / w for w in W],
absolute_sigma=True)
curve_results = [equation_func(x, popt[0], popt[1]) for x in X]
wls_model = sm.WLS(Y,sm.add_constant(X), weights=W)
results = wls_model.fit()
print(results.params)
print([popt[1], popt[0]])
#WLS - [-1.64285714 1.28571429]
#Curve_fit - [-0.5840336134466501, 1.0966386554667582]
I have a function that looks like f(x, m, E, I) = m * (x - x ** 2) / (E * I), where I want to get a value for E. I have some data, which I call X and Y, and some uncertainty in the y data which I call yerr. Additionally the parameters m and I are physical quantities, and they have been measured with some uncertainty.
I want to fit the function f to my data X, Y, having into account the uncertainties of the quantities m, I. Right now this is the command I am using to do the fit:
m = some value
I = some other value
popt, pcov = curve_fit(lambda x, E: f(x, m, E, I), X, Y, p0=[1e9], sigma=yerr)
Of course this doesn't take into account the uncertainty in m and I. Is there any way to fit a curve having into account this uncertainties?
For instance, here they solve a ODE using the module uncertainties, I tried to copy the procedure but didn't work:
import uncertainties as u
def f(x, m, E, I):
return m * (x - x ** 2) / (E * I)
m = u.ufloat(3e-4, 0.1e-6)
I = u.ufloat(1e-10, 0.2e-12)
#u.wrap
def fit():
popt, pcov = curve_fit(lambda x, E: f(x, m, E, I), X, Y, p0=[1e9], sigma=yerr)
return popt, pcov
where X, Y, yerr are the data and the error in Y as mentioned before.
I'm trying to fit the distribution of some experimental values with a custom probability density function. Obviously, the integral of the resulting function should always be equal to 1, but the results of simple scipy.optimize.curve_fit(function, dataBincenters, dataCounts) never satisfy this condition.
What is the best way to solve this problem?
You can define your own residuals function, including a penalization parameter, like detailed in the code below, where it is known beforehand that the integral along the interval must be 2.. If you test without the penalization you will see that what your are getting is the conventional curve_fit:
import matplotlib.pyplot as plt
import scipy
from scipy.optimize import curve_fit, minimize, leastsq
from scipy.integrate import quad
from scipy import pi, sin
x = scipy.linspace(0, pi, 100)
y = scipy.sin(x) + (0. + scipy.rand(len(x))*0.4)
def func1(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# here you include the penalization factor
def residuals(p, x, y):
integral = quad(func1, 0, pi, args=(p[0], p[1], p[2], p[3]))[0]
penalization = abs(2.-integral)*10000
return y - func1(x, p[0], p[1], p[2], p[3]) - penalization
popt1, pcov1 = curve_fit(func1, x, y)
popt2, pcov2 = leastsq(func=residuals, x0=(1., 1., 1., 1.), args=(x, y))
y_fit1 = func1(x, *popt1)
y_fit2 = func1(x, *popt2)
plt.scatter(x, y, marker='.')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.legend()
plt.xlim(-0.1, 3.5)
plt.ylim(0, 1.4)
print('Exact integral:', quad(sin, 0, pi)[0])
print('Approx integral1:', quad(func1, 0, pi, args=(popt1[0], popt1[1], popt1[2], popt1[3]))[0])
print('Approx integral2:', quad(func1, 0, pi, args=(popt2[0], popt2[1], popt2[2], popt2[3]))[0])
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.60068579748
#Approx integral2: 2.00001911981
Other related questions:
SciPy LeastSq Goodness of Fit Estimator
Here is an almost-identical snippet which makes only use of curve_fit.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
import scipy.integrate as integr
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def Func(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# modified function definition with Penalization
def FuncPen(x, a0, a1, a2, a3):
integral = integr.quad( Func, 0, np.pi, args=(a0,a1,a2,a3))[0]
penalization = abs(2.-integral)*10000
return a0 + a1*x + a2*x**2 + a3*x**3 + penalization
popt1, pcov1 = opt.curve_fit( Func, x, y )
popt2, pcov2 = opt.curve_fit( FuncPen, x, y )
y_fit1 = Func(x, *popt1)
y_fit2 = Func(x, *popt2)
plt.scatter(x,y, marker='.')
plt.plot(x,y_fit2, color='y', label='constrained')
plt.plot(x,y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
print 'Exact integral:',integr.quad(np.sin ,0,np.pi)[0]
print 'Approx integral1:',integr.quad(Func,0,np.pi,args=(popt1[0],popt1[1],
popt1[2],popt1[3]))[0]
print 'Approx integral2:',integr.quad(Func,0,np.pi,args=(popt2[0],popt2[1],
popt2[2],popt2[3]))[0]
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.66485028754
#Approx integral2: 2.00002116217
Following the example above here is more general way to add any constraints:
from scipy.optimize import minimize
from scipy.integrate import quad
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def func_to_fit(x, params):
return params[0] + params[1] * x + params[2] * x ** 2 + params[3] * x ** 3
def constr_fun(params):
intgrl, _ = quad(func_to_fit, 0, np.pi, args=(params,))
return intgrl - 2
def func_to_minimise(params, x, y):
y_pred = func_to_fit(x, params)
return np.sum((y_pred - y) ** 2)
# Do the parameter fitting
#without constraints
res1 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y))
params1 = res1.x
# with constraints
cons = {'type': 'eq', 'fun': constr_fun}
res2 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y), constraints=cons)
params2 = res2.x
y_fit1 = func_to_fit(x, params1)
y_fit2 = func_to_fit(x, params2)
plt.scatter(x,y, marker='.')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
plt.show()
print(f"Constrant violation: {constr_fun(params1)}")
Constraint violation: -2.9179325622408214e-10
If you are able normalise your probability fitting function in advance then you can use this information to constrain your fit. A very simple example of this would be fitting a Gaussian to data. If one were to fit the following three-parameter (A, mu, sigma) Gaussian then it would be unnormalised in general:
however, if one instead enforces the normalisation condition on A:
then the Gaussian is only two parameter and is automatically normalised.
You could ensure that your fitted probability distribution is normalised via a numerical integration. For example, assuming that you have data x and y and that you have defined an unnormalised_function(x, a, b) with parameters a and b for your probability distribution, which is defined on the interval x1 to x2 (which could be infinite):
from scipy.optimize import curve_fit
from scipy.integrate import quad
# Define a numerically normalised function
def normalised_function(x, a, b):
normalisation, _ = quad(lambda x: unnormalised_function(x, a, b), x1, x2)
return unnormalised_function(x, a, b)/normalisation
# Do the parameter fitting
fitted_parameters, _ = curve_fit(normalised_function, x, y)