Fit a nonlinear model with python and gekko - python

I have a dataset of trees. In this dataset, I have the unique number of Plot, the sequence in the order when were take the data "Measurement" and the Height mean in meters and Age mean in years for the trees. Something like this:
head of data
next, I define the model to predict the Height using the Age in this way:
Height = B0 * ((1 - exp(-B1 *Age))**B2)
My goal is to determinate the values of B0, B1 & B2 respectively. For this, I use the package gekko to find the parameters of the models with the next code:
num_p = data_gek.Plot.unique()
nmp = 5
number_p = (data_gek.Plot == num_p[nmp])
m = GEKKO()
xm = np.array(data_gek[number_p]['Age'])
x = m.Param(value=xm)
B0 = m.FV(value=38.2) #value=38.2
B0.STATUS = 1
B1 = m.FV(value=0.1) #value=0.1
B1.STATUS = 1
B2 = m.FV(value=2.08) #value=2.08
B2.STATUS = 1
ym = np.array(data_gek[number_p]['Height'])
z = m.CV(value=ym)
y = m.Var()
m.Equation(y==B0 * ((1 - m.exp(-B1 *x))**B2))
m.Obj(((y-z)/z)**2)
m.options.IMODE = 2
m.options.SOLVER = 1
m.solve(disp=False)
print(B0.value[0],B1.value[0],B2.value[0])
#output
27.787958561 0.0052435491089 0.21178326158
However, I don't sure that I make in the right way. Is it possible to do this without initial values in parameters? Because I used previous values for B0, B1, and B2 from literature.
If you gonna see my dataset and my process you could access this notebook in Google Colab.

Your script has just one problem. The definition of z needs to be a Param or MV type instead of a CV as z = m.Param(value=ym) because it is an input to your objective function.
You can also use the built-in objective if you define y as a CV instead of a Var. You just need to turn on the feedback status FSTATUS=1 to use an objective function that minimizes the difference between the measurements and model predictions. Here is a modified version of your script.
from gekko import GEKKO
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
url = 'http://apmonitor.com/pdc/uploads/Main/data_2nd_order.txt'
data = pd.read_csv(url)
m = GEKKO()
xm = np.array(data['time'])
x = m.Param(value=xm)
B0 = m.FV(1); B1 = m.FV(1); B2 = m.FV(1)
B0.STATUS = 1; B1.STATUS = 1; B2.STATUS = 1
ym = np.array(data['output (y)'])
y = m.CV(value=ym)
y.FSTATUS = 1
yi = m.Intermediate(B0 * ((1 - m.exp(-B1 *x))**B2))
m.Equation(y==yi)
m.options.IMODE = 2
m.options.SOLVER = 1
m.solve(disp=True)
print(B0.value[0],B1.value[0],B2.value[0])
plt.plot(xm,ym,'ro')
plt.plot(xm,y.value,'b--')
plt.show()

Related

Solving Receding Horizon Control in GEKKO

I'm trying to implement a receding horizon control (RHC) scheme using GEKKO in Python, and I'd like to check my formulation. The goal is to solve the OCP over some horizon from t=tk to t=tk+H-1, apply the control solution at tk, and discard the remaining values (u_k+1 to u_k+H-1). The following code appears to give the correct solution, but I want to verify I've used the correct functions in GEKKO, namely when "resetting" the states for the next horizon. I had a few issues trying to use the .VALUE function to reset x1 and x2, e.g. TypeError: 'float' object is not subscriptable.
import numpy as np
import matplotlib.pylab as plt
from gekko import GEKKO
if __name__ == '__main__':
# Instantiate GEKKO
m = GEKKO()
# Constants
nRHC = 21
tRHC = 2
m.time = np.linspace(0, tRHC, nRHC)
# Control
u = m.MV(value=0.0,fixed_initial=False)
u.STATUS = 1
u.DCOST = 0
# Vars
t = m.SV(value=0)
x1 = m.SV(value=1)
x2 = m.SV(value=0)
# Equations
m.Equation(t.dt() == 1)
m.Equation(x1.dt() == x2)
m.Equation(x2.dt() == (1 - x2*x2)*x1 - x2 + u)
# Objective Function
m.Minimize(10*x1**2 + 10*x2**2 + u**2)
# Solve RHC
m.options.IMODE = 6
m.options.NODES = 11
m.options.MV_TYPE = 2
m.options.SOLVER = 3
nTotal = 101
tTotal = np.linspace(0, 10, nTotal)
uStore = np.zeros((1,nTotal))
xStore = np.zeros((2,nTotal))
xStore[:,0] = [1, 0]
for i in range(nTotal):
print('Solving Step: ', i+1, ' of ', nTotal-1)
if i == nTotal-1:
break
# Solve MPC over horizon
m.solve(disp=False)
# Update States
t.VALUE = t[1]
x1.MEAS = x1[1]
x2.MEAS = x2[1]
# Store
uStore[:,i] = u.NEWVAL
xStore[:,i+1] = np.array([x1[1], x2[1]])
# Plot States
f1, axs = plt.subplots(2)
axs[0].plot(tTotal, xStore[0,:])
axs[0].set_ylabel('x')
axs[0].grid()
axs[1].plot(tTotal, xStore[1,:])
axs[1].set_ylabel('x_dot')
axs[1].set_xlabel('time')
axs[1].grid()
# Show Plots
plt.show()
Thank you!
There is no need to update the states because Gekko does this automatically.
# Update States
t.VALUE = t[1]
x1.MEAS = x1[1]
x2.MEAS = x2[1]
The state values are stored in run directory files (see m.path or open with m.open_folder()). The file is ctl.t0. At the next command m.solve(), that file is imported and time shifted to make the values at the next time step the initial conditions. The time shift is adjusted with m.options.TIME_SHIFT=1 (1 is the default). If you do want to override the initial condition, use x1.MEAS=x1.value[1] or x1.value=x1.value[1].

System of FOPDT equations in GEKKO - use more than one input

I want to use two inputs or more to create a more precise estimation of a variable. I already estimated it using only one input and one FOPDT equation, but when I try to add one more input and the respective k, tau and theta, along with another equation, i get "Solution Not Found" error. Can I create a system of equations this way?
More details about the solver output below. Even though I added more variables than equations to use more than one input, this made my problem have negative degrees of freedom.
--------- APM Model Size ------------
Each time step contains
Objects : 2
Constants : 0
Variables : 15
Intermediates: 0
Connections : 4
Equations : 6
Residuals : 6
Number of state variables: 1918
Number of total equations: - 2151
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : -233
* Warning: DOF <= 0
**********************************************
Dynamic Estimation with Interior Point Solver
**********************************************
Info: Exact Hessian
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************
This is Ipopt version 3.12.10, running with linear solver ma57.
Number of nonzeros in equality constraint Jacobian...: 6451
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 1673
Exception of type: TOO_FEW_DOF in file "IpIpoptApplication.cpp" at line 891:
Exception message: status != TOO_FEW_DEGREES_OF_FREEDOM evaluated false: Too few degrees of freedom (rethrown)!
EXIT: Problem has too few degrees of freedom.
An error occured.
The error code is -10
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 2.279999999154825E-002 sec
Objective : 0.000000000000000E+000
Unsuccessful with error code 0
---------------------------------------------------
Creating file: infeasibilities.txt
Use command apm_get(server,app,'infeasibilities.txt') to retrieve file
#error: Solution Not Found
And here's the code
from gekko import GEKKO
import numpy as np
import pandas as pd
import plotly.express as px
d19jc = d19jc.dropna()
d19jcSlice = d19jc.loc['2019-10-22 05:30:00':'2019-10-22 09:30:00'] #jc22
d19jcSlice.index = pd.to_datetime(d19jcSlice.index)
d19jcSliceGroupMin = d19jcSlice.groupby(pd.Grouper(freq='T')).mean()
data = d19jcSliceGroupMin.dropna()
xdf1 = data['Cond_PID_SP']
xdf2 = data['Front_PID_SP']
ydf1 = data['Cond_Center_Top_TC']
xms1 = pd.Series(xdf1)
xm1 = np.array(xms1)
xms2 = pd.Series(xdf2)
xm2 = np.array(xms2)
yms = pd.Series(ydf1)
ym = np.array(yms)
xm_r = len(xm1)
tm = np.linspace(0,xm_r-1,xm_r)
m = GEKKO()
m.options.IMODE=5
m.time = tm; time = m.Var(0); m.Equation(time.dt()==1)
k1 = m.FV(lb=0.1,ub=5); k1.STATUS=1
tau1 = m.FV(lb=1,ub=300); tau1.STATUS=1
theta1 = m.FV(lb=0,ub=30); theta1.STATUS=1
k2 = m.FV(lb=0.1,ub=5); k2.STATUS=1
tau2 = m.FV(lb=1,ub=300); tau2.STATUS=1
theta2 = m.FV(lb=0,ub=30); theta2.STATUS=1
# create cubic spline with t versus u
uc1 = m.Var(xm1); tc1 = m.Var(tm); m.Equation(tc1==time-theta1)
m.cspline(tc1,uc1,tm,xm1,bound_x=False)
# create cubic spline with t versus u
uc2 = m.Var(xm2); tc2 = m.Var(tm); m.Equation(tc2==time-theta2)
m.cspline(tc2,uc2,tm,xm2,bound_x=False)
x1 = m.Param(value=xm1)
x2 = m.Param(value=xm2)
y = m.Var(value=ym)
yObj = m.Param(value=ym)
m.Equation(tau1*y.dt()+(y-ym[0])==k1 * (uc1-xm1[0]))
m.Equation(tau2*y.dt()+(y-ym[0])==k2 * (uc2-xm2[0]))
m.Minimize((y-yObj)**2)
m.options.EV_TYPE=2
print('solve start')
m.solve(disp=True)
print('k1: ', k1.value[0])
print('tau1: ', tau1.value[0])
print('theta1: ', theta1.value[0])
print('k2: ', k2.value[0])
print('tau2: ', tau2.value[0])
print('theta2: ', theta2.value[0])
df_plot = pd.DataFrame({'DateTime' : data.index,
'Cond_Center_Top_TC' : np.array(yObj),
'Fit Cond_Center_Top_TC - Train' : np.array(y),
figGekko = px.line(df_plot,
x='DateTime',
y=['Cond_Center_Top_TC','Fit Cond_Center_Top_TC - Train'],
labels={"value": "Degrees Celsius"},
title = "(Cond_PID_SP & Front_PID_SP) -> Cond_Center_Top_TC JC only - Train")
figGekko.update_layout(legend_title_text='')
figGekko.show()
You currently have only one variable and two equations.
y = m.Var(value=ym)
yObj = m.Param(value=ym)
m.Equation(tau1*y.dt()+(y-ym[0])==k1 * (uc1-xm1[0]))
m.Equation(tau2*y.dt()+(y-ym[0])==k2 * (uc2-xm2[0]))
You need to create two separate variables y1 and y2 for the two equations.
y1 = m.Var(value=ym)
y2 = m.Var(value=ym)
yObj = m.Param(value=ym)
m.Equation(tau1*y1.dt()+(y1-ym[0])==k1 * (uc1-xm1[0]))
m.Equation(tau2*y2.dt()+(y2-ym[0])==k2 * (uc2-xm2[0]))
You may also need to create two separate measurement vectors for ym1 and ym2 if you have different data for each.
Edit: Multiple Input, Single Output (MISO) System
For a MISO system, you need to adjust the equation as shown in the Process Dynamics and Control course.
y = m.Var(value=ym)
yObj = m.Param(value=ym)
m.Equation(tau*y.dt()+(y-ym[0])==k1 * (uc1-xm1[0]) + k2 * (uc2-xm2[0]))
There is only one time constant in this form. If they need different time constants, you can also add the two outputs together with:
y1 = m.Var(value=ym[0])
y2 = m.Var(value=ym[0])
y = m.Var(value=ym)
yObj = m.Param(value=ym)
m.Equation(tau1*y1.dt()+(y1-ym[0])==k1 * (uc1-xm1[0]))
m.Equation(tau2*y2.dt()+(y2-ym[0])==k2 * (uc2-xm2[0]))
m.Equation(y==y1+y2)
In this case, y is the variable that has data and y1 and y2 are unmeasured states.
Please see the below example code for a FOPDT model that has two inputs and one output. Both the transfer function models have different FOPDT parameters, including different deadtimes as well.
It is still in a dynamic simulation mode (IMODE=4), but you can start from this modifying a little bit toward the dynamic estimation mode (IMODE=5) and MPC mode (IMODE=6) later.
from gekko import GEKKO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
tf = 100
npt = 101
t = np.linspace(0,tf,npt)
u1 = np.zeros(npt)
u2 = np.zeros(npt)
u1[10:] = 5
u2[40:] = -5
m = GEKKO(remote=True)
m.time = t
time = m.Var(0)
m.Equation(time.dt()==1)
K1 = m.FV(1,lb=0,ub=1); K1.STATUS=1
tau1 = m.FV(5, lb=1,ub=300); tau1.STATUS=1
theta1 = m.FV(10, lb=2,ub=30); theta1.STATUS=1
K2 = m.FV(0.5,lb=0,ub=1); K2.STATUS=1
tau2 = m.FV(10, lb=1,ub=300); tau2.STATUS=1
theta2 = m.FV(20, lb=2,ub=30); theta2.STATUS=1
uc1 = m.Var(u1)
uc2 = m.Var(u2)
tc1 = m.Var(t)
tc2 = m.Var(t)
m.Equation(tc1==time-theta1)
m.Equation(tc2==time-theta2)
m.cspline(tc1,uc1,t,u1,bound_x=False)
m.cspline(tc2,uc2,t,u2,bound_x=False)
yp = m.Var()
yp1 = m.Var()
yp2 = m.Var()
m.Equation(yp1.dt() == -yp1/tau1 + K1*uc1/tau1)
m.Equation(yp2.dt() == -yp2/tau2 + K2*uc2/tau2)
m.Equation(yp == yp1 + yp2)
m.options.IMODE=4
m.solve()
print('K1: ', K1.value[0])
print('tau1: ', tau1.value[0])
print('theta1: ', theta1.value[0])
print('')
print('K2: ', K2.value[0])
print('tau2: ', tau2.value[0])
print('theta2: ', theta2.value[0])
plt.figure()
plt.subplot(2,1,1)
plt.plot(t,u1)
plt.plot(t,u2)
plt.legend([r'u1', r'u2'])
plt.ylabel('Inputs 1 & 2')
plt.subplot(2,1,2)
plt.plot(t,yp)
plt.legend([r'y1'])
plt.ylabel('Output')
plt.xlabel('Time')
plt.savefig('sysid.png')
plt.show()
K1: 1.0
tau1: 5.0
theta1: 10.0
K2: 0.5
tau2: 10.0
theta2: 20.0

Convex optimization and in python

I use excel to minimize a variable and I started using cvxopt recently. I am trying to figure out how to minimize a value given two constraints. I have two returns data frame and taking the weights w1 and w2multiplying with the returns and subtracting them. I am finding to minimize the sharpe ratio for the difference of the returns by changing the weights. The constraints here is sum of w1 = 1 and sum of w2= 1
In Excel I use solver add in and add constraints $S$4 = 1 and $s$5= 1. I am trying to figure out how to do that in python cvxopt. Below is the code I have written for cvxopt in creating an efficient frontier. I would really appreciate any help.
'import numpy as np
import matplotlib.pyplot as plt
import cvxopt as opt
from cvxopt import blas, solvers
import pandas as pd'
`
def random_portfolio(returns1, returns2):
#Returns the mean and standard deviation of returns for a random portfolio
p1 = np.asmatrix(np.nanmean(returns1, axis=1))
w1 = np.asmatrix(rand_weights(returns1.shape[0]))
mu1 = w 1* p1.T
p2 = np.asmatrix(np.nanmean(returns2, axis=1))
w2 = np.asmatrix(rand_weights(returns2.shape[0]))
mu2 = w 1* p1.T
final = mu1- mu2
mean_ret = mean(final)
voltality = std(final)
sharpe = mean_ret/voltality
n = len(returns1)
G = -opt.matrix(np.eye(n)) # negative n x n identity matrix
h = opt.matrix(0.0, (n ,1))
A = opt.matrix(1.0, (1, n))
b = opt.matrix(1.0)
portfolios = solvers.qp(-sharpe, G, h, A, b)['x']
returns = [blas.dot(mu, x) for x in portfolios]
risks = [np.sqrt(blas.dot(x, C*x)) for x in portfolios]
return mean_ret, voltality, sharpe
`

Constrained regression in Python

I have this simple regression model:
y = a + b * x + c * z + error
with a constraint on parameters:
c = b - 1
There are similar questions posted on SO (like Constrained Linear Regression in Python). However, the constraints' type is lb <= parameter =< ub.
What are the available options to handle this specific constrained linear regression problem?
This is how it can be done using GLM:
import statsmodels
import statsmodels.api as sm
import numpy as np
# Set the link function to identity
statsmodels.genmod.families.links.identity()
OLS_from_GLM = sm.GLM(y, sm.add_constant(np.column_stack(x, z)))
'''Setting the restrictions on parameters in the form of (R, q), where R
and q are constraints' matrix and constraints' values, respectively. As
for the restriction in the aforementioned regression model, i.e.,
c = b - 1 or b - c = 1, R = [0, 1, -1] and q = 1.'''
res_OLS_from_GLM = OLS_from_GLM.fit_constrained(([0, 1.0, -1.0], 1))
print(res_OLS_from_GLM.summary())
There are a few constrained optimization packages in Python such as CVX, CASADI, GEKKO, Pyomo, and others that can solve the problem. I develop Gekko for linear, nonlinear, and mixed integer optimization problems with differential or algebraic constraints.
import numpy as np
from gekko import GEKKO
# Data
x = np.random.rand(10)
y = np.random.rand(10)
z = np.random.rand(10)
# Gekko for constrained regression
m = GEKKO(remote=False); m.options.IMODE=2
a,b,c = m.Array(m.FV,3)
a.STATUS=1; b.STATUS=1; c.STATUS=1
x=m.Param(x); z=m.Param(z)
y = m.Var(); ym=m.Param(y)
m.Equation(y==a+b*x+c*z)
m.Equation(c==b-1)
m.Minimize((ym-y)**2)
m.options.SOLVER=1
m.solve(disp=True)
print(a.value[0],b.value[0],c.value[0])
This gives the solution that may be different when you run it because it uses random values for the data.
-0.021514129645 0.45830726553 -0.54169273447
The constraint c = b - 1 is satisfied with -0.54169273447 = 0.45830726553 - 1. Here is a comparison to other linear regression packages in Python with an without constraints:
import numpy as np
from scipy.stats import linregress
import statsmodels.api as sm
import matplotlib.pyplot as plt
from gekko import GEKKO
# Data
x = np.array([4,5,2,3,-1,1,6,7])
y = np.array([0.3,0.8,-0.05,0.1,-0.8,-0.5,0.5,0.65])
# calculate R^2
def rsq(y1,y2):
yresid= y1 - y2
SSresid = np.sum(yresid**2)
SStotal = len(y1) * np.var(y1)
r2 = 1 - SSresid/SStotal
return r2
# Method 1: scipy linregress
slope,intercept,r,p_value,std_err = linregress(x,y)
a = [slope,intercept]
print('R^2 linregress = '+str(r**2))
# Method 2: numpy polyfit (1=linear)
a = np.polyfit(x,y,1); print(a)
yfit = np.polyval(a,x)
print('R^2 polyfit = '+str(rsq(y,yfit)))
# Method 3: numpy linalg solution
# y = X a
# X^T y = X^T X a
X = np.vstack((x,np.ones(len(x)))).T
# matrix operations
XX = np.dot(X.T,X)
XTy = np.dot(X.T,y)
a = np.linalg.solve(XX,XTy)
# same solution with lstsq
a = np.linalg.lstsq(X,y,rcond=None)[0]
yfit = a[0]*x+a[1]; print(a)
print('R^2 matrix = '+str(rsq(y,yfit)))
# Method 4: statsmodels ordinary least squares
X = sm.add_constant(x,prepend=False)
model = sm.OLS(y,X).fit()
yfit = model.predict(X)
a = model.params
print(model.summary())
# Method 5: Gekko for constrained regression
m = GEKKO(remote=False); m.options.IMODE=2
c = m.Array(m.FV,2); c[0].STATUS=1; c[1].STATUS=1
c[1].lower=-0.5
xd = m.Param(x); yd = m.Param(y); yp = m.Var()
m.Equation(yp==c[0]*xd+c[1])
m.Minimize((yd-yp)**2)
m.solve(disp=False)
c = [c[0].value[0],c[1].value[1]]
print(c)
# plot data and regressed line
plt.plot(x,y,'ko',label='data')
xp = np.linspace(-2,8,100)
slope = str(np.round(a[0],2))
intercept = str(np.round(a[1],2))
eqn = 'LstSQ: y='+slope+'x'+intercept
plt.plot(xp,a[0]*xp+a[1],'r-',label=eqn)
slope = str(np.round(c[0],2))
intercept = str(np.round(c[1],2))
eqn = 'Constraint: y='+slope+'x'+intercept
plt.plot(xp,c[0]*xp+c[1],'b--',label=eqn)
plt.grid()
plt.legend()
plt.show()

Multiple linear regression python

I use multiple linear regression, I have one dependant variable (var) and several independant variables (varM1, varM2,...)
I use this code in python:
z=array([varM1, varM2, varM3],int32)
n=max(shape(var))
X = vstack([np.ones(n), z]).T
a = np.linalg.lstsq(X, var)[0]
How can I calculate the R-square change for every variable with python ? I would like to see how the regression changes if I add or remove predictor variables.
If the broadcasting is correct along the way the following should give you the correlation coefficient R:
R = np.sqrt( ((var - X.dot(a))**2).sum() )
One full example of multi-variate regression:
import numpy as np
x1 = np.array([1,2,3,4,5,6])
x2 = np.array([1,1.5,2,2.5,3.5,6])
x3 = np.array([6,5,4,3,2,1])
y = np.random.random(6)
nvar = 3
one = np.ones(x1.shape)
A = np.vstack((x1,one,x2,one,x3,one)).T.reshape(nvar,x1.shape[0],2)
for i,Ai in enumerate(A):
a = np.linalg.lstsq(Ai,y)[0]
R = np.sqrt( ((y - Ai.dot(a))**2).sum() )
print R

Categories

Resources