I'm using GEKKO for a time-dependent DEs model. I am tracking the concentrations of species in a well-mixed compartment model using GEKKO variables; however, I can't view the time-dependent arrays of the concentrations within the spyder IDE. I can plot the concentrations and view the values as graphs, or use the values as reported in the "results" excel file, but it would be useful to see the time-dependent values within spyder. Right now, within the variable explorer of spyder, the variables appear as type gk_variable.GKVariable.
This is a sample of the code I use
from gekko import GEKKO
import numpy as np
import pandas as pd
myFile = pd.read_csv('Time_Array.csv', sep = ',')
myFile.index
myFile.columns
time = np.array(myFile)
m.time = time
FCG = m.Var(value = 6.934045, lb = 0)
CMG = m.Var(value = 7.01148, lb = 0)
m.Equations([FCG.dt() == Qdiet/Vintestines + Qsynintestine + Kfclg*FCL*(Vliver/Vintestines) - Kfcex*FCG - Kfccm*FCG, CMG.dt() == Kfccm*FCG - Kcmgb * CMG])
m.options.IMODE = 4
m.solve(disp = True)
The above code is a snippet of the model. In the actual model, there are a lot more constants and equations; however, I'd basically like to view the time dependent values of concentrations like FCG and CMG without leaving the spyder IDE. I know that you can do this by inserting print statements in the code, I was wondering if anyone knew of any other cleaner fixes.
Thanks so much!
The .value property is available to see the numeric values of a Gekko parameter or variable. This doesn't show up in the variable explorer so one work-around is to create a new Python variable.
fcg_values = FCG.value
You can also see the first value:
fcg_init = FCG.value[0]
the last value:
fcg_last = FCG.value[-1]
or also a range of values in between:
fcg_inner = FCG.value[2:5]
Related
I want to implement MLE (Maximum likelihood estimation) with gekko package in python. Suppose that we have a DataFrame that contains two columns: ['Loss', 'Target'] and it length is equal to 500.
First we have to import packages that we need:
from gekko import GEKKO
import numpy as np
import pandas as pd
Then we simply create the DataFrame like this:
My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
My_DataFrame
It going to be look like this:
Some components of [‘Target’] column should be calculated with a formula that I wrote it right down below in the picture(and the rest of them remains zero. I explained more in continue, please keep reading) so you can see it perfectly. Two main elements of formula are ‘Kasi’ and ‘Betaa’. I want to find best value for them that maximize sum of My_DataFrame[‘Target’]. So you got the idea and what is going to happen!
Now let me show you how I wrote the code for this purpose. First I define my objective function:
def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'
[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))
return My_DataFrame["Target"]
if you got confused what's happening in for loop in obj_function function, check picture below, it contains a brief example! and if not, skip this part :
Then just we need to go through optimization. I use gekko package for this purpose. Note that I want to find best values of ‘Kasi’ and ‘Betaa’ so I have two main variables and I don’t have any kind of constraints!
So let’s get started:
# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()
# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd , value = [0.7 , 20])
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'
qw.Maximize(np.sum(obj_function(x)))
And then when I want to solve the optimization with qw.solve():
qw.solve()
But i got this error:
Exception: This steady-state IMODE only allows scalar values.
How can I fix this problem? (Complete script gathered in next section for the purpose of convenience)
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'
[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))
return My_DataFrame["Target"]
# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()
# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'
qw.Maximize(qw.sum(obj_function(x)))
proposed potential script is here:
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.read_excel("[<FILE_PATH_IN_YOUR_MACHINE>]\\Losses.xlsx")
# i'll put link of "Losses.xlsx" file in the end of my explaination
# so you can download it from my google drive.
loss = My_DataFrame["Loss"]
def obj_function(x):
k,b = x
target = []
for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# bounds
k,b = x
k.lower=0.1; k.upper=0.8
b.lower=10; b.upper=500
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])
python output:
objective function = -1155.4861315885942
b = 500.0
k = 0.1
note that in python output b is representing "Betaa" and k is representing "Kasi".
output seems abit strange, so i decide to test it! for this purpose I used Microsoft Excel Solver!
(i put the link of excel file at the end of my explaination so you can check it out yourself if
you want.) as you can see in picture bellow, optimization by excel has been done and optimal solution
has been found successfully (see picture bellow for optimization result).
excel output:
objective function = -108.21
Betaa = 32.53161
Kasi = 0.436246
as you can see there is huge difference between python output and excel output and seems that excel is performing pretty well! so i guess problem still stands and proposed python script is not performing well...
Implementation_in_Excel.xls file of Optimization by Microsoft excel application is available here.(also you can see the optimization options in Data tab --> Analysis --> Slover.)
data that used for optimization in excel and python are same and it's available here (it's pretty simple and contains 501 rows and 1 column).
*if you can't download the files, let me know then I'll update them.
The initialization is applying the values of [0.7, 20] to each parameter. A scalar should be used to initialize value instead such as:
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
Another issue is that gekko needs to use special functions to perform automatic differentiation for the solvers. For the objective function, switch to the gekko version of summation as:
qw.Maximize(qw.sum(obj_function(x)))
If loss is computed by changing the values of x then the objective function has logical expressions that need special treatment for solution with gradient-based solvers. Try using the if3() function for a conditional statement or else slack variables (preferred). The objective function is evaluated once to build a symbolic expressions that are then compiled to byte-code and solved with one of the solvers. The symbolic expressions are found in m.path in the gk0_model.apm file.
Response to Edit
Thanks for posting an edit with the complete code. Here is a potential solution:
from gekko import GEKKO
import numpy as np
import pandas as pd
loss = np.linspace(-555.795 , 477.841 , 500)
def obj_function(x):
k,b = x
target = []
for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# bounds
k,b = x
k.lower=0.6; k.upper=0.8
b.lower=10; b.upper=30
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])
The solver reaches bounds at the solution. The bounds may need to be widened so that arbitrary limits are not the solution.
Update
Here is a final solution. That objective function in code had a problem so it should be fixed Here is the correct script:
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.read_excel("<FILE_PATH_IN_YOUR_MACHINE>\\Losses.xlsx")
loss = My_DataFrame["Loss"]
def obj_function(x):
k,b = x
q = ((-1/k)-1)
target = []
for iloss in loss:
if iloss>160:
t = qw.log(1/b) + q* ( qw.log(b+k*(iloss-160)) - qw.log(b))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
qw.Maximize(qw.sum(obj_function(x)))
qw.solve()
print('Kasi = ',x[0].value)
print('Betaa = ',x[1].value)
Output:
The final value of the objective function is 108.20609317143486
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.031200000000000006 sec
Objective : 108.20609317143486
Successful solution
---------------------------------------------------
Kasi = [0.436245842]
Betaa = [32.531632983]
Results are close to the optimization result from Microsoft Excel.
qw.Maximize() only sets the objective of the optimization, you still need to call solve() on your model.
If I can see correctly, My_DataFrame has been defined in the global scope.
The problem is that the obj_funtion tries to access it (successful) and then, modify it's value (fails)
This is because you can't modify global variables from a local scope by default.
Fix:
At the beginning of the obj_function, add a line:
def obj_function(Array):
# comments
global My_DataFrame
for item .... # remains same
This should fix your problem.
Additional Note:
If you just wanted to access My_DataFrame, it would work without any errors and you don't need to add the global keyword
Also, just wanted to appreciate the effort you put into this. There's a proper explanation of what you want to do, relevant background information, an excellent diagram (Whiteboard is pretty great too), and even a minimal working example.
This should be how all SO questions are, it would make everyone's lives easier
I'm trying to do some approximate Bayesian computing, and am able to use the pm.Simulator class to estimate functions with 2 or more parameters (where each parameter is actually an array of multiple values). However, when I try to estimate values of a single parameter function, I get an error.
The simplest working example (loosely based on the actual code):
# 2 parameter pm.Simulator snippet that *works*
import pymc3 as pm
import numpy as np
def get_mean_sig2(mu,sigma):
multi_var = np.random.normal(mu,sigma)
return multi_var
# create the observed data
obs2 = get_mean_sig2(np.array([10,5,2,1]), np.array([0.5,1,2,1]))
with pm.Model() as m91:
mu = pm.Uniform('mu', lower=1, upper=15, shape=obs2.shape[0])
sigma = pm.Uniform('sigma',lower=0.25, upper=3,shape=obs2.shape[0])
sim = pm.Simulator('sim', get_mean_sig2,params=(mu,sigma),observed=obs2)
jj = pm.sample_smc(kernel='ABC')
When I remove the 'sigma' parameter, and simplify the problem to only estimating the mean with this code:
# 1 parameter pm.Simulator snippet that doesn't work
def get_only_mean(mu):
multi_var = np.random.normal(mu,0.2)
return multi_var
obs = get_only_mean(np.array([10,5,2,1]))
with pm.Model() as m90:
mu = pm.Uniform('mu', lower=1, upper=15, shape=obs.shape[0])
sim = pm.Simulator('sim', get_only_mean,params=(mu),observed=obs)
jj = pm.sample_smc(kernel='ABC')
I get the error message ValueError: Length of mu ~ Uniform cannot be determined . I have tried
variations of inputting shape=(1,obs.shape[0]) or manually setting shape=4 for the 'shape' parameter's input - but failed.
I'm unable to understand why this problem suddenly appears - any help would be appreciated.
My environment/system config is:
OS: Linux Mint 19.2
Python 3.8.5
numpy 1.19.5
pymc3 3.11.0
theano 1.1.0
The error disappears when the variable/s are put into a list rather than a tuple.
For the single-parameter example, using params=[mu] instead of params=(mu) solves the issue.
A list is a valid data-type for multi-parameter situations too - eg. params=(mu, sigma) is equivalent to params=[mu, sigma].
I've recently started trying out moving horizon estimation with GEKKO. My specified manipulated variables are used in a heat balance equation within my model, and I am having some issues with the matrix operations in the model.
Example code:
from gekko import GEKKO
import numpy as np
#creating a sample array of input values
nt = 51
u_meas = np.zeros(nt)
u_meas[3:10] = 1.0
u_meas[10:20] = 2.0
u_meas[20:40] = 0.5
u_meas[40:] = 3.0
p = GEKKO(remote=False)
p.time = np.linspace(0,10,nt)
n = 1 #process model order
#designating u as my input, and that I'm going to be using these measurements to estimate my parameters with MHE
p.u = p.MV(value=u_meas)
p.u.FSTATUS=1
#parameters I'm looking to modulate
p.K = p.FV(value=1, lb = 1, ub = 3) #gain
p.tau = p.FV(value=5, lb = 1, ub = 10) #time constant
p.x = [p.Intermediate(p.u)]
#constants within the model that do not change
X_O2 = 0.5
X_SiO2 = 0.25
X_N2 = 0.1
m_feed = 100
#creating an array with my feed separated into components. This creates a 1D array with the individual feed streams of my components.
mdot_F_i = (np.tile(m_feed,3)*np.array([X_O2, X_SiO2, X_N2])
#at this point, I want to add my MV values to the end of my component feed array for later heat and mass balance equations. Normally, in my previous model without MHE, I would put
mdot_c_i = np.concatenate(mdot_F_i, x, (other MV variables after))
However, now that u is a specified MV in GEKKO, and not a set value, I get an error at the mdot_c_i line that says that the array at index 0 has 1 dimension, and the array at index 1 has 2 dimensions.
I'm guessing that I have to specify mdot_c_i as an intermediate variable within Gekko. I've tried a couple different variations, alternately specifying mdot_c_i as an intermediate and trying to use only the values of the MV; however, I keep getting that error.
Has anyone experiences similar issues to this?
Thank you!
You can resolve this by using np.append() instead of np.concatenate(). Try something like:
mdot_c_i = np.append(mdot_F_i, p.u)
Here is a minimum and complete example if you'd like to try it.
import numpy as np
from gekko import GEKKO
m = GEKKO(remote=False)
x = m.Array(m.Var,3,lb=-10,ub=10)
y = m.Var(5,lb=-5,ub=5)
z = np.append(x,y)
m.Minimize(np.dot([1,1,-1,1],z))
m.solve(disp=False)
print([zi.value[0] for zi in z])
# solution: [-10.0, -10.0, 10.0, -5.0]
Gekko variables need to be stored as objects, not as numerical values. The error may be because the np.concatenate() function is trying to access the length of the Gekko manipulated variable data p.u.value to concatenate those values instead of concatenating p.u as an object.
I have used Gekko from APM in Python to solve an optimization problem. The two main decision variables (DVs) are large arrays. The problem has converged successfully, however, I need the results of these tables in an excel worksheet for further work.
An example variable name is 's'. Since the arrays created within Gekko are GKVariable/Object variable types I cannot simply use:
pd.DataFrame(s).to_csv(r'C:\Users\...\s.csv')
because the result gives every cell of the array the label of each variable defined in the model (i.e. v1, v2, etc.)
Using print 's' within the kernel will show the numbers of the array from the optimization results but in a format that doesn't guarantee that each line is a new row of the matrix because of the many columns.
Is there another solution to copy just the resulting value of the DV 's' so it becomes a normal np.array instead of the object type variable? Open to any ideas for this.
You can use s[i].value[0]`` for steady state problems (IMODE=1orIMODE=3) ors[i].value[:]``` to access the array of values for all other IMODE options. Here is a simple example with writing the results to a file with pandas and numpy.
import numpy as np
from gekko import GEKKO
import pandas as pd
m = GEKKO(remote=False)
# Random 3x3
A = np.random.rand(3,3)
# Random 3x1
b = np.random.rand(3,1)
# Ax = b
y = m.axb(A,b)
m.solve()
yn = [y[i].value[0] for i in range(3)]
print(yn)
pd.DataFrame(yn).to_csv(r'y1.csv')
np.savetxt('y2.csv',yn,delimiter=',',comments='')
I am having an issue with this function. I am wanting to perform a cross-sectional regression on 25 portfolios ranked on value and size. I have 7 independent variables as the right side of the equation.
import pandas as pd
import numpy as np
from linearmodels import FamaMacBeth
#creating a multi_index of independent variables
ind_var = pd.read_excel('FAMA_MACBETH.xlsx')
ind_var['date'] = pd.to_datetime(ind_var['date'])
# dropping our dependent variables
ind_var = ind_var.drop(['Mkt_rf', 'div_innovations', 'term_innovations',
'def_innovations', 'rf_innovations', 'hml_innovations',
'smb_innovations'],axis = 1)
ind_var = pd.DataFrame(ind_var.set_index('date').stack())
ind_var.columns = ['x']
x = np.asarray(ind_var)
len(x)
11600
#creatiing a multi_index of dependent variables
# reading in our data
dep_var = pd.read_excel('FAMA_MACBETH.xlsx')
dep_var['date'] = pd.to_datetime(dep_var['date'])
# dropping our independent variables
dep_var = dep_var.drop(['SMALL_LoBM', 'ME1_BM2', 'ME1_BM3', 'ME1_BM4',
'SMALL_HiBM', 'ME2_BM1', 'ME2_BM2', 'ME2_BM3', 'ME2_BM4', 'ME2_BM5',
'ME3_BM1', 'ME3_BM2', 'ME3_BM3', 'ME3_BM4', 'ME3_BM5', 'ME4_BM1',
'ME4_BM2', 'ME4_BM3', 'ME4_BM4', 'ME4_BM5', 'BIG_LoBM', 'ME5_BM2',
'ME5_BM3', 'ME5_BM4', 'BIG_HiBM'],axis = 1)
dep_var = pd.DataFrame(dep_var.set_index('date').stack())
dep_var.columns = ['y']
y = np.asarray(dep_var)
len(y)
3248
mod = FamaMacBeth(y, x)
res = mod.fit(cov_type='kernel', kernel='Parzen')
output with tstats and errors ideally
I have tried numerous methods of getting this to work. I am really thinking of using SAS at this point. Really, I would prefer to get this running with pandas
I expect a cross-sectional regression output with standard errors and t stats
I got it to work in one go. See this site and run the lines of code for OLS below: "Here the difference is presented using the canonical Grunfeld data on investment."
(Note that this line is important: etdata = data.set_index(['firm','year']), else Python won't know the correct dimensions to run F&McB on.)
Then run:
from linearmodels import FamaMacBeth
FamaMacBeth(etdata.invest,etdata[['value','capital']]).fit()
Note, I updated linearmodels to the latest version, that got me access to the data.