Ax.dev does not do anything with search space constraints? - python

I found some articles online that mentioned Ax.dev's capability to cope with a constrained search space (e.g. dimension_x + dimension_y <= bound). However, I only experienced Ax.dev to ignore/violate all constraints. I have tried some different constraints on the Hartmann6d example. I assume Ax.dev models the constraints as soft constraints (not sure though, might as well be my coding skills...). So, my first question is: does Ax.dev SearchSpace use parameter_constraints as soft or hard constraint(s).
My second problem:
from ax import *
number of parameters
...
c0 = SumConstraint(parameters=[ some parameters ], bound= some boundary)
c1...
space = SearchSpace(parameters=[ parameters ], parameter_constraints=[c0, c1])
exp = SimpleExperiment(
name='EXPERIMENT5',
search_space=space,
evaluation_function=black_box_function,
objective_name='BLABLA',
minimize=False,
)
sobol = Models.SOBOL(exp.search_space)
for i in range(10):
exp.new_trial(generator_run=sobol.gen(1))
exp.trials[len(exp.trials) - 1].run()
returns
SearchSpaceExhausted: Rejection sampling error (specified maximum draws (100000) exhausted, without finding sufficiently many (1) candidates). This likely means that there are no new points left in the search space.
I have not been able to find useful information concerning this, despite all promising articles online stating ax.dev benefits (such as a constrained parameter space!) :(

meta-comment: Probably a better place for GitHub issues (not much by way of Ax help/docs on stackoverflow to my knowledge, but their GitHub issues is rich and generally has a lot of developer/community support).
Does Ax.dev SearchSpace use parameter_constraints as soft or hard constraint(s)?
I think parameter constraints are hard constraints (pretty sure that's the case at least for Sobol sampling, but I'm not sure about Bayesian models).
Outcome constraints are soft penalties (constraints)
SearchSpaceExhausted: Rejection sampling error
Search: https://github.com/facebook/Ax/issues?q=is%3Aissue+sort%3Aupdated-desc+specified+maximum+draws+is%3Aclosed
--> https://github.com/facebook/Ax/issues/694
--> https://github.com/facebook/Ax/issues/694#issuecomment-987353936
Since it's on master, first I install the latest version in a conda environment:
pip install 'git+https://github.com/facebook/Ax.git#egg=ax-platform'
The relevant imports are:
from ax.modelbridge.generation_strategy import GenerationStrategy, GenerationStep
from ax.modelbridge.registry import Models
Then based on 1A. Manually configured generation strategy, I change the first GenerationStep model_kwargs from:
model_kwargs={"seed": 999}
to
model_kwargs={
"seed": 999,
"fallback_to_sample_polytope": True,
}
With the full generation strategy (gs) given by:
gs = GenerationStrategy(
steps=[
# 1. Initialization step (does not require pre-existing data and is well-suited for
# initial sampling of the search space)
GenerationStep(
model=Models.SOBOL,
num_trials=5, # How many trials should be produced from this generation step
min_trials_observed=3, # How many trials need to be completed to move to next model
max_parallelism=5, # Max parallelism for this step
model_kwargs={
"seed": 999,
"fallback_to_sample_polytope": True,
}, # Any kwargs you want passed into the model
model_gen_kwargs={}, # Any kwargs you want passed to `modelbridge.gen`
),
# 2. Bayesian optimization step (requires data obtained from previous phase and learns
# from all data available at the time of each new candidate generation call)
GenerationStep(
model=Models.GPEI,
num_trials=-1, # No limitation on how many trials should be produced from this step
max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol
# More on parallelism vs. required samples in BayesOpt:
# https://ax.dev/docs/bayesopt.html#tradeoff-between-parallelism-and-total-number-of-trials
),
]
)
Finally, in the case of this issue, and as mentioned:
AxClient(generation_strategy=gs)
Or in the case of the Loop API:
optimize(..., generation_strategy=gs)
Seems to work well for my use-case; thank you! I'll try to update the other relevant issues soon.

Related

IV2SLS output summary much different than previous OLS estimation

I ran a different version of Mincer´s equation to estimate salary. Firstly, I ran an OLS version without considering endogeneity and the results are the following:Output summary
Salario is actually the natural log of it.
After that, I wrote next code to get an estimation with 2SLS method to solve endogeneity in variables No_Feliz and Años_Edu using No_Dep and max_edu_padres as instrumental variables for each one. However, the output is a bit confusing and I don´t know how to deal with it.
from statsmodels.sandbox.regression.gmm import IV2SLS
resultIV = IV2SLS(_dfb['Salario'], _dfb[['No_Feliz','Años_Edu']], _dfb[['No_Dep','Estatura','Años_Expe','Edad','Edad_2','Peso','Peso_2','DI','D_Hombre']]).fit()
resultIV.summary()
Results: Output summary IV2SLS
It´s clear the output has some issues (R2 too high compared to ols result, no result for F-statistic, coef of No_Feliz has a positive sign while it's negative in OLS estimation, the other exogenous variables are not taken into account despite the fact I included them)
I´d appreciate if someone could help me to fix it or at least make things a bit more clear to me. Thank you very much!
Don't use statsmodels's sandbox version which is not well tested. Instead use linearmodels. It is fully tested and correct.
Start by installing using pip install linearmodels
Then you need to be explicit about which variables are endogenous and need instrumenting, which are exogenous so in the model but not instrumented, and which are instruments only.
from linearmodels import IV2SLS
dep = _dfb['Salario']
exog = None
endog = _dfb[['No_Feliz','Años_Edu']]
instr = _dfb[['No_Dep','Estatura','Años_Expe','Edad','Edad_2','Peso','Peso_2','DI','D_Hombre']]
resultIV = IV2SLS(dep, exog, endog, instr).fit()
resultIV.summary # Note: this is a property so no ()
You can find the help which has more details. There are also some examples.

PyGMO Batch fitness evaluation

My goal is to perform a parameter estimation (model calibration) using PyGmo. My model will be an external "black blox" model (c-code) outputting the objective function J to be minimized (J in this case will be the "Normalized Root Mean Square Error" (NRMSE) between model outputs and measured data. To speed up the optimization (calibration) I would like to run my models/simulations on multiple cores/threads in parallel. Therefore I would like to use a batch fitness evaluator (bfe) in PyGMO. I prepared a minimal example using a simple problem class but using pure python (no external model) and the rosenbrock problem:
#!/usr/bin/env python
# coding: utf-8
import numpy as np
from fmpy import read_model_description, extract, simulate_fmu, freeLibrary
from fmpy.fmi2 import FMU2Slave
import pygmo as pg
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import time
#-------------------------------------------------------
def main():
# Optimization
# Define problem
class my_problem:
def __init__(self, dim):
self.dim = dim
def fitness(self, x):
J = np.zeros((1,))
for i in range(len(x) - 1):
J[0] += 100.*(x[i + 1]-x[i]**2)**2+(1.-x[i])**2
return J
def get_bounds(self):
return (np.full((self.dim,),-5.),np.full((self.dim,),10.))
def get_name(self):
return "My implementation of the Rosenbrock problem"
def get_extra_info(self):
return "\nDimensions: " + str(self.dim)
def batch_fitness(self, dvs):
J = [123] * len(dvs)
return J
prob = pg.problem(my_problem(30))
print('\n----------------------------------------------')
print('\nProblem description: \n')
print(prob)
#-------------------------------------------------------
dvs = pg.batch_random_decision_vector(prob, 1)
print('\n----------------------------------------------')
print('\nBarch fitness evaluation:')
print('\ndvs length:' + str(len(dvs)))
print('\ndvs:')
print(dvs)
udbfe = pg.default_bfe()
b = pg.bfe(udbfe=udbfe)
print('\nudbfe:')
print(udbfe)
print('\nbfe:')
print(b)
fvs = b(prob, dvs)
print(fvs)
#-------------------------------------------------------
pop_size = 50
gen_size = 1000
algo = pg.algorithm(pg.sade(gen = gen_size)) # The algorithm (a self-adaptive form of Differential Evolution (sade - jDE variant)
algo.set_verbosity(int(gen_size/10)) # We set the verbosity to 100 (i.e. each 100 gen there will be a log line)
print('\n----------------------------------------------')
print('\nOptimization:')
start = time.time()
pop = pg.population(prob, size = pop_size) # The initial population
pop = algo.evolve(pop) # The actual optimization process
best_fitness = pop.get_f()[pop.best_idx()] # Getting the best individual in the population
print('\n----------------------------------------------')
print('\nResult:')
print('\nBest fitness: ', best_fitness) # Get the best parameter set
best_parameterset = pop.get_x()[pop.best_idx()]
print('\nBest parameter set: ',best_parameterset)
print('\nTime elapsed for optimization: ', time.time() - start, ' seconds\n')
if __name__ == '__main__':
main()
When I try to run this code I get the following error:
Exception has occurred: ValueError
function: bfe_check_output_fvs
where: C:\projects\pagmo2\src\detail\bfe_impl.cpp, 103
what: An invalid result was produced by a batch fitness evaluation: the number of produced fitness vectors, 30, differs from the number of input decision vectors, 1
By deleting or commeting out this two lines:
fvs = b(prob, dvs)
print(fvs)
the script can be run without errors.
My questions:
How to use the batch fitness evaluation? (I know this is a new
capability of PyGMO and they are still working on the
documentation...) Can anybody give a minimal example on how to implement this?
Is this the right way to go to speed up my model calibration problem? Or should I use islands and archipelagos? If I got it right, the islands in an archipelago are not communicating to eachother, right? So if one performs e.g. a Particle Swarm Optimization and wants to evaluate several objective function calls simultaneously (in parallel) then the batch fitness evaluator is the right choice?
Do I need to care about archipelagos and islands in this example? What are they exactly meant for? Is it worth running several optimizations but with different initial x (input to objective function) and then to take the best solution? Is this a common approach in optimization with GA's?
I am very knew to the field of optimization and PyGMO, so thx for helping!
Is this the right way to go to speed up my model calibration problem? Or should I use islands and archipelagos? If I got it right, the islands in an archipelago are not communicating to eachother, right? So if one performs e.g. a Particle Swarm Optimization and wants to evaluate several objective function calls simultaneously (in parallel) then the batch fitness evaluator is the right choice?
There are 2 modes of parallelization in pagmo, the island model (i.e., coarse-grained parallelization) and the BFE machinery (i.e., fine-grained parallelization).
The island model works on any problem/algorithm combination, and it is based on the idea that multiple optimisations are run in parallel while exchanging information to accelerate the global convergence to a solution.
The BFE machinery, instead, parallelizes a single optimisation, and it requires explicit support in the solver to work. Currently in pagmo only a handful of solvers are able to take advantage of the BFE machinery. The BFE machinery can also be used to parallelise the initialisation of a population of individuals, which can be useful is your fitness function is particularly heavyweight.
Which parallelisation method is best for you depends on the properties of your problem. In my experience, users tend to prefer the BFE machinery (fine-grained parallelisation) if the fitness function is very heavy (e.g., it takes minutes or more to compute), because in such a situation fitness evaluations are so costly that in order to take advantage of the island model one would have to wait too long. The BFE is also in some sense easier to understand because you don't have to delve into the details of archipelagos, topologies, etc. On the other hand, the BFE works only with certain solvers (although we are trying to extend BFE support to other solvers as time goes by).
How to use the batch fitness evaluation? (I know this is a new capability of PyGMO and they are still working on the documentation...) Can anybody give a minimal example on how to implement this?
One way of using the BFE is what you did in your example, i.e., via the implementation of a batch_fitness() method in your problem. However, my suggestion would be to comment out the batch_fitness() method and try using one of the general-purpose batch fitness evaluators provided with pagmo. The easiest thing to do is to just default-construct an instance of the bfe class and then pass it to one of the algorithms that can use the BFE machinery. One such algorithm is nspso:
https://esa.github.io/pygmo2/algorithms.html#pygmo.nspso
So, something like this:
b = pg.bfe() # Construct a default BFE
uda = pg.nspso(gen = gen_size) # Construct the algorithm
uda.set_bfe(b) # Tell the UDA to use the BFE machinery
algo = pg.algorithm(uda) # Construct a pg.algorithm from the UDA
new_pop = algo.evolve(pop) # Evolve the population
This should use multiple processes to evaluate your fitness functions in parallel within the loop of the nspso algorithm.
If you need more help, please come over to our public users/devs chat room, where you should get assistance rather quickly (normally):
https://gitter.im/pagmo2/Lobby

How to remove constraint in ORTools

Is there any way to remove defined constraint from solver with out clearing solver and creating constraints from first?
suppose my problem is to maximize sum of 3 variables which two constraints
constraint1: variable 2 should be between 8 - 10
constraint2: variable 3 should be between 5 - 10
from ortools.linear_solver import pywraplp
solver = pywraplp.Solver('SolveIntegerProblem',
pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
objective = solver.Objective()
Variable[0] = solver.IntVar(0, 5, variable 0 )
Variable[1] = solver.IntVar(0, 10, variable 1 )
Variable[2] = solver.IntVar(0, 20, variable 2 )
objective.SetCoefficient(Variable[0], 1)
objective.SetCoefficient(Variable[1], 1)
objective.SetCoefficient(Variable[2], 1)
objective.SetMaximization()
constraints.append(solver.Constraint(8,10))
constraints[0].SetCoefficient(variable[1],1)
constraints.append(solver.Constraint(5,10))
constraints[1].SetCoefficient(variable[2],1)
Now in the second time of running my code I want to remove constraint number 2, but I can not find any operation to do it and the only way is to clear solver and define constraint from first.
In this semi code the number of constraints were limited but actually, in my real code the number of constraint are many and I can not define them from first.
I know this question is quite old but:
As far as I know, or-tools does not provide any interface that removes constraints or variables. From an engineering perspective, messing with the internal logic to remove them 'by hand' is dangerous.
I absolutely needed that feature for my tech stack and tried multiple python linear programming librairies out there (wrappers around clp/cbc really) and I settle on or-tools despite that flaw for 2 main reasons 1) this was the only librairy with the minimal features support I required out of the box and 2) at the time (~4-5 years ago) it was the only librairy using C bindings.
All others used one form of another of interfacing with the cbc command line which is a ... horrible way to interface with python. It is unscalable due to the overhead of writing and reading files on disk. Nasty nasty nasty. So if I remember correctly, only pylp and or-tools had c bindings and again if I remember correctly, pylp was NOT python 3 compatible (and has been in limbo ever since) so I settled on or-tools.
So to answer your question: to 'remove' variables or constraints with or-tools, I had to build my own python wrapper around or-tools. To deactivate a variable or a constraint, I would set coefficients to zero and free bounds (set to +/- infinity) and set costs to zero to effectively deactivate the constraint. In my wrapper, I would keep a list of deactivated constraints/variables and recycle them instead of creating new ones (which was proven to lead to both increased runtimes and memory leaks because C++ + python is a nightmare in those areas). I heavily suspect that I get floating points noise in the recycling but it's stable enough in practice for my needs.
So in your code example, to rerun without creating an new model from scratch you need to do:
(...)
constr1 = solver.Constraint(8,10)
constraints.append(constr1)
constraints[0].SetCoefficient(variable[1],1)
constr2 = solver.Constraint(5,10)
constraints.append(constr2)
constraints[1].SetCoefficient(variable[2],1)
constr2.SetBounds(-solver.infinity(), solver.infinity())
constr2.SetCoefficient(variable[2], 0)
# constr2 is now deactivated. If you wanted to add a new constraints, you can
# change the bounds on constr2 to recycle it and add new variables
# coefficients
That being said, very recently, python-mip was released and it supports both removing variables and constraints and has c-bindings.
Did you try to use the MPConstraint::Clear() method ?
Declaration: https://github.com/google/or-tools/blob/9487eb85f4620f93abfed64899371be88d65c6ec/ortools/linear_solver/linear_solver.h#L865
Definition: https://github.com/google/or-tools/blob/9487eb85f4620f93abfed64899371be88d65c6ec/ortools/linear_solver/linear_solver.cc#L101
Concerning Python swig wrapper MPConstraint is exported as Constraint object.
src: https://github.com/google/or-tools/blob/9487eb85f4620f93abfed64899371be88d65c6ec/ortools/linear_solver/python/linear_solver.i#L180
But the method Constraint::Clear() seems not exposed
https://github.com/google/or-tools/blob/9487eb85f4620f93abfed64899371be88d65c6ec/ortools/linear_solver/python/linear_solver.i#L270
You can try to patch the swig file and recompile make python && make install_python

MCMC convergence in hierarchical model with (large) time^2 term in pymc3

I have a hierarchical logit that has observations over time. Following Carter 2010, I have included a time, time^2, and time^3 term. The model mixes using Metropolis or NUTS before I add the time variables. HamiltonianMC fails. NUTS and Metropolis also work with time. But NUTS and Metropolis fail with time^2 and time^3, but they fail differently and in a puzzling way. However, unlike in other models that fail for more obvious model specification reasons, ADVI still gives an estimate, (and ELBO is not inf).
NUTS either stalls early (last time it made it to 60 iterations), or progresses too quickly and returns an empty traceplot with an error about varnames.
Metropolis errors out immediately with a dimension mismatch error. It looks like the one in this github error, but I'm using a Bernoulli outcome, not a negative binomial. The end of the error looks like: ValueError: Input dimension mis-match. (input[0].shape[0] = 1, input[4].shape[0] = 18)
I get an empty trace when I try HamiltonianMC. It returns the starting values with no mixing
ADVI gives me a mean and a standard deviation.
I increased the ADVI iterations by a lot. It gave pretty close to the same starting points and NUTS still failed.
I double checked that the fix for the github issue is in place in the version of pymc3 I'm running. It is.
My intuition is that this has something to do with how huge the time^2 and time^3 variables get, since I'm looking over a large time-frame. Time^3 starts at 0 and goes to 64,000.
Here is what I've tried for sampling so far. Note that I have small sample sizes while testing, since it takes so long to run (if it finishes) and I'm just trying to get it to sample at all. Once I find one that works, I'll up the number of iterations
with my_model:
mu,sds,elbo = pm.variational.advi(n=500000,learning_rate=1e-1)
print(mu['mu_b'])
step = pm.NUTS(scaling=my_model.dict_to_array(sds)**2,
is_cov=True)
my_trace = pm.sample(500,
step=step,
start=mu,
tune=100)
I've also done the above with tune=1000
I've also tried a Metropolis and Hamiltonian.
with my_model:
my_trace = pm.sample(5000,step=pm.Metropolis())
with my_model:
my_trace = pm.sample(5000,step=pm.HamiltonianMC())
Questions:
Is my intuition about the size and spread of the time variables reasonable?
Are there ways to sample square and cubed terms more effectively? I've searched, but can you perhaps point me to a resource on this so I can learn more about it?
What's up with Metropolis and the dimension mismatch error?
What's up with the empty trace plots for NUTS? Usually when it stalls, the trace up until the stall works.
Are there alternative ways to handle time that might be easier to sample?
I haven't posted a toy model, because it's hard to replicate without the data. I'll add a toy model once I replicate with simulated data. But the actual model is below:
with pm.Model() as my_model:
mu_b = pm.Flat('mu_b')
sig_b = pm.HalfCauchy('sig_b',beta=2.5)
b_raw = pm.Normal('b_raw',mu=0,sd=1,shape=n_groups)
b = pm.Deterministic('b',mu_b + sig_b*b_raw)
t1 = pm.Normal('t1',mu=0,sd=100**2,shape=1)
t2 = pm.Normal('t2',mu=0,sd=100**2,shape=1)
t3 = pm.Normal('t3',mu=0,sd=100**2,shape=1)
est =(b[data.group.values]* data.x.values) +\
(t1*data.t.values)+\
(t2*data.t2.values)+\
(t3*data.t3.values)
y = pm.Bernoulli('y', p=tt.nnet.sigmoid(est), observed = data.y)
BREAKTHROUGH 1: Metropolis error
Weird syntax issue. Theano seemed to be confused about a model with both constant and random effects. I created a constant in data equal to 0, data['c']=0 and used it as an index for the time, time^2 and time^3 effects, as follows:
est =(b[data.group.values]* data.x.values) +\
(t1[data.c.values]*data.t.values)+\
(t2[data.c.values]*data.t2.values)+\
(t3[data.c.values]*data.t3.values)
I don't think this is the whole issue, but it's a step in the right direction. I bet this is why my asymmetric specification didn't work, and if so, suspect it may sample better.
UPDATE: It sampled! Will now try some of the suggestions for making it easier on the sampler, including using the specification suggested here. But at least it's working!
Without having the dataset to play with it is hard to give a definite answer, but here is my best guess:
To me, it is a bit unexpected to hear about the third order polynomial in there. I haven't read the paper, so I can't really comment on it, but I think this might be the reason for your problems. Even very small values for t3 will have a huge influence on the predictor. To keep this reasonable, I'd try to change the parametrization a bit: First make sure that your predictor is centered (something like data['t'] = data['t'] - data['t'].mean() and after that define data.t2 and data.t3). Then try to set a more reasonable prior on t2 and t3. They should be pretty small, so maybe try something like
t1 = pm.Normal('t1',mu=0,sd=1,shape=1)
t2 = pm.Normal('t2',mu=0,sd=1,shape=1)
t2 = t2 / 100
t3 = pm.Normal('t3',mu=0,sd=1,shape=1)
t3 = t3 / 1000
If you want to look at other models, you could try to model your predictor as a GaussianRandomWalk or a Gaussian Process.
Updating pymc3 to the latest release candidate should also help, the sampler was improved quit a bit.
Update I just noticed you don't have an intercept term in your model. Unless there is a good reason for that you probably want to add
intercept = pm.Flat('intercept')
est = (intercept
+ b[..] * data.x
+ ...)

How to set multiple starting values in constrOptim()

I am trying to estimate few parameters using the constained maximum likelihood in R and more specifically the constrOptim() from the stata package in R. I am programming in Python and using R via the RPy2.
In my model, I am assuming that the data follow the Beta-distribution, so I created a simulated dataset by using prespecified values for the parameters and now I am trying to estimate these parameters in order to verify that my estimation program works fine.
What I have observed is that my estimation is quite sensitive to the initial parameters. For example I have 11 parameters to estimate (let's call the parameters as pam1..pam11) and their true value is:
pam1=0.2 pam2=0.3 pam3=0.4 pam4=0.7 pam5=0.55 pam6=0.45 pam7=0.1 pam8=0.01 pam9=0.01 pam10=45 pam11=45
In the constrOptim() I am setting the starting parameters as:
start_param=FloatVector((pam1,pam2,pam3,pam4,pam5,pam6,pam7,pam8,pam9,pam10,pam,11))
where I set the starting values. I have observed that when I am using different sets of starting values the results change. For example when I am using the set
start_param=FloatVector((0.2,0.3,0.4,0.6,0.7,0.8,0.3,0.011,0.011,15,15))
and I obtain the following estimates
$par
[1] 0.20851065 0.30348571 0.43616932 0.73695654 0.58287221
0.45541506
[7] 0.11191879 0.02233908 0.01988878 46.57249043 45.48544918
$value
[1] -215.9711
$convergence
[1] 0
but when I am using another set as for example:
start_param=FloatVector((0.2,0.3,0.4,0.75,0.55,0.45,0.3,0.05,0.05,59,59))
the results change and it seems that I am losing convergence
$par
[1] 0.17218738 0.27165359 0.48458978 0.80295773 0.62618983 0.43254786
[7] 0.12426385 0.02991442 0.01853252 57.78269692 59.35376216
$value
[1] -146.9858
$convergence
[1] 1
My question is the following:
I have seen that in Stata, there is an option that searches for better starting values for the numerical optimization algorithm. I tried to set multiple starting values by setting a matrix but this did not work.
Is there an option in constrOptim that will allow me to do something like this?
Many thanks in advance.
For additional information, the specification I use for the constrOptim() is:
res=statsr.constrOptim(start_param,Rmaxlikelihood,grad='NULL',ui=ui,ci=ci,method="Nelder-Mead",control=list("maxit=3000,trace=F"))
I came across a function in R which does exactly what I was looking for.
The package ‘Rsolnp’ has the function "gosolnp" which is described to perform Random Initialization and Multiple Restarts of the solnp solver.
It is quite efficient and the documentation provides examples on how to use it.
More: http://cran.r-project.org/web/packages/Rsolnp/Rsolnp.pdf

Categories

Resources