Optuna's FAQ has a clear answer when it comes to dynamically adjusting the range of parameter during a study: it poses no problem since each sampler is defined individually.
But what about adding and/or removing parameters? Is Optuna able to handle such adjustments?
One thing I noticed when doing this is that in the results dataframe these parameters get nan entries for other trials. Would there be any benefit to being able to set these nans to their (default) value that they had when not being sampled? Is the study still sound with all these unknown values?
Question was answered here:
Thanks for the question. Optuna internally supports two types of sampling: optuna.samplers.BaseSampler.sample_independent and optuna.samplers.BaseSampler.sample_relative.
The former optuna.samplers.BaseSampler.sample_independent is a method that samples independently on each parameter, and is not affected by the addition or removal of parameters. The added parameters are taken into account from the timing when they are added.
The latter optuna.samplers.BaseSampler.sample_relative is a method that samples by considering the correlation of parameters and is affected by the addition or removal of parameters. Optuna's default search space for correlation is the product set of the domains of the parameters that exist from the beginning of the hyperparameter tuning to the present. Developers who implement samplers can implement their own search space calculation method optuna.samplers.BaseSampler.infer_relative_search_space. This may allow correlations to be considered for hyperparameters that have been added or removed, but this depends on the sampling algorithm, so there is no API for normal users to modify.
Related
I am currently simulating a structural optimization problem in which the gradients of responses are extracted from Nastran and provided to SLSQP optimizer in OpenMDAO. The number of constraints changes in subsequent iterations, because the design variables included both the shape and sizing variables, therefore a new mesh is generated every time. A constraint component is defined in OpenMDAO, and it reads the response data exported from Nastran. Now, the issue here is in defining the shape of its output variable "f_const". The shape of this output variable is required to adjust according to the shape of the available response array, since outputs['f_const'] = np.loadtxt("nsatran_const.dat"). Here, nastran_const.dat is the file containing response data extracted from Nastran. The shape of this data is not known at the beginning of design iteration and keep on changing during the subsequent iterations. So, if some shape of f_const is defined at the start, then it does not change later and gives error because of the mismatch in the shapes.
In the doc of openmdao, I found https://openmdao.org/newdocs/versions/latest/features/experimental/dyn_shapes.html?highlight=varying%20shape
It explains, that the shape of input/out variable can be set dynamic by linking it to any connecting or local variables whose shapes are already known. This is different from my case because, the shape of stress array is not known before the start of computation. The shape of f_const is to be defined in setup, and I cannot figure out how to change it later. Please guide me in this regard.
You can't have arrays that change shape like that. The "dynamic" shape you found in the docs refers to setup-time variation. Once setup has been finished though, sizes are fixed. So we need a way for you arrays to be of a fixed size.
If you really must re-mesh every time (which I don't recommend) then there are two possible solutions I can think of:
Over-allocation
Constraint Aggregation
Option 1 -- Over Allocation
This topic is covered in detail in this related question, but briefly what you could do is allocate an array big enough that you always have enough space. Then you can use one entry of the array to record how many active entries are in it. Any non-active entries would be set to a default value that won't violate your constraints.
You'll have to be very careful with the way you define the derivatives. For active array entries, the derivatives come from NASTRAN. For inactive ones, you could set them to 0 but note that you are creating a discrete discontinuity when an entry switches to active. This may very well give the optimizer fits when its trying to converge and derivatives of active constraints keep flipping between 0 and nonzero values.
I really don't recommend this approach, but if you absolutely must have "variable size" arrays then over-allocation is your best best.
Option 2 -- Constraint Aggregation
They key idea here is to use an aggregation function to collapse all the stress constraints into a single value. For structural problems this is most often done with a KS function. OpenMDAO has a KScomponent in its standard library that you can use.
The key is that this component requires a constant sized input. So again, over-allocation would be used here. In this case, you shouldn
't track the number of active values in the array, because you are passing that to the aggregation function. KS functions are like smooth max functions, so if you have a bunch of 0's then it shouldn't affect it.
Your problem still has a discontinuous operation going on with the re-meshing and the noisy constraint array. The KS function should smooth some of that, but not all of it. I still think you'll have trouble converging, but it should work better than raw over-allocation.
Option 3 --- The "right" answer
Find a way to fix your grid, so it never changes. I know this is hard if you're using VSP to generate your discritizations, and letting NASTRAN re-grid things from there ... but its not impossible at all.
OpenVSP has a set of geometry-query functions that can be used to back-fit fixed meshes into the parametric space of the geometry. If you do that, then you can regenerate the geometry in VSP and use the parametric space to move your fixed grids with it. This is how the pyGeo tool that the University of Michigan MDO Lab does it, and it works very well.
Its a modest amount of work (though a lot less if you use pyGeo directly), but I think its well worth it. You'll get faster components and a much more stable optimization.
Scipy's differential evolution implementation (https://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.optimize.differential_evolution.html) uses either a Latin hypercube or a random method for population initialization. Latin hypercube sampling tries to maximize coverage of the available parameter space. ‘random’ initializes the population randomly. I am wondering if it would be possible to specify starting values for each parameter, instead of relying on these default algorithms.
For complex models (particularly those that are mathematically intractable and thus need to be simulated), I have observed that 2 independent runs of scipy's differential evolution likely give different results after X iterations of the algorithm (I usually set X = 100 to avoid running the agorithm during several days). I think it is because (1) population initialization is not identical between 2 independent runs (because of the stochastic nature of the population initialization methods 'random' and 'hypercube') and (2) there's noise in model prediction. I am thus thinking of running ~10 independent runs of DE with 100 iterations, pick-up the best-fitting parameter set across the 10 runs and use this set as the starting values for a final run with more iterations (say 200). The problem is that I see no way to manually enter these starting values within scipy's DE implementation. I would be very grateful if somebody in the community could help me.
This has indeed been possible since version 1.1 of SciPy (note that you're referring to the dated 0.17.0 documentation). In particular, the recent versions lets you specify any array, instead of just 'hypercube' or 'random'. From the documentation, a possible value of init is:
array specifying the initial population. The array should have shape (M, len(x)), where len(x) is the number of parameters. init is clipped to bounds before use.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html
If, for some reason, you're forced to use the old version, it's still possible to obtain what you want by just using the underlying DifferentialEvolutionSolver instead. There, you can either monkey patch one of the initializing functions, or just run them and manually override the population attribute post-initialization.
The documentation of minimize_blockmodel_dl says
See peixoto-hierarchical-2014 for details on the algorithm.
However, the paper explicitely states
However, in order to perform model selection, one first needs to find
optimal partitions of the network for given values of B, which is the
subproblem which we consider in detail in this work. Therefore, in the
remainder of this paper we will assume that the value of B is a fixed
parameter, unless otherwise stated, but the reader should be aware
that this value itself can be determined at a later step via model
selection, as described, e.g., in Refs. [19,26].
Hence, how exactly do minimize_blockmodel_dl and variants decide B? Ultimatively, I'd be interested in plotting the implied likelihoods for different values of B, but would first see what the algorythm has built-in by default - Bayesian model selection?
You are confusing two different papers. The quote you show does not come from the paper you mention. The cited paper:
https://journals.aps.org/prx/abstract/10.1103/PhysRevX.4.011047
explains exactly your question, i.e. how the most appropriate number of groups is determined, using minimum description length. You can also read a more recent introduction to Bayesian inference of the stochastic block model, which deals with this issue at length:
https://arxiv.org/abs/1705.10225
I tried to use the option of putting monotonic constraints in XGBboost (without using Scikit-Learn wrapper : see my previous post here)
but I now would like to check that this was correcly applied.
As this type of model is a kind of black box, I usually try to look at some KPI related with the overall accuracy of the model (logloss, RMSE, etc.) but not directly to the effect of each feature.
Is there an easy way (or alternatively complex one) to do so and then check that monotonicity was effectively applied ?
At this stage, what comes to my mind is 1/ to take one observation of the test set, 2/ to duplicate it - let's say 10 or 100 times - 3/ then to manually make vary one of the features for which monotonicity should apply 4/ to make of graph about the predicted values. That is not stragith forward (especially considering that I want to check it on ~25 features...) nor very robust (I do not change the values of the other features that are correlated to the one I am looking at).
Any suggestion is welcome !
During model selection, sometimes the likelihood-ratio test, or analysis using BIC (Bayesian Information Criterion) are often necessary. While I could definitely do it by hand, I was wondering, is there any scipy functions that are designed to do this?
I am asking this question because I think there should be a way to do this type of analysis, or, at least a function to get the likelihood value.
PS: I am not thinking about fitting a single distribution, instead, I am thinking about looking at some 1D data that changes with time (i.e. the model prediction changes with time).
Any help would be appreciated!
Example for this question:
I have some data that looks like this.
And now, I have two models - one with four parameters, another model nested in it with two parameters (fixing the other two).
I want to perform BIC / likelihood-ratio test to see, whether the two free parameters will make a significant difference.
In statsmodels you can perform likelihood ratio and Wald tests. Different information criteria are also available for all of the models. There are a few other model selection techniques, but I'm going to need to know a little bit more about what you're doing to give specific answers. Meanwhile, our documentation should help http://statsmodels.sourceforge.net/devel/