running multiple ray Tuning in parallel using a search algorithm

running multiple ray Tuning in parallel using a search algorithm - python

I want to queue 200+ tuning jobs to my ray cluster, they each need to be guided by a search algorithm, as my actual objective function has 40+ parameters.
I can do this for a single job like this:
import ray
from ray import tune
from ray.tune import Tuner, TuneConfig
from ray.tune.search.optuna import OptunaSearch
ray.init()
def objective(config):
ground_truth = [1,2,3,4]
yhat = [i*config['factor'] + config['constant'] for i in range(4)]
abs_err = [abs(gt - yh) for gt, yh in zip(ground_truth, yhat)]
mae = sum(abs_err)/len(abs_err)
tune.report(mean_accuracy = mae)
config = {
'factor': tune.quniform(0,3,1),
'constant': tune.quniform(0,3,1)
}
algo = OptunaSearch()
tuner = tune.Tuner(
objective,
tune_config=TuneConfig(
metric="mean_accuracy",
mode="min",
search_alg=algo,
num_samples=100
),
param_space=config
)
results = tuner.fit()
This works and gives the desired result for 1 of the 200 jobs.
Now I want to queue up to 200 jobs from a single run of a single script:
As far as I understood the documentation this is how that should work:
import ray
from ray import tune
ray.init()
def objective(config):
ground_truth = [1,2,3,4]
yhat = [i*config['factor'] + config['constant'] for i in range(4)]
abs_err = [abs(gt - yh) for gt, yh in zip(ground_truth, yhat)]
mae = sum(abs_err)/len(abs_err)
tune.report(mean_accuracy = mae)
config = {
'factor': tune.quniform(0,3,1),
'constant': tune.quniform(0,3,1)
}
experiments = []
for i in range(3):
experiment_spec = tune.Experiment(
name=f'{i}',
run=objective,
stop={"mean_accuracy": 0},
config=config,
num_samples=10
)
experiments.append(experiment_spec)
out = tune.run_experiments(experiments)
When I run this I get the message: Running with multiple concurrent experiments. All experiments will be using the same SearchAlgorithm..
I need to be able to specify the search algorithm, but I don't understand how. Additionally, these experiments appear to be part of one large optimization out is a list of 30 objective objects. The parameter values chosen are from a uniform distribution, without the q. However all 30 values fall in the specified range.
I must've misunderstood the purpose of run_experiments, please help.

Related

How to use nevergrad to find the minimum of a cost function over the field of integers?

As part of our Poolkeh paper, we thought to use nevergrad. However, sadly it doesn't always return the same result, nor the most optimal one.
We tried DiscreteOnePlusOne as an optimizer, but it didn't find the optimal results. OnePlusOne worked ok, but didn't give the best solution and it needed some hints like this one:
if s1 < s2*(1+r0):
return np.Inf
We explored the case of pooling COVID-19 tests with two steps, here is the complete code:
!pip install nevergrad
import numpy as np
def optimal(r0: float, s1:int, s2:int):
r0 = r0/100
if s1 < s2*(1+r0):
return np.Inf
p1=1-np.power(1-r0,s1)
r1=r0/p1
p2=1-np.power(1-r1,s2)
return 1/s1 + p1/s2 + p1*p2
import nevergrad as ng
def findBestStategy(r0: float):
'''
r0 is in %
'''
parametrization = ng.p.Instrumentation(
r0 = r0,
s1=ng.p.Scalar(lower=1, upper=100).set_integer_casting(),
s2=ng.p.Scalar(lower=1, upper=100).set_integer_casting(),
)
optimizer = ng.optimizers.OnePlusOne(parametrization=parametrization, budget=2000, num_workers=1)
recommendation = optimizer.minimize(optimal)
return recommendation.kwargs
findBestStategy(1)
{'r0': 1, 's1': 23, 's2': 5}
This is not the optimal, but really it's close :
optimal(1, 23,5)
0.13013924406458133
optimal(1, 24,5)
0.13007783167425113
How can we make nevergrad more robust?
Which optimizer should we use?
Is there a way to run nevergrad multiple times with different "initial conditions" and take the optimal results among all multiple tries?

Tensorflow Probability Error: OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed

I am trying to estimate a model in tensorflow using NUTS by providing it a likelihood function. I have checked the likelihood function is returning reasonable values. I am following the setup here for setting up NUTS:
https://rlhick.people.wm.edu/posts/custom-likes-tensorflow.html
and some of the examples here for setting up priors, etc.:
https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Multilevel_Modeling_Primer.ipynb
My code is in a colab notebook here:
https://drive.google.com/file/d/1L9JQPLO57g3OhxaRCB29do2m808ZUeex/view?usp=sharing
I get the error: OperatorNotAllowedInGraphError: iterating overtf.Tensoris not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function. This is my first time using tensorflow and I am quite lost interpreting this error. It would also be ideal if I could pass the starting parameter values as a single input (example I am working off doesn't do it, but I assume it is possible).
Update
It looks like I had to change the position of the #tf.function decorator. The sampler now runs, but it gives me the same value for all samples for each of the parameters. Is it a requirement that I pass a joint distribution through the log_prob() function? I am clearly missing something. I can run the likelihood through bfgs optimization and get reasonable results (I've estimated the model via maximum likelihood with fixed parameters in other software). It looks like I need to define the function to return a joint distribution and call log_prob(). I can do this if I set it up as a logistic regression (logit choice model is logistically distributed in differences). However, I lose the standard closed form.
My function is as follows:
#tf.function
def mmnl_log_prob(init_mu_b_time,init_sigma_b_time,init_a_car,init_a_train,init_b_cost,init_scale):
# Create priors for hyperparameters
mu_b_time = tfd.Sample(tfd.Normal(loc=init_mu_b_time, scale=init_scale),sample_shape=1).sample()
# HalfCauchy distributions are too wide for logit discrete choice
sigma_b_time = tfd.Sample(tfd.Normal(loc=init_sigma_b_time, scale=init_scale),sample_shape=1).sample()
# Create priors for parameters
a_car = tfd.Sample(tfd.Normal(loc=init_a_car, scale=init_scale),sample_shape=1).sample()
a_train = tfd.Sample(tfd.Normal(loc=init_a_train, scale=init_scale),sample_shape=1).sample()
# a_sm = tfd.Sample(tfd.Normal(loc=init_a_sm, scale=init_scale),sample_shape=1).sample()
b_cost = tfd.Sample(tfd.Normal(loc=init_b_cost, scale=init_scale),sample_shape=1).sample()
# Define a heterogeneous random parameter model with MultivariateNormalDiag()
# Use MultivariateNormalDiagPlusLowRank() to define nests, etc.
b_time = tfd.Sample(tfd.MultivariateNormalDiag( # b_time
loc=mu_b_time,
scale_diag=sigma_b_time),sample_shape=num_idx).sample()
# Definition of the utility functions
V1 = a_train + tfm.multiply(b_time,TRAIN_TT_SCALED) + b_cost * TRAIN_COST_SCALED
V2 = tfm.multiply(b_time,SM_TT_SCALED) + b_cost * SM_COST_SCALED
V3 = a_car + tfm.multiply(b_time,CAR_TT_SCALED) + b_cost * CAR_CO_SCALED
print("Vs",V1,V2,V3)
# Definition of loglikelihood
eV1 = tfm.multiply(tfm.exp(V1),TRAIN_AV_SP)
eV2 = tfm.multiply(tfm.exp(V2),SM_AV_SP)
eV3 = tfm.multiply(tfm.exp(V3),CAR_AV_SP)
eVD = eV1 + eV2 +
eV3
print("eVs",eV1,eV2,eV3,eVD)
l1 = tfm.multiply(tfm.truediv(eV1,eVD),tf.cast(tfm.equal(CHOICE,1),tf.float32))
l2 = tfm.multiply(tfm.truediv(eV2,eVD),tf.cast(tfm.equal(CHOICE,2),tf.float32))
l3 = tfm.multiply(tfm.truediv(eV3,eVD),tf.cast(tfm.equal(CHOICE,3),tf.float32))
ll = tfm.reduce_sum(tfm.log(l1+l2+l3))
print("ll",ll)
return ll
The function is called as follows:
nuts_samples = 1000
nuts_burnin = 500
chains = 4
## Initial step size
init_step_size=.3
init = [0.,0.,0.,0.,0.,.5]
##
## NUTS (using inner step size averaging step)
##
#tf.function
def nuts_sampler(init):
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
num_results=nuts_samples,
current_state=init,
kernel=adapt_nuts_kernel,
num_burnin_steps=100,
parallel_iterations=5)
return samples_nuts_, stats_nuts_
samples_nuts, stats_nuts = nuts_sampler(init)

I have an answer to my question! It is simply a matter of different nomenclature. I need to define my model as a softmax function, which I knew was what I would call a "logit model", but it just wasn't clicking for me. The following blog post gave me the epiphany:
http://khakieconomics.github.io/2019/03/17/Putting-it-all-together.html

Multivariate linear mixed effects model in Python

I am playing around with this code which is for Univariate linear mixed effects modelling. The data set denotes:
students as s
instructors as d
departments as dept
service as service
In the syntax of R's lme4 package (Bates et al., 2015), the model implemented can be summarized as:
y ~ 1 + (1|students) + (1|instructor) + (1|dept) + service
where 1 denotes an intercept term,(1|x) denotes a random effect for x, and x denotes a fixed effect.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import edward as ed
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from edward.models import Normal
from observations import insteval
data = pd.DataFrame(data, columns=metadata['columns'])
train = data.sample(frac=0.8)
test = data.drop(train.index)
train.head()
s_train = train['s'].values
d_train = train['dcodes'].values
dept_train = train['deptcodes'].values
y_train = train['y'].values
service_train = train['service'].values
n_obs_train = train.shape[0]
s_test = test['s'].values
d_test = test['dcodes'].values
dept_test = test['deptcodes'].values
y_test = test['y'].values
service_test = test['service'].values
n_obs_test = test.shape[0]
n_s = max(s_train) + 1 # number of students
n_d = max(d_train) + 1 # number of instructors
n_dept = max(dept_train) + 1 # number of departments
n_obs = train.shape[0] # number of observations
# Set up placeholders for the data inputs.
s_ph = tf.placeholder(tf.int32, [None])
d_ph = tf.placeholder(tf.int32, [None])
dept_ph = tf.placeholder(tf.int32, [None])
service_ph = tf.placeholder(tf.float32, [None])
# Set up fixed effects.
mu = tf.get_variable("mu", [])
service = tf.get_variable("service", [])
sigma_s = tf.sqrt(tf.exp(tf.get_variable("sigma_s", [])))
sigma_d = tf.sqrt(tf.exp(tf.get_variable("sigma_d", [])))
sigma_dept = tf.sqrt(tf.exp(tf.get_variable("sigma_dept", [])))
# Set up random effects.
eta_s = Normal(loc=tf.zeros(n_s), scale=sigma_s * tf.ones(n_s))
eta_d = Normal(loc=tf.zeros(n_d), scale=sigma_d * tf.ones(n_d))
eta_dept = Normal(loc=tf.zeros(n_dept), scale=sigma_dept * tf.ones(n_dept))
yhat = (tf.gather(eta_s, s_ph) +
tf.gather(eta_d, d_ph) +
tf.gather(eta_dept, dept_ph) +
mu + service * service_ph)
y = Normal(loc=yhat, scale=tf.ones(n_obs))
#Inference
q_eta_s = Normal(
loc=tf.get_variable("q_eta_s/loc", [n_s]),
scale=tf.nn.softplus(tf.get_variable("q_eta_s/scale", [n_s])))
q_eta_d = Normal(
loc=tf.get_variable("q_eta_d/loc", [n_d]),
scale=tf.nn.softplus(tf.get_variable("q_eta_d/scale", [n_d])))
q_eta_dept = Normal(
loc=tf.get_variable("q_eta_dept/loc", [n_dept]),
scale=tf.nn.softplus(tf.get_variable("q_eta_dept/scale", [n_dept])))
latent_vars = {
eta_s: q_eta_s,
eta_d: q_eta_d,
eta_dept: q_eta_dept}
data = {
y: y_train,
s_ph: s_train,
d_ph: d_train,
dept_ph: dept_train,
service_ph: service_train}
inference = ed.KLqp(latent_vars, data)
This works fine in the univariate case for Linear mixed effects modelling. I am trying to extend this approach to the multivariate case. Any ideas are more than welcome.

There are a number of ways to conduct linear mixed effects models in Python. It looks like you've adapted the Tensorflow approach but if that is not a hard requirement then there are several other potentially more convenient options.
You can use the Statsmodels implementation of LMER which is conveniently contained in Python but the syntax is a bit different from traditional formulaic expressions from R's LMER. It looks like you are using python to split your data to training and test sets so you can also write a loop to call the
You can also install R and rpy2 on your local machine and call the LMER packages from your Python environment. This allows you to keep your familiarity with working in R but allows you to do everything else in Python. All you have to do is use the rmagic %%R or (%R for inline) in your cell block in Jupyter Notebooks to pass variables and models between Python and R. The latter would be useful if you are passing the train/test data you split in Python to R to run lmer and retrieve the parameters back in a loop.
Lastly, another option is to use Pymer4 which is a wrapper for rpy2 allowing you to directly call LMER in R but without having to deal with rmagic.
I wrote a tutorial on how to use LMER with each of these methods which also works on Cloud setups like Google Colab. These methods will all allow you to run the multivariate approach like you asked for using the LMER in R but from a Python environment.

Error in Threading SARIMAX model

I am using threading library for the first time inorder to speed up the training time of my SARIMAX model. But the code keeps failing with the following error
Bad direction in the line search; refresh the lbfgs memory and restart the iteration.
This problem is unconstrained.
This problem is unconstrained.
This problem is unconstrained.
Following is my code:
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.tsa.api as smt
from threading import Thread
def process_id(ndata):
train = ndata[0:-7]
test = ndata[len(train):]
try:
model = smt.SARIMAX(train.asfreq(freq='1d'), exog=None, order=(0, 1, 1), seasonal_order=(0, 1, 1, 7)).fit()
pred = model.get_forecast(len(test))
fcst = pred.predicted_mean
fcst.index = test.index
mapelist = []
for i in range(len(fcst)):
mapelist.insert(i, (np.absolute(test[i] - fcst[i])) / test[i])
mape = np.mean(mapelist) * 100
print(mape)
except:
mape = 0
pass
return mape
def process_range(ndata, store=None):
if store is None:
store = {}
for id in ndata:
store[id] = process_id(ndata[id])
return store
def threaded_process_range(nthreads,ndata):
store = {}
threads = []
# create the threads
k = 0
tk = ndata.columns
for i in range(nthreads):
dk = tk[k:len(tk)/nthreads+k]
k = k+len(tk)/nthreads
t = Thread(target=process_range, args=(ndata[dk],store))
threads.append(t)
[ t.start() for t in threads ]
[ t.join() for t in threads ]
return store
outdata = threaded_process_range(4,ndata)
Few things I would like to mention:
Data is daily stock time series in a dataframe
Threading works for ARIMA model
SARIMAX model works when done in a for loop
Any insights would be highly appreciated thanks!

I got the same error with lbfgs, I'm not sure why lbfgs fails to do gradient evaluations, but I tried changing the optimizer. You can try this too, choose among any of these optimizers
’newton’ for Newton-Raphson, ‘nm’ for Nelder-Mead
’bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
’lbfgs’ for limited-memory BFGS with optional box constraints
’powell’ for modified Powell’s method
’cg’ for conjugate gradient
’ncg’ for Newton-conjugate gradient
’basinhopping’ for global basin-hopping solver
change this in your code
model = smt.SARIMAX(train.asfreq(freq='1d'), exog=None, order=(0, 1, 1), seasonal_order=(0, 1, 1, 7)).fit(method='cg')
It's an old question but still I'm answering it in case someone in future faces the same problem.

How to pass objects into function which is optimized with hyperopt?

I'm new to hyperopt package.
Now, I wanna optimize my LDA model which is implemented in gensim. The LDA model is optimized to maximize silhouette score over training data.
Now, my question is "How do I pass training-data(numpy.ndarray) to objective-function which is called from hyperopt?"
I looked tutorials and some example codes. They set training-data as global variable. But in my situation, it's difficult to set training-data as global variable as they do.
I wrote following code to optimize LDA with hyoeropt. I'm stacked with the way to pass training-data to gensim_objective_function function because I'm gonna put gensim_lda_optimaze in system which calls gensim_lda_optimaze function.
How to realize that?
# I want to pass training data to this function!
# gensim_lda_tuning_training_corpus, gensim_lda_tuning_num_topic, gensim_lda_tuning_word2id is what I wanna pass
def gensim_objective_function(arg_dict):
from .gensim_lda import evaluate_clustering
from .gensim_lda import call_lda_single
from .gensim_lda import get_topics_ids
alpha = arg_dict['alpha']
eta = arg_dict['eta']
iteration= arg_dict['iteration']
gamma_threshold= arg_dict['gamma_threshold']
minimum_probability= arg_dict['minimum_probability']
passes= arg_dict['passes']
# train LDA model
lda_model, gensim_corpus = call_lda_single(matrix=gensim_lda_tuning_training_corpus,
num_topics=gensim_lda_tuning_num_topic,
word2id_dict=gensim_lda_tuning_word2id,
alpha=alpha, eta=eta,
iteration=iteration,
gamma_threshold=gamma_threshold,
minimum_probability=minimum_probability,
passes=passes)
topic_ids = get_topics_ids(trained_lda_model=lda_model, gensim_corpus=gensim_corpus)
labels = [t[0] for t in topic_ids]
# get silhouette score with extracted label
evaluation_score = evaluate_clustering(feature_matrix=gensim_lda_tuning_training_corpus, labels=numpy.array(labels))
return -1 * evaluation_score
def gensim_lda_optimaze(feature_matrix, num_topics, word2id_dict):
assert isinstance(feature_matrix, (ndarray, csr_matrix))
assert isinstance(num_topics, int)
assert isinstance(word2id_dict, dict)
parameter_space = {
'alpha': hp.loguniform("alpha", numpy.log(0.1), numpy.log(1)),
'eta': hp.loguniform("eta", numpy.log(0.1), numpy.log(1)),
'iteration': 100,
'gamma_threshold': 0.001,
'minimum_probability': 0.01,
'passes': 10
}
trials = Trials()
best = fmin(
gensim_objective_function,
parameter_space,
algo=tpe.suggest,
max_evals=100,
trials=trials
)
return best

You can always use partial in python.
from functools import partial
def foo(params, data):
return params, data
goo = partial(foo, data=[1,2,3])
print goo('ala')
gives
ala [1, 2, 3]
In other words, you make a proxy function, which has data loaded as a given parameter and you ask hyperopt to optimize this new function, with data already set.
thus in your case you change gensim_objective_function to be something accepting all your params:
def RAW_gensim_objective_function(arg_dict, gensim_lda_tuning_training_corpus,
gensim_lda_tuning_num_topic,
gensim_lda_tuning_word2id):
and create actual function to optimize by passing your data in different part of code
gensim_objective_function = partial(RAW_gensim_objective_function,
gensim_lda_tuning_training_corpus = YOUR_CORPUS,
gensim_lda_tuning_num_topic = YOUR_NUM_TOPICS,
gensim_lda_tuning_word2id = YOUR_IDs)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

running multiple ray Tuning in parallel using a search algorithm - python

Related

How to use nevergrad to find the minimum of a cost function over the field of integers?

Tensorflow Probability Error: OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed

Multivariate linear mixed effects model in Python

Error in Threading SARIMAX model

How to pass objects into function which is optimized with hyperopt?

Categories

Resources