According to keras.io:
Once the model is created, you can config the model with losses and
metrics with model.compile().
But this explanation does not provide enough information about what exactly compiling model does.
Configures the model for training. documentation
Personally, I wouldn't call it compile, because what it does has got nothing to do with compilation, in computer science terms, and this is very confusing/ overwhelming to think about machine learning and compilation at the same time.
Its just a method which does configuration:
It just sets the arguments you pass it: optimizer, loss function, metrics, eager execution. You can run it multiple times, it will just overwrite the settings you set previously.
My suggestion to developers of TensorFlow would be to rename it to configure in the short term, and perhaps in the future (not that important), move to having 1 setter (or use the factory/ builder pattern) for each configuration argument.
Heres the code for it:
base_layer.keras_api_gauge.get_cell('compile').set(True)
with self.distribute_strategy.scope():
if 'experimental_steps_per_execution' in kwargs:
logging.warn('The argument `steps_per_execution` is no longer '
'experimental. Pass `steps_per_execution` instead of '
'`experimental_steps_per_execution`.')
if not steps_per_execution:
steps_per_execution = kwargs.pop('experimental_steps_per_execution')
self._validate_compile(optimizer, metrics, **kwargs)
self._run_eagerly = run_eagerly
self.optimizer = self._get_optimizer(optimizer)
self.compiled_loss = compile_utils.LossesContainer(
loss, loss_weights, output_names=self.output_names)
self.compiled_metrics = compile_utils.MetricsContainer(
metrics, weighted_metrics, output_names=self.output_names)
self._configure_steps_per_execution(steps_per_execution or 1)
# Initializes attrs that are reset each time `compile` is called.
self._reset_compile_cache()
self._is_compiled = True
self.loss = loss or {} # Backwards compat.
model.compile is related to training your model. Actually, your weights need to optimize and this function can optimize them. In a way that your accuracy make increases. This was just one of the input parameters called 'optimizer'.
model.compile(
optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics='acc'
)
These are the main inputs. Also you can find more details in TensorFlow documentation in link below:
https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile
Related
I'm currently working on a solution via PyTorch. I'm not going to share the exact solution but I will provide code that reproduces the issue I'm having.
I have a model defined as follows:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(10,4)
def foward(self,x):
return nn.functional.relu(self.fc1(x))
Then I create a instance: my_model = Net(). Next I create an Adam optimizer as such:
optim = Adam(my_model.parameters())
# create a random input
inputs = torch.tensor(np.array([1,1,1,1,1,2,2,2,2,2]),dtype=torch.float32,requires_grad=True)
# get the outputs
outputs = my_model(inputs)
# compute gradients / backprop via
outputs.backward(gradient=torch.tensor([1.,1.,1.,5.]))
# store parameters before optimizer step
before_step = list(my_model.parameters())[0].detach().numpy()
# update parameters via
optim.step()
# collect parameters again
after_step = list(my_model.parameters())[0].detach().numpy()
# Print if parameters are the same or not
print(np.array_equal(before_step,after_step)) # Prints True
I provided my models parameters to the Adam optimizer, so I'm not exactly sure why the parameters aren't updating. I know in most cases one uses a loss function, however I cannot do that in my case but I assumed if I specified model paramters to the optimizers, it would know to connect the two.
Anyone know why the parameters aren't getting updated?
The problem is with detach (docs).
As noted at the bottom:
Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks
So that is exactly what's happening here. To correctly compare the parameters, you need to clone (docs) them to get a real copy.
list(my_model.parameters())[0].clone().detach().numpy()
On a side note, it can be helpful if you check the gradients after optim.step() with print(list(my_model.parameters())[0].grad) to check if the graph is intact. Also, don't forget to call optim.zero_grad().
As the title states I'm wondering how I could access the privileged 'training' argument when I'm using the functional API.
So if I use subclassing, I can write something like:
class MyLayer(tf.keras.layers.Layer):
def __init__(self):
...
self.BN = tf.keras.Layers.BatchNormalization()
def call(self,inputs, training=None):
self.BN(inputs, training=training)
So I can control how my batchnorm behaves during training and prediction. But If I want to use the functional API:
input = tf.Input(someshape)
normalized = tf.keras.layers.BatchNormalization()(input)
tf.keras.Model(inputs=input, outputs=normalized)
Now I can't really set the priviledged 'training' argument for my batch_norm anymore. I love the functional API, its just really so much fun to use, but having to build around this kind of is a dealbreaker quite often. I feel like I must miss some important idea on how one would solve this here.
I'm aware that I could create a tf.Input, which could hold the 'training' argument. But this would change it from a keyord arg to some element of a list, which creates very very inconsistent code. Any smarter solution to this?
Edit: Should make it clear that I'm looking for a general idea that can be used for the 'training' arg, not just tackling the BatchNormalization in particular.
When you instantiate the model model = tf.keras.Model(inputs=input, outputs=normalized), the model has not yet been built. You will need to call the build method, usually when you do everything by hand using the gradient tape, or when you first call the fit method. At that point, the weights will be initialized. Now, if you use the fit method or call your model output_tensors = mymodel(input_tensors, training=True), or conversely if you use the predict method or use output_tensors = mymodel(input_tensors, training=False), the training flag will be set to True or False, (which is obvious if you call the model directly).
I have a trained neural network model developed using the Keras framework in a Jupyter notebook. It is a regression problem, where I am trying to predict an output variable using some 14 input variables or features.
As a next step, I would like to minimize my output and want to determine what configuration/values these 14 inputs would take to get to the minimal value of the output.
So, essentially, I would like to pass the trained model object as my objective function in a solver, and also a bunch of constraints on the input variables to optimize/minimize the objective.
What is the best Python solver that can help me get there?
Thanks in advance!
So you already have your trained model, which we can think of as f(x) = y.
The standard SciPy method to minimize this is appropriately named scipy.optimize.minimize.
To use it, you just need to adapt your f(x) = y function to fit the API that SciPy uses. That is, the first function argument is the list of params to optimize over. The second argument is optional, and can contain any args that are fixed for the entire optimization (i.e. your trained model).
def score_trained_model(params, args):
# Get the model from the fixed args.
model = args[0]
# Run the model on the params, return the output.
return model_predict(model, params)
With this, plus an initial guess, you can use the minimize function now:
# Nelder-Mead is my go-to to start with.
# But it doesn't take advantage of the gradient.
# Something that does, e.g. BGFS, may perform better for your case.
method = 'Nelder-Mead'
# All zeros is fine, but improving this initial guess can help.
guess_params = [0]*14
# Given a trained model, optimize the inputs to minimize the output.
optim_params = scipy.optimize.minimize(
score_trained_model,
guess_params,
args=(trained_model,),
method=method,
)
It is possible to supply constraints and bounds to some of the optimization methods. For Nelder-Mead that is not supported, but you can just return a very large error when constraints are violated.
Older answer.
OP wants to optimize the inputs, x, not the hyperparameters.
It sounds like you want to do hyperparameter optimization. My Python library of choice is hyperopt: https://github.com/hyperopt/hyperopt
Given that you already have some training and scoring code, for example:
def train_and_score(args):
# Unpack args and train your model.
model = make_model(**args)
trained = train_model(model, **args)
# Return the output you want to minimize.
return score_model(trained)
You can easily use hyperopt to tune parameters like the learning rate, dropout, or choice of activations:
from hyperopt import fmin, hp, tpe, space_eval
space = {
'lr': hp.loguniform('lr', np.log(0.01), np.log(0.5)),
'dropout': hp.uniform('dropout', 0, 1),
'activation': hp.choice('activation', ['relu', 'sigmoid']),
}
# Minimize the training score over the space.
trials = Trials()
best = fmin(train_and_score, space, trials=trials, algo=tpe.suggest, max_evals=100)
# Print details about the best results and hyperparameters.
print(best)
print(space_eval(space, best))
There are also libraries that will help you directly integrate this with Keras. A popular choice is hyperas: https://github.com/maxpumperla/hyperas
So I have been working with a sequential Keras model in Tensor-Flow, and have come across an odd behavior where if I compile a sequential model with the loss as a function, the result is different than if it were a string.
The network (convolutional in nature) is defined as such:
model = tf.keras.Sequential()
add_model_layers(model) # Can provide if needed
adam_opt = tf.train.AdamOptimizer(learning_rate=0.001,
beta1=0.9,
beta2=0.999)
The network is then compiled:
loss_param1 = tf.keras.losses.categorical_crossentropy
loss_param2 = "categorical_crossentropy"
model.compile(optimizer=adam_opt,
loss=loss_param # ADD NUMBER TO END
metrics=[tf.keras.metrics.categorical_accuracy])
After which it is trained:
# Records are tf.data.Dataset based on TFRecord files
model.fit(train_records,
epochs=400,
use_multiprocessing=False,
validation_data=validation_records)
# And for completeness, tested
model.evaluate(test_records)
If the loss parameter of model.compile is loss_param1, it begins training with a high loss value (in my case, after an epoch or 5 around 192). On the other hand, if loss_param2 is used, training begins at a much lower loss (around 41).
Does anyone know why this would be occurring?
(As an additional note, I am also running into a similar issue where if the metric is as a string I get a different result. However in model.fit, if use_multiprocessing is True the effect is negated (this also applies the other way around).)
Here is an example in Pytorch:
optimizer = optim.Adam([modifier_var], lr=0.0005)
And here in Tensorflow:
self.train = self.optimizer.minimize(self.loss, var_list=[self.modifier])
But Chainer's optimizers only can use on 'Link', how can I apply Optimizer on Variable in Chainer?
In short, there is no way to directly assign chainer.Variable (even nor chainer.Parameter) to chainer.Optimizer.
The following is some redundant explanation.
First, I re-define Variable and Parameter to avoid confusion.
Variable is (1) torch.Tensor in PyTorch v4, (2) torch.autograd.Variable in PyTorch v3, and (3) chainer.Variable in Chainer v4.
Variable is an object who holds two tensors; .data and .grad. It is the necessary and sufficient condition, so Variable is not necessarily a learnable parameter, which is a target of the optimizer.
In both libraries, there is another class Parameter, which is similar but not the same with Variable. Parameter is torch.autograd.Parameter in Pytorch and chainer.Parameter in Chainer.
Parameter must be a learnable parameter and should be optimized.
Therefore, there should be no case to register Variable (not Parameter) to Optimizer (although PyTorch allows to register Variable to Optimizer: this is just for backward compatibility).
Second, in PyTorch torch.nn.Optimizer directly optimizes Parameter, but in Chainer chainer.Optimizer DOES NOT optimize Parameter: instead, chainer.UpdateRule does. The Optimizer just registers UpdateRules to Parameters in a Link.
Therefore, it is only natural that chainer.Optimizer does not receive Parameter as its arguments, because it is just a "delivery-man" of UpdateRule.
If you want to attach different UpdateRule for each Parameter, you should directly create an instance of UpdateRule subclass, and attach it to the Parameter.
Below is an example to learn regression task by MyChain MLP model using Adam optimizer in Chainer.
from chainer import Chain, Variable
# Prepare your model (neural network) as `Link` or `Chain`
class MyChain(Chain):
def __init__(self):
super(MyChain, self).__init__(
l1=L.Linear(None, 30),
l2=L.Linear(None, 30),
l3=L.Linear(None, 1)
)
def __call__(self, x):
h = self.l1(x)
h = self.l2(F.sigmoid(h))
return self.l3(F.sigmoid(h))
model = MyChain()
# Then you can instantiate optimizer
optimizer = chainer.optimizers.Adam()
# Register model to optimizer (to indicate which parameter to update)
optimizer.setup(model)
# Calculate loss, and update parameter as follows.
def lossfun(x, y):
loss = F.mean_squared_error(model(x), y)
return loss
# this iteration is "training", to fit the model into desired function.
for i in range(300):
optimizer.update(lossfun, x, y)
So in summary, you need to setup the model, after that you can use update function to calculate loss and update model's parameter.
The above code comes from here
Also, there are other way to write training code using Trainer module. For more detailed tutorial of Chainer, please refer below
chainer-handson
deep-learning-tutorial-with-chainer