I have a network for transfer learning and want to train on two GPUs. I have just trained on one up to this point and am looking for ways to speed things up. I am getting conflicting answers about how to use it most efficiently.
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
with strategy.scope():
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(200,200,3))
x = base_model.output
x = GlobalAveragePooling2D(name="class_pool")(x)
x = Dense(1024, activation='relu', name="class_dense1")(x)
types = Dense(20,activation='softmax', name='Class')(x)
model = Model(inputs=base_model.input, outputs=[types])
Then I set trainable layers:
for layer in model.layers[:160]:
layer.trainable=False
for layer in model.layers[135:]:
layer.trainable=True
Then I compile
optimizer = Adam(learning_rate=.0000001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics='accuracy')
Should everything be nested insidestrategy.scope()?
This tutorial shows compile within but this tutorial shows it is outside.
Thefirst one shows it outside
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
but says this right after
In this example we used MirroredStrategy so we can run this on a machine with multiple GPUs. strategy.scope() indicates to Keras which strategy to use to distribute the training. Creating models/optimizers/metrics inside this scope allows us to create distributed variables instead of regular variables. Once this is set up, you can fit your model like you would normally. MirroredStrategy takes care of replicating the model's training on the available GPUs, aggregating gradients, and more.
It does not matter where it goes because under the hood, model.compile() would create the optimizer,loss and accuracy metric variables under the strategy scope in use. Then you can call model.fit which would also schedule a training loop under the same strategy scope.
I would suggest further searching as my answer does not have any experimental basis to it. It's just what I think.
Related
I am training a large model on my computer which takes time. I have checkpoint and earlystopping callback. Checkpoint callback saves weights only and earlystopping is set to restore_best_weights.
I set my computer to train during the night when I go to bed. When I wake up, I see the results, close Jupyter Notebook because I have to write my thesis, and then I open Jupyter again before I go to bed to have it train again after I changed some things.
My question is, if I open the notebook again after I trained the model, I still have to execute all the cells to resume working. If I execute all the cells, like this one:
# Define ResNet50 base model with weights from imagenet. Do NOT include top classification layers.
base_model = tf.keras.applications.resnet_v2.ResNet50V2(input_shape=input_shape, include_top=False, weights='imagenet')
# Freeze base model
base_model.trainable = False
# Define input layer
inputs = tf.keras.Input(shape=input_shape)
# Apply Data Augmentation
x = data_augmentation(inputs)
# Preproccess input using the same weights base model was trained on
x = tf.keras.applications.resnet_v2.preprocess_input(x)
# Set training = False to disable Batch Norm layers from updating
x = base_model(x, training = False)
# Add avaragePooling
x = tfl.GlobalAveragePooling2D()(x)
# Add dropout layer for regularization
x = tfl.Dropout(0.2)(x)
# Add prediction/output layer with 3 neurons (Class Number = 3)
outputs = tfl.Dense(3, kernel_initializer = HeNormal())(x)
model = tf.keras.Model(inputs, outputs)
base_lr = 0.004
model.compile(optimizer=Adam(learning_rate = base_lr), loss=CategoricalCrossentropy(from_logits = True), metrics=["accuracy"])
My model will get replaced, therefore the weights that it learnt will be initialized again? Even if Early Stopping restores the best weights, those will be overwritten, right?
My workflow is:
Define dataset
Define model with data augmentation and base_model.trainable = false
Compile model
Define callbacks such as EarlyStopping, Checkpoints, ReduceLR and tensorboard
model.fit
Now I want to apply fine-tuning but I think when I run my code again the model's state is gone. What I thought to do is after the first training session, I do:
model.get_weights() # So I can see what the weights are when I first train the model
And then when i restart my notebook I do model.get_weights() again to see if they are the same, if they are not, I check my checkpoint_dir and see which checkpoint has the lowest val_loss and do:
# Load weights from the checkpoint with the lowest val_loss if model.get_weights() returns different weights
model.load_weights(r'.\checkpoints\ResNet50\run_23_07_2022-13_18_57\ckp_24-0.4337')
Basically, I am restoring the best weights manually.
Then continue with base_model.trainable = True, and carry on with compiling the model again and running model.fit to fine-tune it.
As you can tell, I am a bit confused on how Jupyter Notebook works and how to resume training my model after each session. Is my workflow correct, with manually loading the best checkpoint?
Do I have to keep Jupyter open until I completely finish with the project, so the variables are not gone?
I have a sequential model with a custom loss function for training. For prediction and validation however, I want to remove one layer. Is there any way to do this? The easiest thing I could think would be within a custom metric by being able to get the value of output from a previous layer without access to the input. Alternatively, I could run prediction and verification on a separate model, but I worry about constructing a separate model because I want the weights to be saved. Any suggestions? I have spent a lot of time with this and any thing I try has involved scope issues. I took a look at this: Keras, How to get the output of each layer? but every answer I see requires me to know the inputs.
You can create separate models. Each model will need to be compiled. My solution was of this form...
inputs = Input(input_shape)
model = Conv2D(32, [3,3])(inputs)
# pass the model through some layers
# finish the model
model = Model(inputs=inputs, outputs=model)
input_2 = Input(input_shape)
second_model = model(input_2)
# pass the second model through some layers
second_model = Model(inputs=inputs, outputs=second_model)
model.compile(...
second_model.compile(...
Now any training done to second_model affects the weights of model, allowing you to do training off of second_model and predictions with model.
Is it possible in Keras that the training of each or some of outputs in multi-output training start at different epochs? For example one of the outputs takes some other outputs as its input. But those outputs at the beginning are quite premature and it brings huge computational burdens to the model. This output that I would like its training to be postponed to some time later is a custom layer that has to apply some image processing operations to its input which is an image generated by another output but at the beginning that the generated image is quite meaningless, I think it's just waste of time for first epochs to apply this custom layer. Is there a way to do that? Like we have weights over each output's loss, do we have different starting point for calculating each output's loss?
Build a model that does not contain the later output.
Train that model to the degree you want.
Build a new model that incorporates the old model into it.
Compile the new model with the new loss functions you want.
Train that model.
To elaborate on step 3: Keras models can be used like layers in Keras' functional API.
You can build a normal model like so:
input = Input((100,))
x = Dense(50)(input)
x = Dense(1, activation='sigmoid')(x)
model = Model(input, x)
However, if you have another standard Keras model, it can be used just like any other layer. For example, if we have a model (created with Sequential(), Model(), or keras.models.load_model()) called model1, we can put it in like this:
input = Input((100,))
x = model1(input)
x = Dense(1, activation='sigmoid')(x)
model = Model(input, x)
This would be the equivalent of putting in each layer in model1 individually.
How to reset optimizer state in keras?
Looking at Optimizer class I can't see such a method:
https://github.com/keras-team/keras/blob/613aeff37a721450d94906df1a3f3cc51e2299d4/keras/optimizers.py#L60
Also what is actually self.updates and self.weights?
There isn't an "easy" way to reset the "states", but you can always simply recompile your model with a new optimizer (model's weights are preserved):
newOptimizer = Adadelta()
model.compile(optimizer=newOptimizer)
You can also use the method set_weights(weightsListInNumpy) (not recommended), in the base class Optimizer, but this would be rather cumbersome as you would need to know all initial values and shapes, which sometimes may not be trivial zeroes .
Now, the property self.weights doesn't do much, but the functions that save and load optimizers will save and load this property. It's a list of tensors and should not be changed directly. At most use K.set_value(...) in each entry of the list. You can see the weights in saving the optimizer in the _serialize_model method.
The self.updates are something a little more complex to understand. It stores the variables that will be updated with every batch that is processed by the model in training. But it's a symbolic graph variable.
The self.updates, as you can see in the code, is always appended with a K.update(var, value) or K.update_add(var, value). This is the correct way to tell the graph that these values should be updated every iteration.
Usually, the updated vars are iterations, params (the model's weights), moments, accumulators, etc.
I don't think there is a universal method for this, but you should be able to reset the state of your optimizer by initializing the variables holding it. This would need to be done with the TensorFlow API, though. The state variables depend on the specific kind of optimizer. For example, if you have a Adam optimizer (source), you could do the following:
from keras.optimizers import Adam
from keras import backend as K
optimizer = Adam(...)
# These depend on the optimizer class
optimizer_state = [optimizer.iterations, optimizer.lr, optimizer.beta_1,
optimizer.beta_2, optimizer.decay]
optimizer_reset = tf.variables_initializer(optimizer_state)
# Later when you want to reset the optimizer
K.get_session().run(optimizer_reset)
The optimizer is just adjusting the wheihts of your model, thus the information is stored in the model, not in the optimizer.
That means you can't reset an optimizer in a way you might think. You need to reset (or maybe easyier, recreate) your model.
That means you also can optimize your model with an optimizer A, stop after some epochs, and continue optimizing your model with optimizer B not loosing the progress optimizer A made allready.
I don't know exactly what self.updates and self.weights are there for. But because those are internal variables of the class someone needs to know/read about the optimizer class itself and understand its code. Here we need to wait fore someone who dived deeper into the sourcecode of keras.
EDIT
You can just recreate your optimizer for example:
model = Seqeuential()
...
...
...
model.compile(optimizer=keras.optimizers.Adadelta(lr = 5, loss='mean_squared_error')
model.fit(X, y, epochs=10)
model.compile(optimizer=keras.optimizers.Adadelta(lr = 0.5, loss='mean_squared_error')
model.fit(X, y, epochs=10)
With the above code you train 10 epochs with learning rate 5, compile your model with a new optimizer, and continue for another 10 epochs with learning rate 0.5. The weights which you could also call your training progress do not get lost if you compile your model again.
I am trying to find the cost function in Keras. I am running an LSTM with the loss function categorical_crossentropy and I added a Regularizer. How do I output what the cost function looks like after my Regularizer this for my own analysis?
model = Sequential()
model.add(LSTM(
NUM_HIDDEN_UNITS,
return_sequences=True,
input_shape=(PHRASE_LEN, SYMBOL_DIM),
kernel_regularizer=regularizers.l2(0.01)
))
model.add(Dropout(0.3))
model.add(LSTM(NUM_HIDDEN_UNITS, return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(SYMBOL_DIM))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=1e-03, rho=0.9, epsilon=1e-08))
How do i output what the cost function looks like after my regularizer this for my own analysis?
Surely you can achieve this by obtaining the output (yourlayer.output) of the layer you want to see and print it (see here). However there are better ways to visualize these things.
Meet Tensorboard.
This is a powerful visualization tool that enables you to track and visualize your metrics, outputs, architecture, kernel_initializations, etc. The good news is that there is already a Tensorboard Keras Callback that you can use for this purpose; you just have to import it. To use it just pass an instance of the Callback to your fit method, something like this:
from keras.callbacks import TensorBoard
#indicate folder to save, plus other options
tensorboard = TensorBoard(log_dir='./logs/run1', histogram_freq=1,
write_graph=True, write_images=False)
#save it in your callback list
callbacks_list = [tensorboard]
#then pass to fit as callback, remember to use validation_data also
model.fit(X, Y, callbacks=callbacks_list, epochs=64,
validation_data=(X_test, Y_test), shuffle=True)
After that, start your Tensorboard sever (it runs locally on your pc) by executing:
tensorboard --logdir=logs/run1
For example, this is what my Kernels look like on two different models I tested (to compare them you have to save separate runs and then start Tensorboard on the parent directory instead). This is on the Histograms tab, on my second layer:
The model on the left I initialized with kernel_initializer='random_uniform', thus its shape is the one of a Uniform Distribution. The model on the right I initialized with kernel_initializer='normal', thus why it appears as a Gaussian distribution throughout my epochs (about 30).
This way you could visualize how your kernels and layers "look like", in a more interactive and understandable way than printing outputs. This is just one of the great features Tensorboard has, and it can help you develop your Deep Learning models faster and better.
Of course there are more options to the Tensorboard Callback and for Tensorboard in general, so I do suggest you thoroughly read the links provided if you decide to attempt this. For more information you can check this and also this questions.
Edit: So, you comment you want to know how your regularized loss "looks" analytically. Let's remember that by adding a Regularizer to a loss function we are basically extending the loss function to include some "penalty" or preference in it. So, if you are using cross_entropy as your loss function and adding an l2 regularizer (that is Euclidean Norm) with a weight of 0.01 your whole loss function would look something like: