How to reset optimizer state in keras?
Looking at Optimizer class I can't see such a method:
https://github.com/keras-team/keras/blob/613aeff37a721450d94906df1a3f3cc51e2299d4/keras/optimizers.py#L60
Also what is actually self.updates and self.weights?
There isn't an "easy" way to reset the "states", but you can always simply recompile your model with a new optimizer (model's weights are preserved):
newOptimizer = Adadelta()
model.compile(optimizer=newOptimizer)
You can also use the method set_weights(weightsListInNumpy) (not recommended), in the base class Optimizer, but this would be rather cumbersome as you would need to know all initial values and shapes, which sometimes may not be trivial zeroes .
Now, the property self.weights doesn't do much, but the functions that save and load optimizers will save and load this property. It's a list of tensors and should not be changed directly. At most use K.set_value(...) in each entry of the list. You can see the weights in saving the optimizer in the _serialize_model method.
The self.updates are something a little more complex to understand. It stores the variables that will be updated with every batch that is processed by the model in training. But it's a symbolic graph variable.
The self.updates, as you can see in the code, is always appended with a K.update(var, value) or K.update_add(var, value). This is the correct way to tell the graph that these values should be updated every iteration.
Usually, the updated vars are iterations, params (the model's weights), moments, accumulators, etc.
I don't think there is a universal method for this, but you should be able to reset the state of your optimizer by initializing the variables holding it. This would need to be done with the TensorFlow API, though. The state variables depend on the specific kind of optimizer. For example, if you have a Adam optimizer (source), you could do the following:
from keras.optimizers import Adam
from keras import backend as K
optimizer = Adam(...)
# These depend on the optimizer class
optimizer_state = [optimizer.iterations, optimizer.lr, optimizer.beta_1,
optimizer.beta_2, optimizer.decay]
optimizer_reset = tf.variables_initializer(optimizer_state)
# Later when you want to reset the optimizer
K.get_session().run(optimizer_reset)
The optimizer is just adjusting the wheihts of your model, thus the information is stored in the model, not in the optimizer.
That means you can't reset an optimizer in a way you might think. You need to reset (or maybe easyier, recreate) your model.
That means you also can optimize your model with an optimizer A, stop after some epochs, and continue optimizing your model with optimizer B not loosing the progress optimizer A made allready.
I don't know exactly what self.updates and self.weights are there for. But because those are internal variables of the class someone needs to know/read about the optimizer class itself and understand its code. Here we need to wait fore someone who dived deeper into the sourcecode of keras.
EDIT
You can just recreate your optimizer for example:
model = Seqeuential()
...
...
...
model.compile(optimizer=keras.optimizers.Adadelta(lr = 5, loss='mean_squared_error')
model.fit(X, y, epochs=10)
model.compile(optimizer=keras.optimizers.Adadelta(lr = 0.5, loss='mean_squared_error')
model.fit(X, y, epochs=10)
With the above code you train 10 epochs with learning rate 5, compile your model with a new optimizer, and continue for another 10 epochs with learning rate 0.5. The weights which you could also call your training progress do not get lost if you compile your model again.
Related
I have a network for transfer learning and want to train on two GPUs. I have just trained on one up to this point and am looking for ways to speed things up. I am getting conflicting answers about how to use it most efficiently.
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
with strategy.scope():
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(200,200,3))
x = base_model.output
x = GlobalAveragePooling2D(name="class_pool")(x)
x = Dense(1024, activation='relu', name="class_dense1")(x)
types = Dense(20,activation='softmax', name='Class')(x)
model = Model(inputs=base_model.input, outputs=[types])
Then I set trainable layers:
for layer in model.layers[:160]:
layer.trainable=False
for layer in model.layers[135:]:
layer.trainable=True
Then I compile
optimizer = Adam(learning_rate=.0000001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics='accuracy')
Should everything be nested insidestrategy.scope()?
This tutorial shows compile within but this tutorial shows it is outside.
Thefirst one shows it outside
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
but says this right after
In this example we used MirroredStrategy so we can run this on a machine with multiple GPUs. strategy.scope() indicates to Keras which strategy to use to distribute the training. Creating models/optimizers/metrics inside this scope allows us to create distributed variables instead of regular variables. Once this is set up, you can fit your model like you would normally. MirroredStrategy takes care of replicating the model's training on the available GPUs, aggregating gradients, and more.
It does not matter where it goes because under the hood, model.compile() would create the optimizer,loss and accuracy metric variables under the strategy scope in use. Then you can call model.fit which would also schedule a training loop under the same strategy scope.
I would suggest further searching as my answer does not have any experimental basis to it. It's just what I think.
I finished building the DNN model for the Titanic Dataset. Given that, how do I make predictions on the X_test? My code can be accessed through my github:
https://github.com/isaac-altair/Titanic-Dataset
Thanks
When you trained your model you asked tensorflow to evaluate your train_op. Your train_op is your optimizer, e.g.:
train_op = tf.train.AdamOptimizer(...).minimize(cost)
You ran something like this to train the model:
sess.run([train_op], feed_dict={x:data, y:labels})
The train_op depends on things like the gradients and the operations that update the weights, so all of these things happened when you ran the train_op.
At inference time you simply ask it to perform different calculations. You can have the optimizer defined, but if you don't ask it to run the optimizer it won't perform any of the actions that the optimizer is dependent on. You probably have an output of the network called logits (you could call it anything, but logits is the most common and seen in most tutorials). You might also have defined an op called accuracy which computes the accuracy of the batch. You can get the value of those with a similar request to tensorflow:
sess.run([logits, accuracy], feed_dict={x:data, y:labels})
Almost any tutorial will demonstrate this. My favorite tutorials are here: https://github.com/aymericdamien/TensorFlow-Examples
When I run the functional API in the model for k-fold cross-validation, the numbers in the naming the dense layer is increased in the return fitted model of each fold.
Like in the first fold it’s dense_2_acc, then in 2nd fold its dense_5_acc.
By my model summary shows my model is correct. why is it changing the names in the fitted model history object of each fold?
This is a really good question which shows something really important about keras. The reason why names change in such manner is that keras is not clearing previously defined variables even when you overwrite the model. You can easily check that variables are still in session.graph by calling:
from keras import backend as K
K.get_session().graph.get_collection('variables')
In order to clear previous model variables one may call:
K.clear_session()
However - be careful - as you might lose an existing model. If you want to keep names the same you can simply name your layers by adding name parameter to your layer instantiation, e.g.:
Dense(10, activation='softmax', name='output')
I am trying to find the cost function in Keras. I am running an LSTM with the loss function categorical_crossentropy and I added a Regularizer. How do I output what the cost function looks like after my Regularizer this for my own analysis?
model = Sequential()
model.add(LSTM(
NUM_HIDDEN_UNITS,
return_sequences=True,
input_shape=(PHRASE_LEN, SYMBOL_DIM),
kernel_regularizer=regularizers.l2(0.01)
))
model.add(Dropout(0.3))
model.add(LSTM(NUM_HIDDEN_UNITS, return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(SYMBOL_DIM))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=1e-03, rho=0.9, epsilon=1e-08))
How do i output what the cost function looks like after my regularizer this for my own analysis?
Surely you can achieve this by obtaining the output (yourlayer.output) of the layer you want to see and print it (see here). However there are better ways to visualize these things.
Meet Tensorboard.
This is a powerful visualization tool that enables you to track and visualize your metrics, outputs, architecture, kernel_initializations, etc. The good news is that there is already a Tensorboard Keras Callback that you can use for this purpose; you just have to import it. To use it just pass an instance of the Callback to your fit method, something like this:
from keras.callbacks import TensorBoard
#indicate folder to save, plus other options
tensorboard = TensorBoard(log_dir='./logs/run1', histogram_freq=1,
write_graph=True, write_images=False)
#save it in your callback list
callbacks_list = [tensorboard]
#then pass to fit as callback, remember to use validation_data also
model.fit(X, Y, callbacks=callbacks_list, epochs=64,
validation_data=(X_test, Y_test), shuffle=True)
After that, start your Tensorboard sever (it runs locally on your pc) by executing:
tensorboard --logdir=logs/run1
For example, this is what my Kernels look like on two different models I tested (to compare them you have to save separate runs and then start Tensorboard on the parent directory instead). This is on the Histograms tab, on my second layer:
The model on the left I initialized with kernel_initializer='random_uniform', thus its shape is the one of a Uniform Distribution. The model on the right I initialized with kernel_initializer='normal', thus why it appears as a Gaussian distribution throughout my epochs (about 30).
This way you could visualize how your kernels and layers "look like", in a more interactive and understandable way than printing outputs. This is just one of the great features Tensorboard has, and it can help you develop your Deep Learning models faster and better.
Of course there are more options to the Tensorboard Callback and for Tensorboard in general, so I do suggest you thoroughly read the links provided if you decide to attempt this. For more information you can check this and also this questions.
Edit: So, you comment you want to know how your regularized loss "looks" analytically. Let's remember that by adding a Regularizer to a loss function we are basically extending the loss function to include some "penalty" or preference in it. So, if you are using cross_entropy as your loss function and adding an l2 regularizer (that is Euclidean Norm) with a weight of 0.01 your whole loss function would look something like:
I would like to train a GAN in Keras. My final target is BEGAN, but I'm starting with the simplest one. Understanding how to freeze weights properly is necessary here and that's what I'm struggling with.
During the generator training time the discriminator weights might not be updated. I would like to freeze and unfreeze discriminator alternately for training generator and discriminator alternately. The problem is that setting trainable parameter to false on discriminator model or even on its' weights doesn't stop model to train (and weights to update). On the other hand when I compile the model after setting trainable to False the weights become unfreezable. I can't compile the model after each iteration because that negates the idea of whole training.
Because of that problem it seems that many Keras implementations are bugged or they work because of some non-intuitive trick in old version or something.
I've tried this example code a couple months ago and it worked:
https://github.com/fchollet/keras/blob/master/examples/mnist_acgan.py
It's not the simplest form of GAN, but as far as I remembered, it's not too difficult to remove the classification loss and turn the model into a GAN.
You don't need to turn on/off the discriminator's trainable property and recompile. Simply create and compile two model objects, one with trainable=True (discriminator in the code) and another one with trainable=False (combined in the code).
When you're updating the discriminator, call discriminator.train_on_batch(). When you're updating the generator, call combined.train_on_batch().
Can you use tf.stop_gradient to conditionally freeze weights?
Maybe your adversarial net(generator plus discriminator) are wrote in 'Model'.
However, even you set the d.trainable=False, the independent d net are set non-trainable, but the d in the whole adversarial net is still trainable.
You can use the d_on_g.summary() before then after set d.trainable=False and you would know What I mean(pay attention to the trainable variables).