keras rl - dqn model update - python

I am reading through the DQN implementation in keras-rl /rl/agents/dqn.py and see that in the compile() step essentially 3 keras models are instantiated:
self.model : provides q value predictions
self.trainable_model : same as self.model but has the loss function we want to train
self.target_model : target model which provides the q targets and is updated every k steps with the weights from self.model
The only model on which train_on_batch() is called is trainable_model, however - and this is what I don't understand - this also updates the weights of model.
In the definition of trainable_model one of the output tensors y_pred is referencing the output from model:
y_pred = self.model.output
y_true = Input(name='y_true', shape=(self.nb_actions,))
mask = Input(name='mask', shape=(self.nb_actions,))
loss_out = Lambda(clipped_masked_error, output_shape=(1,), name='loss')([y_true, y_pred, mask])
ins = [self.model.input] if type(self.model.input) is not list else self.model.input
trainable_model = Model(inputs=ins + [y_true, mask], outputs=[loss_out, y_pred])
When trainable_model.train_on_batch() is called, BOTH the weights in trainable_model and in model change. I am surprised because even though the two models reference the same output tensor object (trainable_model.y_pred = model.output), the instantiation of trainable_model = Model(...) should also instantiate a new set of weights, no?
Thanks for the help!

This is a small example to show that when you instantiate a new keras.models.Model() using the input and output tensor from another model, then the weights of these two models are shared. They don't get re-initialized.
# keras version: 2.2.4
import numpy as np
from keras.models import Sequential, Model
from keras.layers import Dense, Input
from keras.optimizers import SGD
np.random.seed(123)
model1 = Sequential()
model1.add(Dense(1, input_dim=1, activation="linear", name="model1_dense1", weights=[np.array([[10]]),np.array([10])]))
model1.compile(optimizer=SGD(), loss="mse")
model2 = Model(inputs=model1.input, outputs=model1.output)
model2.compile(optimizer=SGD(), loss="mse")
x = np.random.normal(size=2000)
y = 2 * x + np.random.normal(size=2000)
print("model 1 weights", model1.get_weights())
print("model 2 weights", model2.get_weights())
model2.fit(x,y, epochs=3, batch_size=32)
print("model 1 weights", model1.get_weights())
print("model 2 weights", model2.get_weights())
Definitely something to keep in mind. Wasnt intuitive to me.

Related

Why am I receiving 'ValueError: This model has not yet been built'? [Following tutorial] [duplicate]

I've keras model defined as follow
class ConvLayer(Layer) :
def __init__(self, nf, ks=3, s=2, **kwargs):
self.nf = nf
self.grelu = GeneralReLU(leak=0.01)
self.conv = (Conv2D(filters = nf,
kernel_size = ks,
strides = s,
padding = "same",
use_bias = False,
activation = "linear"))
super(ConvLayer, self).__init__(**kwargs)
def rsub(self): return -self.grelu.sub
def set_sub(self, v): self.grelu.sub = -v
def conv_weights(self): return self.conv.weight[0]
def build(self, input_shape):
# No weight to train.
super(ConvLayer, self).build(input_shape) # Be sure to call this at the end
def compute_output_shape(self, input_shape):
output_shape = (input_shape[0],
input_shape[1]/2,
input_shape[2]/2,
self.nf)
return output_shape
def call(self, x):
return self.grelu(self.conv(x))
def __repr__(self):
return f'ConvLayer(nf={self.nf}, activation={self.grelu})'
class ConvModel(tf.keras.Model):
def __init__(self, nfs, input_shape, output_shape, use_bn=False, use_dp=False):
super(ConvModel, self).__init__(name='mlp')
self.use_bn = use_bn
self.use_dp = use_dp
self.num_classes = num_classes
# backbone layers
self.convs = [ConvLayer(nfs[0], s=1, input_shape=input_shape)]
self.convs += [ConvLayer(nf) for nf in nfs[1:]]
# classification layers
self.convs.append(AveragePooling2D())
self.convs.append(Dense(output_shape, activation='softmax'))
def call(self, inputs):
for layer in self.convs: inputs = layer(inputs)
return inputs
I'm able to compile this model without any issues
>>> model.compile(optimizer=tf.keras.optimizers.Adam(lr=lr),
loss='categorical_crossentropy',
metrics=['accuracy'])
But when I query the summary for this model, I see this error
>>> model = ConvModel(nfs, input_shape=(32, 32, 3), output_shape=num_classes)
>>> model.summary()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-220-5f15418b3570> in <module>()
----> 1 model.summary()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py in summary(self, line_length, positions, print_fn)
1575 """
1576 if not self.built:
-> 1577 raise ValueError('This model has not yet been built. '
1578 'Build the model first by calling `build()` or calling '
1579 '`fit()` with some data, or specify '
ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.
I'm providing input_shape for the first layer of my model, why is throwing this error?
The error says what to do:
This model has not yet been built. Build the model first by calling build()
model.build(input_shape) # `input_shape` is the shape of the input data
# e.g. input_shape = (None, 32, 32, 3)
model.summary()
There is a very big difference between keras subclassed model and other keras models (Sequential and Functional).
Sequential models and Functional models are datastructures that represent a DAG of layers. In simple words, Functional or Sequential model are static graphs of layers built by stacking one on top of each other like LEGO. So when you provide input_shape to first layer, these (Functional and Sequential) models can infer shape of all other layers and build a model. Then you can print input/output shapes using model.summary().
On the other hand, subclassed model is defined via the body (a call method) of Python code. For subclassed model, there is no graph of layers here. We cannot know how layers are connected to each other (because that's defined in the body of call, not as an explicit data structure), so we cannot infer input / output shapes. So for a subclass model, the input/output shape is unknown to us until it is first tested with proper data. In the compile() method, we will do a deferred compile and wait for a proper data. In order for it to infer shape of intermediate layers, we need to run with a proper data and then use model.summary(). Without running the model with a data, it will throw an error as you noticed. Please check GitHub gist for complete code.
The following is an example from Tensorflow website.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class ThreeLayerMLP(keras.Model):
def __init__(self, name=None):
super(ThreeLayerMLP, self).__init__(name=name)
self.dense_1 = layers.Dense(64, activation='relu', name='dense_1')
self.dense_2 = layers.Dense(64, activation='relu', name='dense_2')
self.pred_layer = layers.Dense(10, name='predictions')
def call(self, inputs):
x = self.dense_1(inputs)
x = self.dense_2(x)
return self.pred_layer(x)
def get_model():
return ThreeLayerMLP(name='3_layer_mlp')
model = get_model()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
model.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.RMSprop())
model.summary() # This will throw an error as follows
# ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.
# Need to run with real data to infer shape of different layers
history = model.fit(x_train, y_train,
batch_size=64,
epochs=1)
model.summary()
Thanks!
Another method is to add the attribute input_shape() like this:
model = Sequential()
model.add(Bidirectional(LSTM(n_hidden,return_sequences=False, dropout=0.25,
recurrent_dropout=0.1),input_shape=(n_steps,dim_input)))
# X is a train dataset with features excluding a target variable
input_shape = X.shape
model.build(input_shape)
model.summary()
Make sure you create your model properly. A small typo mistake like the following code may also cause a problem:
model = Model(some-input, some-output, "model-name")
while the correct code should be:
model = Model(some-input, some-output, name="model-name")
If your Tensorflow, Keras version is 2.5.0 then just add Tensorflow when you import Keras package
Not this:
from tensorflow import keras
from keras.models import Sequential
import tensorflow as tf
Like this:
from tensorflow import keras
from tensorflow.keras.models import Sequential
import tensorflow as tf
Version issues of your Tensorflow, Keras, can be the reason for this.
Same problem I encountered during training for the LSTM model for regression.
Error:
ValueError: This model has not yet been built. Build the model first
by calling build() or by calling the model on a batch of data.
Earlier:
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.models import Sequential
Corrected:
from keras.models import Sequential
I was also facing same error, so I have removed model.summary(). Then issue is resolved. As it arises if model of summary is defined before the model is built.
Here is the LINK for description which states that
Raises:
ValueError: if `summary()` is called before the model is built.**

How to return loss history of multi-output models in Keras?

I use Python 3.7 and Keras 2.2.4. I created a Keras model with two output layers:
self.df_model = Model(inputs=input, outputs=[out1,out2])
As the loss history only returns one loss value per epoch, I want to get the loss of each output layer. How is it possible to get two loss values per epoch, one for each output layer?
Each model in Keras has a default History callback which stores all the loss and metric values of all the epochs, both the aggregate values as well as per output layer. This callback creates a History object which is returned when fit model is called and you can access all of these values by using the history property of that object (it is actually a dictionary):
history = model.fit(...)
print(history.history) # <-- a dict which contains all the loss and metric values per epoch
A minimal reproducible example:
from keras import layers
from keras import Model
import numpy as np
inp = layers.Input((1,))
out1 = layers.Dense(2, name="output1")(inp)
out2 = layers.Dense(3, name="output2")(inp)
model = Model(inp, [out1, out2])
model.compile(loss='mse', optimizer='adam')
x = np.random.rand(2, 1)
y1 = np.random.rand(2, 2)
y2 = np.random.rand(2, 3)
history = model.fit(x, [y1,y2], epochs=5)
print(history.history)
#{'loss': [1.0881365537643433, 1.084699034690857, 1.081269383430481, 1.0781562328338623, 1.0747418403625488],
# 'output1_loss': [0.87154925, 0.8690172, 0.86648905, 0.8641926, 0.8616721],
# 'output2_loss': [0.21658726, 0.21568182, 0.2147803, 0.21396361, 0.2130697]}

What are other weight types for model wrappers in Keras?

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
model = ResNet50(weights='imagenet')
In this code, there is the "wrapper" (that's what it's referred to) ResNet50. What are the other types of weights I can use for this? I tried looking around but I don't even understand the source code; there is nothing conclusive there either
You can find it on the keras doc
Or github code
There is only two options, either None if you just want the architecture without the weights, or imagenet to load imagenet weights.
Edit : how to use our own weights :
# Take a DenseNET201
backbone = tf.keras.applications.DenseNet201(input_shape=input_shape, weights=None, include_top=False)
# Change the model a little bit, because why not
input_image = tf.keras.layers.Input(shape=input_shape)
x = backcone(input_image)
x = tf.keras.layers.Conv2D(classes, (3, 3), padding='same', name='final_conv')(input)
x = tf.keras.layers.Activation(activation, name=activation)(x)
model = tf.keras.Model(input, x)
#... some additional code
# training part
optimizer = tf.keras.optimizers.Adam(lr=FLAGS.learning_rate)
model.compile(loss=loss,
optimizer=optimizer,
metrics=['accuracy', f1_m, recall_m, precision_m])
callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath=ckpt_name)]
model.fit(train_generator, validation_data=validation_generator, validation_freq=1, epochs=10, callbacks=callbacks)
# using the callback there will weights saved in cktp_name each epoch
# Inference part, just need to reinstance the model (lines after #Change part comment)
model.load_weights(ckpt_name)
results = model.predict(test_generator, verbose=1)
You don't need to change the model obviously, you could have used x = backbone(x) and then model = tf.keras.Model(input, x)

Dropout layer directly in tensorflow: how to train?

After I created my model in Keras, I want to get the gradients and apply them directly in Tensorflow with the tf.train.AdamOptimizer class. However, since I am using a Dropout layer, I don't know how to tell to the model whether it is in the training mode or not. The training keyword is not accepted. This is the code:
net_input = Input(shape=(1,))
net_1 = Dense(50)
net_2 = ReLU()
net_3 = Dropout(0.5)
net = Model(net_input, net_3(net_2(net_1(net_input))))
#mycost = ...
optimizer = tf.train.AdamOptimizer()
gradients = optimizer.compute_gradients(mycost, var_list=[net.trainable_weights])
# perform some operations on the gradients
# gradients = ...
trainstep = optimizer.apply_gradients(gradients)
I get the same behavior with and without dropout layer, even with dropout rate=1. How to solve this?
As #Sharky already said you can use training argument while invoking call() method of Dropout class. However, if you want to train in tensorflow graph mode you need to pass a placeholder and feed it boolean value during training. Here is the example of fitting Gaussian blobs applicable to your case:
import tensorflow as tf
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import ReLU
from tensorflow.keras.layers import Input
from tensorflow.keras import Model
x_train, y_train = make_blobs(n_samples=10,
n_features=2,
centers=[[1, 1], [-1, -1]],
cluster_std=1)
x_train, x_test, y_train, y_test = train_test_split(
x_train, y_train, test_size=0.2)
# `istrain` indicates whether it is inference or training
istrain = tf.placeholder(tf.bool, shape=())
y = tf.placeholder(tf.int32, shape=(None))
net_input = Input(shape=(2,))
net_1 = Dense(2)
net_2 = Dense(2)
net_3 = Dropout(0.5)
net = Model(net_input, net_3(net_2(net_1(net_input)), training=istrain))
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=y, logits=net.output)
loss_fn = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(0.01)
grads_and_vars = optimizer.compute_gradients(loss_fn,
var_list=[net.trainable_variables])
trainstep = optimizer.apply_gradients(grads_and_vars)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
l1 = loss_fn.eval({net_input:x_train,
y:y_train,
istrain:True}) # apply dropout
print(l1) # 1.6264652
l2 = loss_fn.eval({net_input:x_train,
y:y_train,
istrain:False}) # no dropout
print(l2) # 1.5676715
sess.run(trainstep, feed_dict={net_input:x_train,
y:y_train,
istrain:True}) # train with dropout
Keras layers inherit from tf.keras.layers.Layer class. Keras API handle this internally with model.fit. In case Keras Dropout is used with pure TensorFlow training loop, it supports a training argument in its call function.
So you can control it with
dropout = tf.keras.layers.Dropout(rate, noise_shape, seed)(prev_layer, training=is_training)
From official TF docs
Note: - The following optional keyword arguments are reserved for
specific uses: * training: Boolean scalar tensor of Python boolean
indicating whether the call is meant for training or inference. *
mask: Boolean input mask. - If the layer's call method takes a mask
argument (as some Keras layers do), its default value will be set to
the mask generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came from
a Keras layer with masking support.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout#call

How to cache layer activations in Keras?

I train a NN in which first layers have fixed weights (non-trainable) in Keras.
Computation performed by these layers is pretty intensive during training. It makes sense to cache layer activations for each input and reuse them when same input data is passed on next epoch to save computation time.
Is it possible to achieve this behaviour in Keras?
You could separate your model into two different models. For example, in the following snippet x_ would correspond to your intermediate activations:
from keras.models import Model
from keras.layers import Input, Dense
import numpy as np
nb_samples = 100
in_dim = 2
h_dim = 3
out_dim = 1
a = Input(shape=(in_dim,))
b = Dense(h_dim, trainable=False)(a)
model1 = Model(a, b)
model1.compile('sgd', 'mse')
c = Input(shape=(h_dim,))
d = Dense(out_dim)(c)
model2 = Model(c, d)
model2.compile('sgd', 'mse')
x = np.random.rand(nb_samples, in_dim)
y = np.random.rand(nb_samples, out_dim)
x_ = model1.predict(x) # Shape=(nb_samples, h_dim)
model2.fit(x_, y)

Categories

Resources