I am trying to get a fine-tuned MobileNetV3Small running in JavaScript. Unfortunately tfjs does not support the Rescaling layer yet. That shouldn't matter too much though, since I can rescale the image beforehand. Now I would like to get rid of the Rescaling layer in the model, but am failing to do so.
tf.keras' model.layers.pop seems to be not working (see e.g. here).
So I tried to disassemble the layers, like here, skip the rescaling layer and assemble them to a model again. Problem is, that MobileNetV3 has some skip-layers which are concatenated by Add layers with several Inputs, so I end up with:
ValueError: A merge layer should be called on a list of inputs.
Any ideas on how to solve it? Every help would be greatly appreciated!
Here's the code I used for (dis)assembling:
#Creating the Model with the undesired layer
base = tf.keras.applications.MobileNetV3Small(input_shape=(224,224,3), include_top=False, weights='imagenet', minimalistic=True)
model=keras.Model(inputs=base.input, outputs=predictions)
# Dissasemble
layers = [l for l in model.layers]
new_in = keras.Input((224,224,3))
x = new_in
#Assemble again, but Skip Layer-No1, the Rescaling Layer
for idx,l in enumerate(layers[2:]):
l.trainable = False
x = l(x)
results = tf.keras.Model(inputs=new_in, outputs=x)
# Results in mentioned Error
Related
Ng's Convolutional Neural Network class's Week 2 Lab on using Transfer Learning with MobileNetV2 (summary: https://github.com/EhabR98/Transfer-Learning-with-MobileNetV2) and an additional tutorial (https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/) both begin like this:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False
They then proceed to add a pooling layer(s), a Dropout layer and a Dense 1-unit layer to the end, apply a BinaryCrossentropy loss and some kind of optimizer, then train it on some custom data that has been inputted. Lets call this custom model "model2" as Ng's lab does
Here's what my the Coursera class model looks like, its important to include here because the variable base_model is called in two different closures throughout the Coursera lab (previous to this it was called outside of a method, as base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=True, weights='imagenet'); base_model.trainable= False)
def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter())
input_shape = image_shape + (3,0)
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')
base_model.trainable = False
inputs = tf.keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = tfl.GlobalAveragePooling2D()(x)
x = tfl.Dropout(0.2)(x)
prediction_layer = tfl.Dense(1)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
model2 = alpaca_model()
base_learning_rate = 0.001
initial_epochs = 5
model2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=["accuracy"])
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)
This performs OK, getting as much as 80% accuracy
Fine tuning -- Now in both the course lab and the tutorial, they then proceed to "unfreeze" some of the last layers of the internal network so that they can be trained, like so:
fine_tune_at = 120
base_model = model2.layers[4] #totally separate question, but I would love to hear in comments, what this does exactly. It is difficult to Google this.
base_model.trainable = True
print("#/layers in base model: ", len(base_model.layers))
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
loss_function = tf.keras.losses.BinaryCrossentrop(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=base_learning_rate*0.1)
metrics = ['accuracy']
fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs
Up until this point, I'm satisfied, I can clearly see what is going on, but then:
model2.compile(loss=loss_function,optimizer=optimizer,metrics=metrics)
history_fine = model2.fit(train_dataset, epochs=total_epochs, initial_epoch=history.epoch[-1], validation_data = validation_dataset)
This leads to a marked improvement in results. Which confused me, I was very much expecting base_model to get passed in somehow. I didn’t imagine that altering some other variable that hasn’t been passed into or been initially called would come into play.
So given all of that context, the question is: How is altering the base_model affecting model2?
If the above example from the Coursera lab is as confusing to you as it is to me, the example shown on https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/ as mentioned above is much simpler and contains much less ambiguity as base_model is defined only once. Regardless, the same dynamic applies and I'm equally confused on both. Thanks again for your time
Your
totally separate question, but I would love to hear in comments, what this does exactly.
The following list get the MobileNetV2 model:
base_model = model2.layers[4]
Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.
How is altering the base_model affecting model2
During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.
To view the full model summary with nested models, use:
>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)
I'll go ahead and post an answer that speaks to (the problem with) Professor Ng's Convolution Neural Network, Week 2 Assignment, "Transfer Learning with MobileNet", with the hope that other students might find this answer and realize that they are not crazy and that the Lab was poorly coded.
I'm not sure how it (appears to work) on Jupyter, but the main reason I was having problems with this lab was that base_model was defined several times within the lab. It should have only been defined once. Even worse, the base_model was redefined inside the alpaca_model() function, but that's not accessible outside the closure of the function. I'm not in the industry, but that is just plain terrible coding to redefine a variable inside a method that's already been defined then call it again outside of the method.
Once I took base_model out of the function, defining it beforehand, everything works perfectly not just on the computer, but in my head.
I have a Keras-model (let's call it full model), which was already trained and now I would like to create a new submodel using layers m to n of the full model.
E.g. full model has 10 layers and my submodel shall comprise layers 3 to 8
For the case that m=0, the task is trivial as one can use: (assume we want to go to layer 5)
full_model = ... # anything we load from a h5-file
submodel=tf.keras.Model(inputs=full_model.inputs, outputs=full_model.layers[5].output)
# =>
submodel.summary()
tf.keras.utils.plot_model(submodel, to_file = ...)
So, we can use the submodel, get its summary and also get the png-plot of the submodel-architecture.
The concrete problem now is that I don't know how to make this if we want to take the last layers of the model for example. I always get a GraphDisconnected error than.
The only way to get around this, that I found, was to manually loop over the layers (as the function below, "create_submodel", is doing it) - but in my case, I cannot use this because the model is quite complex and the layers are not simply put after each other but they are nested and so on i.e. in the architecture-plot, I do not have a straight series of layers but many different branches in the "tree" of layers.
So: Is there a way to create a submodel (from layer "m" to layer "n") of a "full" model without simple, naive looping through the layers (as demonstrated in the function below)
Thanks very much!
def create_submodel(full_model, start_layer_number=None, end_layer_number=None):
layers = tf.keras.layers
if start_layer_number is None:
start_layer_number = 0
if end_layer_number is None:
end_layer_number = len(full_model.layers)
inp_shape = full_model.layers[start_layer_number].input.shape[1:]
inp = layers.Input(shape=(inp_shape))
x = inp
for i in range(start_layer_number, end_layer_number):
print(i, full_model.layers[i].name)
x = full_model.layers[i](x)
out = x
sub_model = tf.keras.Model(inputs=inp, outputs=out)
sub_model.summary()
return sub_model
input_word = Input(shape=(max_len,))
model = Embedding(input_dim=num_words, output_dim=50, input_length=max_len)(input_word)
model = SpatialDropout1D(0.1)(model)
model = Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1))(model)
out = TimeDistributed(Dense(num_tags, activation="softmax"))(model)
#out = Dense(num_tags, activation="softmax")(model)
model = Model(input_word, out)
model.summary()
I get the same result when I use just Dense layer or with TimeDistributed. In which case should I use TimeDistributed?
TimeDistributed is only necessary for certain layers that cannot handle additional dimensions in their implementation. E.g. MaxPool2D only works on 2D tensors (shape batch x width x height x channels) and will crash if you, say, add a time dimension:
tfkl = tf.keras.layers
a = tf.random.normal((16, 32, 32, 3))
tfkl.MaxPool2D()(a) # this works
a = tf.random.normal((16, 5, 32, 32, 3)) # added a 5th dimension
tfkl.MaxPool2D()(a) # this will crash
Here, adding TimeDistributed will fix it:
tfkl.TimeDistributed(tfkl.MaxPool2D())(a) # works with a being 5d!
However, many layers already support arbitrary input shapes and will automatically distribute the computations over those dimensions. One of these is Dense -- it is always applied to the last axis in your input and distributed over all others, so TimeDistributed isn't necessary. In fact, as you noted, it changes nothing about the output.
Still, it may change how exactly the computation is done. I'm not sure about this, but I would wager that not using TimeDistributed and relying on the Dense implementation itself may be more efficient.
According to the book Zero to Deep Learning by Francesco Mosconi in chapter 7:
If we want the model return an output sequence to be compared with the
sequence of values in the labels, we will use the TimeDistributed
layer wrapper around our output Dense layer. This method of training
is called Teacher Forcing. If we didn’t create output sequences we
wouldn't need Teacher Forcing(i.e. wouldn't need TimeDistributed wrapper).
I am training an autoencoder constructed using the Sequential API in Keras. I'd like to create separate models that implement the encoding and decoding functions. I know from examples how to do this with the functional API, but I can't find an example of how it's done with the Sequential API. The following sample code is my starting point:
input_dim = 2904
encoding_dim = 4
hidden_dim = 128
# instantiate model
autoencoder = Sequential()
# 1st hidden layer
autoencoder.add(Dense(hidden_dim, input_dim=input_dim, use_bias=False))
autoencoder.add(BatchNormalization())
autoencoder.add(Activation('elu'))
autoencoder.add(Dropout(0.5))
# encoding layer
autoencoder.add(Dense(encoding_dim, use_bias=False))
autoencoder.add(BatchNormalization())
autoencoder.add(Activation('elu'))
# autoencoder.add(Dropout(0.5))
# 2nd hidden layer
autoencoder.add(Dense(hidden_dim, use_bias=False))
autoencoder.add(BatchNormalization())
autoencoder.add(Activation('elu'))
autoencoder.add(Dropout(0.5))
# output layer
autoencoder.add(Dense(input_dim))
I realize I can select individual layers using autoencoder.layer[i], but I don't know how to associate a new model with a range of such layers. I naively tried the following:
encoder = Sequential()
for i in range(0,7):
encoder.add(autoencoder.layers[i])
decoder = Sequential()
for i in range(7,12):
decoder.add(autoencoder.layers[i])
print(encoder.summary())
print(decoder.summary())
which seemingly worked for the encoder part (a valid summary was shown), but the decoder part generated an error:
This model has not yet been built. Build the model first by calling build() or calling fit() with some data. Or specify input_shape or batch_input_shape in the first layer for automatic build.
Since the input shape for a middle layer (i.e. here I am referring to autoencoder.layers[7]) is not explicitly set, when you add it to another model as the first layer, that model would not be built automatically (i.e. building process involves constructing weight tensor for the layers in the model). Therefore, you need to call build method explicitly and set the input shape:
decoder.build(input_shape=(None, encoding_dim)) # note that batch axis must be included
As a side note, there is no need to call print on model.summary(), since it would print the result by itself.
Another way which also works.
input_img = Input(shape=(encoding_dim,))
previous_layer = input_img
for i in range(bottleneck_layer,len(autoencoder.layers)): # bottleneck_layer = index of bottleneck_layer + 1!
next_layer = autoencoder.layers[i](previous_layer)
previous_layer = next_layer
decoder = Model(input_img, next_layer)
I've trained the following model for some timeseries in Keras:
input_layer = Input(batch_shape=(56, 3864))
first_layer = Dense(24, input_dim=28, activation='relu',
activity_regularizer=None,
kernel_regularizer=None)(input_layer)
first_layer = Dropout(0.3)(first_layer)
second_layer = Dense(12, activation='relu')(first_layer)
second_layer = Dropout(0.3)(second_layer)
out = Dense(56)(second_layer)
model_1 = Model(input_layer, out)
Then I defined a new model with the trained layers of model_1 and added dropout layers with a different rate, drp, to it:
input_2 = Input(batch_shape=(56, 3864))
first_dense_layer = model_1.layers[1](input_2)
first_dropout_layer = model_1.layers[2](first_dense_layer)
new_dropout = Dropout(drp)(first_dropout_layer)
snd_dense_layer = model_1.layers[3](new_dropout)
snd_dropout_layer = model_1.layers[4](snd_dense_layer)
new_dropout_2 = Dropout(drp)(snd_dropout_layer)
output = model_1.layers[5](new_dropout_2)
model_2 = Model(input_2, output)
Then I'm getting the prediction results of these two models as follow:
result_1 = model_1.predict(test_data, batch_size=56)
result_2 = model_2.predict(test_data, batch_size=56)
I was expecting to get completely different results because the second model has new dropout layers and theses two models are different (IMO), but that's not the case. Both are generating the same result. Why is that happening?
As I mentioned in the comments, the Dropout layer is turned off in inference phase (i.e. test mode), so when you use model.predict() the Dropout layers are not active. However, if you would like to have a model that uses Dropout both in training and inference phase, you can pass training argument when calling it, as suggested by François Chollet:
# ...
new_dropout = Dropout(drp)(first_dropout_layer, training=True)
# ...
Alternatively, If you have already trained your model and now want to use it in inference mode and keep the Dropout layers (and possibly other layers which have different behavior in training/inference phase such as BatchNormalization) active, you can define a backend function that takes the model's inputs as well as Keras learning phase:
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# to use it pass 1 to set the learning phase to training mode
outputs = func([input_arrays] + [1.])
your question has a simple solution in the latest version of Tensorflow. you can set the training argument of the call method to true.
you can run a code like the below code:
model(input,training=True)
by using training=True TensorFlow automatically applies the Dropout layer in inference mode.
As there are already some working code solutions above, I will simply add a few more details regarding dropout during inference to prevent confusion.
Based on the original paper, Dropout layers play the role of turning off (setting gradients to zero) the neuron nodes during training to reduce overfitting. However, once we finish off with training and start testing the model, we do not 'touch' any neurons, thus, all the units are considered to make the decision when inferencing. This causes previously 'dead' neuron weights to be large than expected due to the usage of Dropout. To prevent this, a scaling factor is applied to balance the network node. To be more precise, if a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p during the prediction stage.