I am implementing a simple multitask model in Keras. I used the code given in the documentation under the heading of shared layers.
I know that in multitask learning, we share some of the initial layers in our model and the final layers are made individual to the specific tasks as per the link.
I have following two cases in keras API where in the first, I am using keras.layers.concatenate while in the other, I am not using any keras.layers.concatenate.
I am posting the codes as well as the models for each case as follows.
Case-1 code
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions1 = Dense(1, activation='sigmoid')(merged_vector)
predictions2 = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=[predictions1, predictions2])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
Case-1 Model
Case-2 code
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# And add a logistic regression on top
predictions1 = Dense(1, activation='sigmoid')(encoded_a )
predictions2 = Dense(1, activation='sigmoid')(encoded_b)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=[predictions1, predictions2])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
Case-2 Model
In both cases, the LSTMlayer is shared only. In case-1, we have keras.layers.concatenate but in case-2, we don't have any keras.layers.concatenate.
My question is, which one is multitasking, case-1 or case-2? Morover, what is the function of keras.layers.concatenate in case-1?
Both are multi-task models, as this only depends if there are multiple outputs with one task associated to each output.
The difference is that your first model explicitly concatenates features produced by the shared layer, so both output tasks can consider information from both inputs. The second model only has connections from one input directly to one of the outputs, without considering the other input. The only link between models here is that they share the LSTM weights.
Related
I am working on a project where I must use four different images as inputs. I take the inputs run them through a simple model and detect two classes.I am really struggling in the actual setup of the model.
Not sure if I am on the right track. I haven't been able to run the code since I am still unsure of the architecture of the model. The code below is my model how its setup currently. I have all the images already. I took one image and split it into four. Then using the four images, detect one of two classes. If this doesn't make sense or I am taking the wrong direction with this please help.'
# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras.layers import Flatten
from keras.layers import Input
from keras.models import Model
from keras.layers import Dense ,LSTM,concatenate,Input,Flatten
from keras.models import Sequential
from keras.preprocessing import image
from keras.layers import Dense, InputLayer, Conv2D, MaxPool2D, Flatten
import keras
# define four sets of inputs
inputA = Input(shape=(200, 200, 3))
inputB = Input(shape=(200, 200, 3))
inputC = Input(shape=(200, 200, 3))
inputD = Input(shape=(200, 200, 3))
# merge all input images
merged = keras.layers.Concatenate(axis=1)([inputA, inputB, inputC, inputD])
# the first branch operates on the first input through the fourth input
dense1 = keras.layers.Conv2D(16, (2, 2), activation='relu')(merged)
output = keras.layers.Conv2D(16, (2, 2), activation='relu')(dense1)
# apply a FC layer and then a regression prediction on the
# combined outputs
z = keras.layers.Dense(128, activation="relu")(output)
z = keras.layers.Dense(9, activation="softmax")(z)
# then output a single value then our model will accept the inputs of the two branches and
# model = Model(inputs=[tt.output, y.output, t.output, w.output], outputs=z)
model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])
model.summary()
I think the model is setup correctly to run using four images as inputs, but now I need to setup the classes. My idea with is , to set up a CNN that looks at four different pictures of the same tree. Using those four images to detect the type of tree. I want all four images to be for one tree. Then detect the tree. After detecting the tree moving onto the next four images for a different tree.
Thanks everyone I am very grateful for all your inputs and help.
I would like to use the first layers of a pre-trained model --say in Xception up and including the add_5 layer to extract features from an input. Then pass the output of the add_5 layer to a dense layer that will be trainable.
How can I implement this idea?
Generally you need to reuse layers from one model, to pass them as an input to the rest layers and to create a Model object with input and output of the combined model specified. For example alexnet.py from https://github.com/FHainzl/Visualizing_Understanding_CNN_Implementation.git.
They have
from keras.models import Model
from keras.layers.convolutional import Conv2D, MaxPooling2D, ZeroPadding2D
def alexnet_model():
inputs = Input(shape=(3, 227, 227))
conv_1 = Conv2D(96, 11, strides=4, activation='relu', name='conv_1')(inputs)
…
prediction = Activation("softmax", name="softmax")(dense_3)
m = Model(input=inputs, output=prediction)
return m
and then they take this returned model, the desired intermediate layer and make a model that returns this layer’s outputs:
def _sub_model(self):
highest_layer_name = 'conv_{}'.format(self.highest_layer_num)
highest_layer = self.base_model.get_layer(highest_layer_name)
return Model(inputs=self.base_model.input,
outputs=highest_layer.output)
You will need similar thing,
highest_layer = self.base_model.get_layer('add_5')
then continue it like
my_dense = Dense(... name=’my_dense’)(highest_layer.output)
…
and finish with
return Model(inputs=self.base_model.input,
outputs=my_prediction)
Since highest_layer is a layer (graph node), not a connection, returning result (graph arc), you’ll need to add .output to highest_layer.
Not sure how exactly to combine models if the upper one is also ready. Maybe something like
model_2_lowest_layer = model_2.get_layer(lowest_layer_name)
upper_part_model = Model(inputs= model_2_lowest_layer.input,
outputs=model_2.output)
upper_part = upper_part_model()(highest_layer.output)
return Model(inputs=self.base_model.input,
outputs=upper_part)
I've been following Towards Data Science's tutorial about word2vec and skip-gram models, but I stumbled upon a problem that I cannot solve, despite searching about it a lot and trying multiple unsuccessful solutions.
https://towardsdatascience.com/understanding-feature-engineering-part-4-deep-learning-methods-for-text-data-96c44370bbfa
The step that it shows you how to build the skip-gram model architecture seems deprecated because of the use of the Merge layer from keras.layers.
What I tried to do was translate his piece of code - which is implemented in the Sequential API of Keras - to the Functional API to solve the deprecation of the Merge layer, by replacing it with the keras.layers.Dot layer. However, I'm still stuck in this step of merging the two models (word and context) into the final model, whose architecture must be like this:
Here's the code that the author used:
from keras.layers import Merge
from keras.layers.core import Dense, Reshape
from keras.layers.embeddings import Embedding
from keras.models import Sequential
# build skip-gram architecture
word_model = Sequential()
word_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform",
input_length=1))
word_model.add(Reshape((embed_size, )))
context_model = Sequential()
context_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform",
input_length=1))
context_model.add(Reshape((embed_size,)))
model = Sequential()
model.add(Merge([word_model, context_model], mode="dot"))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
And here is my attempt to translate the Sequential code implementation into the Functional one:
from keras import models
from keras import layers
from keras import Input, Model
word_input = Input(shape=(1,))
word_x = layers.Embedding(vocab_size, embed_size, embeddings_initializer='glorot_uniform')(word_input)
word_reshape = layers.Reshape((embed_size,))(word_x)
word_model = Model(word_input, word_reshape)
context_input = Input(shape=(1,))
context_x = layers.Embedding(vocab_size, embed_size, embeddings_initializer='glorot_uniform')(context_input)
context_reshape = layers.Reshape((embed_size,))(context_x)
context_model = Model(context_input, context_reshape)
model_input = layers.dot([word_model, context_model], axes=1, normalize=False)
model_output = layers.Dense(1, kernel_initializer='glorot_uniform', activation='sigmoid')
model = Model(model_input, model_output)
However, when executed, the following error is returned:
ValueError: Layer dot_5 was called with an input that isn't a symbolic
tensor. Received type: . Full
input: [,
]. All inputs to
the layer should be tensors.
I'm a total beginner to the Functional API of Keras, I will be grateful if you could give me some guidance in this situation on how could I input the context and word models into the dot layer to achieve the architecture in the image.
You are passing Model instances to the layer, however as the error suggests you need to pass Keras Tensors (i.e. outputs of layers or models) to layers in Keras. You have two option here. One is to use the .output attribute of the Model instance like this:
dot_output = layers.dot([word_model.output, context_model.output], axes=1, normalize=False)
or equivalently, you can use the output tensors directly:
dot_output = layers.dot([word_reshape, context_reshape], axes=1, normalize=False)
Further, you need to apply the Dense layer which is followed on the dot_output and pass instances of Input layer as inputs of Model. Therefore:
model_output = layers.Dense(1, kernel_initializer='glorot_uniform',
activation='sigmoid')(dot_output)
model = Model([word_input, context_input], model_output)
I am coming from tensorflow learning more about keras and came across this notation. I looked in the documentation but couldn't find any examples. The syntax is when a function is followed with a variable in parenthesis.
model_input = Input(shape=input_shape)
z = model_input
z = Dropout(dropout_prob[0])(z) # Not sure what this means
The only idea I had is this may be a layer multiplication, but I am not sure thank you for your help.
It's part of the Sequential model in Keras; as it's stated in the doc here
A layer instance is callable (on a tensor), and it returns a tensor
Input tensor(s) and output tensor(s) can then be used to define a
Model
Such a model can be trained just like Keras Sequential models.
So following up your code (that is only a portion), first probably you imported
from keras.layers import Input, Dropout
Then in var "model_input" you return a tensor
model_input = Input(shape=input_shape)
And then a layer instance is callable on a tensor, and returns a tensor
z = model_input
z = Dropout(dropout_prob[0])(z) # This returns another tensor
After that, for example, you can follow with a model like this:
from keras.models import Model
model = Model(inputs=model_input, outputs=z)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels) # starts training
So now, it is easy to reuse trained models: you can treat any model as if it were a layer, by calling it on a tensor, like this:
x = Input(shape=(784,))
y = model(x)
I am trying to fine-tune a model using keras, according to this description: https://keras.io/applications/#inceptionv3
However, during training I discovered that the output of the network does not remain constant after training when using the same input (while all relevant layers were frozen), which I do not want.
I constructed the following toy example to investigate this:
import keras.applications.resnet50 as resnet50
from keras.layers import Dense, Flatten, Input
from keras.models import Model
from keras.utils import to_categorical
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
# data
i = np.random.rand(1,224,224,3)
X = np.random.rand(32,224,224,3)
y = to_categorical(np.random.randint(751, size=32), num_classes=751)
# model
base_model = resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224,224,3)))
layer = base_model.output
layer = Flatten(name='myflatten')(layer)
layer = Dense(751, activation='softmax', name='fc751')(layer)
model = Model(inputs=base_model.input, outputs=layer)
# freeze all layers
for layer in model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# features and predictions before training
feat0 = base_model.predict(i)
pred0 = model.predict(i)
weights0 = model.layers[-1].get_weights()
# before training output is consistent
feat00 = base_model.predict(i)
pred00 = model.predict(i)
print(np.allclose(feat0, feat00)) # True
print(np.allclose(pred0, pred00)) # True
# train
model.fit(X, y, batch_size=2, epochs=3, shuffle=False)
# features and predictions after training
feat1 = base_model.predict(i)
pred1 = model.predict(i)
weights1 = model.layers[-1].get_weights()
# these are not the same
print(np.allclose(feat0, feat1)) # False
# Optionally: printing shows they are in fact very different
# print(feat0)
# print(feat1)
# these are not the same
print(np.allclose(pred0, pred1)) # False
# Optionally: printing shows they are in fact very different
# print(pred0)
# print(pred1)
# these are the same and loss does not change during training
# so layers were actually frozen
print(np.allclose(weights0[0], weights1[0])) # True
# Check again if all layers were in fact untrainable
for layer in model.layers:
assert layer.trainable == False # All succeed
# Being overly cautious also checking base_model
for layer in base_model.layers:
assert layer.trainable == False # All succeed
Since I froze all layers i fully expect both the predictions and both the features to be equal, but surprisingly they aren't.
So probably I am making some kind of mistake, but I can't figure what.. Any suggestions would be greatly appreciated!
So the problem seems to be that the model uses batch normalization layers, which do update their internal state (i.e. their weights) based on the seen data during training. This even happens when their trainable flag have been set to False. And as their weights are thus updated, the output also changes. You can check this by using the code in the question and changing the following codelines:
This weights0 = model.layers[-1].get_weights()
to weights0 = model.layers[2].get_weights()
and this weights1 = model.layers[-1].get_weights()
to weights1 = model.layers[2].get_weights()
or the index of any other batch normalization layer.
Because then the following assertion will no longer hold:
print(np.allclose(weights0, weights1)) # Now this is False
As far as I am aware there is currently no solution for this yet..
See also my issue on Keras' Github page.
One more reason for unstable training could be since you are using a very small batch size, i.e., batch_size=2. At least, use batch_size=32. This value is too small for the batch normalization to compute reliably the estimation of the training distribution statistics (mean and variance). These mean and variance values are then used to normalize first the distribution and followed by learning of beta and gamma parameters (actual distribution).
Check the following links for more details:
In the introduction and related works, the authors criticized BatchNorm and do check figure 1: https://arxiv.org/pdf/1803.08494.pdf
Nice article on "Curse of Batch Norm": https://towardsdatascience.com/curse-of-batch-normalization-8e6dd20bc304