I've been following Towards Data Science's tutorial about word2vec and skip-gram models, but I stumbled upon a problem that I cannot solve, despite searching about it a lot and trying multiple unsuccessful solutions.
https://towardsdatascience.com/understanding-feature-engineering-part-4-deep-learning-methods-for-text-data-96c44370bbfa
The step that it shows you how to build the skip-gram model architecture seems deprecated because of the use of the Merge layer from keras.layers.
What I tried to do was translate his piece of code - which is implemented in the Sequential API of Keras - to the Functional API to solve the deprecation of the Merge layer, by replacing it with the keras.layers.Dot layer. However, I'm still stuck in this step of merging the two models (word and context) into the final model, whose architecture must be like this:
Here's the code that the author used:
from keras.layers import Merge
from keras.layers.core import Dense, Reshape
from keras.layers.embeddings import Embedding
from keras.models import Sequential
# build skip-gram architecture
word_model = Sequential()
word_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform",
input_length=1))
word_model.add(Reshape((embed_size, )))
context_model = Sequential()
context_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform",
input_length=1))
context_model.add(Reshape((embed_size,)))
model = Sequential()
model.add(Merge([word_model, context_model], mode="dot"))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
And here is my attempt to translate the Sequential code implementation into the Functional one:
from keras import models
from keras import layers
from keras import Input, Model
word_input = Input(shape=(1,))
word_x = layers.Embedding(vocab_size, embed_size, embeddings_initializer='glorot_uniform')(word_input)
word_reshape = layers.Reshape((embed_size,))(word_x)
word_model = Model(word_input, word_reshape)
context_input = Input(shape=(1,))
context_x = layers.Embedding(vocab_size, embed_size, embeddings_initializer='glorot_uniform')(context_input)
context_reshape = layers.Reshape((embed_size,))(context_x)
context_model = Model(context_input, context_reshape)
model_input = layers.dot([word_model, context_model], axes=1, normalize=False)
model_output = layers.Dense(1, kernel_initializer='glorot_uniform', activation='sigmoid')
model = Model(model_input, model_output)
However, when executed, the following error is returned:
ValueError: Layer dot_5 was called with an input that isn't a symbolic
tensor. Received type: . Full
input: [,
]. All inputs to
the layer should be tensors.
I'm a total beginner to the Functional API of Keras, I will be grateful if you could give me some guidance in this situation on how could I input the context and word models into the dot layer to achieve the architecture in the image.
You are passing Model instances to the layer, however as the error suggests you need to pass Keras Tensors (i.e. outputs of layers or models) to layers in Keras. You have two option here. One is to use the .output attribute of the Model instance like this:
dot_output = layers.dot([word_model.output, context_model.output], axes=1, normalize=False)
or equivalently, you can use the output tensors directly:
dot_output = layers.dot([word_reshape, context_reshape], axes=1, normalize=False)
Further, you need to apply the Dense layer which is followed on the dot_output and pass instances of Input layer as inputs of Model. Therefore:
model_output = layers.Dense(1, kernel_initializer='glorot_uniform',
activation='sigmoid')(dot_output)
model = Model([word_input, context_input], model_output)
Related
Is it possible to access pre-activation tensors in a Keras Model? For example, given this model:
import tensorflow as tf
image_ = tf.keras.Input(shape=[224, 224, 3], batch_size=1)
vgg19 = tf.keras.applications.VGG19(include_top=False, weights='imagenet', input_tensor=image_, input_shape=image_.shape[1:], pooling=None)
the usual way to access layers is:
intermediate_layer_model = tf.keras.models.Model(inputs=image_, outputs=[vgg19.get_layer('block1_conv2').output])
intermediate_layer_model.summary()
This gives the ReLU outputs for a layer, while I would like the ReLU inputs. I tried doing this:
graph = tf.function(vgg19, [tf.TensorSpec.from_tensor(image_)]).get_concrete_function().graph
outputs = [graph.get_tensor_by_name(tname) for tname in [
'vgg19/block4_conv3/BiasAdd:0',
'vgg19/block4_conv4/BiasAdd:0',
'vgg19/block5_conv1/BiasAdd:0'
]]
intermediate_layer_model = tf.keras.models.Model(inputs=image_, outputs=outputs)
intermediate_layer_model.summary()
but I get the error
ValueError: Unknown graph. Aborting.
The only workaround I've found is to edit the model file to manually expose the intermediates, turning every layer like this:
x = layers.Conv2D(256, (3, 3), activation="relu", padding="same", name="block3_conv1")(x)
into 2 layers where the 1st one can be accessed before activations:
x = layers.Conv2D(256, (3, 3), activation=None, padding="same", name="block3_conv1")(x)
x = layers.ReLU(name="block3_conv1_relu")(x)
Is there a way to acces pre-activation tensors in a Model without essentially editing Tensorflow 2 source code, or reverting to Tensorflow 1 which had full flexibility accessing intermediates?
There is a way to access pre-activation layers for pretrained Keras models using TF version 2.7.0. Here's how to access two intermediate pre-activation outputs from VGG19 in a single forward pass.
Initialize VGG19 model. We can omit top layers to avoid loading unnecessary parameters into memory.
vgg19 = tf.keras.applications.VGG19(
include_top=False,
weights="imagenet"
)
This is the important part: Create a deepcopy of the intermediate layer form which you like to have the features, change the activation of the conv layers to linear (i.e. no activation), rename the layer (otherwise two layers in the model will have the same name which will raise errors) and finally pass the output of the previous through the copied conv layer.
# for more intermediate features wrap a loop around it to avoid copy paste
b5c4_layer = deepcopy(vgg19.get_layer("block5_conv4"))
b5c4_layer.activation = tf.keras.activations.linear
b5c4_layer._name = b5c4_layer.name + str("_preact")
b5c4_preact_output = b5c4_layer(vgg19.get_layer("block5_conv3").output)
b2c2_layer = deepcopy(vgg19.get_layer("block2_conv2"))
b2c2_layer.activation = tf.keras.activations.linear
b2c2_layer._name = b2c2_layer.name + str("_preact")
b2c2_preact_output = b2c2_layer(vgg19.get_layer("block2_conv1").output)
Finally, get the outputs and check if they equal post-activation outputs when we apply ReLU-activation.
vgg19_features = Model(vgg19.input, [b2c2_preact_output, b5c4_preact_output])
vgg19_features_control = Model(vgg19.input, [vgg19.get_layer("block2_conv2").output, vgg19.get_layer("block5_conv4").output])
b2c2_preact, b5c4_preact = vgg19_features(tf.keras.applications.vgg19.preprocess_input(img))
b2c2, b5c4 = vgg19_features_control(tf.keras.applications.vgg19.preprocess_input(img))
print(np.allclose(tf.keras.activations.relu(b2c2_preact).numpy(),b2c2.numpy()))
print(np.allclose(tf.keras.activations.relu(b5c4_preact).numpy(),b5c4.numpy()))
True
True
Here's a visualization similar to Fig. 6 of Wang et al. to see the effect in the feature space.
Input image
To get output of each layer. You have to define a keras function and evaluate it for each layer.
Please refer the code as shown below
from tensorflow.keras import backend as K
inp = model.input # input
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
For more details on this please refer SO Answer.
I would like to use the first layers of a pre-trained model --say in Xception up and including the add_5 layer to extract features from an input. Then pass the output of the add_5 layer to a dense layer that will be trainable.
How can I implement this idea?
Generally you need to reuse layers from one model, to pass them as an input to the rest layers and to create a Model object with input and output of the combined model specified. For example alexnet.py from https://github.com/FHainzl/Visualizing_Understanding_CNN_Implementation.git.
They have
from keras.models import Model
from keras.layers.convolutional import Conv2D, MaxPooling2D, ZeroPadding2D
def alexnet_model():
inputs = Input(shape=(3, 227, 227))
conv_1 = Conv2D(96, 11, strides=4, activation='relu', name='conv_1')(inputs)
…
prediction = Activation("softmax", name="softmax")(dense_3)
m = Model(input=inputs, output=prediction)
return m
and then they take this returned model, the desired intermediate layer and make a model that returns this layer’s outputs:
def _sub_model(self):
highest_layer_name = 'conv_{}'.format(self.highest_layer_num)
highest_layer = self.base_model.get_layer(highest_layer_name)
return Model(inputs=self.base_model.input,
outputs=highest_layer.output)
You will need similar thing,
highest_layer = self.base_model.get_layer('add_5')
then continue it like
my_dense = Dense(... name=’my_dense’)(highest_layer.output)
…
and finish with
return Model(inputs=self.base_model.input,
outputs=my_prediction)
Since highest_layer is a layer (graph node), not a connection, returning result (graph arc), you’ll need to add .output to highest_layer.
Not sure how exactly to combine models if the upper one is also ready. Maybe something like
model_2_lowest_layer = model_2.get_layer(lowest_layer_name)
upper_part_model = Model(inputs= model_2_lowest_layer.input,
outputs=model_2.output)
upper_part = upper_part_model()(highest_layer.output)
return Model(inputs=self.base_model.input,
outputs=upper_part)
I am implementing a simple multitask model in Keras. I used the code given in the documentation under the heading of shared layers.
I know that in multitask learning, we share some of the initial layers in our model and the final layers are made individual to the specific tasks as per the link.
I have following two cases in keras API where in the first, I am using keras.layers.concatenate while in the other, I am not using any keras.layers.concatenate.
I am posting the codes as well as the models for each case as follows.
Case-1 code
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions1 = Dense(1, activation='sigmoid')(merged_vector)
predictions2 = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=[predictions1, predictions2])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
Case-1 Model
Case-2 code
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# And add a logistic regression on top
predictions1 = Dense(1, activation='sigmoid')(encoded_a )
predictions2 = Dense(1, activation='sigmoid')(encoded_b)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=[predictions1, predictions2])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
Case-2 Model
In both cases, the LSTMlayer is shared only. In case-1, we have keras.layers.concatenate but in case-2, we don't have any keras.layers.concatenate.
My question is, which one is multitasking, case-1 or case-2? Morover, what is the function of keras.layers.concatenate in case-1?
Both are multi-task models, as this only depends if there are multiple outputs with one task associated to each output.
The difference is that your first model explicitly concatenates features produced by the shared layer, so both output tasks can consider information from both inputs. The second model only has connections from one input directly to one of the outputs, without considering the other input. The only link between models here is that they share the LSTM weights.
I'm kind of a newbie to tensorflow and building neural networks.
I'm trying to make a neural network with the tf.keras api that will take a single input, and give 3 outputs. Here is my code:
import os
import tensorflow as tf
from tensorflow import keras
import numpy as np
train_times = np.array([[1],[2],[3],[4],[5],[6],[7],[8]])
train_sensors = np.array([[0.1,0.15,0.2],[0.25,0.3,0.35],[0.4,0.45,0.5],[0.55,0.6,0.65],[0.7,0.75,0.8],[0.85,0.9,0.95],[0.05,0.33,0.56],[0.8,0.35,0.9]])
test_times = np.array([[1],[2],[3],[4],[5],[6],[7],[8]])
test_sensors = np.array([[0.1,0.15,0.2],[0.25,0.3,0.35],[0.4,0.45,0.5],[0.55,0.6,0.65],[0.7,0.75,0.8],[0.85,0.9,0.95],[0.05,0.33,0.56],[0.8,0.35,0.9]])
print(train_sensors[0].shape)
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(5, activation=tf.nn.relu, input_shape=(1,), name="Input"),
keras.layers.Dense(10,activation=tf.nn.relu, name="Middle"),
keras.layers.Dropout(0.2),
keras.layers.Dense(3, activation=tf.nn.softmax, name="Out")
])
model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
return model
model = create_model()
model.summary()
checkpoint_path = "sensor_predict.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,save_weights_only=True,verbose=1)
model.fit(x=train_times, y=train_sensors,epochs = 10,validation_data = (test_sensors, test_times), callbacks = [cp_callback])
I have specified that the last layer should have three outputs, but I get this error every time I run it:
ValueError: Error when checking target: expected Out to have shape (1,) but got array with shape (3,)
I can't figure out why it seems to think I want a single output from the network.
NOTE: The dataset I am using is not what I will actually use. I'm just trying to get a functional network, and then I'll generate the data later.
Your loss function (tf.keras.losses.sparse_categorical_crossentropy) is expecting the training vector to be one hot encoded. Change it to tf.keras.losses.mse, for example, and I think it will work.
See tensorflow docs for the definition.
I am coming from tensorflow learning more about keras and came across this notation. I looked in the documentation but couldn't find any examples. The syntax is when a function is followed with a variable in parenthesis.
model_input = Input(shape=input_shape)
z = model_input
z = Dropout(dropout_prob[0])(z) # Not sure what this means
The only idea I had is this may be a layer multiplication, but I am not sure thank you for your help.
It's part of the Sequential model in Keras; as it's stated in the doc here
A layer instance is callable (on a tensor), and it returns a tensor
Input tensor(s) and output tensor(s) can then be used to define a
Model
Such a model can be trained just like Keras Sequential models.
So following up your code (that is only a portion), first probably you imported
from keras.layers import Input, Dropout
Then in var "model_input" you return a tensor
model_input = Input(shape=input_shape)
And then a layer instance is callable on a tensor, and returns a tensor
z = model_input
z = Dropout(dropout_prob[0])(z) # This returns another tensor
After that, for example, you can follow with a model like this:
from keras.models import Model
model = Model(inputs=model_input, outputs=z)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels) # starts training
So now, it is easy to reuse trained models: you can treat any model as if it were a layer, by calling it on a tensor, like this:
x = Input(shape=(784,))
y = model(x)