I am trying to implement ADDA in Keras. Here is my code :
class ADDA_Images(object):
def __init__(self,modelInput):
self.img_rows = 28
self.img_cols = 28
self.channels = 3
self.img_shape = (self.img_rows, self.img_cols, self.channels)
optimizer = opt.Adam(0.001)
self.source_generator = self.build_generator(modelInput)
self.target_generator = self.build_generator(modelInput)
outputFeatureExtraction = layers.Input(shape = self.target_generator.output_shape[1:])
self.source_classificator = self.build_classifier(outputFeatureExtraction)
self.discriminator_model = self.build_discriminator(outputFeatureExtraction)
self.discriminator_model.compile(optimizer, loss='binary_crossentropy', metrics=['acc'])
self.discriminator_model.name='disk'
input = layers.Input(shape=self.img_shape)
fe_rep = self.source_generator(input)
cl = self.source_classificator(fe_rep)
self.source_model = Model(input,cl)
self.source_model.compile(optimizer, loss='categorical_crossentropy', metrics=['acc'])
input = layers.Input(shape=self.img_shape)
fe_rep = self.target_generator(input)
cl = self.source_classificator(fe_rep)
self.target_model = Model(input, cl)
self.target_model.compile(optimizer, loss='categorical_crossentropy', metrics=['acc'])
self.combined_model = Sequential()
self.combined_model.add(self.target_generator)
self.combined_model.add(self.discriminator_model)
self.combined_model.get_layer('disk').trainable = False
self.combined_model.compile(optimizer, loss='binary_crossentropy', metrics=['acc'])
print('Source model')
self.source_model.summary()
print('Target model')
self.target_model.summary()
print('Discriminator')
self.discriminator_model.summary()
print('Combined model')
self.combined_model.summary()
def build_generator(self,modelInput):
gen = layers.Conv2D(filters=20, kernel_size=5, padding='valid')(modelInput)
gen = layers.MaxPooling2D(pool_size=2, strides=2)(gen)
gen = layers.Conv2D(filters=50, kernel_size=5, padding='valid')(gen)
gen = layers.MaxPooling2D(pool_size=2, strides=2)(gen)
gen = layers.Flatten()(gen)
model = Model(modelInput,gen)
print('Generator summary')
model.summary()
return model
def build_classifier(self,modelInput):
cl = layers.Dense(3072, activation='relu')(modelInput)
cl = layers.Dense(2048, activation='relu')(cl)
cl = layers.Dense(10, activation='softmax')(cl)
model = Model(modelInput,cl)
print('Classificatior summary')
model.summary()
return model
def build_discriminator(self,modelInput):
disc = layers.Dense(500, activation='relu')(modelInput)
disc = layers.Dense(500, activation='relu')(disc)
disc = layers.Dense(2, activation='softmax')(disc)
model = Model(modelInput,disc)
print('Discriminator summary')
model.summary()
return model
But, it seems that target_generator is not connected to target model. I loaded target model from pretrained source model and then train discriminator and combined model in ADDA way. But, target model is not changed. It has same predictions (accs and losses) as source model all the time.
Here is summary of models :
Source model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 28, 28, 3) 0
_________________________________________________________________
model_1 (Model) (None, 800) 26570
_________________________________________________________________
model_3 (Model) (None, 10) 8774666
=================================================================
Total params: 8,801,236
Trainable params: 8,801,236
Non-trainable params: 0
_________________________________________________________________
Target model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 28, 28, 3) 0
_________________________________________________________________
model_2 (Model) (None, 800) 26570
_________________________________________________________________
model_3 (Model) (None, 10) 8774666
=================================================================
Total params: 8,801,236
Trainable params: 8,801,236
Non-trainable params: 0
_________________________________________________________________
Discriminator
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 800) 0
_________________________________________________________________
dense_4 (Dense) (None, 500) 400500
_________________________________________________________________
dense_5 (Dense) (None, 500) 250500
_________________________________________________________________
dense_6 (Dense) (None, 2) 1002
=================================================================
Total params: 1,304,004
Trainable params: 652,002
Non-trainable params: 652,002
_________________________________________________________________
Combined model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
model_2 (Model) (None, 800) 26570
_________________________________________________________________
disk (Model) (None, 2) 652002
=================================================================
Total params: 678,572
Trainable params: 26,570
Non-trainable params: 652,002
I validated outputs from target_model's second layer (it should be target_generator by specification) and it is not same as output of target_generator (on same input). So, it seems that those two models are not connected as reported in summaries.
Can someone help me to figure out what is wrong?
I am using Keras 2, Tensorflow backend.
Problem was in the training part - I loaded into the target model pretrained source model (load_model) and that made problems because it changed reference to generator model. Instead of load_model, I should use load_weights
So, loading pretrained model which works and not make problems with references is :
source_model = load_model(modelName)
target_model.set_weights(source_model.get_weights())
Related
I have a batch of images thus the shape [None, 256, 256, 3] (the batch is set to none for practical purposes on use).
I am trying to implement a layer that calculates the average of each of the of images or frames in the batch to result the shape [None, 1] or [None, 1, 1, 1]. I have checked to use tf.keras.layers.Average, but apparently it calculates across the batch, returning a tensor of the same shape.
In hindsight I tried implementing the following custom layer:
class ElementMean(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(ElementMean, self).__init__(**kwargs)
def call(self, inputs):
tensors = []
for ii in range(inputs.shape[0] if inputs.shape[0] is not None else 1):
tensors.append(inputs[ii, ...])
return tf.keras.layers.Average()(tensors)
but when it is used:
import tensorflow as tf
x = tf.keras.Input([256, 256, 3], None)
y = ElementMean()(x)
model = tf.keras.Model(inputs=x, outputs=y)
model.compile()
model.summary()
tf.keras.utils.plot_model(
model,
show_shapes=True,
show_dtype=True,
show_layer_activations=True,
show_layer_names=True
)
I get the result:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 256, 256, 3)] 0
element_mean (ElementMean) (256, 256, 3) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
Which makes it entirely wrong.
I also tried this change on the call:
def call(self, inputs):
tensors = []
for ii in range(inputs.shape[0] if inputs.shape[0] is not None else 1):
tensors.append(tf.reduce_mean(inputs[ii, ...]))
return tf.convert_to_tensor(tensors)
Which in turn results to:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 256, 256, 3)] 0
element_mean (ElementMean) (1,) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
Which is also wrong.
You can play around with the axes like this:
import tensorflow as tf
class ElementMean(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(ElementMean, self).__init__(**kwargs)
def call(self, inputs):
return tf.reduce_mean(inputs, axis=(1, 2, 3), keepdims=True)
x = tf.keras.layers.Input([256, 256, 3], None)
em = ElementMean()
y = em(x)
model = tf.keras.Model(x, y)
model.summary()
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 256, 256, 3)] 0
element_mean_1 (ElementMean (None, 1, 1, 1) 0
)
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
there is another way with segment means that allowed you to segment by heights, widths, and channels by remain its properties.
Sample: Width x Height x Channels, mean of each channel represent its data as mean value and you may summarize them later.
import os
from os.path import exists
import tensorflow as tf
import tensorflow_io as tfio
import matplotlib.pyplot as plt
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
PATH = os.path.join('F:\\datasets\\downloads\\Actors\\train\\Pikaploy', '*.tif')
files = tf.data.Dataset.list_files(PATH)
list_file = []
for file in files.take(1):
image = tf.io.read_file( file )
image = tfio.experimental.image.decode_tiff(image, index=0)
image = tf.image.resize(image, [28,32], method='nearest')
list_file.append( image )
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Definitions
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs])
def call(self, inputs):
temp = tf.transpose( tf.constant(tf.cast(list_file, dtype=tf.int64), shape=(28, 32, 4), dtype=tf.int64) )
temp = tf.transpose( temp )
mean = tf.constant( tf.math.segment_mean( temp, tf.ones([28], dtype=tf.int64)).numpy() )
temp = tf.image.rot90(temp)
mean = tf.constant( tf.math.segment_mean( tf.constant(mean[1::], shape=(32, 4)), tf.ones([32], dtype=tf.int64)).numpy() )
return mean[1::]
layer = MyDenseLayer(10)
sample = tf.transpose( tf.constant(tf.cast(list_file, dtype=tf.int64), shape=(28, 32, 4), dtype=tf.int64) )
data = layer(sample)
print( data )
Output: Rx Gx Bx Yx
tf.Tensor([[161 166 171 255]], shape=(1, 4), dtype=int64)
can anyone help me ..where is the error here?? can anyone help me please. i am trying to train a model for image captioning in persian language.i try to build my model .. below is the code and the model summary.....................
`embeddings_dim = 256
input_image_dim = X_train_image.shape[1]
keras.backend.clear_session()
# image model
input_image = keras.layers.Input(shape=(input_image_dim,), name='input_image')
#input_image_dropout = keras.layers.Dropout(0.4)(input_image)
image_embeddings = keras.layers.Dense(embeddings_dim, activation='tanh', name='image_embeddings') (input_image)
# text model
# Set up the decoder, using `image_embeddings` as initial state.
decoder_inputs = keras.layers.Input(shape=(max_len,))
embeddings = keras.layers.Embedding(len(word2index), embeddings_dim, mask_zero=True)(decoder_inputs)
#embeddings_dropout = keras.layers.Dropout(0.3)(embeddings)
gru = keras.layers.GRU(embeddings_dim)(embeddings, initial_state=image_embeddings) # , return_sequences=True
#flat = keras.layers.Flatten()(gru)
dense = keras.layers.Dense(embeddings_dim, activation='relu')(gru)
#dense_dropout = keras.layers.Dropout(0.3)(dense)
decoder_outputs = keras.layers.Dense(len(word2index), activation='softmax')(dense)
seq2seq = keras.Model([input_image, decoder_inputs], decoder_outputs)
seq2seq.summary()
# prepare callback
#histories = My_Callback()
model_checkpoint_path = 'models/model.{epoch:02d}-{val_loss:.3f}--{b1:.3f}.hdf5'
checkpoint_callback = keras.callbacks.ModelCheckpoint(model_checkpoint_path, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)
callbacks = [TQDMNotebookCallback(), Bleu_Callback(), checkpoint_callback] #[checkpoint_callback, TQDMNotebookCallback(), My_Callback()]
seq2seq.compile(optimizer=keras.optimizers.Adam(), loss='categorical_crossentropy')
seq2seq.fit([X_train_image, X_train_text], y_train_text,
validation_data=([X_test_image, X_test_text], y_test_text),
batch_size=1024,
epochs=10,
verbose=2,
callbacks=callbacks) # add My_Callback() to callbacks to calculate & display BLEU score after each epoch`
this is the error after running my code .. gives me KeyError: 'metrics'... thanks
` Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 20)] 0 []
input_image (InputLayer) [(None, 1000)] 0 []
embedding (Embedding) (None, 20, 256) 730112 ['input_1[0][0]']
image_embeddings (Dense) (None, 256) 256256 ['input_image[0][0]']
gru (GRU) (None, 256) 394752 ['embedding[0][0]',
'image_embeddings[0][0]']
dense (Dense) (None, 256) 65792 ['gru[0][0]']
dense_1 (Dense) (None, 2852) 732964 ['dense[0][0]']
==================================================================================================
Total params: 2,179,876
Trainable params: 2,179,876
Non-trainable params: 0
__________________________________________________________________________________________________
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
Training: 0%
0/20 [00:00<?, ?it/s]
Epoch 0: 0%
0/237 [00:00<?, ?it/s]
Epoch 1/20
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-32-0fc16141d8f6> in <module>
32 seq2seq.compile(optimizer=keras.optimizers.Adam(), loss='categorical_crossentropy')
33
---> 34 seq2seq.fit([X_train_image, X_train_text], y_train_text,
35 validation_data=([X_test_image, X_test_text], y_test_text),
36 batch_size=1024,
~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~\anaconda3\lib\site-packages\keras_tqdm\tqdm_callback.py in append_logs(self, logs)
134
135 def append_logs(self, logs):
--> 136 metrics = self.params['metrics']
137 for metric, value in six.iteritems(logs):
138 if metric in metrics:
KeyError: 'metrics'
`
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class KerasSupervisedModelWrapper(keras.Model):
def __init__(self, batch_size, **kwargs):
super().__init__()
self.batch_size = batch_size
def summary(self, input_shape): # temporary fix for a bug
x = layers.Input(shape=input_shape)
model = keras.Model(inputs=[x], outputs=self.call(x))
return model.summary()
class ExampleModel(KerasSupervisedModelWrapper):
def __init__(self, batch_size):
super().__init__(batch_size)
self.conv1 = layers.Conv2D(32, kernel_size=(3, 3), activation='relu')
def call(self, x):
x = self.conv1(x)
return x
model = MyModel(15)
model.summary([28, 28, 1])
output:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 28, 28, 1)] 0
conv2d_2 (Conv2D) (None, 26, 26, 32) 320
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________
I'm writting a wrapper for keras model to pre-define some useful method and variables as above.
And I'd like to modify the wrapper to get some layers to compose model as the keras.Sequential does.
Therefore, I added Sequential method that assigns new call method as below.
class KerasSupervisedModelWrapper(keras.Model):
...(continue)...
#staticmethod
def Sequential(layers, **kwargs):
model = KerasSupervisedModelWrapper(**kwargs)
pipe = keras.Sequential(layers)
def call(self, x):
return pipe(x)
model.call = call
return model
However, it seems not working as I intended. Instead, it shows below error message.
model = KerasSupervisedModelWrapper.Sequential([
layers.Conv2D(32, kernel_size=(3, 3), activation="relu")
], batch_size=15)
model.summary((28, 28, 1))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_91471/2826773946.py in <module>
1 # model.build((None, 28, 28, 1))
2 # model.compile('adam', loss=keras.losses.SparseCategoricalCrossentropy(), metrics=['accuracy'])
----> 3 model.summary((28, 28, 1))
/tmp/ipykernel_91471/3696340317.py in summary(self, input_shape)
10 def summary(self, input_shape): # temporary fix for a bug
11 x = layers.Input(shape=input_shape)
---> 12 model = keras.Model(inputs=[x], outputs=self.call(x))
13 return model.summary()
14
TypeError: call() missing 1 required positional argument: 'x'
What can I do for the wrapper to get keras.Sequential model while usuing other properties?
You could try something like this:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class KerasSupervisedModelWrapper(keras.Model):
def __init__(self, batch_size, **kwargs):
super().__init__()
self.batch_size = batch_size
def summary(self, input_shape): # temporary fix for a bug
x = layers.Input(shape=input_shape)
model = keras.Model(inputs=[x], outputs=self.call(x))
return model.summary()
#staticmethod
def Sequential(layers, **kwargs):
model = KerasSupervisedModelWrapper(**kwargs)
pipe = keras.Sequential(layers)
model.call = pipe
return model
class ExampleModel(KerasSupervisedModelWrapper):
def __init__(self, batch_size):
super().__init__(batch_size)
self.conv1 = layers.Conv2D(32, kernel_size=(3, 3), activation='relu')
def call(self, x):
x = self.conv1(x)
return x
model = ExampleModel(15)
model.summary([28, 28, 1])
model = KerasSupervisedModelWrapper.Sequential([
layers.Conv2D(32, kernel_size=(3, 3), activation="relu")
], batch_size=15)
model.summary((28, 28, 1))
print(model(tf.random.normal((1, 28, 28, 1))).shape)
Model: "model_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_14 (InputLayer) [(None, 28, 28, 1)] 0
conv2d_17 (Conv2D) (None, 26, 26, 32) 320
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________
Model: "model_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) [(None, 28, 28, 1)] 0
sequential_8 (Sequential) (None, 26, 26, 32) 320
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________
(1, 26, 26, 32)
I've defined a complex deep learning model, but for the purpose of this question, I'll use a simple one.
Consider the following:
import tensorflow as tf
from tensorflow.keras import layers, models
def simpleMLP(in_size, hidden_sizes, num_classes, dropout_prob=0.5):
in_x = layers.Input(shape=(in_size,))
hidden_x = models.Sequential(name="hidden_layers")
for i, num_h in enumerate(hidden_sizes):
hidden_x.add(layers.Dense(num_h, input_shape=(in_size,) if i == 0 else []))
hidden_x.add(layers.Activation('relu'))
hidden_x.add(layers.Dropout(dropout_prob))
out_x = layers.Dense(num_classes, activation='softmax', name='baseline')
return models.Model(inputs=in_x, outputs=out_x(hidden_x(in_x)))
I will call the function in the following manner:
mdl = simpleMLP(28*28, [500, 300], 10)
Now when I do mdl.summary() I get the following:
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 784)] 0
_________________________________________________________________
hidden_layers (Sequential) (None, 300) 542800
_________________________________________________________________
baseline (Dense) (None, 10) 3010
=================================================================
Total params: 545,810
Trainable params: 545,810
Non-trainable params: 0
_________________________________________________________________
The problem is that the Sequential block is condensed and showing only the last layer but the sum total of parameters.
In my complex model, I have multiple Sequential blocks that are all hidden.
Is there a way to make it be more verbose? Am I doing something wrong in the model definition?
Edit
When using pytorch I don't see the same behaviour, given the following example (taken from here):
import torch
import torch.nn as nn
class MyCNNClassifier(nn.Module):
def __init__(self, in_c, n_classes):
super().__init__()
self.conv_block1 = nn.Sequential(
nn.Conv2d(in_c, 32, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(32),
nn.ReLU()
)
self.conv_block2 = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(64),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.Linear(32 * 28 * 28, 1024),
nn.Sigmoid(),
nn.Linear(1024, n_classes)
)
def forward(self, x):
x = self.conv_block1(x)
x = self.conv_block2(x)
x = x.view(x.size(0), -1) # flat
x = self.decoder(x)
return x
When printing it I get:
MyCNNClassifier(
(conv_block1): Sequential(
(0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv_block2): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(decoder): Sequential(
(0): Linear(in_features=25088, out_features=1024, bias=True)
(1): Sigmoid()
(2): Linear(in_features=1024, out_features=10, bias=True)
)
)
There is nothing wrong in model summary in Tensorflow 2.x.
import tensorflow as tf
from tensorflow.keras import layers, models
def simpleMLP(in_size, hidden_sizes, num_classes, dropout_prob=0.5):
in_x = layers.Input(shape=(in_size,))
hidden_x = models.Sequential(name="hidden_layers")
for i, num_h in enumerate(hidden_sizes):
hidden_x.add(layers.Dense(num_h, input_shape=(in_size,) if i == 0 else []))
hidden_x.add(layers.Activation('relu'))
hidden_x.add(layers.Dropout(dropout_prob))
out_x = layers.Dense(num_classes, activation='softmax', name='baseline')
return models.Model(inputs=in_x, outputs=out_x(hidden_x(in_x)))
mdl = simpleMLP(28*28, [500, 300], 10)
mdl.summary()
Output:
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 784)] 0
_________________________________________________________________
hidden_layers (Sequential) (None, 300) 542800
_________________________________________________________________
baseline (Dense) (None, 10) 3010
=================================================================
Total params: 545,810
Trainable params: 545,810
Non-trainable params: 0
_________________________________________________________________
You can use get_layer to retrieve a layer on either its name or index.
If name and index are both provided, index will take precedence.
Indices are based on order of horizontal graph traversal (bottom-up).
Here to get Sequential layer (i.e. indexed at 1 in mdl) details, you can try
mdl.get_layer(index=1).summary()
Output:
Model: "hidden_layers"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 500) 392500
_________________________________________________________________
activation_2 (Activation) (None, 500) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 500) 0
_________________________________________________________________
dense_3 (Dense) (None, 300) 150300
_________________________________________________________________
activation_3 (Activation) (None, 300) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 300) 0
=================================================================
Total params: 542,800
Trainable params: 542,800
Non-trainable params: 0
_________________________________________________________________
I'm constructing an encoder-decoder using a BLSTM to model word inflection generation.
I'm not sure why I am getting the titular error message at the model.fit step. I am passing in a matrix of integer-encoded word vectors, but I was under the impression that my input would be converted to three dimensions when passed through the Embedding layer.
encoder_inputs = Input(shape=(enc_len,))
encoder_embedding = Embedding(vocab_size, 100, mask_zero=True)(encoder_inputs)
encoder_outputs = Bidirectional(LSTM(100))(encoder_embedding)
e = Dense(200)(encoder_outputs)
e = RepeatVector(35)(e)
decoder_inputs_lemm = Input(shape=(dec_len,))
decoder_inputs_infl = Input(shape=(dec_len,))
embedding_layer = Embedding(vocab_size, 100) # shared weights
decoder_embedding_lemm = embedding_layer(decoder_inputs_lemm)
decoder_embedding_infl = embedding_layer(decoder_inputs_infl)
concat = Concatenate()([decoder_embedding_lemm, decoder_embedding_infl, e])
decoder_outputs = LSTM(100, return_sequences=True)(concat)
decoder_outputs = TimeDistributed(Dense(dec_len, activation='softmax'))(decoder_outputs)
# prepare input data
enc_lemma = pad_sequences([x[0] for x in data['train']], enc_len, padding='pre')
dec_lemma = pad_sequences([x[0] for x in data['train']], dec_len, padding='post')
dec_infl_shifted = pad_sequences([x[1] for x in data['train']], enc_len, padding='post')
dec_infl_shifted = np.hstack((np.full((dec_infl_shifted.shape[0], 1), 2), dec_infl_shifted))
dec_infl_target = pad_sequences([x[1] for x in data['train']], enc_len, padding='post') # not shifted
dec_infl_target = np.hstack((dec_infl_target, np.full((dec_infl_target.shape[0], 1), 0)))
model = Model([encoder_inputs, decoder_inputs_lemm, decoder_inputs_infl], decoder_outputs)
model.compile(optimizer='adadelta', loss='categorical_crossentropy')
model.fit([enc_lemma, dec_lemma, dec_infl_shifted], dec_infl_target, epochs=30, verbose=1)
Here is the summary:
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 34) 0
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 34, 100) 6300 input_1[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 200) 160800 embedding_1[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 35) 0
__________________________________________________________________________________________________
input_3 (InputLayer) (None, 35) 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, 200) 40200 bidirectional_1[0][0]
__________________________________________________________________________________________________
embedding_2 (Embedding) (None, 35, 100) 6300 input_2[0][0]
input_3[0][0]
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector) (None, 35, 200) 0 dense_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 35, 400) 0 embedding_2[0][0]
embedding_2[1][0]
repeat_vector_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) (None, 35, 100) 200400 concatenate_1[0][0]
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, 35, 35) 3535 lstm_2[0][0]
==================================================================================================
Total params: 417,535
Trainable params: 417,535
Non-trainable params: 0
__________________________________________________________________________________________________
None