How to apply Attention layer to LSTM model

How to apply Attention layer to LSTM model - python

I am doing a speech emotion recognition machine training.
I wish to apply an attention layer to the model. The instruction page is hard to understand.
def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.95):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)
model.compile(loss='binary_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, yhat
How can I apply it to fit for my model?
And are use_scale, causal and dropout all the arguments?
If there is a dropout in attention layer, how do we deal with it since we have dropout in LSTM layer?

Attention can be interpreted as a soft vector retrieval.
You have some query vectors. For each query, you want to retrieve some
values, such that you compute a weighted of them,
where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).
In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.
In classification task, you do not have such an explicit query. The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper).
If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.

Related

why my loss and accuracy plots are slightly shaky?

I built a Bi-LSTM model, which tries to predict certain categories based on a given word. For example, the word "smile" should be predicted by "friendly".
However, after training, the model with 100 samples per 10 categories (1000 in total), at the time of plotting the accuracy and loss, these two are slightly shaky continuously. Why does this occur? Increasing the number of samples causes underfitting.
Model
def build_model(vocab_size, embedding_dim=64, input_length=30):
print('\nbuilding the model...\n')
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=(vocab_size + 1), output_dim=embedding_dim, input_length=input_length),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, return_sequences=True, dropout=0.2)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, return_sequences=True, dropout=0.2)),
tf.keras.layers.GlobalMaxPool1D(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(64, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(l2=0.01)),
# softmax output layer
tf.keras.layers.Dense(10, activation='softmax')
])
# optimizer & loss
opt = 'RMSprop' #tf.optimizers.Adam(learning_rate=1e-4)
loss = 'categorical_crossentropy'
# Metrics
metrics = ['accuracy', 'AUC','Precision', 'Recall']
# compile model
model.compile(optimizer=opt,
loss=loss,
metrics=metrics)
model.summary()
return model
training
def train(model, x_train, y_train, x_validation, y_validation,
epochs, batch_size=32, patience=5,
verbose=2, monitor_es='accuracy', mode_es='auto', restore=True,
monitor_mc='val_accuracy', mode_mc='max'):
# callback
early_stopping = tf.keras.callbacks.EarlyStopping(monitor=monitor_es,
verbose=1, mode=mode_es, restore_best_weights=restore,
min_delta=1e-3, patience=patience)
model_checkpoint = tf.keras.callbacks.ModelCheckpoint('tfjsmode.h5', monitor=monitor_mc, mode=mode_mc,
verbose=1, save_best_only=True)
keras_callbacks = [early_stopping, model_checkpoint]
# train model
history = model.fit(x_train, y_train,
batch_size=batch_size, epochs=epochs, verbose=verbose,
validation_data=(x_validation, y_validation),
callbacks=keras_callbacks)
return history
ACCURACY & LOSS
BATCH SIZE
Currently the batch size is set to 16, if I increase the batch size to 64 with 2500 samples per category, the final plots will will result in underfitting.

As pointed out in the comments the smaller the batch size the more variance of the mean for the batches which then appear in more fluctuation in the loss. I typically use a batch size of 80 since I have a fairly large memory capacity. You are using the ModelCheckpoint callback and saving the model with the best validation accuracy. It is better to save the model with the lowest validation loss. You say increasing the number of samples leads to under fitting. That seems rather strange. Usually more samples results in better accuracy.

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?

I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

Issues with Keras load_model function

I am building a CNN in Keras using a Tensorflow backend for speaker identification, and currently I am attempting to train the model and then save it in as an .hdf5 file. The program trains the model for 100 epochs with early stopping and checkpoints, saving only the best model to a file, as illustrated in the code below:
class BuildModel:
# Create First Model in Ensemble
def createModel(self, model_input, n_outputs, first_session=True):
if first_session != True:
model = load_model('SI_ideal_model_fixed.hdf5')
return model
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv = Dense(64, activation='relu')(conv)
conv = Dense(16, activation='relu')(conv)
conv = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv)
outputs = Dense(n_outputs, activation='softmax')(conv)
# Create Model
model = Model(inputs, outputs)
model.summary()
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes,
first_session=first_session)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.hdf5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
mode='max') # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=epochs,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
accuracy = model.evaluate(x=x_train, y=y_train,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model = load_model('SI_ideal_model_fixed.hdf5')
return (accuracy[1], model)
However, it appears that the load_model function is not working properly since the model achieved a validation accuracy of 0.56193 after the first training session but then only started with a validation accuracy of 0.2508 at the beginning of the second training session. (From what I have seen, the first epoch of the second training session should have a validation accuracy much closer to the that of the best model.)
Moreover, I then attempted to test the trained model on a set of unseen samples with model.predict, and it failed on all six, often with high probabilities, which leads me to believe that it was using minimally trained (or untrained) weights.
So, my question is could this be an issue from loading and saving the models using the load_model and ModelCheckpoint functions? If so, what is the best alternative method? If not, what are some good troubleshooting tips for improving the model's prediction functionality?

I am not sure what you mean by training session. What I would do is first train for a few epochs epochs and note the validation accuracy. Then, load the model and use evaluate() to get the same accuracy. If it differs, then yes something is wrong with your loading. Here is what I would do:
def createModel(self, model_input, n_outputs):
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv2 = Dense(64, activation='relu')(conv)
conv3 = Dense(16, activation='relu')(conv2)
conv4 = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv3)
outputs = Dense(n_outputs, activation='softmax')(conv4)
# Create Model
model = Model(inputs, outputs)
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.h5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
save_weights_only=False) # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=5,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
model.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model2 = load_model('SI_ideal_model_fixed.h5')
model2.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
return (accuracy[1], model)
The two evaluations should print the same thing really.
P.S. TF might change the order of your computations so I used different names to prevent that in the model e.g. conv1, conv2 ...)

Vgg16 for gender detection (male,female)

We have used vgg16 and freeze top layers and retrain the last 4 layers on gender dataset 12k male and 12k female. It gives very low accuracy especially for male. We are using the IMDB dataset. On female test data it gives female as output but on male it gives same output.
vgg_conv=VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
Freeze the layers except the last 4 layers
for layer in vgg_conv.layers[:-4]:
layer.trainable = False
Create the model
model = models.Sequential()
Add the vgg convolutional base model
model.add(vgg_conv)
Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dropout(0.5)) model.add(layers.Dense(2, activation='softmax'))
nTrain=16850 nTest=6667
train_datagen = image.ImageDataGenerator(rescale=1./255)
test_datagen = image.ImageDataGenerator(rescale=1./255)
batch_size = 12 batch_size1 = 12
train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224, 224), batch_size=batch_size, class_mode='categorical', shuffle=False)
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(224, 224), batch_size=batch_size1, class_mode='categorical', shuffle=False)
model.compile(optimizer=optimizers.RMSprop(lr=1e-6), loss='categorical_crossentropy', metrics=['acc'])
history = model.fit_generator( train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=3, validation_data=test_generator, validation_steps=test_generator.samples/test_generator.batch_size, verbose=1)
model.save('gender.h5')
Testing Code:
model=load_model('age.h5')
img=load_img('9358807_1980-12-28_2010.jpg', target_size=(224,224))
img=img_to_array(img)
img=img.reshape((1,img.shape[0],img.shape[1],img.shape[2]))
img=preprocess_input(img)
yhat=model.predict(img)
print(yhat.size)
label=decode_predictions(yhat)
label=label[0][0]
print('%s(%.2f%%)'% (label[1],label[2]*100))

Firstly, you are saving the model as gender.h5 and during testing you are loading the model age.h5. Probably you have added different code for the testing here.
Coming to improving the accuracy of the program -
Most importantly is that you are using loss = 'categorical_crossentropy', change it to loss = 'binary_crossentropy' in model.compile as you have just 2 classes. So your
model.compile(optimizer="adam",loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy']) will look like this.
Also change class_mode='categorical' to class_mode='binary' in flow_from_directory.
As categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid. So last layer should be Dense(1, activation='sigmoid').
You have added 2 Dense layers of 4096, this will add 4096 * 4096 = ‭16,777,216‬ weights to be learnt by the model. Reduce them may be to 1026 and 512 respectively.
You have added Dropout layer of 0.5, that is to keep the 50% of neurons off during the epoch. That is huge number. Better is to drop off the Dropout layer and use to only if your model is overfitting.
Set batch_size = 1. As you have very less input let every epoch have same number of steps as input records.
Use Data Augmentation technique like horizontal_flip, vertical_flip, shear_range, zoom_range of ImageDataGenerator to generate the new batches of training and validation images during every epoch.
Train your model for large number of epoch. You are just training for epoch=3, that is too less for learning the weights. Train for epoch=50 and later trim the number.
Hope this answers your question. Happy Learning.

Keras: Derivatives of output wrt each input

I am using a very simple MLP with just 1 hidden layer to estimate option prices.
In addition to the actual output of the neural network I would also like to know the partial derivative of the output value (of each line of the data sample) with regard to one of the 6 input parameters such that the resulting value can be interpreted as the percentage change of the output with regard to a change in the input parameter.
As I am pretty new to Keras and Neural Networks in general I was not able to come up with a solution for the problem myself.
# Create Model
model = Sequential()
model.add(Dense(6, input_dim=6)) #input layer
model.add(Dense(10, activation=relu)) #hidden layer
model.add(Dense(1, activation=linear)) #output layer
# Compile Model
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
# Train model
model.fit(X_train, Y_train, epochs=50, batch_size=10 verbose=2, validation_split=0.2)
# Predict Values
Y_pred = model.predict(X_test, batch_size=10)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.