Issues with Keras load_model function - python

I am building a CNN in Keras using a Tensorflow backend for speaker identification, and currently I am attempting to train the model and then save it in as an .hdf5 file. The program trains the model for 100 epochs with early stopping and checkpoints, saving only the best model to a file, as illustrated in the code below:
class BuildModel:
# Create First Model in Ensemble
def createModel(self, model_input, n_outputs, first_session=True):
if first_session != True:
model = load_model('SI_ideal_model_fixed.hdf5')
return model
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv = Dense(64, activation='relu')(conv)
conv = Dense(16, activation='relu')(conv)
conv = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv)
outputs = Dense(n_outputs, activation='softmax')(conv)
# Create Model
model = Model(inputs, outputs)
model.summary()
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes,
first_session=first_session)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.hdf5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
mode='max') # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=epochs,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
accuracy = model.evaluate(x=x_train, y=y_train,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model = load_model('SI_ideal_model_fixed.hdf5')
return (accuracy[1], model)
However, it appears that the load_model function is not working properly since the model achieved a validation accuracy of 0.56193 after the first training session but then only started with a validation accuracy of 0.2508 at the beginning of the second training session. (From what I have seen, the first epoch of the second training session should have a validation accuracy much closer to the that of the best model.)
Moreover, I then attempted to test the trained model on a set of unseen samples with model.predict, and it failed on all six, often with high probabilities, which leads me to believe that it was using minimally trained (or untrained) weights.
So, my question is could this be an issue from loading and saving the models using the load_model and ModelCheckpoint functions? If so, what is the best alternative method? If not, what are some good troubleshooting tips for improving the model's prediction functionality?

I am not sure what you mean by training session. What I would do is first train for a few epochs epochs and note the validation accuracy. Then, load the model and use evaluate() to get the same accuracy. If it differs, then yes something is wrong with your loading. Here is what I would do:
def createModel(self, model_input, n_outputs):
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv2 = Dense(64, activation='relu')(conv)
conv3 = Dense(16, activation='relu')(conv2)
conv4 = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv3)
outputs = Dense(n_outputs, activation='softmax')(conv4)
# Create Model
model = Model(inputs, outputs)
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.h5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
save_weights_only=False) # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=5,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
model.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model2 = load_model('SI_ideal_model_fixed.h5')
model2.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
return (accuracy[1], model)
The two evaluations should print the same thing really.
P.S. TF might change the order of your computations so I used different names to prevent that in the model e.g. conv1, conv2 ...)

Related

Limit CPU Utilization when training Sequential model using Keras

I am trying to built a sequential model using Keras.
The model is working fine, but its consuming 100% of my system CPU, I need help to limit my system CPU at 80% when this code is running because it's causing alarms.
I am attaching the methods I am using for training the model
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(2500, input_dim=n_inputs, kernel_initializer='he_uniform',activation='relu'))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
Evaluate a model using repeated k-fold cross-validation:
def train_model(X, y):
results = list()
n_inputs, n_outputs = X.shape[1], y.shape[1]
model = get_model(n_inputs, n_outputs)
model.fit(X, y, verbose=1, epochs=100)
# make a prediction on the test set
yhat = model.predict(X)
# round probabilities to class labels
yhat = yhat.round()
# calculate accuracy
acc = accuracy_score(y, yhat)
# store result
print('>%.3f' % acc)
results.append(acc)
return model,results
Fitting the model:
model,results = train_model(X, y)

Deep learning accuracy changes

Every time I change the dataset, it gives a different accuracy. Sometimes it gives 97%, 50%, and 92%. It is a text classification. Why does this happen? The other 95% comes from 2 datasets that are the same size and give almost the same result.
#Split DatA
X_train, X_test, label_train, label_test = train_test_split(X, Y, test_size=0.2,random_state=42)
#Size of train and test data:
print("Training:", len(X_train), len(label_train))
print("Testing: ", len(X_test), len(label_test))
#Function defined to test the models in the test set
def test_model(model, epoch_stop):
model.fit(X_test
, Y_test
, epochs=epoch_stop
, batch_size=batch_size
, verbose=0)
results = model.evaluate(X_test, Y_test)
return results
#############3
maxlen = 300
#Bidirectional LSTM model
embedding_dim = 100
dropout = 0.5
opt = 'adam'
####################
#embed_dim = 128 #dimension of the word embedding vector for each word in a sequence
lstm_out = 196 #no of lstm layers
lstm_model = Sequential()
#Adding dropout
#lstm_model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))##############################
lstm_model = Sequential()
lstm_model.add(layers.Embedding(input_dim=num_words,
output_dim=embedding_dim,
input_length=X_train.shape[1]))
#lstm_model.add(Bidirectional(LSTM(lstm_out, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
#lstm_model.add(Bidirectional(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)))
#lstm_model.add(Bidirectional(LSTM(64, return_sequences=True)))
lstm_model.add(Bidirectional(LSTM(64, return_sequences=True)))
lstm_model.add(layers.GlobalMaxPool1D())
#Adding a regularized dense layer
lstm_model.add(layers.Dense(32,kernel_regularizer=regularizers.l2(0.001),activation='relu'))
lstm_model.add(layers.Dropout(0.25))
lstm_model.add(Dense(3,activation='softmax'))
lstm_model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(lstm_model.summary())
#TRANING
history = lstm_model.fit(X_train, label_train,
epochs=4,
verbose=True,**strong text**
validation_data=(X_test, label_test),
batch_size=64)
loss, accuracy = lstm_model.evaluate(X_train, label_train, verbose=True)
print("Training Accuracy: {:.4f}".format(accuracy))
loss_val, accuracy_val = lstm_model.evaluate(X_test, label_test, verbose=True)
print("Testing Accuracy: {:.4f}".format(accuracy_val))
ML models will base their predictions on the data previously trained on, it is only natural that the outcome will differ in case the training data is changed. Also it might be the case that a different dataset may perform better using different hyperparameters.

How to apply Attention layer to LSTM model

I am doing a speech emotion recognition machine training.
I wish to apply an attention layer to the model. The instruction page is hard to understand.
def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.95):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)
model.compile(loss='binary_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, yhat
How can I apply it to fit for my model?
And are use_scale, causal and dropout all the arguments?
If there is a dropout in attention layer, how do we deal with it since we have dropout in LSTM layer?
Attention can be interpreted as a soft vector retrieval.
You have some query vectors. For each query, you want to retrieve some
values, such that you compute a weighted of them,
where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).
In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.
In classification task, you do not have such an explicit query. The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper).
If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.

why my loss and accuracy plots are slightly shaky?

I built a Bi-LSTM model, which tries to predict certain categories based on a given word. For example, the word "smile" should be predicted by "friendly".
However, after training, the model with 100 samples per 10 categories (1000 in total), at the time of plotting the accuracy and loss, these two are slightly shaky continuously. Why does this occur? Increasing the number of samples causes underfitting.
Model
def build_model(vocab_size, embedding_dim=64, input_length=30):
print('\nbuilding the model...\n')
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=(vocab_size + 1), output_dim=embedding_dim, input_length=input_length),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, return_sequences=True, dropout=0.2)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, return_sequences=True, dropout=0.2)),
tf.keras.layers.GlobalMaxPool1D(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(64, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(l2=0.01)),
# softmax output layer
tf.keras.layers.Dense(10, activation='softmax')
])
# optimizer & loss
opt = 'RMSprop' #tf.optimizers.Adam(learning_rate=1e-4)
loss = 'categorical_crossentropy'
# Metrics
metrics = ['accuracy', 'AUC','Precision', 'Recall']
# compile model
model.compile(optimizer=opt,
loss=loss,
metrics=metrics)
model.summary()
return model
training
def train(model, x_train, y_train, x_validation, y_validation,
epochs, batch_size=32, patience=5,
verbose=2, monitor_es='accuracy', mode_es='auto', restore=True,
monitor_mc='val_accuracy', mode_mc='max'):
# callback
early_stopping = tf.keras.callbacks.EarlyStopping(monitor=monitor_es,
verbose=1, mode=mode_es, restore_best_weights=restore,
min_delta=1e-3, patience=patience)
model_checkpoint = tf.keras.callbacks.ModelCheckpoint('tfjsmode.h5', monitor=monitor_mc, mode=mode_mc,
verbose=1, save_best_only=True)
keras_callbacks = [early_stopping, model_checkpoint]
# train model
history = model.fit(x_train, y_train,
batch_size=batch_size, epochs=epochs, verbose=verbose,
validation_data=(x_validation, y_validation),
callbacks=keras_callbacks)
return history
ACCURACY & LOSS
BATCH SIZE
Currently the batch size is set to 16, if I increase the batch size to 64 with 2500 samples per category, the final plots will will result in underfitting.
As pointed out in the comments the smaller the batch size the more variance of the mean for the batches which then appear in more fluctuation in the loss. I typically use a batch size of 80 since I have a fairly large memory capacity. You are using the ModelCheckpoint callback and saving the model with the best validation accuracy. It is better to save the model with the lowest validation loss. You say increasing the number of samples leads to under fitting. That seems rather strange. Usually more samples results in better accuracy.

How can I restart model.fit after x epochs if loss remains high?

I am having trouble sometimes getting a fit on my data, and when I restart fit (with shuffle=true) then I sometimes get a good fit.
See my previous question:
https://datascience.stackexchange.com/questions/62516/why-does-my-model-sometimes-not-learn-well-from-same-data
As a work around, I want to automatically restart the fitting process, if loss is high after x epochs. How can I achieve this?
I assume I would need to use a custom version of EarlyStopping callback? How could I differentiate between ES because of finding a low loss ( < 0.5) so training is finished, or because loss > 0.5 after x epochs so need to restart training?
Here is a simplified structure:
def train_till_good():
while not_finished:
train()
def train():
load_data()
model = VerySimpleNet2();
checkpoint = keras.callbacks.ModelCheckpoint(filepath=images_root + dataset_name + '\\CheckPoint.hdf5')
myOpt = keras.optimizers.Adam(lr=0.001,decay=0.01)
model.compile(optimizer=myOpt, loss='categorical_crossentropy', metrics=['accuracy'])
LRS = CyclicLR(base_lr=0.000005, max_lr=0.0003, step_size=200.)
tensorboard = keras.callbacks.TensorBoard(log_dir='C:\\Tensorflow', histogram_freq=0,write_graph=True, write_images=False)
ES = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5)
model.fit(train_images, train_labels, shuffle=True, epochs=num_epochs,
callbacks=[checkpoint,
tensorboard,
ES,
LRS],
validation_data = (test_images, test_labels)
)
def VerySimpleNet2():
model = keras.Sequential([
keras.layers.Dense(112, activation=tf.nn.relu, input_shape=(224, 224, 3)),
keras.layers.Dropout(0.4),
keras.layers.Flatten(),
keras.layers.Dense(3, activation=tf.nn.softmax)
])
return model

Categories

Resources