Keras LSTM Continue training after save - python

i'm working on a LSTM model and i'd like to save it and continue later with extra data as it accumulates.
My problem is that after save the model and load it again next time i run the script, the prediction is completely wrong, it just mimics the data i entered into it.
Here's the model initialization:
# create and fit the LSTM network
if retrain == 1:
print "Creating a newly retrained network."
model = Sequential()
model.add(LSTM(inputDimension, input_shape=(1, inputDimension)))
model.add(Dense(inputDimension, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, verbose=2)
model.save("model.{}.h5".format(interval))
else:
print "Using an existing network."
model = load_model("model.{}.h5".format(interval))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, verbose=2)
model.save("model.{}.h5".format(interval))
del model
model = load_model("model.{}.h5".format(interval))
model.compile(loss='mean_squared_error', optimizer='adam')
The first dataset, when retrain is set to 1, is around 10 000 entries with around 3k epoch and 5% batch size.
The second dataset is a single entry data. as in one row, with again 3k epochs and batch_size=1
Solved
I was reloading the scaler incorrectly:
scaler = joblib.load('scaler.{}.data'.format(interval))
dataset = scaler.fit_transform(dataset)
Correct:
scaler = joblib.load('scaler.{}.data'.format(interval))
dataset = scaler.transform(dataset)
fit_transform recalculates the multipliers for the scaled values, that means there will be an offset from the original data.

From the functional keras model api for model.fit():
initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training run).
Setting this parameter might solve your problem.
I think the source of the problem is the adaptive learning rate from adam. During training the learning rate anturally declines for more finetuning of the model. When you retrain your model with only one sample the weight updates are much too big (because of the resetted learning rate) which can totally destroy your previous weights.
If initial_epoch is not good, than try to start your second training with a lower learning rate.

Related

Multi class image classification using CNN

I wanted to classify images which consist five classes. I wanted to use CNN. But when I try with several models, the training accuracy will not increase than 20%. Please some one help me to overcome this. Mostly model will trained within 3 epoches and when epoches increase there is no improvement in accuracy. Can anyone suggest me a solution or model or can specify what could be the problem?
Below is one of the model i have used
#defining training and test sets
x_train,x_val,y_train,y_val=train_test_split(x,y,test_size=0.2, random_state=42)
print('Training data and target sizes: \n{}, {}'.format(x_train.shape,y_train.shape))
print('Test data and target sizes: \n{}, {}'.format(x_val.shape,y_val.shape))
Training data and target sizes:
(2398, 224, 224, 3), (2398,)
Test data and target sizes:
(600, 224, 224, 3), (600,)
img_rows, img_cols, img_channel = 224, 224, 3
base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',pooling='avg', input_shape=(img_rows, img_cols, img_channel))
print(base_model.summary())
#Adding custom Layers
add_model = Sequential()
add_model.add(Dense(1024, activation='relu',input_shape=base_model.output_shape[1:]))
add_model.add(Dropout(0.60))
add_model.add(Dense(1, activation='sigmoid'))
print(add_model.summary())
# creating the final model
model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
# compile the model
opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
reduce_lr = ReduceLROnPlateau(monitor='val_acc',
patience=5,
verbose=1,
factor=0.1,
cooldown=10,
min_lr=0.00001)
model.compile(
loss='categorical_crossentropy',
metrics=['acc'],
optimizer='adam'
)
print(model.summary())
n_fold = 5
kf = model_selection.KFold(n_splits = n_fold, shuffle = True)
eval_fun = metrics.roc_auc_score
model.fit(x_train,y_train,epochs=50,batch_size=50,validation_data=(x_val,y_val))
is it okay could you share the part of the code where you're fitting the model. It's not available in the post.
And since the output is not reproducible due to lack of data, I suggest you go through this link https://www.kaggle.com/kenconstable/alzheimer-s-multi-class-classification
It's really well explained and it has given the best practices of multi-class-classification based on transfer learning as well as from scratch. In case you don't find this helpful, It would be helpful to share the training script including the model.fit() code.
Okay, so here's the issue,
In your code, you may be creating a base model with inception V3, however, you are not really adding that base model to your add_model variable.
Your add_model variable is essentially a dense network and not a CNN. Also, another thing, although it's not a big deal is that you're creating your own optimiser opt and not using it in model.compile
Can you please try this code out and let me know if it works:
# function to build the model
def build_transfer_model(conv_base,dropout,dense_node,learn_rate,metric):
"""
Build and compile a transfer learning model
Input: a base model, dropout rate, the number of filters in the dense node,
the learning rate and performance metrics
Output: A compiled CNN model
"""
# clear previous run
backend.clear_session()
# build the model
model = Sequential()
model.add(conv_base)
model.add(Dropout(dropout))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(dense_node,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# complile the model
model.compile(
optimizer = tensorflow.keras.optimizers.Adam(lr=learn_rate),
loss = 'categorical_crossentropy',
metrics = metric )
model.summary()
return model
img_rows, img_cols, img_channel = 224, 224, 3
base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',pooling='avg', input_shape=(img_rows, img_cols, img_channel))
model = build_transfer_model(conv_base=base_model,dropout=0.6,dense_node =1024,learn_rate=0.001,metric=['acc'])
print(model.summary())
model.fit(x_train,y_train,epochs=50,batch_size=50,validation_data=(x_val,y_val))
If you pay attention in the function, the first thing we are adding to the instance of Sequential() is the base layer (InceptionV3 in your case). But you were adding a dense layer directly. Although it may get the weights from the output layer of the base inception V3, it will be a dense network, not a CNN. So please check this out.
I may have changed the variable names, although I have tried not to do the same. And, please change the order of the layers in the build_transfer_model function according to your requirement.
In case it doesn't work, let me know.
Thanks.
You have to use model.fit() to actually train the model after compiling. Right now, it has randomly initialized weights, and is therefore making random predictions. Since you have five classes, the accuracy is approximately 1/5 = 20%. Training your model may take time depending on model size and amount of data you have.

How to apply Attention layer to LSTM model

I am doing a speech emotion recognition machine training.
I wish to apply an attention layer to the model. The instruction page is hard to understand.
def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.95):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)
model.compile(loss='binary_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, yhat
How can I apply it to fit for my model?
And are use_scale, causal and dropout all the arguments?
If there is a dropout in attention layer, how do we deal with it since we have dropout in LSTM layer?
Attention can be interpreted as a soft vector retrieval.
You have some query vectors. For each query, you want to retrieve some
values, such that you compute a weighted of them,
where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).
In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.
In classification task, you do not have such an explicit query. The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper).
If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.

Time series classification using CNN

I am trying to build a convolutional neural network which classifies time series data into two classes. For the time being I only have a small dataset so what I need first is to augment my datasets so I can feed them into a network.
For the data augmentation task, I found some very helpful methods at https://github.com/uchidalab/time_series_augmentation repository. What I have tried so far is to add some gaussian noise to my data, a permutation method, a time warping, a window slice and a window warp methods. These methods are being applied on a (batches, batch_rows, channels)=(354, 400, 3) dataset to generate a (1770, 400, 3) dataset (including train and test datasets and their corresponding labels).
Given the fact that I have a limited number of inputs, I would like to know if you have any suggestions for a 1D CNN structure for a good performance over these datasets.
What I have tried so far is this network:
verbose, epochs, batch_size = 0, 10, 8
n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]
model = Sequential()
model.add(Conv1D(filters=16, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
# evaluate model
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
No matter the changes I make in the parameters and the hyperparameters, I always get an accuracy around 50%, meaning that a binary classifier does not exists.
I would really appreciate if anyone can tell me what probably is the problem. Does this happens due to poor data quality produced by the augmentation methods? Or is it has to do with the network itself?
Thanks in advance
If it's a classification between two classes, you should use binary_crossentropy as loss function.

Good training/validation accuracy but poor test accuracy

Ive trained a model to classify 4 types of eye diseases using the VGG16 pretrained model. I am fairly new to machine learning so didn't know what to make out of the results.
After training it for about 6 hours on 90,000 images:
training accuracy kept increasing as well as the loss (went from roughly 2 to 0.8 ended with an accuracy of 88%)
validation loss kept flucating between 1-2 per epoch (accuracy did improve to 85%)
(I accidentally reran the cell so cant see the output)
After looking at the confusion matrix, it seems my test isn't performing well
Image_height = 196
Image_width = 300
val_split = 0.2
batches_size = 10
lr = 0.0001
spe = 512
vs = 32
epoch = 10
#Creating batches
#Creating batches
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="training")
validation_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="validation")
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(test_folder, target_size=(Image_height,Image_width),
classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical")
#Function to create model. We will be using a pretrained model
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
model.add(vgg16_model)
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model
model = create()
model.compile(Adam(lr=lr),loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(train_batches, steps_per_epoch=spe,
validation_data=validation_batches,validation_steps=vs, epochs=epoch)
Any suggestions on what I can improve on so the confusion matrix isn't doing so poorly? I also have the model saved if its possible to just retrain it with more layers.
A number of issues and recommendations. You are using VGG16 model. That model has over 40 million trainable parameters. On a data set of 90,000 images your training time will be very long. So I recommend you consider using the MobileNet model. It only has 4 million trainable parameters and is essentially just as accurate as VGG16. Documentation is [here.][1] Next irrespective of which model you use you should set the initial weights to the imagenet weights. Your model will start off trained on images.I find I get better results by making all layers in the model trainable. Now you say your model reached an accuracy of 88%. I do not think that is very good. I believe you need to achieve at least 95%. You can do that by using an adjustable learning rate. The keras callback ReduceLROnPlateau makes doing that easy. Documentation is [here.][2] Set it up to monitor validation loss and reduce the learning rate if it fails to decrease on consecutive epochs. Next you want to save the model that has the lowest validation loss and use that to make predictions. The Keras callback ModelCheckpoint can be set up to monitor validation loss and save the model with the lowest loss. Documentation is [here.][3] .
Code below shows how to implement the MobileNet model for your problem and define the callbacks. You will also have to make changes to the generator to use Mobilenet preprocessing and set target size to (224,224). Also I believe you are missing () around the pre-processing function Hope this helps..
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False,
input_shape=(224, 224,3),
pooling='max', weights='imagenet',
alpha=1, depth_multiplier=1,dropout=.5)
x=mobile.layers[-1].output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
predictions=Dense (4, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', save_freq='epoch', options=None)
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=0, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
callbacks=[checkpoint, lr_adjust]
[1]: http://httphttps://keras.io/api/applications/mobilenet/s://
[2]: https://keras.io/api/callbacks/reduce_lr_on_plateau/
[3]: https://keras.io/api/callbacks/model_checkpoint/
You don't train any layer except the last one.
You need to set the training capability to the last few or add more layers.
Add
tf.keras.applications.VGG16(... weights='imagenet'... )
In your code, the weights are not pretrained on any set.
The available options are explained here:
https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16
while adding layers to model you have to remove last dense layer of the model, as your model has four classes but vgg16 has 1000 classes so you have to remove last dense layer then add your own dense layers:
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
model.summary()
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model

Why the accuracy of the neural network stops increasing

I'm trying to solve the Titanic competition on Kaggle. But the modelaccuracy isn't going beyond 80%.
I tried to change a number of hidden nodes, a number of epochs, also tried to apply batch normalization, dropout, changing the weights initializations, but there's the same 80%. What am I doing wrong?
This is my code below:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_shape=(5,), kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(20, kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2, kernel_initializer=tf.keras.initializers.GlorotNormal(), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_scores = model.fit(train_features, train_labels, epochs=200, batch_size=64, verbose=2)
And here's on the picture accuracy in some last epochs:model accuracy
How can I improve it?
You can try normalising the data, Generally while implementing Neural Networks we don't need to normalise our data (if the network is deep) but since here we are only working with 3 layers only I guess normalising the data might help.
I would suggest to split your training data again into training and validation set and use K-fold cross validation ( I am not sure about this one!! I too am new in this field).
But in general I have seen if the accuracy is constant then the best approach is to alter the training data ( I mean normalise it or try imputing NaN values with the mean (rather than setting the to 0)).

Categories

Resources