Different models do incremental fit for RNN model while hyperparameter tuning - python

I am quite new to deep learning and I was studying this RNN example.
After completing the tutorial, I decided to see the effect of various hyperparameters such as the number of nodes in each layer and dropout factor etc.
What I do is, for each value in my lists, create a new model using a set of parameters and test the performance in my dataset. Below is the basic code:
def build_model(MODELNAME, l1,l2,l3, l4, d):
tf.global_variables_initializer()
tf.reset_default_graph()
model = Sequential(name = MODELNAME)
model.reset_states
model.add(CuDNNLSTM(l1, input_shape=(x_train.shape[1:]), return_sequences=True) )
model.add(Dropout(d))
model.add(BatchNormalization())
model.add(CuDNNLSTM(l2, input_shape=(x_train.shape[1:]), return_sequences=True) )
# Definition of other layers of the model ...
model.compile(loss="sparse_categorical_crossentropy",
optimizer=opt,
metrics=['accuracy'])
history = model.fit(x_train, y_train,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
validation_data=(x_validation, y_validation))
return model
layer1 = [64, 128, 256]
layer2,3,4 = [...]
drop = [0.2, 0.3, 0.4]
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
for l1 in layer1:
#for l2, l3, l4 for layer2, layer3, layer4
for d in drop:
sess = tf.Session(config=config)
set_session(sess)
MODELNAME = 'RNN-l1={}-l2={}-l3={}-l4={}-drop={} '.format(l1, l2, l3, l4, d)
print(MODELNAME)
model = build_model(MODELNAME, l1,l2,l3, l4, d)
sess.close()
print('-----> training & validation loss & accuracies)
The problem is when the new model is built using the new parameters, it works as if the next epoch of the previous model, rather than epoch 1 of the new one. Below is some of the results.
RNN-l1=64-l2=64-l3=64-l4=32-drop=0.2
Train on 90116 samples, validate on 4458 samples
Epoch 1/6
90116/90116 [==============================] - 139s 2ms/step - loss: 0.5558 - acc: 0.7116 - val_loss: 0.8857 - val_acc: 0.5213
... # results for other epochs
Epoch 6/6
RNN-l1=64-l2=64-l3=64-l4=32-drop=0.3
90116/90116 [==============================] - 140s 2ms/step - loss: 0.5233 - acc: 0.7369 - val_loss: 0.9760 - val_acc: 0.5336
Epoch 1/6
90116/90116 [==============================] - 142s 2ms/step - loss: 0.5170 - acc: 0.7403 - val_loss: 0.9671 - val_acc: 0.5310
... # results for other epochs
90116/90116 [==============================] - 142s 2ms/step - loss: 0.4953 - acc: 0.7577 - val_loss: 0.9587 - val_acc: 0.5354
Epoch 6/6
90116/90116 [==============================] - 143s 2ms/step - loss: 0.4908 - acc: 0.7614 - val_loss: 1.0319 - val_acc: 0.5397
# -------------------AFTER 31TH SET OF PARAMETERS
RNN-l1=64-l2=256-l3=128-l4=32-drop=0.2
Epoch 1/6
90116/90116 [==============================] - 144s 2ms/step - loss: 0.1080 - acc: 0.9596 - val_loss: 1.8910 - val_acc: 0.5372
As seen, the first epoch of 31th set of parameters behaves as if it is 181th epoch. Similarly, if I stop the run at one point and re-run again, the accuracy and loss look as if it is the next epoch as below.
Epoch 1/6
90116/90116 [==============================] - 144s 2ms/step - loss: 0.1053 - acc: 0.9621 - val_loss: 1.9120 - val_acc: 0.5375
I tried a bunch of things (as you can see in the code), such as model=None, reinitializing the variables, resetting_status of the model, closing session in each iteration etc but none helped. I searched for similar question with no luck.
I am trying to understand what I am doing wrong.
Any help is appreciated,
Note: Title is not very explanatory, I am open to suggestions for a better title.

Looks like you are using a Keras setting, which means you need to import keras backend and then clear that session before you run your new model. It would be something like this:
from keras import backend as K
K.clear_session()

Related

Show validation accuracy while training the Keras CAPTCHA OCR model?

I am working on project with Keras Captcha OCR model. This model is about text Captcha recognition with CTC encoded output, apart form combining CNN and RNN.
I am trying to see the accuracy number from training output. How can I get the number of accuracy and validation accuracy?
Here is the training code form keras model:
epochs = 100
early_stopping_patience = 10
# Add early stopping
early_stopping = keras.callbacks.EarlyStopping(
monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True
)
# Train the model
history = model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=epochs,
callbacks=[early_stopping],
)
And this is the training output:
Epoch 1/100
59/59 [==============================] - 3s 53ms/step - loss: 21.5722 - val_loss: 16.3351
Epoch 2/100
59/59 [==============================] - 2s 27ms/step - loss: 16.3335 - val_loss: 16.3062
Epoch 3/100
59/59 [==============================] - 2s 27ms/step - loss: 16.3360 - val_loss: 16.3116
Epoch 4/100
59/59 [==============================] - 2s 27ms/step - loss: 16.3318 - val_loss: 16.3167
Epoch 5/100
Before calling model.fit just specify the metric you want to compute during training, in this case accuracy:
model.compile(optimizer= your_optimizer, loss= your_loss, metrics=['acc'])
Link to the documentation.

Why there's a bad accuracy on dataset when it's used both for validation and training?

I trained a model with ResNet50 and got an amazing accuracy of 95% on training set.
I took the same training set for validation and the accuracy seem very bad.(<0.05%)
from keras.preprocessing.image import ImageDataGenerator
train_set = ImageDataGenerator(horizontal_flip=True,rescale=1./255,shear_range=0.2,zoom_range=0.2).flow_from_directory(data,target_size=(256,256),classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=31)
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import layers,Model
conv_base = ResNet50(
include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(31, activation='softmax')(x)
model = Model(conv_base.input, predictions)
# here you will write the path for train data or if you create your val data then you can test using that too.
# test_dir = ""
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
data,
target_size=(256, 256), classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=1,shuffle=True)
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit_generator(train_set,steps_per_epoch=1488//31,epochs=10,verbose=True,validation_data = test_generator,
validation_steps = test_generator.samples // 31)
Epoch 1/10
48/48 [==============================] - 27s 553ms/step - loss: 1.9631 - acc: 0.4825 - val_loss: 4.3134 - val_acc: 0.0208
Epoch 2/10
48/48 [==============================] - 22s 456ms/step - loss: 0.6395 - acc: 0.8212 - val_loss: 4.7584 - val_acc: 0.0833
Epoch 3/10
48/48 [==============================] - 23s 482ms/step - loss: 0.4325 - acc: 0.8810 - val_loss: 5.3852 - val_acc: 0.0625
Epoch 4/10
48/48 [==============================] - 23s 476ms/step - loss: 0.2925 - acc: 0.9153 - val_loss: 6.0963 - val_acc: 0.0208
Epoch 5/10
48/48 [==============================] - 23s 477ms/step - loss: 0.2275 - acc: 0.9341 - val_loss: 5.6571 - val_acc: 0.0625
Epoch 6/10
48/48 [==============================] - 23s 478ms/step - loss: 0.1855 - acc: 0.9489 - val_loss: 6.2440 - val_acc: 0.0208
Epoch 7/10
48/48 [==============================] - 23s 483ms/step - loss: 0.1704 - acc: 0.9543 - val_loss: 7.4446 - val_acc: 0.0208
Epoch 8/10
48/48 [==============================] - 23s 487ms/step - loss: 0.1828 - acc: 0.9476 - val_loss: 7.5198 - val_acc: 0.0417
What could be the reason?!
You have assigned train_set and test_datagen differently. In particular one is flipped and scaled where the other isn't. As I mentioned in my comment, if its the same data it will have the same accuracy. You can see a model is overfitting when you use validation correctly and use unseen data for validation. Using the same data will always give the same accuracy for training and validation
not sure what is exactly wrong but it is NOT an over fitting issue. It is clear your validation data(same as training data) is not going in correctly. For one thing you set the validation batch size =1 but you set the validation steps as validation_steps = test_generator.samples // 31) . If test_generator,samples = 1488 then you have 48 steps but with a batch size of 1 you will only validate 48 samples. You want to set the batch size and steps so that batch_size X validation_steps equals the total number of samples. That way you go through the validation set exactly one time. I also recommend that for the test generator you set shuffle=False. Also why do you bother entering all the class names. If you have your class directories labeled as 'airplane','airport','baseball_diamond' etc then you don;t need to specifically define the classes flow from directory will do that for you automatically. See documentation below.
classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.
Your training data is actually different than your test data because you are using data augmentation in the generator. That's OK it may lead to a small difference between your test and validation accuracy but your validation accuracy should be pretty close once you get the validation data to go in correctly

Validation accuracy (val_acc) does not change over the epochs

Value of val_acc does not change over the epochs.
Summary:
I'm using a pre-trained (ImageNet) VGG16 from Keras;
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet', include_top=True, input_shape=(224, 224, 3))
Database from ISBI 2016 (ISIC) - which is a set of 900 images of skin lesion used for binary classification (malignant or benign) for training and validation, plus 379 images for testing -;
I use the top dense layers of VGG16 except the last one (that classifies over 1000 classes), and use a binary output with sigmoid function activation;
conv_base.layers.pop() # Remove last one
conv_base.trainable = False
model = models.Sequential()
model.add(conv_base)
model.add(layers.Dense(1, activation='sigmoid'))
Unlock the dense layers setting them to trainable;
Fetch the data, which are in two different folders, one named "malignant" and the other "benign", within the "training data" folder;
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
folder = 'ISBI2016_ISIC_Part3_Training_Data'
batch_size = 20
full_datagen = ImageDataGenerator(
rescale=1./255,
#rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
validation_split = 0.2, # 20% validation
horizontal_flip=True)
train_generator = full_datagen.flow_from_directory( # Found 721 images belonging to 2 classes.
folder,
target_size=(224, 224),
batch_size=batch_size,
subset = 'training',
class_mode='binary')
validation_generator = full_datagen.flow_from_directory( # Found 179 images belonging to 2 classes.
folder,
target_size=(224, 224),
batch_size=batch_size,
subset = 'validation',
shuffle=False,
class_mode='binary')
model.compile(loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=0.001), # High learning rate
metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=721 // batch_size+1,
epochs=20,
validation_data=validation_generator,
validation_steps=180 // batch_size+1,
)
Then I fine-tune it with 100 more epochs and lower learning rate, setting the last convolutional layer to trainable.
I've tried many things such as:
Changing the optimizer (RMSprop, Adam and SGD);
Removing the top dense layers of the pre-trained VGG16 and adding mine;
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Shuffle=True in validation_generator;
Changing batch size;
Varying the learning rate (0.001, 0.0001, 2e-5).
The results are similar to the following:
Epoch 1/100
37/37 [==============================] - 33s 900ms/step - loss: 0.6394 - acc: 0.7857 - val_loss: 0.6343 - val_acc: 0.8101
Epoch 2/100
37/37 [==============================] - 30s 819ms/step - loss: 0.6342 - acc: 0.8107 - val_loss: 0.6342 - val_acc: 0.8101
Epoch 3/100
37/37 [==============================] - 30s 822ms/step - loss: 0.6324 - acc: 0.8188 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 4/100
37/37 [==============================] - 31s 840ms/step - loss: 0.6346 - acc: 0.8080 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 5/100
37/37 [==============================] - 31s 833ms/step - loss: 0.6395 - acc: 0.7843 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 6/100
37/37 [==============================] - 31s 829ms/step - loss: 0.6334 - acc: 0.8134 - val_loss: 0.6340 - val_acc: 0.8101
Epoch 7/100
37/37 [==============================] - 31s 834ms/step - loss: 0.6334 - acc: 0.8134 - val_loss: 0.6340 - val_acc: 0.8101
Epoch 8/100
37/37 [==============================] - 31s 829ms/step - loss: 0.6342 - acc: 0.8093 - val_loss: 0.6339 - val_acc: 0.8101
Epoch 9/100
37/37 [==============================] - 31s 849ms/step - loss: 0.6330 - acc: 0.8147 - val_loss: 0.6339 - val_acc: 0.8101
Epoch 10/100
37/37 [==============================] - 30s 812ms/step - loss: 0.6332 - acc: 0.8134 - val_loss: 0.6338 - val_acc: 0.8101
Epoch 11/100
37/37 [==============================] - 31s 839ms/step - loss: 0.6338 - acc: 0.8107 - val_loss: 0.6338 - val_acc: 0.8101
Epoch 12/100
37/37 [==============================] - 30s 807ms/step - loss: 0.6334 - acc: 0.8120 - val_loss: 0.6337 - val_acc: 0.8101
Epoch 13/100
37/37 [==============================] - 32s 852ms/step - loss: 0.6334 - acc: 0.8120 - val_loss: 0.6337 - val_acc: 0.8101
Epoch 14/100
37/37 [==============================] - 31s 826ms/step - loss: 0.6330 - acc: 0.8134 - val_loss: 0.6336 - val_acc: 0.8101
Epoch 15/100
37/37 [==============================] - 32s 854ms/step - loss: 0.6335 - acc: 0.8107 - val_loss: 0.6336 - val_acc: 0.8101
And goes on the same way, with constant val_acc = 0.8101.
When I use the test set after finishing training, the confusion matrix gives me 100% correct on benign lesions (304) and 0% on malignant, as so:
Confusion Matrix
[[304 0]
[ 75 0]]
What could I be doing wrong?
Thank you.
VGG16 was trained on RGB centered data. Your ImageDataGenerator does not enable featurewise_center, however, so you're feeding your net with raw RGB data. The VGG convolutional base can't process this to provide any meaningful information, so your net ends up universally guessing the more common class.
In general, when you see this type of problem (your net exclusively guessing the most common class), it means that there's something wrong with your data, not with the net. It can be caused by a preprocessing step like this or by a significant portion of "poisoned" anomalous training data that actively harms the training process.

Keras: My model loss and accuracy randomly drop to zero

I have a rather complex sequence to sequence encoder decoder model. I run into an issue where my loss and accuracy drop to zero and I can't reproduce this error. It has nothing to do with the training data as it happens with different sets.
It seems to be learning as the loss slowly drops. Below is what it is like just before:
Epoch 1/2
5000/5000 [==============================] - 235s 47ms/step - loss: 0.9825 - acc: 0.7077
Epoch 2/2
5000/5000 [==============================] - 235s 47ms/step - loss: 0.9443 - acc: 0.7177
And here is what is like during the next mode.fit() iteration:
Epoch 1/2
2882/2882 [==============================] - 136s 47ms/step - loss: 0.7033 - acc: 0.4399
Epoch 2/2
2882/2882 [==============================] - 136s 47ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
After this, the loss and accuracy remain the same:
Epoch 1/2
5000/5000 [==============================] - 278s 56ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
Epoch 2/2
5000/5000 [==============================] - 279s 56ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
The reason I have to train in such a manner is because I have variable input sizes and output sizes. So I have to make batches of my training data with fixed input size before I train.
sgd = optimizers.SGD(lr= 0.015, decay=0.002)
out2 = model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
I need to use curriculum learning to reach sentence level predictions, so I am doing the following:
I initially train my model to output "1 word + end" token. Training on this works fine. When i start to train on "2 words + end", this problem starts to arise.
After training on 1 word, I save the model. Then I define a new model with output size for 2 words, and use the following:
new_model = createModel(...,num_output_words)
new_model.set_weights(old_model.get_weights())
I have to do this as I can't define a model with variable output length.
I can provide more information if needed. I can't find any information online.

Keras train and validation metric values are different even when using same data (Logistic regression)

I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.
I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.
Here is my code:
Generate some two random data with two classes:
N = 100
x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
y = np.concatenate([np.zeros(N//2), np.ones(N//2)])
And plotting the two class data distribution (one feature x):
data = pd.DataFrame({'x': x.ravel(), 'y': y})
sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
pyplot.tight_layout(0)
pyplot.show()
Build and fit the keras model:
model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)
Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.
The training results are:
100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
Epoch 2/10
100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
Epoch 3/10
100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
Epoch 4/10
100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
Epoch 5/10
100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
Epoch 6/10
100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
Epoch 7/10
100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
Epoch 8/10
100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
Epoch 9/10
100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
Epoch 10/10
100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800
The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.
I have been going through the code in keras, particularly this part:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364
and so far, all I can say that the difference is due to some different computation through the computation graph.
Does anyone has any idea why there would be such difference?
So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.
Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.
This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...
A quick check using tensorflow confirmed that values are fetched before variables are updated:
import tensorflow as tf
import numpy as np
np.random.seed(1)
x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")
W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
z = tf.matmul(x, W) + b
error = tf.square(z - y)
obj = tf.reduce_mean(error, name="obj")
opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
grads = opt.compute_gradients(obj)
train_step = opt.apply_gradients(grads)
N = 100
x_np = np.random.randn(N).reshape(-1, 1)
y_np = 2*x_np + 3 + np.random.randn(N)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(2):
res = sess.run([obj, W, b, train_step], feed_dict={x: x_np, y: y_np})
print('MSE: {}, W: {}, b: {}'.format(res[0], res[1][0, 0], res[2][0]))
Output:
MSE: 14.721437454223633, W: 0.0, b: 0.0
MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985
Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...

Categories

Resources