Addressing Saddle Points in Keras Model Training

Addressing Saddle Points in Keras Model Training - python

My keras model seems to have to hit a saddle point in it's training. Of course this is just an assumption; I'm not really sure. In any case, the loss stops at .0025 and nothing I have tried has worked to reduce the loss any further.
What I have tried so far is:
Using Adam and RMSProp with and without cyclical learning rates. The Results are that the loss starts and stays .0989. The learning rates for cyclical learning where .001 to .1.
After 4 or 5 epochs of not moving I tried SGD instead and the loss steadily declined too .0025. This is where the learning rate stalls out. After about 5 epochs of not changing I tried using SGD with cyclical learning enabled hoping it would decrease but I get the same result.
I have tried increasing network capacity (as well as decreasing) thinking maybe the network hit it's learning limitations. I increased all 4 dense layers to 4096. That didn't change anything.
I've tried different batch sizes.
The most epochs I have trained the network for is 7. However, for 6 of those epochs the loss or validation loss do not change. Do I need to train for more epochs or could it be that .0025 is not a saddle point but is the global minimum for my dataset? I would think there is more room for it to improve. I tested the predictions of the network at .0025 and they aren't that great.
Any advice on how to continue? My code is below.
For starters my keras model is similar in style to VGG-16:
# imports
pip install -q -U tensorflow_addons
import tensorflow_addons as tfa
import tensorflow as tf
from tensorflow import keras
from keras import layers
def get_model(input_shape):
input = keras.input(shape=input_shape)
x = layers.Conv2D(filters=64, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=64, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)
x = layers.Conv2D(filters=128, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=128, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)
x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)
x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(4096, activation='relu')(x)
x = layers.Dense(2048, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
x = layers.Dense(512, activation='relu')(x)
output = layers.Dense(9, activation='sigmoid')(x)
return keras.models.Model(inputs=input, outputs=output)
# define learning rate range
lr_range = [.001, .1]
epochs = 100
batch_size = 32
# based on https://www.tensorflow.org/addons/tutorials/optimizers_cyclicallearningrate
steps_per_epoch = len(training_data)/batch_size
clr = tfa.optimizers.CyclicalLearningRate(initial_learning_rate=lr_range[0],
maximal_learning_rate=lr_range[1],
scale_fn=lambda x: 1/(2.**(x-1)),
step_size=2 * steps_per_epoch
)
optimizer = tf.keras.optimizers.Adam(clr)
model = get_model((224, 224, 3))
model.compile(optimzer=optimzer, loss='mean_squared_error')
# used tf.dataset objects for model input
model.fit(train_ds, validation_data=valid_ds, batch_size=batch_size, epochs=epochs)

Related

Why cannot I overfit convolutional autoencoder on one image?

I have this convolutional autoencoder:
inputs = layers.Input(shape=(384, 128, 3))
# coder
x = layers.Conv2D(8, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(inputs)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(16, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(32, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(64, (3, 3), activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
# decoder
x = layers.Conv2DTranspose(64, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2DTranspose(16, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2DTranspose(8, (3, 3), strides=2, activation=layers.LeakyReLU(alpha=0.1), padding="same")(x)
x = layers.Conv2D(3, (3, 3), activation="sigmoid", padding="same")(x)
# autoencoder
autoencoder = Model(inputs, x)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])
autoencoder.summary()
I have following image:
I am trying to overfit my model only on this image, but I cannot get loss lower than ~0,42 and accuracy is somewhere around ~0,79. Reconstruction is pretty decent (I mean, image looks more or less the same, but blurry, without details) but loss doesn't fall neither accuracy rises. I tried a few things (data preprocessing, augmentation, using leaky ReLU instead of normal ReLU), except using larger model (training would take a long time, and I am not sure if this is solution).
How can I overfit this model and increase accuracy and lower my loss? Is this image too much for this model (I am still missing intuition on how large models should be for specific type of data)?

Adding convolution layers on top of inception V3 model

I need to train an image classifier using inception V3 model from Keras. The images pass through 5 Conv2D layers and 2 MaxPool2D layers before entering the pre-trained inception V3 model. However my code gives me an error of ValueError: Depth of input (64) is not a multiple of input depth of filter (3) for 'inception_v3_4/conv2d_123/convolution' (op: 'Conv2D') with input shapes: [?,2,2,224], [3,3,3,32]
I reckon my output shape from previous layers is not compatible with the input shape required by Inception. But i am not able to solve it or is it even possible to solve this error. I am a beginner in machine learning and any light in this matter will be greatly appreciated.
My code is as follows:
inception_model = inception_v3.InceptionV3(weights='imagenet', include_top = False)
for layer in inception_model.layers:
layer.trainable = False
input_layer = Input(shape=(224,224,3)) #Image resolution is 224x224 pixels
x = Conv2D(128, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_layer)
x = Conv2D(128, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = MaxPool2D((3, 3), padding='same',strides=(2, 2))(x)
x = Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = Conv2D(64, (7, 7), padding='same', activation='relu', strides=(2, 2))(x)
x = MaxPool2D((4, 4), padding='same', strides=(2, 2))(x)
x = inception_model (x) #Error in this line
x = GlobalAveragePooling2D()(x)
predictions = Dense(11, activation='softmax')(x) #I have 11 classes of image to classify
model = Model(inputs = input_layer, outputs=predictions)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
model.summary()

Like #CAFEBABE said it would be almost useless to do this because the feature map can have almost 3 values but if you still want to try it then you can do this:
x = Conv2D(3, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_layer)
Another thing you will have to remember is that like you used 5 Conv2D and 2 MaxPooling layers above but you can't do that because even in the Inception model there are Conv2D and max-pooling layers which will take the dimensions to negative and give an error. I tried with 2 Conv2D layers and got an error so at max you can use 1.
Also when you are specifying InceptionV3 model specify the input shape.
input_layer = Input(shape=(224,224,3)) #Image resolution is 224x224 pixels
x = Conv2D(128, (7, 7), padding='same', activation='relu', strides=(2, 2))(input_layer)
inception_model = tf.keras.applications.InceptionV3(weights='imagenet', include_top = False, input_shape=x.shape[0])
for layer in inception_model.layers:
layer.trainable = False
x = inception_model (x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(11, activation='softmax')(x) #I have 11 classes of image to classify
model = Model(inputs = input_layer, outputs=predictions)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['acc'])
model.summary()
This would work but I doubt it will help the model. Anyways try it who knows what will happen.

How to convert keras sequential API to functional API

I am new to deep learning, and am trying to convert this sequential API into a functional API to run on the CIFAR 10 dataset. Below is the sequential API:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu')
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
And here is my attempt at converting this into the functional API:
model_input = Input(shape=input_shape)
x = Conv2D(32, (3, 3), activation='relu',padding='valid')(model_input)
x = MaxPooling2D((2,2))(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = MaxPooling2D((2,2))(x)
x = Conv2D(32, (3, 3))(x)
x = GlobalAveragePooling2D()(x)
x = Activation(activation='softmax')(x)
model = Model(model_input, x, name='nin_cnn')
x = layers.Flatten()
x = layers.Dense(64, activation='relu')
x = layers.Dense(10)
Here is the compile and train code:
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
The original sequential API gets an accuracy of 0.7175999879837036, while the functional API gets an accuracy of 0.0502999983727932. Not sure where I have gone wrong when re-writing the code, any help would be appreciated. Thanks.

Your two models are not the same. The second and third convolutional layers are having 64 units and 32 units respectively for sequential and functional model in your sample code. And you did not include fully-connected layer in your functional model (you created those layer only after you constructed the model).
If you doubt in the future, you can try to do
model.summary()
and compare to see if the models are the same.

In addition what #adrtam mentioned, I want to add few more as the user is beginner.
There are couple of important difference between Sequential and Functional models. Functional and Sequential are almost similar except,
So Sequential is single-input and single-output, and layers can be added/stacked layer by layer. Functional is more flexibility for customization. So, in a way we can say Sequential is subset of Functional Model.
Coming to your case,
Here is a Sequential model
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import models
input_shape=(32, 32, 3)
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
Here is Functional Model.
from tensorflow.keras import Model
from tensorflow.keras import layers
model_input = layers.Input(shape=input_shape)
x = layers.Conv2D(32, (3, 3), activation='relu',padding='valid')(model_input)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(10)(x)
model2 = Model(model_input, x, name='nin_cnn')

Keras - Prediction accuracy of training data is worse?

I've trained my model and got the .hdf5 file. (training and validation accuracy are about 0.9)
Below is my accuracy curve.
trainnig curve
Because of the imbalance of my data, I used SMOTE to oversample my data and then split it into training and validation data.
sm = SMOTE(random_state=42)
X_resampled, y_resampled = sm.fit_resample(X, Y)
X_resampled = X_resampled.reshape(X_resampled.shape[0],128,128,3)
X_tr, X_tst, y_tr, y_tst = train_test_split(X_resampled, y_resampled, test_size=0.33,random_state=22)
And below is my model structure.
image_input = Input(shape=(img_size, img_size, 3))
conv_1 = Conv2D(64, (5, 5), padding='same',
input_shape=(img_size, img_size, 3), activation='relu')(image_input)
drop_2 = Dropout(0.4)(conv_1)
conv_3 = Conv2D(64, (3, 3), padding='same', activation='relu')(drop_2)
drop_4 = Dropout(0.4)(conv_3)
max_5 = MaxPooling2D(pool_size=(2, 2))(drop_4)
conv_6 = Conv2D(32, (5, 5), padding='same', activation='relu')(max_5)
drop_7 = Dropout(0.4)(conv_6)
conv_8 = Conv2D(32, (3, 3), padding='same', activation='relu')(drop_7)
drop_9 = Dropout(0.4)(conv_8)
max_10= MaxPooling2D(pool_size=(2, 2))(drop_9)
conv_11 = Conv2D(32, (5, 5), padding='same', activation='relu')(max_10)
drop_12 = Dropout(0.4)(conv_11)
conv_13 = Conv2D(32, (3, 3), padding='same', activation='relu')(drop_12)
drop_14 = Dropout(0.4)(conv_13)
max_15= MaxPooling2D(pool_size=(2, 2))(drop_14)
flat_16 = Flatten()(max_15)
den_17= Dense(8,activation='relu')(flat_16)
output = Dense(nb_classes, activation='softmax')(den_17)
img_size = 128
nb_classes = 6
batch_size = 256
nb_epoch=1000
savedModelName = 'M.hdf5'
lr = 0.00001
After I finished training my model, I saved it (by ModelCheckpoint save_best_only according to validation accuracy).
And then I used it to predict the "same" data (same random_state).
sm = SMOTE(random_state=42)
X_resampled, y_resampled = sm.fit_resample(X, Y)
X_resampled = X_resampled.reshape(X_resampled.shape[0], 128, 128, 3)
X_tr, X_tst, y_tr, y_tst = train_test_split(X_resampled, y_resampled, test_size=0.33,random_state=22)
But!! I get the prediction accuracy about 0.3.
Why?
Shouldn't it be 0.9?

could you maybe provide the code of you fitting the model.
Also what happens if you predict the first test set with your model? Could you provide maybe a confusion matrix or the Precision/Recall-Values?
My first guess would be, that your model is maybe overfitting or not really learning.

Keras: model.fit() getting error for multiple inputs in siamese_model

I am new to Keras and the Siamese network architecture. I have developed a Siamese network with three inputs and one output as follows.
def get_siamese_model(input_shape):
# Define the tensors for the three input phrases
anchor = Input(input_shape, name='anchor')
positive = Input(input_shape, name='positive')
negative = Input(input_shape, name='negative')
# Convolutional Neural Network
model = Sequential()
model.add(Conv2D(64, kernel_size=(2, 2), activation='relu', input_shape=input_shape, padding='same'))
model.add(Conv2D(32, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(16, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(8, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(4, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(2, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(1, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,1)))
model.add(Flatten())
# Generate the encodings (feature vectors) for the three phrases
anchor_out = model(anchor)
positive_out = model(positive)
negative_out = model(negative)
# Add a customized layer to combine individual output
concat = Lambda(lambda tensors:K.concatenate((tensors[0],tensors[1],tensors[2]),0))
output = concat([anchor_out, positive_out, negative_out])
# Connect the inputs with the outputs
siamese_net = Model(inputs=[anchor,positive,negative],outputs=output)
#plot the model
plot_model(siamese_net, to_file='siamese_net.png',show_shapes=True, show_layer_names=True)
#Error optimization
siamese_net.compile(optimizer=Adam(),
loss=triplet_loss)
# return the model
return siamese_net
while using model.fit() I have written following code:
model = get_siamese_model(input_shape)
X = {
'anchor' : anchor,
'positive' : positive,
'negative' : negative
}
model.fit(np.asarray(X), Y)
I am getting following error message:
ValueError: Error when checking model input:
The list of Numpy arrays that you are passing to your model is not the size the model expected.
Expected to see 3 array(s), but instead got the following list of 1 arrays: [array({'anchor': array([[[[ 4.49218750e-02]...
Any help is appreciated. Thank you in advance.

The following code works for me. Because your names are (anchor, positive, negative), you can use those directly as the keys to your dictionary when passing input. Also, you should make use of the concatenate layer in Keras instead of defining a Lambda. Note that I changed the loss for purposes of this example.
from keras.layers import Input, Conv2D, MaxPooling2D, Flatten, concatenate
from keras.models import Model, Sequential
from keras.optimizers import Adam
from keras.losses import mean_squared_error
import numpy as np
def get_siamese_model(input_shape):
# Define the tensors for the three input phrases
anchor = Input(input_shape, name='anchor')
positive = Input(input_shape, name='positive')
negative = Input(input_shape, name='negative')
# Convolutional Neural Network
model = Sequential()
model.add(Conv2D(64, kernel_size=(2, 2), activation='relu', input_shape=input_shape, padding='same'))
model.add(Conv2D(32, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(16, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(8, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(4, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(2, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(Conv2D(1, kernel_size=(2, 2), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,1)))
model.add(Flatten())
# Generate the encodings (feature vectors) for the three phrases
anchor_out = model(anchor)
positive_out = model(positive)
negative_out = model(negative)
# Add a concatenate layer
output = concatenate([anchor_out, positive_out, negative_out])
# Connect the inputs with the outputs
siamese_net = Model(inputs=[anchor,positive,negative],outputs=output)
# Error optimization
siamese_net.compile(optimizer=Adam(), loss=mean_squared_error)
# Summarize model
siamese_net.summary()
# Return the model
return siamese_net
input_shape = (100, 100, 1)
model = get_siamese_model(input_shape)
X = {'anchor': np.ones((5, 100, 100, 1)), # define input as dictionary
'positive': np.ones((5, 100, 100, 1)),
'negative': np.ones((5, 100, 100, 1))}
Y = np.ones((5, 15000))
model.fit(X, Y) # use a dictionary
model.fit([i for i in X.values()], Y) # use a list

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Addressing Saddle Points in Keras Model Training - python

Related

Why cannot I overfit convolutional autoencoder on one image?

Adding convolution layers on top of inception V3 model

How to convert keras sequential API to functional API

Keras - Prediction accuracy of training data is worse?

Keras: model.fit() getting error for multiple inputs in siamese_model

Categories

Resources