Different overfitting for three models with the same structure

Different overfitting for three models with the same structure - python

I want to design three models that they have the same structure but at the end one of them should have some serious overfitting and another model has less overfitting and the last model has no overfitting.
The idea is that i want to see how much information exist in last layer of each model for some test data. let's say I m using mnist dataset as training and testing set and the structure of all models should be like this.
# Network architecture
network = Sequential()
# input layer
network.add(Dense(512, activation='relu', input_shape=(28*28,) ))
# Hidden layers
network.add(Dense(64, activation='relu', name='features'))
# Output layer
network.add((Dense(10,activation='softmax')))
network.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
history = network.fit(train_img, train_label, epochs=50, batch_size=256, validation_split=0.2)
So now the question is how to change this train model that fulfills my needs for three models with different overfitting.
I m new in machine learning topics and i hope i have explain my question as good as possible.
Thanks in advance

Overfit:
The MNIST dataset is rather simple, therefore it should be easy to overfit with the model you are suggesting. Increase the number of epochs: eventually, your model will memorize the training data very well. If you struggle to overfit the data, you might need a more complex network - but I doubt that this will be the case.
Just right:
Probably the easiest wat to obtain the model which is just right (no overfit or underfit) use a callback. Specifically, we can use early stopping. The callback will stop training if the validation loss stops improving. For your code, all you have to do is modify the training as follows:
First define a callback
callback_es = tf.keras.callbacks.EarlyStopping(monitor = 'val_loss')
Add the callback to your training
history = network.fit(train_img, train_label, epochs=50, batch_size=256, validation_split=0.2, callback = [callback_es])
Underfit
Similar idea as with overfitting. In this case, you want to stop your training early on. Train your model for a limited number of epochs only. If you find that your model overfits to quickly, try to lower the learning rate.

Related

Why doesn't Keras' ModelCheckPoint save my best model with the highest validation accuracy during training?

I am training a ResNet18 with Keras. As shown below, I used ModelCheckPoint to save the best model based on the validation accuracy.
model = ResNet18(2)
model.build(input_shape = (None,128,128,3))
model.summary()
model.save_weights('./Adam_resnet18_original.hdf5')
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
mcp_save = ModelCheckpoint('Adam_resnet18_weights.hdf5', save_best_only=True, monitor='val_accuracy', mode='max')
batch_size = 128
model.fit(generator(batch_size, x_train, y_train), steps_per_epoch = len(x_train) // batch_size, validation_data = generator(batch_size, x_valid, y_valid), validation_steps = len(x_valid) // batch_size, callbacks=[mcp_save], epochs = 300)
As shown in the picture below, the validation accuracy could go up to 0.8281 during training.
Training History
However, when I used the final model to get the final validation accuracy with the code below, I got an accuracy that's only 0.78109. Can anybody enlighten me what might be the problem here? Thanks a lot!
model.load_weights('Adam_resnet18_weights.hdf5')
predictions_validation = model.predict(generator(batch_size, x_valid, y_valid), steps = len(x_valid) // batch_size + 1)
predictions_validation_label = np.argmax(predictions_validation, axis=1)
Y_valid_label = np.argmax(Y_valid, axis=1)
accuracy_validation_conventional = accuracy_score(Y_valid_label, predictions_validation_label[:len(Y_valid_label)])
print(f'Accuracy on the validation set: {accuracy_validation_conventional}')

The biggest clue here is that the accuracy is stuck to 1.000 for the last couple epochs. From this, it appears that this model is overfitting. An intuitive understanding of overfitting would be like a student taking the exact same test over and over again, to the point where they just memorize the answers to each question and are unable to adapt to small changes in wording. The net has "memorized" the training data but is unable to adapt to the testing data.
It's a little tricky to figure out what the best approach would be since I don't know the size of the dataset you are working with or the details of the model. I am under the assumption that the dataset is of a decent size (if not, try data augmentation) and you have defined a multi-layered net (if you are importing this model from Keras, your options may be a little more limited). Here are some suggestions though:
Stop earlier. Set your ephochs to be a smaller number to prevent overtraining. This is the simplest and easiest solution, and it would make sense in your case since accuracy is already at 1.00 for the last several epochs. If you are able to graph your accuracy and loss over time, this will help as you will be able to visually pinpoint the number of epochs where overfitting begins, as you can see in this example. There are fancier ways to implement early stopping, but simply running for fewer epochs will probably be sufficient for your purposes.
Add dropout layers. Put simply, this will "turn off" random weights in the network, which prevents the network from over-relying on a small subset of nodes. This is also a common technique to prevent overfitting.
A fuller explanation along with other suggestions can be found here. Hope this was helpful!

Accuracy increase but loss also increases

I am using this model.While using this model validation accuracy is increasing but at a same time validation loss is also increasing.What happening here?
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import adam
model_alpha1 = Sequential()
model_alpha1.add(Dense(64, input_dim=96, activation='relu'))
model_alpha1.add(Dense(2, activation='softmax'))
opt_alpha1 = adam(lr=0.001)
model_alpha1.compile(loss='sparse_categorical_crossentropy', optimizer=opt_alpha1, metrics=
['accuracy'])
history = model_alpha1.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=200, verbose=1)
If need any more details i will provide just comment for the detail.Thank you

It seems that your model is overfitting. In other words, your model is fitting too much for the training data and that is why it is not performing as well for the validation data anymore.
Typical way to prevent overfitting is to use regularization techniques. For example:
Dropout layers https://keras.io/api/layers/regularization_layers/dropout/
Early stopping https://keras.io/api/callbacks/early_stopping/
Noise https://keras.io/api/layers/regularization_layers/gaussian_noise/
Try to train less deep NN for your problem or try dropout layers (or both obviously depending how these would affect). From your figure, we can see that the overfitting starts after ~25 epochs.
Overfitting may be caused, for example, by using too complex model for data set, which is not large enough. Or you just train the model too long! (here early stopping will fix the issue)
Here some regularization examples with TF: https://tensorflow.rstudio.com/tutorials/beginners/basic-ml/tutorial_overfit_underfit/

When training a classification model, it is not possible to optimize for accuracy directly since it is not a differentiable function. Therefore, we use cross-entropy as our loss function, which is highly correlated with accuracy. When inspecting our metrics it is important to remember that these are still two different metrics.
In terms of CE loss, your model is exhibiting textbook overfitting. However, in terms of accuracy, which is what you are actually interested in, it simply "finished training". This is why we track not only the loss but also the actual metrics we are interested in in the bottom line - so that we make our decisions based on them.

What is training accuracy and training loss and why we need to compute them?

I am new to Lstm and machine learning and I am trying to understand some of it's concepts. Below is the code of my Lstm model.
Lstm model:
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dropout(0.1))
model.add(Dense(vocab_size, activation='softmax'))
early_stopping = EarlyStopping(monitor='val_loss', patience=42)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X, y, validation_split=0.2, epochs=500, verbose=2,batch_size = 20)
Below is a sample of my output:
And the train/test accuracy and train/test loss diagrams:
My undersanding (and please correct me if I am wrong) is that val_loss and val_accuracy is the loss and accuracy of the test data. My question is, what is the train accuracy and train loss and how these values are computed?. Thank you.

1. loss and val_loss-
In deep learning, the loss is the value that a neural network is trying to minimize. That is how a neural network learns by adjusting weights and biases in a manner that reduces the loss.
loss and val_loss differ because the former is applied to the train set, and the latter to the test set. As such, the latter is a good indication of how the model performs on unseen data.
2. accuracy and val_accuracy-
Once again, acc is on the training data, and val_acc is on the validation data. It's best to rely on val_acc for a fair representation of model performance because a good neural network will end up fitting the training data at 100%, but would perform poorly on unseen data.
Training should be stopped when val_acc stops increasing, otherwise your model will probably overffit. You can use earlystopping callback to stop training.
3. Why do we need train accuracy and loss?
It's not a meaningful evaluation metric because a neural network with sufficient parameters can essentially memorize the labels of training data and then perform no better than random guessing on previously unseen examples.
However, it can be useful to monitor the accuracy and loss at some fixed interval during training as it may indicate whether the backend is functioning as expected and if the training process needs to be stopped.
Refer here for a detailed explanation about earlystopping.
4. How accuracy and loss are calculated?
Loss and accuracy are calculated as you train, according to the loss and metrics specified in compiling the model. Before you train, you must compile your model to configure the learning process. This allows you to specify the optimizer, loss function, and metrics, which in turn are how the model fit function knows what loss function to use, what metrics to keep track of, etc.
The loss function (like binary cross entropy) documentation can be found here and the metrics (like accuracy) documentation can be found here.

Cross validation with CNN

I would like to know if my code is doing what i want to do; To give you some background 'im implementing CNN for image classification. I'm trying to use cross validation to compare my different neural network architecture
here the code:
def create_model():
model = Sequential()
model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
input_shape=(96,96,1)))
model.add(MaxPool2D())
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(12, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
return model
model = KerasClassifier(build_fn=create_model, epochs=5, batch_size=20, verbose=1)
# 3-Fold Crossvalidation
kfold = KFold(n_splits=3, shuffle=True, random_state=2019)
results = cross_val_score(model, train_X, train_Y_one_hot, cv=kfold)
model.fit(train_X, train_Y_one_hot,validation_data=(valid_X, valid_label),class_weight=class_weights)
y_pred = model.predict(test_X)
test_eval = model.evaluate(test_X, y_pred, verbose=0)
I have found the part for cross validation on internet. But i have some problem to understand it.
My question: 1=> Can I use cross validation to improve my accuracy? For example i run 10 time my neural network and my model get the weight where the best accuracy occured
2 => If i understand well, in the code above, results run my CNN 3 time and show me the accuracy. But when i use model.fit, model is run only one time; Am i right?
Thanks for your help

Not really, cross-validation is more a way to prevent overfitting/ not be confused by abnormal results coming from a badly splitted dataset -> getting a revelant estimation of you model performances. If you want to tune the Hyperparameters of your model, you should better use sklearn.model_selection.GridSearchCV / sklearn.model_selection.RandomSearchCV
when doing cross_val_score For each Train/Test
sklearn does a fit then predict/evaluate, So for each new Instance of the model,
you have 1 fit then 1 predict/evaluate;
Else your cross-validation is not valid because it depends on fitting on previous dataset (and maybe on test data !)

There are two key terms here that you should get familiarized with:
Hyperparameters
Parameters
Hyperparameters control the general architecture of a model. These are what the programmer or data scientist controls. In case of a CNN, this refers to the number of layers, their configurations, activations, optimizers etc. For a simple polynomial regression model this would be the degree of the polynomial.
Parameters refer to the actual values of weights or coefficients that the model ends up with after it solves the optimization using gradient descent or whatever method you use. In a CNN this would be the weights matrix for each layer. For a polynomial regression this would be the coefficients and bias.
Cross validation is used to find the best set of hyperparameters. The best set of parameters are obtained by the optimizer (gradient descent, adam etc) for a given set of hyperparameters and data.
To answer your questions:
You would run cross validation several times, each time with a different hyperparameter configuration (network architecture). That's the only thing you can control. At the end you pick the best architecture based on accuracy. The weights of the model would be different for each fold but finding the best weights is the optimizer's job, not yours.
Yes. In 3 fold CV, the model is trained 3 times and evaluated 3 times. When you do model.fit you are making predictions once on a new dataset.

How to fix (do better) text classification model with using word2vec

I'm the freshman in Machine Learning and Neural Network. I've got the problem with text classification. I use LSTM NN architecture system with Keras library.
My model every time reach the results about 97%. I got the database with something about 1 million records, where 600k of them are positive and 400k are negative.
I got also 2 labeled classes as 0 (for negative) and 1 (for positive). My database is split for training database and tests database in relation 80:20. For the NN input, I use Word2Vec trained on PubMed articles.
My network architecure:
model = Sequential()
model.add(emb_layer)
model.add(LSTM(64, dropout =0.5))
model.add(Dense(2))
model.add(Activation(‘softmax’)
model.compile(optimizer=’rmsprop’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
model.fit(X_train, y_train, epochs=50, batch_size=32)
How can I fix (do better) my NN created model in this kind of text classification?

The problem with which we are dealing here is called overfitting.
First of all, make sure your input data is properly cleaned. One of the principles of machine learning is: ‘Garbage In, Garbage Out”. Next, you should balance your data collection, for example on 400k positive and 400k negative records. In sequence, the data set should be divided into a training, test and validation set (60%:20%:20%), for example using scikit-learn library, as in the following example:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
Then I would use a different neural network architecture and try to optimize the parameters.
Personally, I would suggest using a 2-layer LSTM neural network or a combination of a convolutional and recurrent neural network (faster and reading articles that give better results).
1) 2-layer LSTM:
model = Sequential()
model.add(emb_layer)
model.add(LSTM(64, dropout=0.5, recurrent_dropout=0.5, return_sequences=True)
model.add(LSTM(64, dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(2))
model.add(Activation(‘sigmoid’))
You can try using 2 layers with 64 hidden neurons, add recurrent_dropout parameter.
The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
2) CNN + LSTM
model = Sequential()
model.add(emb_layer)
model.add(Convolution1D(32, 3, padding=’same’))
model.add(Activation(‘relu’))
model.add(MaxPool1D(pool_size=2))
model.add(Dropout(0.5))
model.add(LSTM(32, dropout(0.5, recurrent_dropout=0.5, return_sequences=True))
model.add(LSTM(64, dropout(0.5, recurrent_dropout=0.5))
model.add(Dense(2))
model.add(Activation(‘sigmoid’))
You can try using combination of a CNN and RNN. In this architecture, the model learns faster (up to 5 times faster).
Then, in both cases, you need to apply optimization, loss function.
A good optimizer for both cases is the "Adam" optimizer.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
In the last step, we validate our network on the validation set.
In addition, we use callback, which will stop the network learning process, in case when, for example, in 3 more iterations, there are no changes in the accuracy of the classification.
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(patience=3)
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping])
We can also control the overfitting using graphs. If you want to see how to do it, check here.
If you need further help, let me know in a comment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.