Fine tuning CNN hyperparameters for complex text classification

Fine tuning CNN hyperparameters for complex text classification - python

I'm working on a CNN model for complex text classification (mainly emails and messages). The dataset contains around 100k entries distributed on 10 different classes. My actual Keras sequential model has the following structure:
model = Sequential(
[
Embedding(
input_dim=10000,
output_dim=150,
input_length=400),
Convolution1D(
filters=128,
kernel_size=4,
padding='same',
activation='relu'),
BatchNormalization(),
MaxPooling1D(),
Flatten(),
Dropout(0.4),
Dense(
100,
activation='relu'),
Dropout(0.4),
Dense(
len(y_train[0]),
activation='softmax')])
In compiling the model I'm using the Nadam optimizer, categorical_crossentropy loss with LabelSmoothing set to 0.2 .
In a model fit, I'm using 30 Epochs and Batch Size set to 512. I also use EarlyStopping to monitor val_loss and patience set to 8 epochs. The test size is set to 25% of the dataset.
Actually the training stops after 16/18 epochs with values that start to fluctuate a little after 6/7 epoch and then go on till being stopped by EarlyStopping. The values are like these on average:
loss: 1.1673 - accuracy: 0.9674 - val_loss: 1.2464 - val_accuracy: 0.8964
with a testing accuracy reaching:
loss: 1.2461 - accuracy: 0.8951
Now I'd like to improve the accuracy of my CNN, I've tried different hyperparameters but as for now, I wasn't able to get a higher value. Therefore I'm trying to figure out:
if there is still room for improvements (I bet so)
if the solution is in a fine-tuning of my hyperparameters and, if so, which ones should I change?
if going deeper by adding layers to the model could be of any use and, if so, how to improve my model
is there any other deep-learning/Neural networks approach rather than CNN that could lead to a better result?
Thank you very much to anybody who will help! :)

There are many libraries, but I find this one very flexible. https://github.com/keras-team/keras-tuner
Just install with pip.
Your updated model, feel free to choose the search range.
from tensorflow import keras
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch
def build_model(hp):
model = keras.Sequential()
model.add(layers.Embedding(input_dim=hp.Int('input_dim',
min_value=5000,
max_value=10000,
step = 1000),
output_dim=hp.Int('output_dim',
min_value=200,
max_value=800,
step = 100),
input_length = 400))
model.add(layers.Convolution1D(
filters=hp.Int('filters',
min_value=32,
max_value=512,
step = 32),
kernel_size=hp.Int('kernel_size',
min_value=3,
max_value=11,
step = 2),
padding='same',
activation='relu')),
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D())
model.add(layers.Flatten())
model.add(layers.Dropout(0.4))
model.add(layers.Dense(units=hp.Int('units',
min_value=64,
max_value=256,
step=32),
activation='relu'))
model.add(layers.Dropout(0.4))
model.add(layers.Dense(y_train[0], activation='softmax'))
model.compile(
optimizer=keras.optimizers.Adam(
hp.Choice('learning_rate',
values=[1e-2, 1e-3, 1e-4])),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=5,
executions_per_trial=3,
directory='my_dir',
project_name='helloworld')
tuner.search_space_summary()
## The following lines are based on your model
tuner.search(x, y,
epochs=5,
validation_data=(val_x, val_y))
models = tuner.get_best_models(num_models=2)
You can try replacing the Conv1D layers with LSTM layers and observe if you get better performance.
LSTM(units = 512) https://keras.io/layers/recurrent/
If you want to extract more meaningful features, one approach I found promising is by extracting pre-trained BERT features and then training using a CNN/LSTM.
A great repository to get started is this one -
https://github.com/UKPLab/sentence-transformers
Once you get the sentence embedding from the BERT/XLNet you can use those features to train another CNN similar to the one you are using except maybe get rid of the embedding layer as it's expensive.

Related

Choose layers on Keras neural network

I am trying to build a neural network that can detect fraudulent transactions. We are using this dataset from Kaggle. I am a beginner to neural networks and am trying to find my way around how to define the model in the best way. Currently the model is not able to detect any frauds at all and all predictions are very close to 0. Including my code in the end. My questions are:
How should I choose the layers to optimize performance?
How should I compile the model and choose parameters such as "epoch" for optimal performance?
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, Conv1D, Activation, Flatten
import tensorflow as tf
model = Sequential([
Dense(256, activation='relu', input_shape=(X_train.shape[1],)),
BatchNormalization(),
Dropout(0.3),
Dense(256, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(256, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(1, activation='sigmoid'),
])

I've implemented a code with nearly 100% accuracy and avoided overfitting for the same, please compare and see where changes have been made, especially during the model creation.
Kaggle Link: https://www.kaggle.com/gautamchettiar/credit-card-fraud
data = pd.read_csv("../input/creditcardfraud/creditcard.csv")
input_features = data.loc[:, data.columns != 'Class']
labels = data['Class']
Then I check up on the split of the classes, which is really uneven in this particular case, yet anyway.
from collections import Counter
Counter(data['Class'])
Counter({0: 284315, 1: 492})
Now that all the data is ready, time to create an appropriate train_test_split for verifying later on.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(input_features, labels, test_size=0.2)
A few imports before creating the model.
from tensorflow.keras import Sequential, layers
import tensorflow as tf
And now, I've not made much changes in my code, however I assume its the Batch Normalization causing issues at your end (just my opinion, I may be wrong). Another thing you might want to check up on is how you've compiled your model.
model = Sequential(
[
layers.Dense(100, activation="relu", input_shape=(x_train.shape[-1],)),
layers.Dropout(0.1),
layers.Dense(100, activation="relu"),
layers.Dropout(0.1),
layers.Dense(50, activation="relu"),
layers.Dropout(0.1),
layers.Dense(50, activation="relu"),
layers.Dropout(0.1),
layers.Dense(1, activation="sigmoid"),
]
)
Then I chose Adam, becuase... its pretty good ig?
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=[tf.keras.metrics.BinaryAccuracy(),
tf.keras.metrics.FalseNegatives()])
The model training begins here then.
model.fit(x_train, y_train)
7121/7121 [==============================] - 24s 3ms/step - loss: 0.9938 - binary_accuracy: 0.9970 - false_negatives_9: 398.0000
<keras.callbacks.History at 0x7ff131da4090>
Then I tested the same on the test set, to understand was the score because of overfitting or not.
scores = model.evaluate(x_test, y_test)
print(f"Accuracy on test set: {scores[1]}")
print(f"False Negatives on test set: {scores[2]}")
And for this, the final output is as shown below.
Accuracy on test set: 0.9983673095703125
False Negatives on test set: 93.0
Hope this helps!

Not able to find a proper CNN

I am using Keras Tensorflow in Colab and I am working on the oxford_flowers102 dataset. Task is image classification. With quite many categories (102) and not so many images per class. I tried to build different neural networks, starting from simple one to more complex ones, with and without image augmentation, dropout, hyper parameter tuning, batch size adjustment, optimizer adjustment, image resizing size .... however, I was not able to find a good CNN which gives me an accetable val_accuracy and finally a good test accuracy. Up to now my max val_accuracy I was able to get was poor 0.3x. I am pretty sure that it is possible to get better results, I am somehow just not finding the right CNN setup. My code so far:
import tensorflow as tf
from keras.models import Model
import tensorflow_datasets as tfds
import tensorflow_hub as hub
# update colab tensorflow_datasets to current version 3.2.0,
# otherwise tfds.load will lead to error when trying to load oxford_flowers102 dataset
!pip install tensorflow_datasets --upgrade
# restart runtime
oxford, info = tfds.load("oxford_flowers102", with_info=True, as_supervised=True)
train_data=oxford['train']
test_data=oxford['test']
validation_data=oxford['validation']
IMG_SIZE = 224
def format_example(image, label):
image = tf.cast(image, tf.float32)
image = image*1/255.0
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image, label
train = train_data.map(format_example)
validation = validation_data.map(format_example)
test = test_data.map(format_example)
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
First model I tried:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(102)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Epoch 20/20 32/32 [==============================] - 4s 127ms/step -
loss: 2.9830 - accuracy: 0.2686 - val_loss: 4.8426 - val_accuracy:
0.0637
When I run it for more epochs, it overfits, val_loss goes up, val_accuracy does not go up.
Second model (very simple one):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(102)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Does not work at all, loss stays at 4.6250.
Third model:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(102)
])
base_learning_rate = 0.0001
model.compile(optimizer=tf.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Model overfits. Val_accuracy not above 0.15.
I added dropout layers to this model (trying differet rates) and also adjusted the kernels. However, no real improvement. Also tried adam optimizer.
Fourth model:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(128, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(256, (3,3), activation='relu'),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.Dense(102)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=20)
Same problem again, no good val_accuracy. Also tried it with RMSprop optimizer. Not able to get a val_accuracy higher than 0.2.
Fifth model:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (2,2), activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(102)
])
base_learning_rate = 0.001
model.compile(optimizer=tf.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_batches, validation_data=validation_batches, epochs=250)
val_accuracy at the highest around 0.3x. Also tried it with adam.
When I tried it with transfer learning, using Mobilenet I immediately got 0.7x within 10 epochs. So I wondered why I am not able to get close to this with a self-built CNN? I do not expect 0.8 or to beat Mobilenet. But where is my mistake? How would a self-built CNN look like with which I can get lets say 0.6-0.7 val_accuracy?

You can try to use some predefined CNN with optimizing its parameters using some metaheuristic optimizers such as Grey Wolf optimizer or PSO, etc....

It's not entirely clear from your question: are you concerned that your model architecture is inferior to that of say MobileNet's, or that your performance is not comparable to that of transfer learning with MobileNet?
In response to the first, in general, the popular architectures such as ResNet, MobileNet, AlexNet are very cleverly crafted networks and so are likely to better represent data than a hand-defined network unless you do something very clever yourself.
In response to the second, the more complex a model gets, the more data it needs to train it well so that it is not underfit or overfit to the data. This poses a problem on datasets such as your (with a few thousand images) because it is difficult for a complex CNN to learn meaningful rules (kernels) for extracting information from images in general without instead learning rules for memorizing the limited set of training inputs. In summary, you want a larger model to make more accurate predictions, but this in turn requires more data, which sometimes you don't have. I suspect that if you used an untrained MobileNet versus your untrained network on the oxford flowers102 dataset, you'd see similarly poor performance.
Enter transfer learning. By pretraining relatively large models on relatively huge datsets (most are pretrained on ImageNet which has millions of images), the model is able to learn to extract relevant information from arbitrary images much better than it would be on a smaller dataset. These general rules for feature extraction apply to your smaller dataset as well, so with just a bit of fine-tuning the transfer learning model will likely far outperform any model trained solely on your dataset.

Good training/validation accuracy but poor test accuracy

Ive trained a model to classify 4 types of eye diseases using the VGG16 pretrained model. I am fairly new to machine learning so didn't know what to make out of the results.
After training it for about 6 hours on 90,000 images:
training accuracy kept increasing as well as the loss (went from roughly 2 to 0.8 ended with an accuracy of 88%)
validation loss kept flucating between 1-2 per epoch (accuracy did improve to 85%)
(I accidentally reran the cell so cant see the output)
After looking at the confusion matrix, it seems my test isn't performing well
Image_height = 196
Image_width = 300
val_split = 0.2
batches_size = 10
lr = 0.0001
spe = 512
vs = 32
epoch = 10
#Creating batches
#Creating batches
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="training")
validation_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="validation")
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(test_folder, target_size=(Image_height,Image_width),
classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical")
#Function to create model. We will be using a pretrained model
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
model.add(vgg16_model)
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model
model = create()
model.compile(Adam(lr=lr),loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(train_batches, steps_per_epoch=spe,
validation_data=validation_batches,validation_steps=vs, epochs=epoch)
Any suggestions on what I can improve on so the confusion matrix isn't doing so poorly? I also have the model saved if its possible to just retrain it with more layers.

A number of issues and recommendations. You are using VGG16 model. That model has over 40 million trainable parameters. On a data set of 90,000 images your training time will be very long. So I recommend you consider using the MobileNet model. It only has 4 million trainable parameters and is essentially just as accurate as VGG16. Documentation is [here.][1] Next irrespective of which model you use you should set the initial weights to the imagenet weights. Your model will start off trained on images.I find I get better results by making all layers in the model trainable. Now you say your model reached an accuracy of 88%. I do not think that is very good. I believe you need to achieve at least 95%. You can do that by using an adjustable learning rate. The keras callback ReduceLROnPlateau makes doing that easy. Documentation is [here.][2] Set it up to monitor validation loss and reduce the learning rate if it fails to decrease on consecutive epochs. Next you want to save the model that has the lowest validation loss and use that to make predictions. The Keras callback ModelCheckpoint can be set up to monitor validation loss and save the model with the lowest loss. Documentation is [here.][3] .
Code below shows how to implement the MobileNet model for your problem and define the callbacks. You will also have to make changes to the generator to use Mobilenet preprocessing and set target size to (224,224). Also I believe you are missing () around the pre-processing function Hope this helps..
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False,
input_shape=(224, 224,3),
pooling='max', weights='imagenet',
alpha=1, depth_multiplier=1,dropout=.5)
x=mobile.layers[-1].output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
predictions=Dense (4, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', save_freq='epoch', options=None)
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=0, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
callbacks=[checkpoint, lr_adjust]
[1]: http://httphttps://keras.io/api/applications/mobilenet/s://
[2]: https://keras.io/api/callbacks/reduce_lr_on_plateau/
[3]: https://keras.io/api/callbacks/model_checkpoint/

You don't train any layer except the last one.
You need to set the training capability to the last few or add more layers.
Add
tf.keras.applications.VGG16(... weights='imagenet'... )
In your code, the weights are not pretrained on any set.
The available options are explained here:
https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16

while adding layers to model you have to remove last dense layer of the model, as your model has four classes but vgg16 has 1000 classes so you have to remove last dense layer then add your own dense layers:
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
model.summary()
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model

Why the accuracy of the neural network stops increasing

I'm trying to solve the Titanic competition on Kaggle. But the modelaccuracy isn't going beyond 80%.
I tried to change a number of hidden nodes, a number of epochs, also tried to apply batch normalization, dropout, changing the weights initializations, but there's the same 80%. What am I doing wrong?
This is my code below:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_shape=(5,), kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(20, kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2, kernel_initializer=tf.keras.initializers.GlorotNormal(), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_scores = model.fit(train_features, train_labels, epochs=200, batch_size=64, verbose=2)
And here's on the picture accuracy in some last epochs:model accuracy
How can I improve it?

You can try normalising the data, Generally while implementing Neural Networks we don't need to normalise our data (if the network is deep) but since here we are only working with 3 layers only I guess normalising the data might help.
I would suggest to split your training data again into training and validation set and use K-fold cross validation ( I am not sure about this one!! I too am new in this field).
But in general I have seen if the accuracy is constant then the best approach is to alter the training data ( I mean normalise it or try imputing NaN values with the mean (rather than setting the to 0)).

Validation accuracy is low and not increasing while training accuracy is increasing

I am a newbie to Keras and machine learning in general. I’m trying to build a classification model using the Sequential model. After some experiments, I see that my validation accuracy behavior is very low and not increasing, although the training accuracy works well. I added regularization parameters to the layers and dropouts also in between the layers. Still, the behavior exists. Here’s my code.
from keras.regularizers import l2
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(filters=32, kernel_size=1, strides=1, padding="SAME", activation="relu", input_shape=[512,1],kernel_regularizer=keras.regularizers.l2(l=0.1))) # 一定要加 input shape
keras.layers.Dropout=0.35
model.add(keras.layers.MaxPool1D(pool_size=1,activity_regularizer=l2(0.01)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation="softmax",activity_regularizer=l2(0.01)))
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Ahistory = model.fit(train_x, trainy, epochs=300,
validation_split = 0.2,
batch_size = 16)
And here is the final results I got.
What is the reason behind this.? How do I fine-tune the model.?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fine tuning CNN hyperparameters for complex text classification - python

Related

Choose layers on Keras neural network

Not able to find a proper CNN

Good training/validation accuracy but poor test accuracy

Why the accuracy of the neural network stops increasing

Validation accuracy is low and not increasing while training accuracy is increasing

Categories

Resources