CNN model overfitting on multi-class classification - python

I am trying to use GloVe embeddings to train a cnn model based on this article (also a rnn, which has this issue). The dataset is a labeled data: text (tweets) with labels (hate, offensive or neither).
The problem is that model performs well on train set but poorly on validation set.
here is the model:
kernel_size = 2
filters = 256
pool_size = 2
gru_node = 64
model = Sequential()
model.add(Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=True))
model.add(Dropout(0.25))
model.add(Conv1D(filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, activation='softmax'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, recurrent_dropout=0.2))
model.add(Dense(1024,activation='relu'))
model.add(Dense(nclasses))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
fitting the model:
X = df.tweet
y = df['classifi'] # classes 0,1,2
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle=False)
X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)
model_RCNN = Build_Model_RCNN_Text(word_index,embeddings_index, 20)
model_RCNN.fit(X_train_Glove, y_train,validation_data=(X_test_Glove, y_test),
epochs=15,batch_size=128,verbose=2)
predicted = model_RCNN.predict(X_test_Glove)
predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))
this is what the distribution looks like (0:hate, 1:offensive, 2:neither)
model summary
Results:
classification report
is this the correct approach or am I missing something here

Generally speaking there are two sides that you can tackle overfitting:
Improving the data
More unique data
oversampling (to balance data)
Limiting the network structure
Dropout (You've implemented this)
Less parameters (You might want to benchmark against a much smaller network)
regularization (ex. L1 and L2)
I'd suggest trying with significantly fewer parameters (because this is quick) and oversampling (because your data seems lopsided).
Also, You can also try hyperparameter fitting. Making a large number of networks with different parameters than picking the best one.
Note: if you do hyper parameter fitting make sure to have an extra validation set because you can easily overfit your test set this way.
Side note: Sometimes when troubleshooting NN it is helpful to set the optimizer to a basic stochastic gradient descent. It slows the training down a bunch but makes the progression much clearer.
Good luck!

Related

Keras sequential model results not reproducible with wildly inconsistent results on same dataset and parameters optimized using Optuna

I am running a Keras sequential model as a regressor with tensorflow backend. I am using Optuna to optimize it's hyper-paramters and reducing the rmse in the Optuna optimizer.
However, when I re-create the Keras model with the best parameters from Optuna and use the same dataset for re-fitting and predicting as the one used in the Optuna objective function, I get wildly inconsistent results.
I'm aware that neural nets are stochastic in nature with an element of randomness. In order to make it deterministic I tried setting the seeds for both numpy and tensorflow in the following manner at beginning of my script, but it doesn't work,
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)
Following is my code and the output-
def create_model(trial):
n_layers = trial.suggest_int("layers_number", 4, 8)#4
model = keras.Sequential()
for i in range(n_layers):
num_hidden = trial.suggest_int("n_units_l_{}".format(i), 10, 16)
activation = trial.suggest_categorical('activation_l_{}'.format(i), ['linear'])#, 'relu', 'sigmoid', 'tanh', 'elu'
model.add(layers.Dense(num_hidden, activation=activation, kernel_initializer = 'uniform'))
dropout = trial.suggest_uniform("dropout_l_{}".format(i), 0.1, 0.4)
model.add(layers.Dropout(dropout))
model.add(layers.Dense(1, activation='linear'))
lr = trial.suggest_loguniform("lr", 1e-5, 1e-1)
model.compile(
loss='mean_squared_error',
optimizer=keras.optimizers.Adam(lr=lr),
metrics=['mse']
)
return model
def objective(trial):
keras.backend.clear_session()
model = create_model(trial)
epochs = trial.suggest_int("epochs", 3, 4)#50
batch = trial.suggest_int("batch", 1, 2)
model.fit(
X_train.values,
y_train.values,
batch_size=batch,
epochs=epochs,
verbose=0,
shuffle=False
)
y_pred_test = model.predict(X_test)
test_copy['pred_scaled'] = y_pred_test
rmse = inverse_transform(test_copy, y_pred_test, df_copy) #inverse transforms the transformed target and calculates rmse
return rmse
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=2)
Output- best trial screenshot
RMSE of best trial is 110.90926282554379
Refitting and predicting using best params.
def KerasRegressor(parameters):
print(parameters)
model = keras.Sequential()
layers_number = int(parameters['layers_number'])
for i in range(layers_number):
model.add(layers.Dense(int(parameters['n_units_l_' + str(i)]), activation=parameters['activation_l_' + str(i)], kernel_initializer = 'uniform'))
model.add(layers.Dropout(int(parameters['dropout_l_' + str(i)])))
model.add(layers.Dense(1, activation='linear'))
model.compile(
loss='mean_squared_error',
optimizer=keras.optimizers.Adam(lr=float(parameters['lr'])),
metrics=['mse'])
return model
params = study.best_trial.params
epochs = params['epochs']
batch = params['batch']
del params['epochs']
del params['batch']
seed(1)
tensorflow.random.set_seed(2)
model = KerasRegressor(params)
model.fit(X_train.values, y_train.values, epochs=epochs, batch_size=batch, shuffle=False)
y_pred_test = model.predict(X_test)
test_copy['pred_scaled'] = y_pred_test
rmse = inverse_transform(test_copy, y_pred_test, df_copy)#inverse transforms the transformed target and calculates rmse
print(rmse)
New rmse on same dataset as used in Optuna objective function with best hyperparameters-
New rmse - 227892.23560327655
Small differences in rmse are acceptable but not this large a difference.
I have a different approach. I save the model to a file every time optuna finds the best metric. During prediction I just load the model file to predict the test.
If you really want to debug, have a method to find the randomness in your system like fix seed (you did that), fit same data and ordering, use same layers etc. use same param then test it. Run again - fix seed, fit ..., test it. Are the 2 tests results the same? Run multiple test, are the tests the same.

Time series classification using CNN

I am trying to build a convolutional neural network which classifies time series data into two classes. For the time being I only have a small dataset so what I need first is to augment my datasets so I can feed them into a network.
For the data augmentation task, I found some very helpful methods at https://github.com/uchidalab/time_series_augmentation repository. What I have tried so far is to add some gaussian noise to my data, a permutation method, a time warping, a window slice and a window warp methods. These methods are being applied on a (batches, batch_rows, channels)=(354, 400, 3) dataset to generate a (1770, 400, 3) dataset (including train and test datasets and their corresponding labels).
Given the fact that I have a limited number of inputs, I would like to know if you have any suggestions for a 1D CNN structure for a good performance over these datasets.
What I have tried so far is this network:
verbose, epochs, batch_size = 0, 10, 8
n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]
model = Sequential()
model.add(Conv1D(filters=16, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
# evaluate model
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
No matter the changes I make in the parameters and the hyperparameters, I always get an accuracy around 50%, meaning that a binary classifier does not exists.
I would really appreciate if anyone can tell me what probably is the problem. Does this happens due to poor data quality produced by the augmentation methods? Or is it has to do with the network itself?
Thanks in advance
If it's a classification between two classes, you should use binary_crossentropy as loss function.

Why the accuracy of the neural network stops increasing

I'm trying to solve the Titanic competition on Kaggle. But the modelaccuracy isn't going beyond 80%.
I tried to change a number of hidden nodes, a number of epochs, also tried to apply batch normalization, dropout, changing the weights initializations, but there's the same 80%. What am I doing wrong?
This is my code below:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_shape=(5,), kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(20, kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2, kernel_initializer=tf.keras.initializers.GlorotNormal(), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_scores = model.fit(train_features, train_labels, epochs=200, batch_size=64, verbose=2)
And here's on the picture accuracy in some last epochs:model accuracy
How can I improve it?
You can try normalising the data, Generally while implementing Neural Networks we don't need to normalise our data (if the network is deep) but since here we are only working with 3 layers only I guess normalising the data might help.
I would suggest to split your training data again into training and validation set and use K-fold cross validation ( I am not sure about this one!! I too am new in this field).
But in general I have seen if the accuracy is constant then the best approach is to alter the training data ( I mean normalise it or try imputing NaN values with the mean (rather than setting the to 0)).

Validation accuracy is low and not increasing while training accuracy is increasing

I am a newbie to Keras and machine learning in general. I’m trying to build a classification model using the Sequential model. After some experiments, I see that my validation accuracy behavior is very low and not increasing, although the training accuracy works well. I added regularization parameters to the layers and dropouts also in between the layers. Still, the behavior exists. Here’s my code.
from keras.regularizers import l2
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(filters=32, kernel_size=1, strides=1, padding="SAME", activation="relu", input_shape=[512,1],kernel_regularizer=keras.regularizers.l2(l=0.1))) # 一定要加 input shape
keras.layers.Dropout=0.35
model.add(keras.layers.MaxPool1D(pool_size=1,activity_regularizer=l2(0.01)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation="softmax",activity_regularizer=l2(0.01)))
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Ahistory = model.fit(train_x, trainy, epochs=300,
validation_split = 0.2,
batch_size = 16)
And here is the final results I got.
What is the reason behind this.? How do I fine-tune the model.?

How do I decrease loss on keras sequential model

I am hoping to get some guidance on what steps I should take next in my attempt to model a certain system. It contains 3 independent variables, 24 dependent variables, and about 21,000 rows. In my attempts to model, I cannot get the accuracy above about 50% or the loss less than about 6500. I've been using variations on the following code:
EPOCHS = 30
#OPTIMIZER = 'adam'
#OPTIMIZER = 'adagrad'
BATCH_SIZE = 10
OUTPUT_UNITS = len(y.columns)
print(f'OUTPUT_UNITS: {OUTPUT_UNITS}')
model = Sequential()
model.add(Dense(8, activation='relu', input_dim=3)) # 3 X parameters, with eng_speed removed
#model.add(Dense(8, activation='relu', input_dim=4)) # 4 X parameters
model.add(Dense(32, activation='relu' ))
#model.add(Dense(64, activation='relu' ))
#model.add(Dense(12, activation='relu' ))
model.add(Dense(OUTPUT_UNITS)) # number of predicted (y) values (labels)
model.summary()
adadelta = optimizers.Adadelta()
adam = optimizers.Adam(lr=0.001)
model.compile(optimizer=adadelta, loss='mse', metrics=['accuracy'])
#model.compile(optimizer=opt, loss='mse', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=EPOCHS, batch_size=BATCH_SIZE)
I've tried removing and adding layers, changing the size of them, different optimizers, learning rates, etc. The following two graphs are typical of what I've been seeing--they both flatten very quickly, then don't improve:
I'm obviously new at this and would appreciate it if someone pointed me in the right direction: an approach to try, something to read up on, whatever. Thanks in advance.
Since (according to your mse loss and your regression tag) you are in a regression setting, accuracy is meaningless (it is only used in classification settings); please see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?
Given that, there is in principle absolutely no reason to consider a loss of 6500 as "high", and hence needing improvement...

Categories

Resources