I am a newbie to Keras and machine learning in general. I’m trying to build a classification model using the Sequential model. After some experiments, I see that my validation accuracy behavior is very low and not increasing, although the training accuracy works well. I added regularization parameters to the layers and dropouts also in between the layers. Still, the behavior exists. Here’s my code.
from keras.regularizers import l2
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(filters=32, kernel_size=1, strides=1, padding="SAME", activation="relu", input_shape=[512,1],kernel_regularizer=keras.regularizers.l2(l=0.1))) # 一定要加 input shape
keras.layers.Dropout=0.35
model.add(keras.layers.MaxPool1D(pool_size=1,activity_regularizer=l2(0.01)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation="softmax",activity_regularizer=l2(0.01)))
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Ahistory = model.fit(train_x, trainy, epochs=300,
validation_split = 0.2,
batch_size = 16)
And here is the final results I got.
What is the reason behind this.? How do I fine-tune the model.?
Related
I followed this
and this is the code of the model.
model = Sequential()
# Recurrent layer
model.add(LSTM(44, return_sequences=True, dropout=0.1,
recurrent_dropout=0.1))
# Fully connected layer
model.add(Dense(44, activation='relu'))
# Recurrent layer
model.add(LSTM(44, return_sequences=False, dropout=0.1,
recurrent_dropout=0.1))
# Fully connected layer
model.add(Dense(44, activation='relu'))
# Dropout for regularization
model.add(Dropout(0.5))
# Output layer
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
the number of training sets, test sets = 6158, 2640
history = model.fit(trainX, trainY, validation_split = 0.33, epochs=500, batch_size=20, verbose=1)
and below graphs show the accuracy and loss of the LSTM model.
epoch 50
epoch 500
Q. At epoch 50, it fluctuates without convergence, so I change the epoch to 500. So now it converges, but the fluctuation problem is the same. Also, additional problem arose. For training, the accuracy is high at first and then decreases. For validation, the accuracy is initially high, then decreases and then increases again. What is the problem with the model? Looking at the loss graph, it doesn't seem to be an overfitting problem. maybe a dataset problem?
I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?
I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.
I'm trying to solve the Titanic competition on Kaggle. But the modelaccuracy isn't going beyond 80%.
I tried to change a number of hidden nodes, a number of epochs, also tried to apply batch normalization, dropout, changing the weights initializations, but there's the same 80%. What am I doing wrong?
This is my code below:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_shape=(5,), kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(20, kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2, kernel_initializer=tf.keras.initializers.GlorotNormal(), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_scores = model.fit(train_features, train_labels, epochs=200, batch_size=64, verbose=2)
And here's on the picture accuracy in some last epochs:model accuracy
How can I improve it?
You can try normalising the data, Generally while implementing Neural Networks we don't need to normalise our data (if the network is deep) but since here we are only working with 3 layers only I guess normalising the data might help.
I would suggest to split your training data again into training and validation set and use K-fold cross validation ( I am not sure about this one!! I too am new in this field).
But in general I have seen if the accuracy is constant then the best approach is to alter the training data ( I mean normalise it or try imputing NaN values with the mean (rather than setting the to 0)).
I'm working on a CNN model for complex text classification (mainly emails and messages). The dataset contains around 100k entries distributed on 10 different classes. My actual Keras sequential model has the following structure:
model = Sequential(
[
Embedding(
input_dim=10000,
output_dim=150,
input_length=400),
Convolution1D(
filters=128,
kernel_size=4,
padding='same',
activation='relu'),
BatchNormalization(),
MaxPooling1D(),
Flatten(),
Dropout(0.4),
Dense(
100,
activation='relu'),
Dropout(0.4),
Dense(
len(y_train[0]),
activation='softmax')])
In compiling the model I'm using the Nadam optimizer, categorical_crossentropy loss with LabelSmoothing set to 0.2 .
In a model fit, I'm using 30 Epochs and Batch Size set to 512. I also use EarlyStopping to monitor val_loss and patience set to 8 epochs. The test size is set to 25% of the dataset.
Actually the training stops after 16/18 epochs with values that start to fluctuate a little after 6/7 epoch and then go on till being stopped by EarlyStopping. The values are like these on average:
loss: 1.1673 - accuracy: 0.9674 - val_loss: 1.2464 - val_accuracy: 0.8964
with a testing accuracy reaching:
loss: 1.2461 - accuracy: 0.8951
Now I'd like to improve the accuracy of my CNN, I've tried different hyperparameters but as for now, I wasn't able to get a higher value. Therefore I'm trying to figure out:
if there is still room for improvements (I bet so)
if the solution is in a fine-tuning of my hyperparameters and, if so, which ones should I change?
if going deeper by adding layers to the model could be of any use and, if so, how to improve my model
is there any other deep-learning/Neural networks approach rather than CNN that could lead to a better result?
Thank you very much to anybody who will help! :)
There are many libraries, but I find this one very flexible. https://github.com/keras-team/keras-tuner
Just install with pip.
Your updated model, feel free to choose the search range.
from tensorflow import keras
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch
def build_model(hp):
model = keras.Sequential()
model.add(layers.Embedding(input_dim=hp.Int('input_dim',
min_value=5000,
max_value=10000,
step = 1000),
output_dim=hp.Int('output_dim',
min_value=200,
max_value=800,
step = 100),
input_length = 400))
model.add(layers.Convolution1D(
filters=hp.Int('filters',
min_value=32,
max_value=512,
step = 32),
kernel_size=hp.Int('kernel_size',
min_value=3,
max_value=11,
step = 2),
padding='same',
activation='relu')),
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling1D())
model.add(layers.Flatten())
model.add(layers.Dropout(0.4))
model.add(layers.Dense(units=hp.Int('units',
min_value=64,
max_value=256,
step=32),
activation='relu'))
model.add(layers.Dropout(0.4))
model.add(layers.Dense(y_train[0], activation='softmax'))
model.compile(
optimizer=keras.optimizers.Adam(
hp.Choice('learning_rate',
values=[1e-2, 1e-3, 1e-4])),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=5,
executions_per_trial=3,
directory='my_dir',
project_name='helloworld')
tuner.search_space_summary()
## The following lines are based on your model
tuner.search(x, y,
epochs=5,
validation_data=(val_x, val_y))
models = tuner.get_best_models(num_models=2)
You can try replacing the Conv1D layers with LSTM layers and observe if you get better performance.
LSTM(units = 512) https://keras.io/layers/recurrent/
If you want to extract more meaningful features, one approach I found promising is by extracting pre-trained BERT features and then training using a CNN/LSTM.
A great repository to get started is this one -
https://github.com/UKPLab/sentence-transformers
Once you get the sentence embedding from the BERT/XLNet you can use those features to train another CNN similar to the one you are using except maybe get rid of the embedding layer as it's expensive.
I am trying to use GloVe embeddings to train a cnn model based on this article (also a rnn, which has this issue). The dataset is a labeled data: text (tweets) with labels (hate, offensive or neither).
The problem is that model performs well on train set but poorly on validation set.
here is the model:
kernel_size = 2
filters = 256
pool_size = 2
gru_node = 64
model = Sequential()
model.add(Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=True))
model.add(Dropout(0.25))
model.add(Conv1D(filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, activation='softmax'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, recurrent_dropout=0.2))
model.add(Dense(1024,activation='relu'))
model.add(Dense(nclasses))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
fitting the model:
X = df.tweet
y = df['classifi'] # classes 0,1,2
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle=False)
X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)
model_RCNN = Build_Model_RCNN_Text(word_index,embeddings_index, 20)
model_RCNN.fit(X_train_Glove, y_train,validation_data=(X_test_Glove, y_test),
epochs=15,batch_size=128,verbose=2)
predicted = model_RCNN.predict(X_test_Glove)
predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))
this is what the distribution looks like (0:hate, 1:offensive, 2:neither)
model summary
Results:
classification report
is this the correct approach or am I missing something here
Generally speaking there are two sides that you can tackle overfitting:
Improving the data
More unique data
oversampling (to balance data)
Limiting the network structure
Dropout (You've implemented this)
Less parameters (You might want to benchmark against a much smaller network)
regularization (ex. L1 and L2)
I'd suggest trying with significantly fewer parameters (because this is quick) and oversampling (because your data seems lopsided).
Also, You can also try hyperparameter fitting. Making a large number of networks with different parameters than picking the best one.
Note: if you do hyper parameter fitting make sure to have an extra validation set because you can easily overfit your test set this way.
Side note: Sometimes when troubleshooting NN it is helpful to set the optimizer to a basic stochastic gradient descent. It slows the training down a bunch but makes the progression much clearer.
Good luck!