LSTM accuracy decrease/drop problem while training

LSTM accuracy decrease/drop problem while training - python

I followed this
and this is the code of the model.
model = Sequential()
# Recurrent layer
model.add(LSTM(44, return_sequences=True, dropout=0.1,
recurrent_dropout=0.1))
# Fully connected layer
model.add(Dense(44, activation='relu'))
# Recurrent layer
model.add(LSTM(44, return_sequences=False, dropout=0.1,
recurrent_dropout=0.1))
# Fully connected layer
model.add(Dense(44, activation='relu'))
# Dropout for regularization
model.add(Dropout(0.5))
# Output layer
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
the number of training sets, test sets = 6158, 2640
history = model.fit(trainX, trainY, validation_split = 0.33, epochs=500, batch_size=20, verbose=1)
and below graphs show the accuracy and loss of the LSTM model.
epoch 50
epoch 500
Q. At epoch 50, it fluctuates without convergence, so I change the epoch to 500. So now it converges, but the fluctuation problem is the same. Also, additional problem arose. For training, the accuracy is high at first and then decreases. For validation, the accuracy is initially high, then decreases and then increases again. What is the problem with the model? Looking at the loss graph, it doesn't seem to be an overfitting problem. maybe a dataset problem?

Related

NAN values with SGD optimizer in Keras for regression NN

I try to train a NN for regression. When using SGD optimizer class from Keras I suddently get NAN values as prediction from my network after the first step. Before I was running trainings with the Adam optimizer class and everything worked fine. I already tried to change the learning rate of SGD but still NAN values occure as model prediction after the first step and after compiling.
Since my training worked with Adam optimizer I don't believe my inputs are causing the NAN's. I already checked my input valus for NaNs and removed all of them. So what could cause this behavior?
Here is my code:
from keras.optimizers import Adam
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(300,input_shape=(50,), kernel_initializer='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(300, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(500, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(400, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dense(1, kernel_initializer='glorot_uniform', activation='linear'))
opt = SGD(lr=0.001, decay=1e-6)
model.compile(loss='mse', optimizer=opt)
model.fit(x_train, y_train, epochs=100, batch_size=32, verbose=0, validation_data=(x_test, y_test))
#print(type(x_train)) ='pandas.core.frame.DataFrame'>
#print( x_train.shape) = (10000 , 50)

Using ANNs for regression is a bit tricky as outputs don't have an upper bound.
The NaNs in the loss function is mostly likely because you have exploding gradients.
The reason that it does not show NaN when you use Adam is that Adam adapts the learning rate. Adam works most of the times, so avoid using SGD as long as you don't have a specific reason.
I am not sure what your dataset contains but, you can try:
Adding L2 regularization
Normalizing inputs
Increasing batch size.

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

How to classify job descriptions into their respective industries?
I'm trying to classify text using LSTM, in particular converting job description
Into industry categories, unfortunately the things I've tried so far
Have only resulted in 76% accuracy.
What is an effective method to classify text for more than 30 classes using LSTM?
I have tried three alternatives
Model_1
Model_1 achieves test accuracy of 65%
embedding_dimension = 80
max_sequence_length = 3000
epochs = 50
batch_size = 100
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model_2
Model_2 achieves test accuracy of 64%
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(100))
model.add(Dropout(rate=0.5))
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
Model_3
Model_3 achieves test accuracy of 76%
model.add(Embedding(max_words, embedding_dimension, input_length= x_shape, trainable=False))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(100, dropout=0.4, recurrent_dropout=0.4))
model.add(Dense(128, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.039, seed=None)))
model.add(BatchNormalization())
model.add(Dense(64, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(32, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer= "adam" , loss='categorical_crossentropy', metrics=['acc'])
I'd like to know how to improve the accuracy of the network.

Start with a minimal base line
You have a simple network at the top of your code, but try this one as your baseline
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(output_dim//4)),
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The intuition here is to see how much work LSTM can do. We don't need it to output the full 30 output_dims (the number of classes) but instead a smaller set of features base the decision of the classes on.
Your larger networks have layers like Dense(128) with 100 input. That's 100x128 = 12,800 connections to learn.
Improving imbalance right away
Your data may have a lot of imbalance so for the next step, let's address that with a loss function called the top_k_loss. This loss function will make your network only train on the training examples that it is having the most trouble on. This does a great job of handling class imbalance without any other plumbing
def top_k_loss(k=16):
#tf.function
def loss(y_true, y_pred):
y_error_of_true = tf.keras.losses.categorical_crossentropy(y_true=y_true,y_pred=y_pred)
topk, indexs = tf.math.top_k( y_error_of_true, k=tf.minimum(k, y_true.shape[0]) )
return topk
return loss
Use this with a batch size of 128 to 512. You add it to your model compile like so
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy']
Now, you'll see that using model.fit on this will return some dissipointing numbers. That's because it is only reporting THE WORST 16 out of each training batch. Recompile with your regular loss and run model.evaluate to find out how it does on the training and again on the test.
Train for 100 epochs, and at this point you should already see some good results.
Next Steps
Make the whole model generate and testing into a function like so
def run_experiment(lstm_layers=1, lstm_size=output_dim//4, dense_layers=0, dense_size=output_dim//4):
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
for i in range(lstm_layers-1):
model.add(LSTM(lstm_size, return_sequences=True)),
model.add(LSTM(lstm_size)),
for i in range(dense_layers):
model.add(Dense(dense_size, activation='tanh'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy'])
model.fit(x=x,y=y,epochs=100)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
loss, accuracy = model.evaluate(x=x_test, y=y_test)
return loss
that can run a whole experiment for you. Now it is a matter of finding a better architecture by searching. One way to search is random. Random is actually really good. If you want to get fancy, I recommend hyperopt. Don't bother with grid search, random usually beats it for large search spaces.
best_loss = 10**10
best_config = []
for trial in range(100):
config = [
randint(1,4), # lstm layers
randint(8,64), # lstm_size
randint(0,8), # dense_layers
randint(8,64) # dense_size
]
result = run_experiment(*config)
if result < best_loss:
best_config = config
print('Found a better loss ',result,' from config ',config)

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?

I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

Validation accuracy is low and not increasing while training accuracy is increasing

I am a newbie to Keras and machine learning in general. I’m trying to build a classification model using the Sequential model. After some experiments, I see that my validation accuracy behavior is very low and not increasing, although the training accuracy works well. I added regularization parameters to the layers and dropouts also in between the layers. Still, the behavior exists. Here’s my code.
from keras.regularizers import l2
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(filters=32, kernel_size=1, strides=1, padding="SAME", activation="relu", input_shape=[512,1],kernel_regularizer=keras.regularizers.l2(l=0.1))) # 一定要加 input shape
keras.layers.Dropout=0.35
model.add(keras.layers.MaxPool1D(pool_size=1,activity_regularizer=l2(0.01)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation="softmax",activity_regularizer=l2(0.01)))
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Ahistory = model.fit(train_x, trainy, epochs=300,
validation_split = 0.2,
batch_size = 16)
And here is the final results I got.
What is the reason behind this.? How do I fine-tune the model.?

different mse result for training set

I get different results for mse. During trainig I get 0.296 after my last training epoch and when I evaluate my model I get 0.112. Does any one know why that is so?
Here is the code:
model = Sequential()
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x)))
model.add(Dropout(0.2))
model.add(LSTM(150,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50,return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
history=model.fit(x_train_r, y_train_r, validation_data=(x_test_r, y_test_r),\
epochs=epochs, batch_size=batch_size, callbacks=[es])
score_test = model.evaluate(x_test_r, y_test_r,batch_size=batch_size)
score_train = model.evaluate(x_train_r, y_train_r,batch_size=batch_size)
print("Score Training Data:")
print(score_train)
Batch size and everything stays the same. Does anyone knows why I get so different results for mse?

The reason for the discrepancy between the training loss and the loss obtained on the training data after the training is finished, is the existence of Dropout layer in the model. That's because this layer has different behavior during training and inference time. As I have mentioned in another answer, you can make this behavior the same either by passing training=True to dropout call, or using K.learning_phase() flag and a backend function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

LSTM accuracy decrease/drop problem while training - python

Related

NAN values with SGD optimizer in Keras for regression NN

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

Machine Learning with Keras: Different Validation Loss for the Same Model

Validation accuracy is low and not increasing while training accuracy is increasing

different mse result for training set

Categories

Resources