different mse result for training set - python

I get different results for mse. During trainig I get 0.296 after my last training epoch and when I evaluate my model I get 0.112. Does any one know why that is so?
Here is the code:
model = Sequential()
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x)))
model.add(Dropout(0.2))
model.add(LSTM(150,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50,return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
history=model.fit(x_train_r, y_train_r, validation_data=(x_test_r, y_test_r),\
epochs=epochs, batch_size=batch_size, callbacks=[es])
score_test = model.evaluate(x_test_r, y_test_r,batch_size=batch_size)
score_train = model.evaluate(x_train_r, y_train_r,batch_size=batch_size)
print("Score Training Data:")
print(score_train)
Batch size and everything stays the same. Does anyone knows why I get so different results for mse?

The reason for the discrepancy between the training loss and the loss obtained on the training data after the training is finished, is the existence of Dropout layer in the model. That's because this layer has different behavior during training and inference time. As I have mentioned in another answer, you can make this behavior the same either by passing training=True to dropout call, or using K.learning_phase() flag and a backend function.

Related

LSTM accuracy decrease/drop problem while training

I followed this
and this is the code of the model.
model = Sequential()
# Recurrent layer
model.add(LSTM(44, return_sequences=True, dropout=0.1,
recurrent_dropout=0.1))
# Fully connected layer
model.add(Dense(44, activation='relu'))
# Recurrent layer
model.add(LSTM(44, return_sequences=False, dropout=0.1,
recurrent_dropout=0.1))
# Fully connected layer
model.add(Dense(44, activation='relu'))
# Dropout for regularization
model.add(Dropout(0.5))
# Output layer
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
the number of training sets, test sets = 6158, 2640
history = model.fit(trainX, trainY, validation_split = 0.33, epochs=500, batch_size=20, verbose=1)
and below graphs show the accuracy and loss of the LSTM model.
epoch 50
epoch 500
Q. At epoch 50, it fluctuates without convergence, so I change the epoch to 500. So now it converges, but the fluctuation problem is the same. Also, additional problem arose. For training, the accuracy is high at first and then decreases. For validation, the accuracy is initially high, then decreases and then increases again. What is the problem with the model? Looking at the loss graph, it doesn't seem to be an overfitting problem. maybe a dataset problem?

How to solve constant model accuracy after each epoch

I am studying deep learning and as an assignment, I am doing a classification project, which has 17k records with 14 features and a target variable that have 11 classes.
I tried to train a simple neural network
# define the keras model
model1 = keras.Sequential()
model1.add(keras.layers.Dense(64, input_dim=14, activation='relu'))
model1.add(keras.layers.Dense(128, activation='relu'))
model1.add(keras.layers.Dense(64, activation='relu'))
model1.add(keras.layers.Dense(1, activation='softmax'))
# compile the keras model
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
performance1 = model1.fit(x_train, y_train, epochs=100, validation_split=0.2)
But the problem here is I am getting the same accuracy for each epoch, it doesn't seem that model is even learning.
I tried to research this problem and found some similar problems on StackOverflow like this question and tried following things
Applied StandardScaler
Increased/Decreased the hidden layer and neurons
Added dropout layer
Changed the optimizers, loss, and activation function
I also tried to batch_size
But none of them worked, of course, the accuracy was different in the different trials (but has the same problem).
Few of trials are as follows:
# define the keras model
model1 = keras.Sequential()
model1.add(keras.layers.Dense(64, input_dim=14, activation='sigmoid'))
model1.add(keras.layers.Dense(128, activation='sigmoid'))
model1.add(keras.layers.Dense(64, activation='sigmoid'))
model1.add(keras.layers.Dense(1, activation='softmax'))
sgd = keras.optimizers.SGD(lr=0.01)
# compile the keras model
model1.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# define the keras model
model1 = keras.Sequential()
model1.add(keras.layers.Dense(64, input_dim=14, activation='relu'))
model1.add(keras.layers.Dense(128, activation='relu'))
model1.add(keras.layers.Dropout(0.2))
model1.add(keras.layers.Dense(64, activation='relu'))
model1.add(keras.layers.Dropout(0.2))
model1.add(keras.layers.Dense(1, activation='softmax'))
# compile the keras model
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I don't know what's the problem here. Please let me know if you require more details to process this. And please don't close this question I know this question stands a chance to marked as a duplicate question but trust me I tried many things which I can understand as a beginner.
The problem is that the softmax should be applied on an output array to get probabilities and that output array from the model should represent the logits for each target class. hence you would have to change this line
model1.add(keras.layers.Dense(1, activation='softmax'))
# TO
model1.add(keras.layers.Dense(df['Class'].nunique(), activation='softmax'))
EDIT:
# Let's say you have 11 unique values in your class then you last layer will become
model1.add(keras.layers.Dense(11, activation='softmax'))
# Now your loss will be
model1.compile(loss=tf.keras.loss.SparseCategoricalCrossentropy(), optimizer='adam', metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

How to classify job descriptions into their respective industries?
I'm trying to classify text using LSTM, in particular converting job description
Into industry categories, unfortunately the things I've tried so far
Have only resulted in 76% accuracy.
What is an effective method to classify text for more than 30 classes using LSTM?
I have tried three alternatives
Model_1
Model_1 achieves test accuracy of 65%
embedding_dimension = 80
max_sequence_length = 3000
epochs = 50
batch_size = 100
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model_2
Model_2 achieves test accuracy of 64%
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(100))
model.add(Dropout(rate=0.5))
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
Model_3
Model_3 achieves test accuracy of 76%
model.add(Embedding(max_words, embedding_dimension, input_length= x_shape, trainable=False))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(100, dropout=0.4, recurrent_dropout=0.4))
model.add(Dense(128, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.039, seed=None)))
model.add(BatchNormalization())
model.add(Dense(64, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(32, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer= "adam" , loss='categorical_crossentropy', metrics=['acc'])
I'd like to know how to improve the accuracy of the network.
Start with a minimal base line
You have a simple network at the top of your code, but try this one as your baseline
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(output_dim//4)),
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The intuition here is to see how much work LSTM can do. We don't need it to output the full 30 output_dims (the number of classes) but instead a smaller set of features base the decision of the classes on.
Your larger networks have layers like Dense(128) with 100 input. That's 100x128 = 12,800 connections to learn.
Improving imbalance right away
Your data may have a lot of imbalance so for the next step, let's address that with a loss function called the top_k_loss. This loss function will make your network only train on the training examples that it is having the most trouble on. This does a great job of handling class imbalance without any other plumbing
def top_k_loss(k=16):
#tf.function
def loss(y_true, y_pred):
y_error_of_true = tf.keras.losses.categorical_crossentropy(y_true=y_true,y_pred=y_pred)
topk, indexs = tf.math.top_k( y_error_of_true, k=tf.minimum(k, y_true.shape[0]) )
return topk
return loss
Use this with a batch size of 128 to 512. You add it to your model compile like so
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy']
Now, you'll see that using model.fit on this will return some dissipointing numbers. That's because it is only reporting THE WORST 16 out of each training batch. Recompile with your regular loss and run model.evaluate to find out how it does on the training and again on the test.
Train for 100 epochs, and at this point you should already see some good results.
Next Steps
Make the whole model generate and testing into a function like so
def run_experiment(lstm_layers=1, lstm_size=output_dim//4, dense_layers=0, dense_size=output_dim//4):
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
for i in range(lstm_layers-1):
model.add(LSTM(lstm_size, return_sequences=True)),
model.add(LSTM(lstm_size)),
for i in range(dense_layers):
model.add(Dense(dense_size, activation='tanh'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy'])
model.fit(x=x,y=y,epochs=100)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
loss, accuracy = model.evaluate(x=x_test, y=y_test)
return loss
that can run a whole experiment for you. Now it is a matter of finding a better architecture by searching. One way to search is random. Random is actually really good. If you want to get fancy, I recommend hyperopt. Don't bother with grid search, random usually beats it for large search spaces.
best_loss = 10**10
best_config = []
for trial in range(100):
config = [
randint(1,4), # lstm layers
randint(8,64), # lstm_size
randint(0,8), # dense_layers
randint(8,64) # dense_size
]
result = run_experiment(*config)
if result < best_loss:
best_config = config
print('Found a better loss ',result,' from config ',config)

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?
I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

Predicting in Stateful LSTMs

I have the following Keras model, although it could be generalised to a normal RNN using GRUs.
model = Sequential()
model.add(GRU(40, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))
model.add(GRU(10, batch_input_shape=(batch_size, look_back, features), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# Train model
iter = 10000
for i in range(iter):
model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=0, shuffle=False)
if (i<(iter-1)):
model.reset_states()
testPred = model.predict(testX,batch_size=batch_size)
print(mean_squared_error(testY,testPred))
If I don't have the if statement with regards to resetting the state, the mean squared error has always been higher. Considering that the test set is right after the train set wouldn't it make sense that you would want to preserve the state of the last memory block?
This tutorial seems to suggest otherwise: http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ (i.e. he simply doesn't have that if statement, doesn't explicitly mention anything about keeping the last state).
So just wondering if I am correct about this.

Categories

Resources