NAN values with SGD optimizer in Keras for regression NN

NAN values with SGD optimizer in Keras for regression NN - python

I try to train a NN for regression. When using SGD optimizer class from Keras I suddently get NAN values as prediction from my network after the first step. Before I was running trainings with the Adam optimizer class and everything worked fine. I already tried to change the learning rate of SGD but still NAN values occure as model prediction after the first step and after compiling.
Since my training worked with Adam optimizer I don't believe my inputs are causing the NAN's. I already checked my input valus for NaNs and removed all of them. So what could cause this behavior?
Here is my code:
from keras.optimizers import Adam
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(300,input_shape=(50,), kernel_initializer='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(300, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(500, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(400, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dense(1, kernel_initializer='glorot_uniform', activation='linear'))
opt = SGD(lr=0.001, decay=1e-6)
model.compile(loss='mse', optimizer=opt)
model.fit(x_train, y_train, epochs=100, batch_size=32, verbose=0, validation_data=(x_test, y_test))
#print(type(x_train)) ='pandas.core.frame.DataFrame'>
#print( x_train.shape) = (10000 , 50)

Using ANNs for regression is a bit tricky as outputs don't have an upper bound.
The NaNs in the loss function is mostly likely because you have exploding gradients.
The reason that it does not show NaN when you use Adam is that Adam adapts the learning rate. Adam works most of the times, so avoid using SGD as long as you don't have a specific reason.
I am not sure what your dataset contains but, you can try:
Adding L2 regularization
Normalizing inputs
Increasing batch size.

Related

How to solve constant model accuracy after each epoch

I am studying deep learning and as an assignment, I am doing a classification project, which has 17k records with 14 features and a target variable that have 11 classes.
I tried to train a simple neural network
# define the keras model
model1 = keras.Sequential()
model1.add(keras.layers.Dense(64, input_dim=14, activation='relu'))
model1.add(keras.layers.Dense(128, activation='relu'))
model1.add(keras.layers.Dense(64, activation='relu'))
model1.add(keras.layers.Dense(1, activation='softmax'))
# compile the keras model
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
performance1 = model1.fit(x_train, y_train, epochs=100, validation_split=0.2)
But the problem here is I am getting the same accuracy for each epoch, it doesn't seem that model is even learning.
I tried to research this problem and found some similar problems on StackOverflow like this question and tried following things
Applied StandardScaler
Increased/Decreased the hidden layer and neurons
Added dropout layer
Changed the optimizers, loss, and activation function
I also tried to batch_size
But none of them worked, of course, the accuracy was different in the different trials (but has the same problem).
Few of trials are as follows:
# define the keras model
model1 = keras.Sequential()
model1.add(keras.layers.Dense(64, input_dim=14, activation='sigmoid'))
model1.add(keras.layers.Dense(128, activation='sigmoid'))
model1.add(keras.layers.Dense(64, activation='sigmoid'))
model1.add(keras.layers.Dense(1, activation='softmax'))
sgd = keras.optimizers.SGD(lr=0.01)
# compile the keras model
model1.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# define the keras model
model1 = keras.Sequential()
model1.add(keras.layers.Dense(64, input_dim=14, activation='relu'))
model1.add(keras.layers.Dense(128, activation='relu'))
model1.add(keras.layers.Dropout(0.2))
model1.add(keras.layers.Dense(64, activation='relu'))
model1.add(keras.layers.Dropout(0.2))
model1.add(keras.layers.Dense(1, activation='softmax'))
# compile the keras model
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I don't know what's the problem here. Please let me know if you require more details to process this. And please don't close this question I know this question stands a chance to marked as a duplicate question but trust me I tried many things which I can understand as a beginner.

The problem is that the softmax should be applied on an output array to get probabilities and that output array from the model should represent the logits for each target class. hence you would have to change this line
model1.add(keras.layers.Dense(1, activation='softmax'))
# TO
model1.add(keras.layers.Dense(df['Class'].nunique(), activation='softmax'))
EDIT:
# Let's say you have 11 unique values in your class then you last layer will become
model1.add(keras.layers.Dense(11, activation='softmax'))
# Now your loss will be
model1.compile(loss=tf.keras.loss.SparseCategoricalCrossentropy(), optimizer='adam', metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

How to classify job descriptions into their respective industries?
I'm trying to classify text using LSTM, in particular converting job description
Into industry categories, unfortunately the things I've tried so far
Have only resulted in 76% accuracy.
What is an effective method to classify text for more than 30 classes using LSTM?
I have tried three alternatives
Model_1
Model_1 achieves test accuracy of 65%
embedding_dimension = 80
max_sequence_length = 3000
epochs = 50
batch_size = 100
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model_2
Model_2 achieves test accuracy of 64%
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(100))
model.add(Dropout(rate=0.5))
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
Model_3
Model_3 achieves test accuracy of 76%
model.add(Embedding(max_words, embedding_dimension, input_length= x_shape, trainable=False))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(100, dropout=0.4, recurrent_dropout=0.4))
model.add(Dense(128, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.039, seed=None)))
model.add(BatchNormalization())
model.add(Dense(64, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(32, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer= "adam" , loss='categorical_crossentropy', metrics=['acc'])
I'd like to know how to improve the accuracy of the network.

Start with a minimal base line
You have a simple network at the top of your code, but try this one as your baseline
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(output_dim//4)),
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The intuition here is to see how much work LSTM can do. We don't need it to output the full 30 output_dims (the number of classes) but instead a smaller set of features base the decision of the classes on.
Your larger networks have layers like Dense(128) with 100 input. That's 100x128 = 12,800 connections to learn.
Improving imbalance right away
Your data may have a lot of imbalance so for the next step, let's address that with a loss function called the top_k_loss. This loss function will make your network only train on the training examples that it is having the most trouble on. This does a great job of handling class imbalance without any other plumbing
def top_k_loss(k=16):
#tf.function
def loss(y_true, y_pred):
y_error_of_true = tf.keras.losses.categorical_crossentropy(y_true=y_true,y_pred=y_pred)
topk, indexs = tf.math.top_k( y_error_of_true, k=tf.minimum(k, y_true.shape[0]) )
return topk
return loss
Use this with a batch size of 128 to 512. You add it to your model compile like so
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy']
Now, you'll see that using model.fit on this will return some dissipointing numbers. That's because it is only reporting THE WORST 16 out of each training batch. Recompile with your regular loss and run model.evaluate to find out how it does on the training and again on the test.
Train for 100 epochs, and at this point you should already see some good results.
Next Steps
Make the whole model generate and testing into a function like so
def run_experiment(lstm_layers=1, lstm_size=output_dim//4, dense_layers=0, dense_size=output_dim//4):
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
for i in range(lstm_layers-1):
model.add(LSTM(lstm_size, return_sequences=True)),
model.add(LSTM(lstm_size)),
for i in range(dense_layers):
model.add(Dense(dense_size, activation='tanh'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy'])
model.fit(x=x,y=y,epochs=100)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
loss, accuracy = model.evaluate(x=x_test, y=y_test)
return loss
that can run a whole experiment for you. Now it is a matter of finding a better architecture by searching. One way to search is random. Random is actually really good. If you want to get fancy, I recommend hyperopt. Don't bother with grid search, random usually beats it for large search spaces.
best_loss = 10**10
best_config = []
for trial in range(100):
config = [
randint(1,4), # lstm layers
randint(8,64), # lstm_size
randint(0,8), # dense_layers
randint(8,64) # dense_size
]
result = run_experiment(*config)
if result < best_loss:
best_config = config
print('Found a better loss ',result,' from config ',config)

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?

I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

How do I fix loss: nan for Keras in Python?

I'm trying to perform a regression using Keras in Python. During training, the loss, mean_squared_error, val_mean_squared_error are all nan. How can I fix this?
I've checked thoroughly, and there are no missing values (NaNs) or infinite values in my data. I've also tried gradient value clipping and gradient norm scaling, but none of them worked.
Below is my neural net model.
model = Sequential()
n_cols = train_X.shape[1]
early_stopping_monitor = EarlyStopping(patience=10)
model.add(Dense(400, activation='relu', input_shape=(n_cols,)))
model.add(Dense(400, activation='relu'))
model.add(Dense(400, activation='relu'))
model.add(Dense(400, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_squared_error'])
model.fit(train_X, train_y, validation_split=0.2, epochs=500, callbacks=[early_stopping_monitor])

Stock price predictions of keras multilayer LSTM model converge to a constant value

I've made a multilayer LSTM model that uses regression to predict next frame's values of the data. The model finishes after 20 epochs. I then get some predictions and compare them to my ground truth values. As you can see them in the picture above, predictions converge to a constant value. I don't know why this happens.
Here is my model so far:
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers import LSTM, BatchNormalization
from tensorflow.python.keras.initializers import RandomUniform
init = RandomUniform(minval=-0.05, maxval= 0.05)
model = Sequential()
model.add(LSTM(kernel_initializer=init, activation='relu', return_sequences=True, units=800, dropout=0.5, recurrent_dropout=0.2, input_shape=(x_train.shape[1], x_train.shape[2]) ))
model.add(LSTM(kernel_initializer=init, activation='relu', return_sequences=False, units=500, dropout=0.5, recurrent_dropout=0.2 ))
model.add(Dense(1024, activation='linear', kernel_initializer=init))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(1, activation='linear', kernel_initializer= 'normal'))
model.compile(loss='mean_squared_error', optimizer='rmsprop' )
model.summary()
EDIT1:
I decreased epochs from 20 to 3. results are as follows:
By comparing 2 pictures, I can conclude that when the number of epochs increases, the predictions are more likely to converge to some specific value which is around -0.1.

So, after trying different number of LSTM units and different types of architectures, I realized that the current number of LSTM units causes the model to learns so slowly and 20 epochs were not sufficient for such huge model.For each layer, I changed the number of LSTM units to 64 and also removed Dense(1024)layer and increased the number of epochs from 20 to 400 and results were incredibly close to the ground truth values. I should mention that the dataset used in the new model was different from the former one because I encountered some problems with that dataset . here is the new model:
from keras.optimizers import RMSprop
from keras.initializers import glorot_uniform, glorot_normal, RandomUniform
init = glorot_normal(seed=None)
init1 = RandomUniform(minval=-0.05, maxval=0.05)
optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model = Sequential()
model.add(LSTM(units=64, dropout=0.2, recurrent_dropout=0.2,
input_shape=(x_train.shape[1], x_train.shape[2]),
return_sequences=True, kernel_initializer=init))
model.add(LSTM(units=64, dropout=0.2, recurrent_dropout=0.2,
return_sequences=False, kernel_initializer=init))
model.add(Dense(1, activation='linear', kernel_initializer= init1))
model.compile(loss='mean_squared_error', optimizer=optimizer )
model.summary()
you can see the predictions here:
It's still not the best model, but at least outperformed the former one.
If you have any further recommendation on how to improve it, it'll be greatly appreciated.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

NAN values with SGD optimizer in Keras for regression NN - python

Related

How to solve constant model accuracy after each epoch

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

Machine Learning with Keras: Different Validation Loss for the Same Model

How do I fix loss: nan for Keras in Python?

Stock price predictions of keras multilayer LSTM model converge to a constant value

Categories

Resources