Binary Classification Neural Network: both Nan loss and NaN predictions

Binary Classification Neural Network: both Nan loss and NaN predictions - python

This model tries to predict two states based on an array with 400 numbers. During the first training round the model starts with loss on the first +- 200 samples, and then goes into Nan loss. The accuracy stays around 50% and when I print the predictions for the test set, it only predicts NaN. My X_train has a shape of (1934, 400, 1) and my y_train a shape of (1934,). I already tried checking for NaNs in the dataset, but there were none.
My model looks like this:
model = Sequential()
model.add(LSTM(128, input_shape=(400,1), activation='relu', return_sequences=True))
model.add(Dropout(0,2))
model.add(LSTM(128, activation='relu'))
model.add(Dropout(0,2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0,2))
model.add(Dense(1, activation='sigmoid'))
opt = tf.keras.optimizers.Adam(lr=0.01)
# mean_squared_error = mse
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=3, validation_split = 0.1, shuffle=True, batch_size = 64)
edit: Solved by changing the activation function to tanh. Sigmoid stays sigmoid!

The problem was solved with changing the activation functions to "tanh". Seems that dropout should be 0.2 instead of 0,2 also, but this wasn't the cause of the problem.

Related

NAN values with SGD optimizer in Keras for regression NN

I try to train a NN for regression. When using SGD optimizer class from Keras I suddently get NAN values as prediction from my network after the first step. Before I was running trainings with the Adam optimizer class and everything worked fine. I already tried to change the learning rate of SGD but still NAN values occure as model prediction after the first step and after compiling.
Since my training worked with Adam optimizer I don't believe my inputs are causing the NAN's. I already checked my input valus for NaNs and removed all of them. So what could cause this behavior?
Here is my code:
from keras.optimizers import Adam
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(300,input_shape=(50,), kernel_initializer='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(300, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(500, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dropout(0.3))
model.add(Dense(400, kernel_initializer='glorot_uniform', activation='relu')) model.add(Dense(1, kernel_initializer='glorot_uniform', activation='linear'))
opt = SGD(lr=0.001, decay=1e-6)
model.compile(loss='mse', optimizer=opt)
model.fit(x_train, y_train, epochs=100, batch_size=32, verbose=0, validation_data=(x_test, y_test))
#print(type(x_train)) ='pandas.core.frame.DataFrame'>
#print( x_train.shape) = (10000 , 50)

Using ANNs for regression is a bit tricky as outputs don't have an upper bound.
The NaNs in the loss function is mostly likely because you have exploding gradients.
The reason that it does not show NaN when you use Adam is that Adam adapts the learning rate. Adam works most of the times, so avoid using SGD as long as you don't have a specific reason.
I am not sure what your dataset contains but, you can try:
Adding L2 regularization
Normalizing inputs
Increasing batch size.

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

How to classify job descriptions into their respective industries?
I'm trying to classify text using LSTM, in particular converting job description
Into industry categories, unfortunately the things I've tried so far
Have only resulted in 76% accuracy.
What is an effective method to classify text for more than 30 classes using LSTM?
I have tried three alternatives
Model_1
Model_1 achieves test accuracy of 65%
embedding_dimension = 80
max_sequence_length = 3000
epochs = 50
batch_size = 100
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model_2
Model_2 achieves test accuracy of 64%
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(100))
model.add(Dropout(rate=0.5))
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(rate=0.5))
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
Model_3
Model_3 achieves test accuracy of 76%
model.add(Embedding(max_words, embedding_dimension, input_length= x_shape, trainable=False))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(100, dropout=0.4, recurrent_dropout=0.4))
model.add(Dense(128, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.039, seed=None)))
model.add(BatchNormalization())
model.add(Dense(64, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(32, activation='sigmoid', kernel_initializer=RandomNormal(mean=0.0, stddev=0.55, seed=None)) )
model.add(BatchNormalization())
model.add(Dense(output_dim, activation='softmax'))
model.compile(optimizer= "adam" , loss='categorical_crossentropy', metrics=['acc'])
I'd like to know how to improve the accuracy of the network.

Start with a minimal base line
You have a simple network at the top of your code, but try this one as your baseline
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
model.add(LSTM(output_dim//4)),
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
The intuition here is to see how much work LSTM can do. We don't need it to output the full 30 output_dims (the number of classes) but instead a smaller set of features base the decision of the classes on.
Your larger networks have layers like Dense(128) with 100 input. That's 100x128 = 12,800 connections to learn.
Improving imbalance right away
Your data may have a lot of imbalance so for the next step, let's address that with a loss function called the top_k_loss. This loss function will make your network only train on the training examples that it is having the most trouble on. This does a great job of handling class imbalance without any other plumbing
def top_k_loss(k=16):
#tf.function
def loss(y_true, y_pred):
y_error_of_true = tf.keras.losses.categorical_crossentropy(y_true=y_true,y_pred=y_pred)
topk, indexs = tf.math.top_k( y_error_of_true, k=tf.minimum(k, y_true.shape[0]) )
return topk
return loss
Use this with a batch size of 128 to 512. You add it to your model compile like so
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy']
Now, you'll see that using model.fit on this will return some dissipointing numbers. That's because it is only reporting THE WORST 16 out of each training batch. Recompile with your regular loss and run model.evaluate to find out how it does on the training and again on the test.
Train for 100 epochs, and at this point you should already see some good results.
Next Steps
Make the whole model generate and testing into a function like so
def run_experiment(lstm_layers=1, lstm_size=output_dim//4, dense_layers=0, dense_size=output_dim//4):
model = Sequential()
model.add(Embedding(max_words, embedding_dimension, input_length=x_shape))
for i in range(lstm_layers-1):
model.add(LSTM(lstm_size, return_sequences=True)),
model.add(LSTM(lstm_size)),
for i in range(dense_layers):
model.add(Dense(dense_size, activation='tanh'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss=top_k_loss(16), optimizer='adam', metrics=['accuracy'])
model.fit(x=x,y=y,epochs=100)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
loss, accuracy = model.evaluate(x=x_test, y=y_test)
return loss
that can run a whole experiment for you. Now it is a matter of finding a better architecture by searching. One way to search is random. Random is actually really good. If you want to get fancy, I recommend hyperopt. Don't bother with grid search, random usually beats it for large search spaces.
best_loss = 10**10
best_config = []
for trial in range(100):
config = [
randint(1,4), # lstm layers
randint(8,64), # lstm_size
randint(0,8), # dense_layers
randint(8,64) # dense_size
]
result = run_experiment(*config)
if result < best_loss:
best_config = config
print('Found a better loss ',result,' from config ',config)

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?

I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

Why is my accuracy and loss, 0.000 and nan, in keras?

My understanding is that "sparse_categorical_crossentropy" fits my multi-classification without one-hot-encoding case. I also slowed the adam learning rate in case it is overshooting the predictions.
I am not sure what I am not understanding that I am doing incorrectly.
My input data looks similar to this:
My output prediction results are labels: [1 2 3 4 5 6 7 8 9 10] (not one-hot-encoded). Each number represents I want the network to end up choosing.
print(x_train.shape)
print(x_test.shape)
x_train = x_train.reshape(x_train.shape[0], round(x_train.shape[1]/5), 5)
x_test = x_test.reshape(x_test.shape[0], round(x_test.shape[1]/5), 5)
print(x_train.shape)
print(np.unique(y_train))
print(len(np.unique(y_train)))
input_shape = (x_train.shape[1], 5)
adam = keras.optimizers.Adam(learning_rate=0.0001)
model = Sequential()
model.add(Conv1D(512, 5, activation='relu', input_shape=input_shape))
model.add(Conv1D(512, 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(512, 5, activation='relu'))
model.add(Conv1D(512, 5, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=32, epochs=25, validation_data=(x_test, y_test))
print(model.summary())
Here is the error results:
Model Layers (if it helps):

I see two main problems in your approach
your labels are from 1 to 10... they must start from 0 in order to have them in the range 0-9. this can be achieved simply doing y_train-1 and y_test-1 (if y_test and y_train are numpy arrays)
the last layer of your network must be Dense(10, activation='softmax') where 10 is the number of class to predict and softmax is used to generate probabilities in multiclass problem
Use sparse_categorical_crossentropy is ok because you have integer encoded target

How to build a 1D CNN

I am trying to use a CNN for classification. My training data is shown in the picture below and has 9923 pieces of data with each piece containing 1k numeric values.
My current model has only around 10 percent accuracy and I am wondering if anyone knows if I am doing something wrong.
model = Sequential()
model.add(Conv1D(64,3, activation ='relu', input_shape= (1000, 1)))
model.add(MaxPooling1D(2))
model.add(Conv1D(64,3, activation ='relu'))
model.add(MaxPooling1D(pool_size=(2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(28, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs = 30, validation_split = 0.1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Binary Classification Neural Network: both Nan loss and NaN predictions - python

The problem was solved with changing the activation functions to "tanh". Seems that dropout should be 0.2 instead of 0,2 also, but this wasn't the cause of the problem.

Related

NAN values with SGD optimizer in Keras for regression NN

LSTM for 30 classes, badly overfitting, cannot go over 76% test accuracy

Machine Learning with Keras: Different Validation Loss for the Same Model

Why is my accuracy and loss, 0.000 and nan, in keras?

How to build a 1D CNN

Categories

Resources