Loss is NAN using Keras on the MNIST digit set

Loss is NAN using Keras on the MNIST digit set - python

I am following an example from a data science textbook and have run into an issue where I am getting NaN values for the loss when running simple Keras neural networks to find the optimal learning rate.
# Get data and split into test/train/valid and normalize
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.
# Callback to grow the learning rate at each iteration.
# Also record learning rate and loss at each iteration.
K = keras.backend
class ExponentialLearningRate(keras.callbacks.Callback):
def __init__(self, factor):
self.factor = factor
self.rates = []
self.losses = []
def on_batch_end(self, batch, logs):
self.rates.append(K.get_value(self.model.optimizer.lr))
self.losses.append(logs["loss"])
K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)
# Define the model and compile/fit.
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(lr=1e-3),
metrics=["accuracy"])
expon_lr = ExponentialLearningRate(factor=1.005)
history = model.fit(X_train, y_train, epochs=1,
validation_data=(X_valid, y_valid),
callbacks=[expon_lr])
Running this gives an output of:
1719/1719 [==============================] - 6s 4ms/step - loss: nan - accuracy: 0.6030 - val_loss: nan - val_accuracy: 0.0958
Plotting the loss vs learning rate gives (top is my result, bottom is the expected result from the example I am following):
Notably, the example loss is much noisier than mine and ranges from ~2.5 to ~0.25. My loss only ranges from ~2.5 to exactly 1, at which point the loss goes NaN.
Perhaps something with keras/tf has been updated since this example was written, but as I am new to keras I am wondering what might be the issue here.

Your problem is the ExponentialLearningRate, your learning rate go from 0.0010150751 to 5.237502 which is why your loss is exploding, change the optimizer like this
optimizer=tf.keras.optimizers.Adam(0.001)
and remove the callback, your loss will be fine then

Related

How to get Mean Absolute Errors (MAE) for deep learning model

I am working on a recommendation system using a deep autoencoder model. How can I define the mean absolute error(MAE) loss function, and use it to calculate the model accuracy.
Here is the model
model = deep_model(train_, layers, activation, last_activation, dropout, regularizer_encode, regularizer_decode)
model.compile(optimizer=Adam(lr=0.001), loss="mse", metrics=[ ] )
model.summary()
define the data-validate
data_valid =(train, validate)
hist_model = model.fit(x=train, y=train,
epochs=100,
batch_size=128,
validation_data= data_valid, verbose=2, shuffle=True)

you can define it yourself:
import keras.backend as K
def my_mae(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1) # -1 is correct, using None gives different result '''
then do this:
model.compile(optimizer=Adam(learning_rate=1e-2), loss=my_mae)
but it still a better idea to call the one implemented in keras, in this way:
model.compile(optimizer=Adam(learning_rate=1e-2), loss=tf.keras.losses.MeanAbsoluteError(name="mean_absolute_error"))

I think you can you the scikit-learn function here. This will return the mae of your prediction.
I suggest fitting the model as:
model.compile(optimizer=Adam(lr=0.001), loss="mae", metrics=[])
# instead of loss="mse"
model_history = model.fit(
X_train, # instead of: x=train, y=train
y_train, # why x and y are the same as 'train'?
epochs=100,
batch_size=128,
validation_data=(X_test,y_test))
After train your model, make a prediction:
predicton = model.predict(X_test)
And get the MAE by:
mae_error = mean_absolute_error(y_test, prediction)

i did the used the RMSE to measure the model and the results were good. below the defined loss=masked_ms and metrics=[masked_rmse_clip] function
for the loss function
def masked_mse(y_true, y_pred):
# masked function
mask_true = K.cast(K.not_equal(y_true, 0), K.floatx())
# masked squared error
masked_squared_error = K.square(mask_true * (y_true - y_pred))
masked_mse = K.sum(masked_squared_error, axis=-1) / K.maximum(K.sum(mask_true, axis=-1), 1)
return masked_mse
for the metric
def masked_rmse_clip(y_true, y_pred):
# masked function
mask_true = K.cast(K.not_equal(y_true, 0), K.floatx())
y_pred = K.clip(y_pred, 1, 5)
# masked squared error
masked_squared_error = K.square(mask_true * (y_true - y_pred))
masked_mse = K.sqrt(K.sum(masked_squared_error, axis=-1) / K.maximum(K.sum(mask_true, axis=-1), 1))
return masked_mse
the model
model = deep_model(train, layers, activation, last_activation, dropout, regularizer_encode, regularizer_decode)
model.compile(optimizer=Adam(lr=0.001), loss=masked_mse, metrics=[masked_rmse_clip] )
model.summary()
data_valid =(train, validate)
hist_model = model.fit(x=train, y=train,
epochs=100,
batch_size=128,
validation_data= data_valid, verbose=2, shuffle=True)
i get this output after 100 epoch
Epoch 100/100
48/48 - 6s - loss: 0.9418 - masked_rmse_clip: 0.8024 - val_loss: 0.9853 - val_masked_rmse_clip: 0.8010
I want something like this for the MAE. so i need help with the loss and metrics function for MAE.

Significant difference in performance for model.fit() when using the same training dataset in validation data

I am currently working on a simple classification problem using a feature extractor and classifier. For the feature extractor, I am using the pretrained ResNet50 model found in tf.keras.application in TF 1.15.2. I am doing a 2 staged training where I train the feature extractor in the first stage (no issues here) and training the classifier in the second stage where I freeze the trained feature extractor (issues here). I am using a batch size of 128 and learning rate of 0.3 for training the classifier. I have faced issues with the performance of the actual validation set which is why I am checking the performance of the training set to better understand what is the issue with my code.
Here is my code to load the dataset
def make_generator(images, labels):
def _generator():
for image, label in zip(images, labels):
yield image, np.array(label)
return _generator
def create_dataset(dataset_dict, shuffle=False):
images = []
labels = []
buffer_size = 0
for k,v in dataset_dict.items():
images += v
labels += [k for _ in v]
buffer_size += len(v)
dataset = tf.data.Dataset.from_generator(make_generator(images, labels),
(tf.float32, tf.uint8), output_shapes=(tf.TensorShape([HEIGHT,WIDTH,DEPTH]), tf.TensorShape([])))
if shuffle:
dataset = dataset.shuffle(buffer_size, reshuffle_each_iteration=True)
dataset = dataset.batch(batch_size, drop_remainder=True) \
.prefetch(10)
return dataset
where dataset_dict is a dictionary that contains {label: <preprocessed images from class>}
train_dataset = create_dataset(base_train_dict, shuffle=True)
Create model
def get_feature_extractor(pretrained=True):
if pretrained:
weights = 'imagenet'
feat_extractor = tf.keras.applications.ResNet50(
input_shape=(HEIGHT, WIDTH, DEPTH),
include_top=False,
layers=tf.keras.layers,
pooling='avg',
weights=weights)
for layer in feat_extractor.layers[:25]:
layer.trainable = False
return feat_extractor
model_input = tf.keras.Input(shape=(HEIGHT, WIDTH, DEPTH), name='input')
feat_extractor = get_feature_extractor()
.
.
.
# feat_extractor is trained in first stage
output = feat_extractor(model.inputs)
output = Dense(num_classes, 'softmax')(output)
model = tf.keras.Model(model.inputs, output)
# freeze layers before classification layer
for layer in model.layers[:-1]:
layer.trainable = False
Define metrics
loss = 'sparse_categorical_crossentropy'
if optimizer == 'SGD':
opt = SGD(lr=learning_rate, momentum=momentum, clipnorm=5)
else:
opt = Adam(lr=learning_rate, decay=weight_decay, clipnorm=5)
def sparse_top_3_categorical_accuracy(y_true, y_pred):
return sparse_top_k_categorical_accuracy(y_true, y_pred, k=3)
metrics = ['sparse_categorical_accuracy', sparse_top_3_categorical_accuracy]
model.compile(loss=loss, optimizer=opt, metrics=metrics)
def scheduler(epoch, restart=50):
if epoch % restart == 0:
return learning_rate
else:
return learning_rate * (0.999 ** (epoch % restart))
monitor = 'val_sparse_categorical_accuracy'
callbacks = []
callbacks.append(
ModelCheckpoint('output/local_base_class.h5', monitor=monitor,
save_best_only=True,
mode='max'))
# callbacks.append(CustomTensorBoardCallback(log_dir=tensorboard_dir))
# callbacks.append(ReduceLROnPlateau(monitor='sparse_categorical_accuracy', patience=8, verbose=1))
callbacks.append(TerminateOnNaN())
callbacks.append(CSVLogger(f'classifier_base_{datetime.now()}.log'))
callbacks.append(LearningRateScheduler(scheduler))
epochs = 10
logging.info("Starting classifier training")
history = model.fit(train_dataset,
epochs=epochs,
validation_data=train_dataset,
callbacks=callbacks,
verbose=2)
I am facing the peculiar issue of obtaining drastically different results for my model.fit() call and I suspect it has to do with the BatchNormalization Layer found in the ResNet50.
Epoch 10/10
82/82 - 22s - loss: 5.5539 - sparse_categorical_accuracy: 0.5319 - sparse_top_3_categorical_accuracy: 0.6624 - val_loss: 74.7229 - val_sparse_categorical_accuracy: 0.0033 - val_sparse_top_3_categorical_accuracy: 0.0109
I then did a evaluation
model.evaluate(train_dataset)
82/82 [==============================] - 10s 125ms/step - loss: 74.7229 - sparse_categorical_accuracy: 0.0033 - sparse_top_3_categorical_accuracy: 0.0109
So I attempted a fix by doing
from tensorflow.keras import backend as K
from tensorflow.keras.models import load_model
dependencies = {
'sparse_top_3_categorical_accuracy': sparse_top_3_categorical_accuracy
}
K.clear_session()
K.set_learning_phase(1)
model = load_model('tmp.h5', custom_objects=dependencies)
But I obtained the same result
model.evaluate(train_dataset)
82/82 [==============================] - 11s 135ms/step - loss: 74.7229 - sparse_categorical_accuracy: 0.0033 - sparse_top_3_categorical_accuracy: 0.0109
How do I fix this issue such that I am able to get almost the same level of performance for the model.fit() for the same dataset or the same performance for the model.evaluate?

Calculating Fscore for each epoch using keras (not batch-wise)

Essence of this question:
I'd like to find a proper way to calculate the Fscore for the validation and training data after each epoch (not batch-wise)
For a binary classification task, I'd like to calculate the Fscore after each epoch using a simple keras model. But how to calculate the Fscore seems quite the discussion.
I know keras works in batches and one way to calculate the fscore for each batch would be https://stackoverflow.com/a/45305384/10053244 (Fscore-calculation: f1).
The batch-wise calculation can be quite confusing and I prefer to calculate Fscore after each epoch. So just calling history.history['f1'] or history.history['val_f1'] does not do the trick, cause it shows the batch-wise fscores.
I figured one way is to save each model using the
from keras.callbacks import ModelCheckpoint function:
Saving each model-weights after every epoch
Reloading the model and using model.evaluate or model.predict
Edit:
Using tensorflow backend, I decided to track TruePositives, FalsePositives and FalseNegatives(as umbreon29 suggested).
But now comes the fun part: The results when reloading the model are different for the training data (TP, FP, FN are different) but not for the validation set!
So a simple model storing the weights to rebuild each model and recalculate the TP,FN,TP (and finally the Fscore) looks like:
from keras.metrics import TruePositives, TrueNegatives, FalseNegatives, FalsePositives
## simple keras model
sequence_input = Input(shape=(input_dim,), dtype='float32')
preds = Dense(1, activation='sigmoid',name='output')(sequence_input)
model = Model(sequence_input, preds)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=[TruePositives(name='true_positives'),
TrueNegatives(name='true_negatives'),
FalseNegatives(name='false_negatives'),
FalsePositives(name='false_positives'),
f1])
# model checkpoints
filepath="weights-improvement-{epoch:02d}-{val_f1:.2f}.hdf5"
checkpoint = ModelCheckpoint(os.path.join(savemodel,filepath), monitor='val_f1', verbose=1, save_best_only=False, save_weights_only=True, mode='auto')
callbacks_list = [checkpoint]
history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=epoch, batch_size=batch,
callbacks=[callbacks_list])
## Saving TP, FN, FP to calculate Fscore
tp.append(history.history['true_positives'])
fp.append(history.history['false_positives'])
fn.append(history.history['false_negatives'])
arr_train = np.stack((tp, fp, fn), axis=1)
## doing the same for tp_val, fp_val, fn_val
[...]
arr_val = np.stack((tp_val, fp_val, fn_val), axis=1)
## following method just showes batch-wise fscores and shouldnt be used:
## f1_sc.append(history.history['f1'])
Reloading the model after each epoch to calculate the Fscores (The predict method with sklearn fscore metric from sklearn.metrics import f1_score is equivalent to the calculating fscore metric from TP,FP, FN):
Fscore_val = []
fscorepredict_val_sklearn = []
Fscore_train = []
fscorepredict_train = []
## model_loads contains list of model-paths
for i in model_loads:
## rebuilding the model each time since only weights are stored
sequence_input = Input(shape=(input_dim,), dtype='float32')
preds = Dense(1, activation='sigmoid',name='output')(sequence_input)
model = Model(sequence_input, preds)
model.load_weights(i)
# Compile model (required to make predictions)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=[TruePositives(name='true_positives'),
TrueNegatives(name='true_negatives'),
FalseNegatives(name='false_negatives'),
FalsePositives(name='false_positives'),
f1
])
### For Validation data
## using evaluate
y_pred = model.evaluate(x_val, y_val, verbose=0)
Fscore_val.append(y_pred) ## contains (loss,tp,fp,fn, f1-batchwise)
## using predict
y_pred = model.predict(x_val)
val_preds = [1 if x > 0.5 else 0 for x in y_pred]
cm = f1_score(y_val, val_preds)
fscorepredict_val_sklearn.append(cm) ## equivalent to Fscore calculated from Fscore_vals tp,fp, fn
### For the training data
y_pred = model.evaluate(x_train, y_train, verbose=0)
Fscore_train.append(y_pred) ## also contains (loss,tp,fp,fn, f1-batchwise)
y_pred = model.predict(x_train, verbose=0) # gives probabilities
train_preds = [1 if x > 0.5 else 0 for x in y_pred]
cm = f1_score(y_train, train_preds)
fscorepredict_train.append(cm)
Calculating the Fscore from the tp,fn, and fp using Fscore_val's tp,fn,fp and comparing it tofscorepredict_val_sklearn is equivalent and identical to calculating it from arr_val.
However, the number of tp,fn, and fp is different when comparing Fscore_train and arr_train. Therefore, I also arrive at different Fscores. The number of tp,fn,fp should be the same but they arent.. Is this a bug?
Which one should I trust? The fscorepredict_train seem actually more trustworthy, since they start above the "always guessing class 1"-Fscore (when recall=1). (fscorepredict_train[0]=0.6784 vs f_hist[0]=0.5736 vs always-guessing-class-1-fscore = 0.6751)
[Note: Fscore_train[0] = [0.6853608025386962, 2220.0, 250.0, 111.0, 1993.0, 0.6730511784553528] (loss,tp,tn,fp,fn) leading to fscore= 0.6784 , so Fscore from Fscore_train = fscorepredict_train ]

I provide a custom callback that computes the score (in your case F1 from sklearn) on ALL the data at the end of the epoch (for train and optionally validation)
class F1History(tf.keras.callbacks.Callback):
def __init__(self, train, validation=None):
super(F1History, self).__init__()
self.validation = validation
self.train = train
def on_epoch_end(self, epoch, logs={}):
logs['F1_score_train'] = float('-inf')
X_train, y_train = self.train[0], self.train[1]
y_pred = (self.model.predict(X_train).ravel()>0.5)+0
score = f1_score(y_train, y_pred)
if (self.validation):
logs['F1_score_val'] = float('-inf')
X_valid, y_valid = self.validation[0], self.validation[1]
y_val_pred = (self.model.predict(X_valid).ravel()>0.5)+0
val_score = f1_score(y_valid, y_val_pred)
logs['F1_score_train'] = np.round(score, 5)
logs['F1_score_val'] = np.round(val_score, 5)
else:
logs['F1_score_train'] = np.round(score, 5)
here a dummy example:
x_train = np.random.uniform(0,1, (30,10))
y_train = np.random.randint(0,2, (30))
x_val = np.random.uniform(0,1, (20,10))
y_val = np.random.randint(0,2, (20))
sequence_input = Input(shape=(10,), dtype='float32')
preds = Dense(1, activation='sigmoid',name='output')(sequence_input)
model = Model(sequence_input, preds)
es = EarlyStopping(patience=3, verbose=1, min_delta=0.001, monitor='F1_score_val', mode='max', restore_best_weights=True)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(x_train,y_train, epochs=10,
callbacks=[F1History(train=(x_train,y_train),validation=(x_val,y_val)),es])
the output print:
Epoch 1/10
1/1 [==============================] - 0s 78ms/step - loss: 0.7453 - F1_score_train: 0.3478 - F1_score_val: 0.4762
Epoch 2/10
1/1 [==============================] - 0s 57ms/step - loss: 0.7448 - F1_score_train: 0.3478 - F1_score_val: 0.4762
Epoch 3/10
1/1 [==============================] - 0s 58ms/step - loss: 0.7444 - F1_score_train: 0.3478 - F1_score_val: 0.4762
Epoch 4/10
1/1 [==============================] - ETA: 0s - loss: 0.7439Restoring model weights from the end of the best epoch.
1/1 [==============================] - 0s 70ms/step - loss: 0.7439 - F1_score_train: 0.3478 - F1_score_val: 0.4762
I have TF 2.2 and works without problems, I hope this help

Keras Batchnormalization and sample weights

I am trying the the training and evaluation example on the tensorflow website.
Specifically, this part:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
y_train = y_train.astype('float32')
y_test = y_test.astype('float32')
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train, y_train, sample_weight))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model = get_compiled_model()
model.fit(train_dataset, epochs=3)
It appears that if I add the batch normalization layer (this line: x = layers.BatchNormalization()(x)) I get the following error:
InvalidArgumentError: The second input must be a scalar, but it has shape [64]
[[{{node batch_normalization_2/cond/ReadVariableOp/Switch}}]]
Any ideas?

The same code works for me.
The only lines I changed are :
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3)
to model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3)
(which is version specific)
Then
model.fit(train_dataset, epochs=3) to model.fit(train_dataset, epochs=3, steps_per_epoch=30)
Reason : When using iterators as input to a model, you should specify the steps_per_epoch argument

If you just want to use sample weights, you don't have to use tf.data.Dataset, you can simply run:
model.fit(x=x_train, y=y_train, sample_weight=sample_weight, batch_size=64, epochs=3)
and it works for me (when I change learning_rate to lr as #ASHu2 mentioned).
It gets 97% accuracy after 3 epochs:
...
57408/60000 [===========================>..] - ETA: 0s - loss: 0.1010 - sparse_categorical_accuracy: 0.9709
58816/60000 [============================>.] - ETA: 0s - loss: 0.1011 - sparse_categorical_accuracy: 0.9708
60000/60000 [==============================] - 2s 37us/sample - loss: 0.1007 - sparse_categorical_accuracy: 0.9709
I used TF 1.14.0 on windows.

The problem was solved when I updated tensorflow from version 1.14.1 to 2.0.0-rc1.

Text classification with LSTM Network and Keras

I'm currently using a Naive Bayes algorithm to do my text classification.
My end goal is to be able to highlight parts of a big text document if the algorithm has decided the sentence belonged to a category.
Naive Bayes results are good, but I would like to train a NN for this problem, so I've followed this tutorial:
http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/ to build my LSTM network on Keras.
All these notions are quite difficult for me to understand right now, so excuse me if you see some really stupid things in my code.
1/ Preparation of the training data
I have 155 sentences of different sizes that have been tagged to a label.
All these tagged sentences are in a training.csv file:
8,9,1,2,3,4,5,6,7
16,15,4,6,10,11,12,13,14
17,18
22,19,20,21
24,20,21,23
(each integer representing a word)
And all the results are in another label.csv file:
6,7,17,15,16,18,4,27,30,30,29,14,16,20,21 ...
I have 155 lines in trainings.csv, and of course 155 integers in label.csv
My dictionnary has 1038 words.
2/ The code
Here is my current code:
total_words = 1039
## fix random seed for reproducibility
numpy.random.seed(7)
datafile = open('training.csv', 'r')
datareader = csv.reader(datafile)
data = []
for row in datareader:
data.append(row)
X = data;
Y = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
max_sentence_length = 500
X_train = sequence.pad_sequences(X, maxlen=max_sentence_length)
X_test = sequence.pad_sequences(X, maxlen=max_sentence_length)
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(total_words, embedding_vecor_length, input_length=max_sentence_length))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, Y, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_train, Y, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
This model is never converging:
155/155 [==============================] - 4s - loss: 0.5694 - acc: 0.0000e+00
Epoch 2/3
155/155 [==============================] - 3s - loss: -0.2561 - acc: 0.0000e+00
Epoch 3/3
155/155 [==============================] - 3s - loss: -1.7268 - acc: 0.0000e+00
I would like to have one of the 24 labels as a result, or a list of probabilities for each label.
What am I doing wrong here?
Thanks for your help!

I've updated my code thanks to the great comments posted to my question.
Y_train = numpy.genfromtxt("labels.csv", dtype="int", delimiter=",")
Y_test = numpy.genfromtxt("labels_test.csv", dtype="int", delimiter=",")
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
max_review_length = 50
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_review_length))
model.add(LSTM(10, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(31, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
model.fit(X_train, Y_train, epochs=100, batch_size=30)
I think I can play with LSTM size (10 or 100), number of epochs and batch size.
Model has a very poor accuracy (40%). But currently I think it's because I don't have enough data (150 sentences for 24 labels).
I will put this project in standby mode until I get more data.
If someone has some ideas to improve this code, feel free to comment!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loss is NAN using Keras on the MNIST digit set - python

Your problem is the ExponentialLearningRate, your learning rate go from 0.0010150751 to 5.237502 which is why your loss is exploding, change the optimizer like this optimizer=tf.keras.optimizers.Adam(0.001) and remove the callback, your loss will be fine then

Related

How to get Mean Absolute Errors (MAE) for deep learning model

Significant difference in performance for model.fit() when using the same training dataset in validation data

Calculating Fscore for each epoch using keras (not batch-wise)

Keras Batchnormalization and sample weights

Text classification with LSTM Network and Keras

Categories

Resources