Why loss is NaN - python

I am learning about DNN and transformers and I have the following problem. Running the model on IMDB dataset this transformer NN works just fine. But when I try and run it on another dataset which instead of 2 classes has 4, the loss goes to NaN and the accuracy quickly goes to 0. Please if you are kind to help me. I am struggling for 2 days with the same problem.
I've tried clipping gradients, increasing batch size, increasing dropout, reducing lr.
If you are kind to help me I will be forever grateful.
embed_dim = 32 # Embedding size for each token
num_heads = 2 # Number of attention heads
ff_dim = 32 # Hidden layer size in feed forward network inside transformer
inputs = layers.Input(shape=(maxlen,))
embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = transformer_block(x)
# x = transformer_block(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(4, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
opt = tf.keras.optimizers.Adam(learning_rate=3e-04,clipvalue=0.5)
model.compile(optimizer = opt, loss="sparse_categorical_crossentropy", metrics=["accuracy"])
history = model.fit(
x_train, y_train, batch_size=128, epochs=10, validation_data=(x_val, y_val)
)

I was running on GPU and the error did not appeared. It was because I was having labels [1,2,4,5] and I should have labels between 0 and 4.

Related

Neural network written with keras outputs only 0

my neural network written in keras, for the problem of binary image classification, after selecting hyperparameters using the keras tuner, produces only zeros.
import keras_tuner
from kerastuner import BayesianOptimization
from keras_tuner import Objective
from tensorflow.keras.models import Model
from tensorflow.keras.applications import Xception
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from torch.nn.modules import activation
def build_model(hp):
# create the base pre-trained model
base_model = Xception(include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False
x = base_model.output
x = Flatten()(x)
hp_units = hp.Int('units', min_value=32, max_value=4096, step=32)
x = Dense(units = hp_units, activation="relu")(x)
hp_rate = hp.Float('rate', min_value = 0.01, max_value=0.9, step=0.01)
x = Dropout(rate = hp_rate)(x)
predictions = Dense(1, activation='sigmoid')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
hp_learning_rate = hp.Float('learning_rate', max_value = 1e-2, min_value = 1e-7, step = 0.0005)
optimizer = hp.Choice('optimizer', ['adam', 'sgd', 'adagrad', 'rmsprop'])
model.compile(optimizer,
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
return model
stop_early = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, min_delta= 0.0001)
tuner = BayesianOptimization(
hypermodel = build_model,
objective = Objective(name="val_accuracy",direction="max"),
max_trials = 10,
directory='/content/best_model_s',
overwrite=False
)
tuner.search(train_batches,
validation_data = valid_batches,
epochs = 100,
callbacks=[stop_early]
)
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
model = tuner.hypermodel.build(best_hps)
history = model.fit(train_batches, validation_data = valid_batches ,epochs=50)
val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))
best_model = tuner.hypermodel.build(best_hps)
# Retrain the model
best_model.fit(train_batches, validation_data = valid_batches , epochs=best_epoch)
test_generator.reset()
predict = best_model.predict_generator(test_generator, steps = len(test_generator.filenames))
I'm guessing that maybe the problem is that the ImageDataGenerator is fed to train with 2 batches of 16 images each, and to test the ImageDataGenerator with 2 batches of 4 images (each batch has an equal number of class representatives).I also noticed that with a small number of epochs, the neural network produces values ​​from 0 to 1, but the more epochs, the closer the response of the neural network is to zero. For a solution, I tried to stop training as soon as the next 5 iterations do not decrease the loss on validation. Again, it seems to me that the matter is in the validation sample, it is very small.
Any advice?

What is the problem with this SGD loss graph?

I've been trying to train audio classification model. When i used SGD with learning_rate=0.01, momentum=0.0 and nesterov=False i get the following Loss and Accuracy graphs:
I can't figure out what what causes the instant decrease in loss at around epoch 750. I tried different learning rates, momentum values and their combinations, different batch sizes, initial layer weights etc. to get more appropriate graph but no luck at all. So if you have any knowledge about what causes this please let me know.
Code i used for this training is below:
# MFCCs Model
x = tf.keras.layers.Dense(units=512, activation="sigmoid")(mfcc_inputs)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(units=256, activation="sigmoid")(x)
x = tf.keras.layers.Dropout(0.5)(x)
# Spectrograms Model
y = tf.keras.layers.Conv2D(32, kernel_size=(3,3), strides=(2,2))(spec_inputs)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Conv2D(64, kernel_size=(3,3), strides=(1,1), padding="same")(y)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Conv2D(64, kernel_size=(3,3), strides=(1,1), padding="same")(y)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Flatten()(y)
y = tf.keras.layers.Dense(units=256, activation="sigmoid")(y)
y = tf.keras.layers.Dropout(0.5)(y)
# Chroma Model
t = tf.keras.layers.Dense(units=512, activation="sigmoid")(chroma_inputs)
t = tf.keras.layers.Dropout(0.5)(t)
t = tf.keras.layers.Dense(units=256, activation="sigmoid")(t)
t = tf.keras.layers.Dropout(0.5)(t)
# Merge Models
concated = tf.keras.layers.concatenate([x, y, t])
# Dense and Output Layers
z = tf.keras.layers.Dense(64, activation="sigmoid")(concated)
z = tf.keras.layers.Dropout(0.5)(z)
z = tf.keras.layers.Dense(64, activation="sigmoid")(z)
z = tf.keras.layers.Dropout(0.5)(z)
z = tf.keras.layers.Dense(1, activation="sigmoid")(z)
mdl = tf.keras.Model(inputs=[mfcc_inputs, spec_inputs, chroma_inputs], outputs=z)
mdl.compile(optimizer=SGD(), loss="binary_crossentropy", metrics=["accuracy"])
mdl.fit([M_train, X_train, C_train], y_train, batch_size=8, epochs=1000, validation_data=([M_val, X_val, C_val], y_val), callbacks=[tensorboard_cb])
I'm not too sure myself, but as Frightera said, sigmoid activations in hidden layers can cause trouble since it is more sensitive to weight initialization, and if the weights aren't perfectly set, it can cause gradients to be very small. Perhaps the model eventually deals with the small sigmoid gradients and loss finally decreases around epoch 750, but just my hypothesis. If ReLU doesn't work, try using LeakyReLU since it doesn't have the dead neuron effect that ReLU does.

I am trying to combine LSTM and CNN in a multi-input model, but a shape error occurs

I am implementing a multi-input model while reading a book. This model combines CNN and LSTM.
The input of CNN and the input of LSTM are different. So, of course, the shapes must be different. Is there a way to fit the shapes without affecting the data values ​​??
Or is it that the input shape must be the same for input?
I don't know how to combine. Any help would be appreciated.
CNN MODEL
with K.tf_ops.device('/device:GPU:0'):
input_tensor = Input(shape=(64,64,1))
x= layers.Conv2D(64, kernel_size=(3,3),activation='relu')(input_tensor)
x= layers.Conv2D(128 ,kernel_size=(3,3),activation='relu')(x)
x= layers.MaxPooling2D(2)(x)
x= layers.Dropout(0.5)(x)
x= layers.Conv2D(256 ,kernel_size=(3,3),activation='relu')(x)
x= layers.MaxPooling2D(2)(x)
x= layers.Dropout(0.5)(x)
x= layers.Conv2D(512 ,kernel_size=(3,3),activation='relu')(x)
x= layers.MaxPooling2D(2)(x)
x= layers.Dropout(0.5)(x)
x= layers.Flatten()(x)
x= layers.Dropout(0.5)(x)
x= layers.Dense(512,activation='relu')(x)
x= layers.Dropout(0.5)(x)
last_1= layers.Dense(20,activation='softmax')(x)
max_word = 33
max_len =107
nb_classes = 20
LSTM
with K.tf_ops.device('/device:GPU:0'):
input_tensor_LSTM = Input(shape=(None,),dtype='int32',name='permission')
x_2=layers.Embedding(max_word,64,input_length=max_len)(input_tensor_LSTM)
x_2=layers.LSTM(60,return_sequences= True)(x_2)
x_2=layers.GlobalMaxPool1D()(x_2)
x_2= layers.Dropout(0.5)(x_2)
x_2= layers.Dense(50,activation='relu')(x_2)
x_2= layers.Dropout(0.5)(x_2)
last_2= layers.Dense(20,activation='softmax')(x_2)
merged_model = concatenate([last_1, last_2])
last = layers.Dense(20,activation='softmax')(merged_model)
model = Model([input_tensor,input_tensor_LSTM],last)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_dir = './model'
if not os.path.exists(model_dir):
os.mkdir(model_dir)
model_path = model_dir + "/concatenate.model"
checkpoint = ModelCheckpoint(filepath=model_path, monitor="val_loss", verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor='val_loss', patience=20)
history = model.fit([X_train_CNN,X_train], y_train, batch_size=128, epochs=100, validation_split=0.2,
callbacks=[checkpoint, early_stopping])
The code shown at the top is the model I implemented, and when I use the fit function, the following error appears.
-> Error when checking target: expected dense_89 to have 2 dimensions, but got array with shape (3686, 20, 2)

How to deactivate a dropout layer called with training=True in a Keras model?

I wish to view the final output of training a tf.keras model. In this case it would be an array of predictions from the softmax function, e.g. [0,0,0,1,0,1].
Other threads on here have suggested using model.predict(training_data), but this won't work for my situation since I am using dropout at training and validation, so neurons are randomly dropped and predicting again with the same data will give a different result.
def get_model():
inputs = tf.keras.layers.Input(shape=(input_dims,))
x = tf.keras.layers.Dropout(rate=dropout_rate)(inputs, training=True)
x = tf.keras.layers.Dense(units=29, activation='relu')(x)
x = tf.keras.layers.Dropout(rate=dropout_rate)(x, training=True)
x = tf.keras.layers.Dense(units=15, activation='relu')(x)
outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
myModel = get_model()
myModel.summary()
myModel.fit(X_train, y_train,
batch_size = batch_size,
epochs= epochs,
verbose = 1,
validation_data = (X_val, y_val))
In tensorflow, you can grab the output of a model after training quite easily. Here is an example from a Github repo:
input = tf.placeholder(tf.float32, shape=[None, INPUT_DIMS])
labels = tf.placeholder(tf.float32, shape=[None])
hidden = tf.nn.tanh(make_nn_layer(normalized, NUM_HIDDEN))
logits = make_nn_layer(hidden, NUM_CLASSES)
outputs = tf.argmax(logits, 1)
int_labels = tf.to_int64(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, int_labels, name='xentropy')
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)
correct_prediction = tf.equal(outputs, int_labels)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
validation_dict = {
input: validation_data[:,0:7],
labels: validation_data[:,7],}
for i in range(NUM_BATCHES):
batch = training_data[numpy.random.choice(training_size, BATCH_SIZE, False),:]
train_step.run({input: batch[:,0:7], labels: batch[:,7]})
if i % 100 == 0 or i == NUM_BATCHES - 1:
print('Accuracy %.2f%% at step %d' % (accuracy.eval(validation_dict) * 100, i))
output_data = outputs.eval({input: data_vector[:,0:7]})
The only output I can get from the trained model appears to be a history object. There is also a myModel.output object, but it is a tensor that I can't evaluate without putting data into it. Any ideas?
As far as I know, you can't turn off the dropout after passing training=True when calling the layers (unless you transfer the weights to a new model with the same architecture). However, instead you can build and train your model in normal case (i.e. without using training argument in the calls) and then selectively turn on and off the dropout layer in test phase by defining a backend function (i.e. keras.backend.function()) and setting the learning phase (i.e. keras.backend.learning_phase()):
# build your model normally (i.e. without using `training=True` argument)
# train your model...
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# run the model with dropout layers being active, i.e. learning_phase == 1
preds = func(list_of_input_arrays + [1])
# run the model with dropout layers being inactive, i.e. learning_phase == 0
preds = func(list_of_input_arrays + [0])
Update: As I suggested above, another approach is to define a new model with the same architecture but without setting training=True, and then transfer the weights from the trained model to this new model. To achieve this, I just add a training argument to your get_model() function:
def get_model(training=None):
inputs = tf.keras.layers.Input(shape=(input_dims,))
x = tf.keras.layers.Dropout(rate=dropout_rate)(inputs, training=training)
x = tf.keras.layers.Dense(units=29, activation='relu')(x)
x = tf.keras.layers.Dropout(rate=dropout_rate)(x, training=training)
x = tf.keras.layers.Dense(units=15, activation='relu')(x)
outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
# build a model with dropout layers active in both training and test phases
myModel = get_model(training=True)
# train the model
myModel.fit(...)
# build a clone of the model with dropouts deactivated in test phase
myTestModel = get_model() # note: the `training` is `None` by default
# transfer the weights from the trained model to this model
myTestModel.set_weights(myModel.get_weights())
# use the new model in test phase; the dropouts would not be active
myTestModel.predict(...)

Rebuild Keras-model in Tensorflow

I'm new to Tensorflow and I'm trying to rebuild a simple network, that I've built in Keras (TF backend), with Tensorflows Python API. It is a simple function approximator (z = sin(x + y)).
I've tried different architectures, optimizers and learning rates, but I'm not getting the new network to train properly. However in my eyes, the networks seem to be identical. Both get the exact same feature vectors and labels:
# making training data
start = 0
end = 2*np.pi
samp = 1000
num_samp = samp**2
step = end / samp
x_train = np.arange(start, end, step)
y_train = np.arange(start, end, step)
data = np.array(np.meshgrid(x_train,y_train)).T.reshape(-1,2)
z_label = np.sin(data[:,0] + data[:,1])
Here is the Keras model:
#start model
model = Sequential()
#stack layers
model.add(Dense(units=128, activation='sigmoid', input_dim=2, name='dense_1'))
model.add(Dense(units=64, activation='sigmoid', input_dim=128, name='dense_2'))
model.add(Dense(units=1, activation='linear', name='output'))
#compile model
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
checkpointer = ModelCheckpoint(filepath='./weights/weights.h5',
verbose=1, save_best_only=True)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
model.fit(data, z_label, epochs=20, batch_size=32,
shuffle='true',validation_data=(data_val, z_label_val),
callbacks=[checkpointer, tensorboard])
Here is the new network, built with Tensorflows Python API:
# hyperparameter
n_inputs = 2
n_hidden1 = 128
n_hidden2 = 64
n_outputs = 1
learning_rate = 0.01
# construction phase
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='input')
y = tf.placeholder(tf.float32, shape=(None), name="target")
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation=tf.nn.sigmoid)
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.sigmoid)
logits = tf.layers.dense(hidden2, n_outputs, activation='linear', name='output')
loss = tf.reduce_mean(tf.square(logits - y), name='loss')
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss, name='train')
init = tf.global_variables_initializer()
saver = tf.train.Saver()
# --- execution phase ---
n_epochs = 40
batch_size = 32
n_batches = int(num_samp/batch_size)
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
print("Epoch: ", epoch, " Running...")
loss_arr = np.array([])
for iteration in range( n_batches ):
start = iteration * batch_size
end = start + batch_size
sess.run(training_op, feed_dict={X: data[start:end], y: z_label[start:end] })
loss_arr = np.append(loss_arr, loss.eval(feed_dict={X: data[start:end, :], y: z_label[start:end]}))
mean_loss = np.mean(loss_arr)
print("Epoch: ", epoch, " Calculated ==> Loss: ", mean_loss)
While the Keras model train properly with a decreasing loss and proper test results, the new model converges pretty fast and stops learning. Accordingly the results are completely useless.
Am I building/training the the model incorrectly or is Keras doing anything in the background, that I'm not aware of?
Solved this issue. The problem was the shape of the label vector. It was a lying vector with shape (1000000,). While Keras is apparently capable of dealing with different shapes of output and label vectors, Tensorflow initialized the placeholder incorrectly and the loss function
loss = tf.reduce_mean(tf.square(logits - y), name='loss')
did't make sense anymore and thus training failed. Adding
z_label = z_label.reshape(-1,1)
reshaped the label vector to (1000000, 1) and solved it. Alternatively one can specify the shape of the placeholder more precisely
y = tf.placeholder(tf.float32, shape=(None,1), name="target")

Categories

Resources