Getting NaN value from loss function for k-fold validation

Getting NaN value from loss function for k-fold validation - python

I am trying to implement MNIST using PyTorch Lightning. Here, I wanted to use k-fold cross-validation.
The problem is I am getting the NaN value from the loss function (for at least 1 fold). From below 3rd time, I was getting NaN values from the loss function.
Epoch 19: 100%|█████████████████████████████████| 110/110 [00:03<00:00, 29.24it/s, loss=0.963, v_num=287]
Testing: 100%|███████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 39.94it/s]
Epoch 19: 100%|█████████████████████████████████| 110/110 [00:04<00:00, 25.69it/s, loss=0.825, v_num=288]
Testing: 100%|███████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 41.19it/s]
Epoch 19: 100%|███████████████████████████████████| 110/110 [00:03<00:00, 30.19it/s, loss=nan, v_num=289]
Testing: 100%|███████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 42.15it/s
Or very big loss value (terminated before completing full epocs)
Epoch 0: 44%|█████████████▉ | 48/110 [00:02<00:02, 22.87it/s, loss=2.08e+23, v_num=295]
The code I have used for data preparation, k-fold, and trainer is given below
def prepare_data():
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
mnist_train = MNIST(os.getcwd(), train=True, download=True, transform=transform)
mnist_test = MNIST(os.getcwd(), train=False, download=True, transform=transform)
dataset = ConcatDataset([mnist_train, mnist_test])
return dataset
k_folds=5
epochs=20
kfold=KFold(n_splits=k_folds,shuffle=True)
dataset = prepare_data()
model = LightningMNIST(lr_rate=0.01)
for fold, (train_idx, val_idx) in enumerate(kfold.split(dataset)):
train_subsampler = torch.utils.data.SubsetRandomSampler(train_idx)
val_subsampler = torch.utils.data.SubsetRandomSampler(val_idx)
train_loader = torch.utils.data.DataLoader(dataset, num_workers=8, batch_size=512, sampler=train_subsampler)
val_loader = torch.utils.data.DataLoader(dataset, num_workers=8, batch_size=512, sampler=val_subsampler)
model.apply(reset_weights) # reset model for every fold
early_stopping = EarlyStopping('train_loss', mode='min', patience=5)
model_checkpoint = ModelCheckpoint(dirpath=model_path+'mnist_{epoch}-{train_loss:.2f}',
monitor='train_loss', mode='min', save_top_k=3)
trainer = pl.Trainer(max_epochs=epochs, profiler=False, callbacks = [model_checkpoint],default_root_dir=model_path)
trainer.fit(model, train_dataloader=train_loader)
trainer.test(test_dataloaders=val_loader, ckpt_path=None)
The training step is given below
def training_step(self, train_batch, batch_idx):
x, y = train_batch
logits = self.forward(x)
loss = self.error_loss(logits.squeeze(-1), y.float())
self.log('train_loss', loss)
return {'loss': loss}
I assume, maybe I am doing something wrong in the k-fold data preparation or in the training step. Otherwise getting NaN or very big value is not expected for this simple problem and simple model.
I have gone through several posts like this, this, and that. Some of them suggested that it could happen because the dataset might contain NaN (but I think MNIST does not contain NaN as directly downloading from the module), model's learning rate is 0.01 (not too big not too small). Moreover, I believe that this post is not duplicated (because here, trying to use k-fold thought the error seems the same).
Any suggestions?

Related

Tensorflow does not apply data augmentation properly

I'm trying to apply the process of data augmentation to a database. I use the following code:
train_generator = keras.utils.image_dataset_from_directory(
directory= train_dir,
subset = "training",
image_size = (50,50),
batch_size = 32,
validation_split = 0.3,
seed = 1337,
labels = "inferred",
label_mode = 'binary'
)
validation_generator = keras.utils.image_dataset_from_directory(
subset="validation",
directory=validation_dir,
image_size=(50,50),
batch_size =40,
seed=1337,
validation_split = 0.3,
labels = "inferred",
label_mode ='binary'
)
data_augmentation = keras.Sequential([
keras.layers.RandomFlip("horizontal"),
keras.layers.RandomRotation(0.1),
keras.layers.RandomZoom(0.1),
])
train_dataset = train_generator.map(lambda x, y: (data_augmentation(x, training=True), y))
But when I try to run the training processe using this method, I get a "insuficient data" warning:
6/100 [>.............................] - ETA: 21s - loss: 0.7602 - accuracy: 0.5200WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 10 batches). You may need to use the repeat() function when building your dataset.
Yes, the original dataset is insuficient, but the data augmentation should provide more than enough data for the training.
Does anyone know what's going on ?
EDIT:
fit call:
history = model.fit(
train_dataset,
epochs = 20,
steps_per_epoch = 100,
validation_data = validation_generator,
validation_steps = 10,
callbacks=callbacks_list)
This is the version I have using DataImageGenerator:
train_datagen = keras.preprocessing.image.ImageDataGenerator(rescale =1/255,rotation_range = 40,width_shift_range = 0.2,height_shift_range = 0.2,shear_range = 0.2,zoom_range = 0.2,horizontal_flip = True)
train_generator = train_datagen.flow_from_directory(directory= train_dir,target_size = (50,50),batch_size = 32,class_mode = 'binary')
val_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
validation_generator = val_datagen.flow_from_directory(directory=validation_dir,target_size=(50,50),batch_size =40,class_mode ='binary')
This specific code (with this same number of epochs, steps_per_epoch and batchsize) was taken from the book deeplearning with python, by François Chollet, it's an example on page 141 of a data augmentation system. As you may have guessed, this produces the same results as the other method displayed.

When we state that data augmentation increases the number of instances, we usually understand that an altered version of a sample would be created for the model to process. It's just image preprocessing with randomness.
If you closely inspect your training log, you will get your solution, shown below. The main issue with your approach is simply discussed in this post.
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.
So, to solve this, we can use .repeat() function. To understand what it does, you can check this answer. Here is the sample code that should work for you.
train_ds= keras.utils.image_dataset_from_directory(
...
)
train_ds = train_ds.map(
lambda x, y: (data_augmentation(x, training=True), y)
)
val_ds = keras.utils.image_dataset_from_directory(
...
)
# using .repeat function
train_ds = train_ds.repeat().shuffle(8 * batch_size)
train_ds = train_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = val_ds.repeat()
val_ds = val_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
# specify step per epoch
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=..,
steps_per_epoch = train_ds.cardinality().numpy(),
validation_steps = val_ds.cardinality().numpy(),
)

Gradient Accumulation with Custom model.fit in TF.Keras?

Please add a minimum comment on your thoughts so that I can improve my query. Thank you. -)
I'm trying to train a tf.keras model with Gradient Accumulation (GA). But I don't want to use it in the custom training loop (like) but customize the .fit() method by overriding the train_step.Is it possible? How to accomplish this? The reason is if we want to get the benefit of keras built-in functionality like fit, callbacks, we don't want to use the custom training loop but at the same time if we want to override train_step for some reason (like GA or else) we can customize the fit method and still get the leverage of using those built-in functions.
And also, I know the pros of using GA but what are the major cons of using it? Why does it's not come as a default but an optional feature with the framework?
# overriding train step
# my attempt
# it's not appropriately implemented
# and need to fix
class CustomTrainStep(keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = n_gradients
self.gradient_accumulation = [
tf.zeros_like(this_var) for this_var in self.trainable_variables
]
def train_step(self, data):
x, y = data
batch_size = tf.cast(tf.shape(x)[0], tf.float32)
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(
y, y_pred, regularization_losses=self.losses
)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
accum_gradient = [
(acum_grad+grad) for acum_grad, grad in \
zip(self.gradient_accumulation, gradients)
]
accum_gradient = [
this_grad/batch_size for this_grad in accum_gradient
]
# apply accumulated gradients
self.optimizer.apply_gradients(
zip(accum_gradient, self.trainable_variables)
)
# TODO: reset self.gradient_accumulation
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
Please, run and check with the following toy setup.
# Model
size = 32
input = keras.Input(shape=(size,size,3))
efnet = keras.applications.DenseNet121(
weights=None,
include_top = False,
input_tensor = input
)
base_maps = keras.layers.GlobalAveragePooling2D()(efnet.output)
base_maps = keras.layers.Dense(
units=10, activation='softmax',
name='primary'
)(base_maps)
custom_model = CustomTrainStep(
n_gradients=10, inputs=[input], outputs=[base_maps]
)
# bind all
custom_model.compile(
loss = keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = keras.optimizers.Adam()
)
# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.expand_dims(x_train, -1)
x_train = tf.repeat(x_train, 3, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [size,size]) # if we want to resize
y_train = tf.one_hot(y_train , depth=10)
# customized fit
custom_model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 1)
Update
I've found that some others also tried to achieve this and ended up with the same issue. One has got some workaround, here, but it's too messy and I think there should be some better approach.
Update 2
The accepted answer (by Mr.For Example) is fine and works well in single strategy. Now, I like to start 2nd bounty to extend it to support multi-gpu, tpu, and with mixed-precision techniques. There are some complications, see details.

Yes it is possible to customize the .fit() method by overriding the train_step without a custom training loop, following simple example will show you how to train a simple mnist classifier with gradient accumulation:
import tensorflow as tf
class CustomTrainStep(tf.keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = tf.constant(n_gradients, dtype=tf.int32)
self.n_acum_step = tf.Variable(0, dtype=tf.int32, trainable=False)
self.gradient_accumulation = [tf.Variable(tf.zeros_like(v, dtype=tf.float32), trainable=False) for v in self.trainable_variables]
def train_step(self, data):
self.n_acum_step.assign_add(1)
x, y = data
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign_add(gradients[i])
# If n_acum_step reach the n_gradients then we apply accumulated gradients to update the variables otherwise do nothing
tf.cond(tf.equal(self.n_acum_step, self.n_gradients), self.apply_accu_gradients, lambda: None)
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
def apply_accu_gradients(self):
# apply accumulated gradients
self.optimizer.apply_gradients(zip(self.gradient_accumulation, self.trainable_variables))
# reset
self.n_acum_step.assign(0)
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign(tf.zeros_like(self.trainable_variables[i], dtype=tf.float32))
# Model
input = tf.keras.Input(shape=(28, 28))
base_maps = tf.keras.layers.Flatten(input_shape=(28, 28))(input)
base_maps = tf.keras.layers.Dense(128, activation='relu')(base_maps)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax', name='primary')(base_maps)
custom_model = CustomTrainStep(n_gradients=10, inputs=[input], outputs=[base_maps])
# bind all
custom_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) )
# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.divide(x_train, 255)
y_train = tf.one_hot(y_train , depth=10)
# customized fit
custom_model.fit(x_train, y_train, batch_size=6, epochs=3, verbose = 1)
Outputs:
Epoch 1/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.5053 - accuracy: 0.8584
Epoch 2/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.1389 - accuracy: 0.9600
Epoch 3/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.0898 - accuracy: 0.9748
Pros:
Gradient accumulation is a mechanism to split the batch of samples —
used for training a neural network — into several mini-batches of
samples that will be run sequentially
Because GA calculates the loss and gradients after each mini-batch, but instead of updating the model parameters, it waits and accumulates the gradients over consecutive batches, so it can overcoming memory constraints, i.e using less memory to training the model like it using large batch size.
Example: If you run a gradient accumulation with steps of 5 and batch
size of 4 images, it serves almost the same purpose of running with a
batch size of 20 images.
We could also parallel the training when using GA, i.e aggregate gradients from multiple machines.
Things to consider:
This technique is working so well so it is widely used, there few things to consider before using it that I don't think it should be called cons, after all, all GA does is turning 4 + 4 to 2 + 2 + 2 + 2.
If your machine has sufficient memory for the batch size that already large enough then there no need to use it, because it is well known that too large of a batch size will lead to poor generalization, and it will certainly run slower if you using GA to achieve the same batch size that your machine's memory already can handle.
Reference:
What is Gradient Accumulation in Deep Learning?

Thanks to #Mr.For Example for his convenient answer.
Usually, I also observed that using Gradient Accumulation, won't speed up training since we are doing n_gradients times forward pass and compute all the gradients. But it will speed up the convergence of our model. And I found that using the mixed_precision technique here can be really helpful here. Details here.
policy = tf.keras.mixed_precision.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
Here is a complete gist.

PyTorch Linear MINST model training error

I am creating a binary classifier based on the MINST dataset using PyTorch. I want my classifier to classify between only 0s and 1s, however, when I train it, the error doesn't decrease and the loss becomes negative.
Here's the error and loss at the first few iterations:
I was obviously expecting better results.
Here is the code I am using:
# Loading the MNISR data reduced to the 0/1 examples
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
mnist_train = datasets.MNIST("./data", train=True, download=True, transform=transforms.ToTensor())
mnist_test = datasets.MNIST("./data", train=False, download=True, transform=transforms.ToTensor())
train_idx = mnist_train.train_labels <= 1
try:
mnist_train.train_data = mnist_train.train_data[train_idx]
except AttributeError:
mnist_train._train_data = mnist_train.train_data[train_idx]
try:
mnist_train.train_labels = mnist_train.train_labels[train_idx]
except AttributeError:
mnist_train._train_labels = mnist_train.train_labels[train_idx]
test_idx = mnist_test.test_labels <= 1
try:
mnist_test.test_data = mnist_test.test_data[test_idx]
except AttributeError:
mnist_test._test_data = mnist_test.test_data[test_idx]
try:
mnist_test.test_labels = mnist_test.test_labels[test_idx]
except AttributeError:
mnist_test._test_labels = mnist_test.test_labels[test_idx]
train_loader = DataLoader(mnist_train, batch_size = 100, shuffle=True)
test_loader = DataLoader(mnist_test, batch_size = 100, shuffle=False)
# Creating a simple linear classifier
import torch
import torch.nn as nn
import torch.optim as optim
# do a single pass over the data
def epoch(loader, model, opt=None):
total_loss, total_err = 0.,0.
for X,y in loader:
yp = model(X.view(X.shape[0], -1))[:,0]
loss = nn.BCEWithLogitsLoss()(yp, y.float())
if opt:
opt.zero_grad()
loss.backward()
opt.step()
total_err += ((yp > 0) * (y==0) + (yp < 0) * (y==1)).sum().item()
total_loss += loss.item() * X.shape[0]
return total_err / len(loader.dataset), total_loss / len(loader.dataset)
model = nn.Linear(784, 1)
opt = optim.SGD(model.parameters(), lr=1)
print("Train Err", "Train Loss", "Test Err", "Test Loss", sep="\t")
for i in range(10):
train_err, train_loss = epoch(train_loader, model, opt)
test_err, test_loss = epoch(test_loader, model)
print(*("{:.6f}".format(i) for i in (train_err, train_loss, test_err, test_loss)), sep="\t")
I don't know why my error does not decrease nor why my loss keeps getting more negative. Does anyone spot the error?

As mnist data consists of 10 different outputs change the model to output size to 10
model = nn.Linear(784, 10)
Also change the loss to cross entropy loss and reduce the learning rate to some smaller value(0.001) and use much deeper model.
Probably the above changes should solve your problem

I found the error. My initial code to select only 1s and 0s from the MNIST dataset didn't work. So obviously, applying BCELoss to a non-binary dataset was making the model fail.

tf.estimator.Estimator gives different test accuracy when trianed epoch by epoch vs over all epochs

I have defined a straightforward CNN as my model_fn for a tf.estimator.Estimator and feed it with this input_fn:
def input_fn(features, labels, batch_size, epochs):
dataset = tf.data.Dataset.from_tensor_slices((features))
dataset = dataset.map(lambda x: tf.cond(tf.random_uniform([], 0, 1) > 0.5, lambda: dataset_augment(x), lambda: x),
num_parallel_calls=16).cache()
dataset_labels = tf.data.Dataset.from_tensor_slices((labels))
dataset = dataset.zip((dataset, dataset_labels))
dataset = dataset.shuffle(30000)
dataset = dataset.repeat(epochs)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(-1)
return dataset
when I train the estimator this way, I get 43% test accuracy after 10 epochs:
steps_per_epoch = data_train.shape[0] // batch_size
for epoch in range(1, epochs + 1):
cifar100_classifier.train(lambda: input_fn(data_train, labels_train, batch_size, epochs=1), steps=steps_per_epoch)
But when I train it this way I get 32% test accuracy after 10 epochs:
steps_per_epoch = data_train.shape[0] // batch_size
max_steps = epochs * steps_per_epoch
cifar100_classifier.train(steps=max_steps,
input_fn=lambda: input_fn(data_train, labels_train, batch_size, epochs=epochs))
I just cannot understand why these two methods produce different results. Can anyone please explain?

Since you are calling the input_fn multiple times in the first example it seems like you would be generating more augmented data through dataset_augment(x) as you're doing an augmentation coin-toss for every x every epoch.
In the second example you only do these coin-tosses once and then train multiple epochs on that same data. So here your train set is effectively ``smaller''.
The .cache() doesn't really save you from this in the first example.

are your model's weights initialized randomly? this may be a case.

How to return history of validation loss in Keras

Using Anaconda Python 2.7 Windows 10.
I am training a language model using the Keras exmaple:
print('Build model...')
model = Sequential()
model.add(GRU(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(GRU(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
def sample(a, temperature=1.0):
# helper function to sample an index from a probability array
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
return np.argmax(np.random.multinomial(1, a, 1))
# train the model, output generated text after each iteration
for iteration in range(1, 3):
print()
print('-' * 50)
print('Iteration', iteration)
model.fit(X, y, batch_size=128, nb_epoch=1)
start_index = random.randint(0, len(text) - maxlen - 1)
for diversity in [0.2, 0.5, 1.0, 1.2]:
print()
print('----- diversity:', diversity)
generated = ''
sentence = text[start_index: start_index + maxlen]
generated += sentence
print('----- Generating with seed: "' + sentence + '"')
sys.stdout.write(generated)
for i in range(400):
x = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
x[0, t, char_indices[char]] = 1.
preds = model.predict(x, verbose=0)[0]
next_index = sample(preds, diversity)
next_char = indices_char[next_index]
generated += next_char
sentence = sentence[1:] + next_char
sys.stdout.write(next_char)
sys.stdout.flush()
print()
According to Keras documentation, the model.fit method returns a History callback, which has a history attribute containing the lists of successive losses and other metrics.
hist = model.fit(X, y, validation_split=0.2)
print(hist.history)
After training my model, if I run print(model.history) I get the error:
AttributeError: 'Sequential' object has no attribute 'history'
How do I return my model history after training my model with the above code?
UPDATE
The issue was that:
The following had to first be defined:
from keras.callbacks import History
history = History()
The callbacks option had to be called
model.fit(X_train, Y_train, nb_epoch=5, batch_size=16, callbacks=[history])
But now if I print
print(history.History)
it returns
{}
even though I ran an iteration.

Just an example started from
history = model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10, verbose=0)
You can use
print(history.history.keys())
to list all data in history.
Then, you can print the history of validation loss like this:
print(history.history['val_loss'])

It's been solved.
The losses only save to the History over the epochs. I was running iterations instead of using the Keras built in epochs option.
so instead of doing 4 iterations I now have
model.fit(......, nb_epoch = 4)
Now it returns the loss for each epoch run:
print(hist.history)
{'loss': [1.4358016599558268, 1.399221191623641, 1.381293383180471, 1.3758836857303727]}

The following simple code works great for me:
seqModel =model.fit(x_train, y_train,
batch_size = batch_size,
epochs = num_epochs,
validation_data = (x_test, y_test),
shuffle = True,
verbose=0, callbacks=[TQDMNotebookCallback()]) #for visualization
Make sure you assign the fit function to an output variable. Then you can access that variable very easily
# visualizing losses and accuracy
train_loss = seqModel.history['loss']
val_loss = seqModel.history['val_loss']
train_acc = seqModel.history['acc']
val_acc = seqModel.history['val_acc']
xc = range(num_epochs)
plt.figure()
plt.plot(xc, train_loss)
plt.plot(xc, val_loss)
Hope this helps.
source: https://keras.io/getting-started/faq/#how-can-i-record-the-training-validation-loss-accuracy-at-each-epoch

The dictionary with histories of "acc", "loss", etc. is available and saved in hist.history variable.

I have also found that you can use verbose=2 to make keras print out the Losses:
history = model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10, verbose=2)
And that would print nice lines like this:
Epoch 1/1
- 5s - loss: 0.6046 - acc: 0.9999 - val_loss: 0.4403 - val_acc: 0.9999
According to their documentation:
verbose: 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.

For plotting the loss directly the following works:
import matplotlib.pyplot as plt
...
model_ = model.fit(X, Y, epochs= ..., verbose=1 )
plt.plot(list(model_.history.values())[0],'k-o')

Another option is CSVLogger: https://keras.io/callbacks/#csvlogger.
It creates a csv file appending the result of each epoch. Even if you interrupt training, you get to see how it evolved.

Actually, you can also do it with the iteration method. Because sometimes we might need to use the iteration method instead of the built-in epochs method to visualize the training results after each iteration.
history = [] #Creating a empty list for holding the loss later
for iteration in range(1, 3):
print()
print('-' * 50)
print('Iteration', iteration)
result = model.fit(X, y, batch_size=128, nb_epoch=1) #Obtaining the loss after each training
history.append(result.history['loss']) #Now append the loss after the training to the list.
start_index = random.randint(0, len(text) - maxlen - 1)
print(history)
This way allows you to get the loss you want while maintaining your iteration method.

Thanks to Alloush,
Following parameter must be included in model.fit():
validation_data = (x_test, y_test)
If it is not defined, val_acc and val_loss will not
be exist at output.

Those who got still error like me:
Convert model.fit_generator() to model.fit()

you can get loss and metrics like below:
returned history object is dictionary and you can access model loss( val_loss) or accuracy(val_accuracy) like below:
model_hist=model.fit(train_data,train_lbl,epochs=my_epoch,batch_size=sel_batch_size,validation_data=val_data)
acc=model_hist.history['accuracy']
val_acc=model_hist.history['val_accuracy']
loss=model_hist.history['loss']
val_loss=model_hist.history['val_loss']
dont forget that for getting val_loss or val_accuracy you should specify validation data in the "fit" function.

history = model.fit(partial_train_data, partial_train_targets,
validation_data=(val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_mean_absolute_error']
I had the same problem. The following code worked for me.
mae_history = history.history['val_mae']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting NaN value from loss function for k-fold validation - python

Related

Tensorflow does not apply data augmentation properly

Gradient Accumulation with Custom model.fit in TF.Keras?

PyTorch Linear MINST model training error

tf.estimator.Estimator gives different test accuracy when trianed epoch by epoch vs over all epochs

How to return history of validation loss in Keras

Categories

Resources