Keras GAN Batch Training - python

I have looked at some code/tutorials (tutorial: 1 and 2) for implementing a GAN in Keras.
Both do batch training as follows:
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random batch of images
# Generate a batch of new images
# Train the discriminator
# ---------------------
# Train Generator
# ---------------------
In the above code (taken from line 92 in (2)), they loop over all epochs, but then for each epoch, only train on one batch. As I understand, for each epoch, we should train on many batches; so that we go through the whole dataset. For example, if we have 100 samples and a batch size of 10, then for each epoch, we train on 10 batches of size 10. Why is it that in this code, they only train on a single batch for each epoch? Sorry if this is a basic question; I am quite new to machine learning.

When you do GAN there are few things that change from normal neural network training.
Your input data evolves through time. The artificial images from the Generator network change at each update of the weights in the network.
You have to train both networks simultaneously. It is pointless to train the discriminator on a lot of data, if you then update the generator. Because this changes the data distribution from which the discriminator learns. For this reason you might want to update both networks frequently. So it can be preferred to make updates of both networks each batch.
I don't know why they call this update an epoch, I guess you could disagree with the naming. But remember that epoch and batch have a meaning when the training data is fixed. In this case it is not, so maybe they just call it epoch because they lack of a better word.

Related

Difference between TensorFlow model fit and train_on_batch

I am building a vanilla DQN model to play the OpenAI gym Cartpole game.
However, in the training step where I feed in the state as input and the target Q values as the labels, if I use model.fit(x=states, y=target_q), it works fine and the agent can eventually play the game well, but if I use model.train_on_batch(x=states, y=target_q), the loss won't decrease and the model will not play the game anywhere better than a random policy.
I wonder what is the difference between fit and train_on_batch? To my understanding, fit calls train_on_batch with a batch size of 32 under the hood which should make no difference since specifying the batch size to equal the actual data size I feed in makes no difference.
The full code is here if more contextual information is needed to answer this question: https://github.com/ultronify/cartpole-tf
model.fit will train 1 or more epochs. That means it will train multiple batches. model.train_on_batch, as the name implies, trains only one batch.
To give a concrete example, imagine you are training a model on 10 images. Let's say your batch size is 2. model.fit will train on all 10 images, so it will update the gradients 5 times. (You can specify multiple epochs, so it iterates over your dataset.) model.train_on_batch will perform one update of the gradients, as you only give the model on batch. You would give model.train_on_batch two images if your batch size is 2.
And if we assume that model.fit calls model.train_on_batch under the hood (though I don't think it does), then model.train_on_batch would be called multiple times, likely in a loop. Here's pseudocode to explain.
def fit(x, y, batch_size, epochs=1):
for epoch in range(epochs):
for batch_x, batch_y in batch(x, y, batch_size):
model.train_on_batch(batch_x, batch_y)

I am trying to understand 'epochs' in neural network training. Are the next experiments equivalent?

Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.
In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.
Experiment 1.
I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.
Experiment 2
I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.
Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.
Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.
Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?
If you follow this tutorial here, you will find the results of the two experiments as given below -
Experiment 1
Experiment 2
In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.
If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.
On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").
TL;DR
If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.
If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.
Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off.
See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.
Specifically:
It is important to also save the optimizer’s state_dict, as this
contains buffers and parameters that are updated as the model trains.

Importance of number of steps in an epoch for LSTM model training in Keras

Whats the difference between two LSTM models A and B that are trained on same data, but the batches are shuffled randomly for each epoch, that A has 14 steps per epoch and B has 132 steps per epoch?
Which one will perform better in validation?
An epoch consists of going through all your training samples once. And one step/iteration refers to training over a single minibatch. So if you have 1,000,000 training samples and use a batch size of 100, one epoch will be equivalent to 10,000 steps, with 100 samples per step.
A high-level neural network framework may let you set either the number of epochs or total number of training steps. But you can't set them both since one directly determines the value of the other.
Effect of Batch Size on Model Behavior: Small batch results generally in rapid learning but a volatile learning process with higher variance. Larger batch sizes slow down the learning process but the final stages result in a convergence to a more stable model exemplified by lower variance.

Predictions Incorrect For Trained Keras Model

I have trained an image classifier CNN for 50 epochs and achieved 65% accuracy with 64% accuracy on the validation data.
My problem is when using model.predict on a single sample (one image) the network behaves as though it is untrained.
I fed model.predict thousands of images, one at a time and the average classification accuracy was only 46%.
I tried using both model.save_model and saving the json model and weights separately, but there was no difference.
My only thought as to why this is occurring is to do with the BatchNorm layers in my model effecting the consistency of the data.
My model contains six CNN layers, with three max pooling and one final fully connected layer, with a BatchNormalisation layer between every layer (seven in total). I used a batch size of 128 during training, but of course the batch size for the prediction samples is 1. I don't know much about BatchNorm, but I wonder if there is some kind of normalisation that is happening on the training and testing data but not on the predictions?

What does train_on_batch() do in keras model?

I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.

Categories

Resources