I am building a vanilla DQN model to play the OpenAI gym Cartpole game.
However, in the training step where I feed in the state as input and the target Q values as the labels, if I use model.fit(x=states, y=target_q), it works fine and the agent can eventually play the game well, but if I use model.train_on_batch(x=states, y=target_q), the loss won't decrease and the model will not play the game anywhere better than a random policy.
I wonder what is the difference between fit and train_on_batch? To my understanding, fit calls train_on_batch with a batch size of 32 under the hood which should make no difference since specifying the batch size to equal the actual data size I feed in makes no difference.
The full code is here if more contextual information is needed to answer this question: https://github.com/ultronify/cartpole-tf
model.fit will train 1 or more epochs. That means it will train multiple batches. model.train_on_batch, as the name implies, trains only one batch.
To give a concrete example, imagine you are training a model on 10 images. Let's say your batch size is 2. model.fit will train on all 10 images, so it will update the gradients 5 times. (You can specify multiple epochs, so it iterates over your dataset.) model.train_on_batch will perform one update of the gradients, as you only give the model on batch. You would give model.train_on_batch two images if your batch size is 2.
And if we assume that model.fit calls model.train_on_batch under the hood (though I don't think it does), then model.train_on_batch would be called multiple times, likely in a loop. Here's pseudocode to explain.
def fit(x, y, batch_size, epochs=1):
for epoch in range(epochs):
for batch_x, batch_y in batch(x, y, batch_size):
model.train_on_batch(batch_x, batch_y)
Related
While doing transfer learning on VGG, with decent amount of data, and with the following configuration:
base_big_3 = tf.keras.applications.VGG19(include_top=False, weights='imagenet',input_shape=[IMG_SIZE,IMG_SIZE,3])
model_big_3 = tf.keras.Sequential()
model_big_3.add(base_big_3)
model_big_3.add(BatchNormalization(axis=-1))
model_big_3.add(GlobalAveragePooling2D())
model_big_3.add(Dense(5, activation='softmax'))
model_big_3.compile(loss=tf.keras.losses.CategoricalCrossentropy(), optimizer=tf.keras.optimizers.Adamax(learning_rate=0.01), metrics=['acc'])
history = model_big_3.fit(
train_generator,
steps_per_epoch=BATCH_SIZE,
epochs=100,
validation_data=valid_generator,
batch_size=BATCH_SIZE
)
The training loss and validation loss varies as below, wherein the training loss is constant throughout and validation loss spikes initially to become constant afterwards:
What I tried out
I tried the solutions given here one by one and decreased the learning rate from 0.01 to 0.0001.Now, this time training loss did go down slightly but then validation error still seems super fluctuating. The training loss and validation loss varies as below:
The above solution link also suggests to normalize the input, but in my opinion images doesn't need to be normalized because the data doesn't vary much and also that the VGG network already has batch normalization, please correct me if I'm wrong.Please point what is leading to this kind of behavior, what to change in the configs and how can I improve training?
One thing I see is you set steps_per_epoch = BATCH_SIZE. Assume you have 3200 training samples and the BATCH_SIZE=32. To go through all your training samples you would have to go through 3200/32=100 batches. But with steps_per_epoch=BATCH_SIZE=32 you only go through 1024 samples in an epoch. Set the steps_per_epoch as
steps_per_epoch =number_of_train samples//BATCH_SIZE
where BATCH_SIZE is whatever you specified in the generator. Alternatively you can leave it as None and model.fit will determine the right value internally.
As stated in the model.fit documentation located here. ,
Do not specify the batch_size if your data is in the form of datasets,
generators, or keras.utils.Sequence instances (since they generate batches).
Since in model.fit you use train_generator I assume this is a generator.
The VGG model was trained on imagenet images where the pixel values were rescaled within the range from -1 to +1. So somewhere in your input pipeline you should rescale the images. For example image=image/127.5-1 will do the job. What BATCH_SIZE did you use? Making it larger (within the limits of your memory size) may help smooth out the fluctuations.
I also recommend you use two keras callbacks, EarlyStopping and ReduceLROnPlateau. Documentation is here. Set them up to monitor validation loss. My suggested code is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=4,verbose=1,
restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5,
patience=2, verbose=1)
callbacks=[estop, rlronp]
# in model.fit add callbacks=callbacks
Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.
In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.
Experiment 1.
I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.
Experiment 2
I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.
Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.
Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.
Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?
If you follow this tutorial here, you will find the results of the two experiments as given below -
Experiment 1
Experiment 2
In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.
If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.
On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").
TL;DR
If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.
If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.
Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off.
See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.
Specifically:
It is important to also save the optimizer’s state_dict, as this
contains buffers and parameters that are updated as the model trains.
I want to create a Model in Keras that can learn "sample by sample"; this type of machine is called online learning, a model that receives and fit data by data. My question is:
How can I do that in Keras? Is it possible to do this just by setting batch_size=1 while fitting?
In Keras batch size has nothing to do with how data is fed in. Batch size determines how many parallel samples are going to be fed into the network per gradient update. A more clear explanation of batch size depends on what the network is. For example in a stateful RNN, batch size of N means the input tensor contains N independent series. A single batch process moves forward on all N series by one sample. So, in each batch, N samples (1 of each N independent series) are processed and the gradient is updated.
Therefore, in your case, it seems that there's only one stream for samples, If the samples are of type time series data, so we definitely have batch_size=1, If before deploying the model you have a data set to train the model on, you can read them all in memory and fit the model, and after deployment as new observations is provided you can train_on_batch or fit the model again and again. There's no limit how many times you fit the model.
I have looked at some code/tutorials (tutorial: 1 and 2) for implementing a GAN in Keras.
Both do batch training as follows:
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random batch of images
# Generate a batch of new images
# Train the discriminator
# ---------------------
# Train Generator
# ---------------------
In the above code (taken from line 92 in (2)), they loop over all epochs, but then for each epoch, only train on one batch. As I understand, for each epoch, we should train on many batches; so that we go through the whole dataset. For example, if we have 100 samples and a batch size of 10, then for each epoch, we train on 10 batches of size 10. Why is it that in this code, they only train on a single batch for each epoch? Sorry if this is a basic question; I am quite new to machine learning.
When you do GAN there are few things that change from normal neural network training.
Your input data evolves through time. The artificial images from the Generator network change at each update of the weights in the network.
You have to train both networks simultaneously. It is pointless to train the discriminator on a lot of data, if you then update the generator. Because this changes the data distribution from which the discriminator learns. For this reason you might want to update both networks frequently. So it can be preferred to make updates of both networks each batch.
I don't know why they call this update an epoch, I guess you could disagree with the naming. But remember that epoch and batch have a meaning when the training data is fixed. In this case it is not, so maybe they just call it epoch because they lack of a better word.
I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.