I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.
Related
I am building a vanilla DQN model to play the OpenAI gym Cartpole game.
However, in the training step where I feed in the state as input and the target Q values as the labels, if I use model.fit(x=states, y=target_q), it works fine and the agent can eventually play the game well, but if I use model.train_on_batch(x=states, y=target_q), the loss won't decrease and the model will not play the game anywhere better than a random policy.
I wonder what is the difference between fit and train_on_batch? To my understanding, fit calls train_on_batch with a batch size of 32 under the hood which should make no difference since specifying the batch size to equal the actual data size I feed in makes no difference.
The full code is here if more contextual information is needed to answer this question: https://github.com/ultronify/cartpole-tf
model.fit will train 1 or more epochs. That means it will train multiple batches. model.train_on_batch, as the name implies, trains only one batch.
To give a concrete example, imagine you are training a model on 10 images. Let's say your batch size is 2. model.fit will train on all 10 images, so it will update the gradients 5 times. (You can specify multiple epochs, so it iterates over your dataset.) model.train_on_batch will perform one update of the gradients, as you only give the model on batch. You would give model.train_on_batch two images if your batch size is 2.
And if we assume that model.fit calls model.train_on_batch under the hood (though I don't think it does), then model.train_on_batch would be called multiple times, likely in a loop. Here's pseudocode to explain.
def fit(x, y, batch_size, epochs=1):
for epoch in range(epochs):
for batch_x, batch_y in batch(x, y, batch_size):
model.train_on_batch(batch_x, batch_y)
I'm currently creating a model and while creating it I came with some questions. Does training the same model with the same data multiple times leads to better precision of those objects, since your training it every time? And what could be the issue when sometimes the object gets 90% precision and when I re-run it gets lower precision or even not predicting the right object? Is it because of Tensorflow running on the GPU?
I will guess that you are doing image recognition and that you want to identify images (objects) using a neuronal network made with Keras. You should train it once, but during training you will do several epochs, meaning the algorithm adapts the weights in several rounds (epochs). For each round it goes over all training images. Once trained, you can use the model to identify images/objects.
You can evaluate the accuracy of your trained model over the same training set, but it is better to use a different set (see train_test_split from sklearn for instance).
Training is a stochastic process, meaning that every time you train your network it will be different in the end. Hence, you will get different accurcies. The stochasticity comes from different initial weights or from using stochastic gradient descent methods for instance.
The question does not appear to have anything to do with Keras or TensorFlow but basic understandting of how neuronal networks work. There is no connection to running Tensorflow on the GPU. You will also not get better precision by training with the same objects. If you train your model on a dataset for a very long time (many epochs), you might get into overfitting. Then, the accuracy of your model on this training dataset will be very high, but the model will have low accuracy on other datasets.
A common technique is split your date in train and validation datasets, then repeatedly train your model using EarlyStopping. This will train on the training dataset, then calculate the loss against the validation dataset, and then keep training until no further improvement is seen. You can set a patience parameter to wait for X epochs without an improvement to stop training (and optionally save the best model)
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
Another trick is image augmentation with ImageDataGenerator which will generate synthetic data for you (rotations, shifts, mirror images, brightness adjusts, noise etc). This can effectively increase the amount of data you have to train with, thus reducing overfitting.
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.
In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.
Experiment 1.
I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.
Experiment 2
I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.
Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.
Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.
Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?
If you follow this tutorial here, you will find the results of the two experiments as given below -
Experiment 1
Experiment 2
In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.
If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.
On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").
TL;DR
If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.
If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.
Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off.
See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.
Specifically:
It is important to also save the optimizer’s state_dict, as this
contains buffers and parameters that are updated as the model trains.
I want to create a Model in Keras that can learn "sample by sample"; this type of machine is called online learning, a model that receives and fit data by data. My question is:
How can I do that in Keras? Is it possible to do this just by setting batch_size=1 while fitting?
In Keras batch size has nothing to do with how data is fed in. Batch size determines how many parallel samples are going to be fed into the network per gradient update. A more clear explanation of batch size depends on what the network is. For example in a stateful RNN, batch size of N means the input tensor contains N independent series. A single batch process moves forward on all N series by one sample. So, in each batch, N samples (1 of each N independent series) are processed and the gradient is updated.
Therefore, in your case, it seems that there's only one stream for samples, If the samples are of type time series data, so we definitely have batch_size=1, If before deploying the model you have a data set to train the model on, you can read them all in memory and fit the model, and after deployment as new observations is provided you can train_on_batch or fit the model again and again. There's no limit how many times you fit the model.
Recently, I use keras to train a network to classify pictures, and use the keras function model.fit_generator() to fit my model. The fit_generator() will automatically run the model in validation data and return a validation accuracy when finish a epoch.
But odd thing happened, when I used the model to predict the validation data and compared the results with the correct class, the validation accuracy is lower than what I get when use the fit_generator().
I have two assumptions:
1. I use a generator to get data from dictionary, so I assume in one single epoch, the generator may repeatedly fetch data which is highly fitted to the model, so that the accuracy may be higher.
2. keras may use some tricks or preprocess the data when do validation, thus enhance the accuracy.
I tried to look through the source code and document of keras, but nothing helped. I would be very thankful if anyone could give me some advice about the problem.