I decided to build a large dataset with augmented images in order to save time during the training, which takes too long due to every image being augmented on the fly, thus reducing performance and GPU usage.
I was wondering if it is possible for every epoch to train on a subset of the dataset in order to save time (train on 4000 images instead of 40000). This is somehow similar to cross validation, but my aim is simply to reduce the portion of dataset on which the model is being trained every epoch, and alternating these small portions randomly. In cross validation of course I would not reduce my training dataset size, just alternate the validation set.
By definition an epoch means that the entire dataset is passed trought the model for training. However you can use mini-batch training, divide the entire dataset into batches and train one batch at the time using .next_batch() function or by iterating over the dataset.
When you define your dataset you can use .shuffle() if you want the data in your bacthes to be randomly selected at each epoch and .batch(batch_size) to define how many samples to use for each batch.
Related
I am using Tensorflow to train my model. I am routinely saving my model every 10 epochs. I have a limited number of samples to train, so I am augmenting my dataset to make a larger training dataset.
If I need to use my saved model to resume training after a power outage would it be best to resume training using the same dataset or to make a new dataset?
Your question very much depends on how you're augmenting your dataset. If your augmentation skews the statistical distribution of the underlying dataset then you should resume training with the pre-power outage dataset. Otherwise, you're assuming that your augmentation has not changed the distribution of the dataset.
It is a fairly safe assumption to make (assuming your augmentations do not change the data in an extremely significant way) that you are safe to resume training on a new dataset or the old dataset without significant change in accuracy.
I'm currently creating a model and while creating it I came with some questions. Does training the same model with the same data multiple times leads to better precision of those objects, since your training it every time? And what could be the issue when sometimes the object gets 90% precision and when I re-run it gets lower precision or even not predicting the right object? Is it because of Tensorflow running on the GPU?
I will guess that you are doing image recognition and that you want to identify images (objects) using a neuronal network made with Keras. You should train it once, but during training you will do several epochs, meaning the algorithm adapts the weights in several rounds (epochs). For each round it goes over all training images. Once trained, you can use the model to identify images/objects.
You can evaluate the accuracy of your trained model over the same training set, but it is better to use a different set (see train_test_split from sklearn for instance).
Training is a stochastic process, meaning that every time you train your network it will be different in the end. Hence, you will get different accurcies. The stochasticity comes from different initial weights or from using stochastic gradient descent methods for instance.
The question does not appear to have anything to do with Keras or TensorFlow but basic understandting of how neuronal networks work. There is no connection to running Tensorflow on the GPU. You will also not get better precision by training with the same objects. If you train your model on a dataset for a very long time (many epochs), you might get into overfitting. Then, the accuracy of your model on this training dataset will be very high, but the model will have low accuracy on other datasets.
A common technique is split your date in train and validation datasets, then repeatedly train your model using EarlyStopping. This will train on the training dataset, then calculate the loss against the validation dataset, and then keep training until no further improvement is seen. You can set a patience parameter to wait for X epochs without an improvement to stop training (and optionally save the best model)
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
Another trick is image augmentation with ImageDataGenerator which will generate synthetic data for you (rotations, shifts, mirror images, brightness adjusts, noise etc). This can effectively increase the amount of data you have to train with, thus reducing overfitting.
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
I am training a deep learning model using Mask RCNN from the following git repository: matterport/Mask_RCNN. I rely on a heavy augmentation of my dataset (original dataset: 59 images of 1988x1355x3 with each > 80 annotations), which I store locally (necessary to evaluate type/degree of augmentation vs validation metrics). The augmented dataset counts 6000 images. This dataset varies in x and y dimensions of the image, because of reducing resolution and affine transformations - I assume the different x,y-dimensions will not affect the final tests.
However, my Python kernel crashes whenever I load more than 'X' images to train the model.
Hence, I came up with the idea of splitting the dataset in sub-datasets and iterate through the sub-dataset, using the 'last' trained weights as starting point for the new round. But I am not sure if the results will be the same (read: same, taken the stochastic nature of 'stochastic gradient descent' into account)?
I wonder, if the results would be the same, if I don't iterate through the sub-datasets per epoch, but train Y epochs (eg. 20 for 'heads' only, 10 for 'all layers')?
Yet, I am sure this is not the most efficient way of solving this issues. Ideas for improvement are welcome.
Note, I am not using keras.preprocessing.image.ImageDataGenerator(), as I have understood it, it randomly generates data and feeds it to the model by replacing the input for the epoch, whereas I would like to feed the whole dataset to the model.
I came up with the idea of splitting the dataset in sub-datasets and iterate through the sub-dataset, using the 'last' trained weights as starting point for the new round. But I am not sure if the results will be the same?
You are doing the same thing as ImageDataGenerator is doing (creating your own mini-batches) but less optimally. The result will be same with respect to what?
If you mean with respect to a model that was trained with all the data in a single batch? - Most probably not. As a lower batch means slower convergence. But this can be solved by training for more epochs.
Another issue is reproducibility. If you want to reproduce your model with same results each time just use seeds.
import random
random.seed(1)
import numpy as np
np.random.seed(1)
import tensorflow
tensorflow.random.set_seed(1)
Another concept is gradient accumulation. It will help you train you with high batch size without keeping too many images in memory at a time.
https://github.com/CyberZHG/keras-gradient-accumulation
Finally, keras.preprocessing.image.ImageDataGenerator() in facts trains on the whole dataset, it just chooses a random sample at each step (you're doing the same thing with your so-called sub-datasets).
You can seed the ImageDataGenerator so it is reproducible and not entirely random.
Whats the difference between two LSTM models A and B that are trained on same data, but the batches are shuffled randomly for each epoch, that A has 14 steps per epoch and B has 132 steps per epoch?
Which one will perform better in validation?
An epoch consists of going through all your training samples once. And one step/iteration refers to training over a single minibatch. So if you have 1,000,000 training samples and use a batch size of 100, one epoch will be equivalent to 10,000 steps, with 100 samples per step.
A high-level neural network framework may let you set either the number of epochs or total number of training steps. But you can't set them both since one directly determines the value of the other.
Effect of Batch Size on Model Behavior: Small batch results generally in rapid learning but a volatile learning process with higher variance. Larger batch sizes slow down the learning process but the final stages result in a convergence to a more stable model exemplified by lower variance.
I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.