I have trained a pre-trained RESNET18 model in pytorch and saved it. While testing the model is giving different accuracy for different mini-batch size. Does anyone know why?
Yes, I think so.
RESNET contains batch normalisation layers. At evaluation time you need to fix these; otherwise the running means are continuously being adjusted after processing each batch hence giving you different accuracy .
Try setting:
model.eval()
before evaluation. Note before getting back into training, call model.train().
Related
Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.
In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.
Experiment 1.
I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.
Experiment 2
I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.
Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.
Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.
Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?
If you follow this tutorial here, you will find the results of the two experiments as given below -
Experiment 1
Experiment 2
In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.
If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.
On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").
TL;DR
If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.
If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.
Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off.
See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.
Specifically:
It is important to also save the optimizer’s state_dict, as this
contains buffers and parameters that are updated as the model trains.
I am trying to train a resnet network using keras backend in tensorflow. The feed dictionary for each batch update is written as:
feed_dict= {x:X_train[indices[start:end]], y:Y_train[indices[start:end]], keras.backend.learning_phase():1}
I am using keras backend (keras.backend.set_session(sess)) because the original resnet network is defined with keras. As the model contains dropout and batch_norm layers, it requires a learning phase to distinct between training and testing.
I observe that whenever I set keras.backend.learning_phase():1, the model train/test accuracy hardly increase above 10%. In contrast, if the learning phase is not set i.e., the feed dictionary is defined as:
feed_dict= {x:X_train[indices[start:end]], y:Y_train[indices[start:end]]}
Then as expected, the model accuracy keeps in increasing with epochs in a standard way.
I would appreciate if someone clarifies whether the use of learning phase is not necessary or if something else is wrong. Keras 2.0 documentation seems to suggest using learning phase with dropout and batch_norm layers.
set the learning phase to 1 (training)
K.set_learning_phase(1)
Then you need to set the training=false for all batch normalization layers
if layer.name.startswith('bn'):
layer.call(layer.input, training=False)
I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.
I'm building a CNN model using Tensorflow, without the use of any frontend APIs such as Keras. I'm creating a VGG-16 model and using the pre-trained weights, and want to fine tune the last layers to serve my purpose.
Following the tutorial here, http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
I re-created the training script and modified as per my requirements. However, my training does not happen and the training accuracy is stuck at 50.00% and validation accuracy is forming a pattern repeating the numbers.
Attached is the screenshot of the same.
I have been stuck on this for days now and can't seem to find the error. Any help is appreciated.
The code is pretty long and hence here is the gist file for the same
Your cross entropy is wrong, you are comparing your logits with the softmax of your logits.
This:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_pred)
Should be:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
Some things to note. I would not train on some data point and then evaluate on the same datapoint. Your training accuracy is probably going to be biased by doing so. Another point to note ist that tf.argmax(tf.softmax(logits)) is the same as tf.argmax(logits).
Recently, I use keras to train a network to classify pictures, and use the keras function model.fit_generator() to fit my model. The fit_generator() will automatically run the model in validation data and return a validation accuracy when finish a epoch.
But odd thing happened, when I used the model to predict the validation data and compared the results with the correct class, the validation accuracy is lower than what I get when use the fit_generator().
I have two assumptions:
1. I use a generator to get data from dictionary, so I assume in one single epoch, the generator may repeatedly fetch data which is highly fitted to the model, so that the accuracy may be higher.
2. keras may use some tricks or preprocess the data when do validation, thus enhance the accuracy.
I tried to look through the source code and document of keras, but nothing helped. I would be very thankful if anyone could give me some advice about the problem.