Training and testing CNN with pytorch. With and without model.eval()

Training and testing CNN with pytorch. With and without model.eval() - python

I have two questions:-
I am trying to train a convolution neural network initialized with some pre trained weights (Netwrok contains batch normalization layers as well) (taking reference from here). Before training I want to calculate a validation error using loss_fn = torch.nn.MSELoss().cuda().
And in the reference, the author is using model.eval() before calculating the validation error. But with that result, the CNN model is off from what it should be however when I comment out model.eval(), the output is good (what it should be with pre-trained weights). What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
While calculating the validation error with pre-trained weights and above mentioned loss function what should be the batch size. Shouldn't it be 1 as i want output on each of my input, calculate error with ground truth and in the end take average of all results. If i use higher batch size error is increased. So question is can i use higher batch size if yes what should be the right way. In given code i have given err = float(loss_local) / num_samples but i observed without averaging i.e err = float(loss_local). Error is different for different batch size. I am doing this without model.eval right now.
batch_size = 1
data_path = 'path_to_data'
dtype = torch.FloatTensor
weight_file = 'path_to_weight_file'
val_loader = torch.utils.data.DataLoader(NyuDepthLoader(data_path, val_lists),batch_size=batch_size, shuffle=True, drop_last=True)
model = Model(batch_size)
model.load_state_dict(load_weights(model, weight_file, dtype))
loss_fn = torch.nn.MSELoss().cuda()
# model.eval()
with torch.no_grad():
for input, depth in val_loader:
input_var = Variable(input.type(dtype))
depth_var = Variable(depth.type(dtype))
output = model(input_var)
input_rgb_image = input_var[0].data.permute(1, 2, 0).cpu().numpy().astype(np.uint8)
input_gt_depth_image = depth_var[0][0].data.cpu().numpy().astype(np.float32)
pred_depth_image = output[0].data.squeeze().cpu().numpy().astype(np.float32)
print (format(type(depth_var)))
pred_depth_image_resize = cv2.resize(pred_depth_image, dsize=(608, 456), interpolation=cv2.INTER_LINEAR)
target_depth_transform = transforms.Compose([flow_transforms.ArrayToTensor()])
pred_depth_image_tensor = target_depth_transform(pred_depth_image_resize)
#both inputs to loss_fn are 'torch.Tensor'
loss_local += loss_fn(pred_depth_image_tensor, depth_var)
num_samples += 1
print ('num_samples {}'.format(num_samples))
err = float(loss_local) / num_samples
print('val_error before train:', err)

What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
Note: testing the model is called inference.
As explained in the official documentation:
Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
So this code must be present once you load the model from a file and do inference.
# Model class must be defined somewhere
model = torch.load(PATH)
model.eval()
This is because dropout works as a regularization for preventing overfitting during training, it is not needed for inference. Same for the batch norms.
When you use eval() this just sets module train label to False and affects only certain types of modules in particular Dropout and BatchNorm.

Related

How to make predictions on new dataset with tensorflow's gradient tape

While I'm able to understand how to use model.fit(x_train, y_train), I can't figure out how to make predictions on new data using tensorflow's gradient tape. My github repository with runnable code (up to an error) can be found here. What is currently working is that I get the trained model "network_output", however it appears that with gradient tape, argmax is being used on the model itself, where I'm used to model.fit() taking the test data as an input:
network_output = trained_network(input_images,input_number)
preds = np.argmax(network_output, axis=1)
Where "input_images" is an ndarray: (20,3,3,1) and "input_number" is an ndarray: (20,5).
Now I'm taking network_output as the trained model and would like to use it to predict similarly typed data of test_images, and test_number respectively.
The error 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'predict' here:
predicted_number = network_output.predict(test_images)
Which is because I don't know how to use the tape to make predictions. However once the prediction works I would guess I can compare the resulting "predicted_number" against the "test_number" as would usually be done using the model.fit method.
acc = 0
for i in range(len(test_images)):
if (predicted_number[i] == test_number[i]):
acc += 1
print("Accuracy: ", acc / len(input_images) * 100, "%")

In order to obtain prediction I usually iterate through batches manually like this:
predictions = []
for batch in range(num_batch):
logits = trained_network(x_test[batch * batch_size: (batch + 1) * batch_size], training=False)
# first obtain probabilities
# (if the last layer of the network has no activation, otherwise skip the softmax here)
prob = tf.nn.softmax(logits)
# putting back together predictions for all batches
predictions.extend(tf.argmax(input=prob, axis=1))
If you don't have a lot of data you can skip the loop, this is faster than using predict because you directly invoke the __call__ method of the model:
logits = trained_network(x_test, training=False)
prob = tf.nn.softmax(logits)
predictions = tf.argmax(input=prob, axis=1)
Finally you could also use predict. In this case the batches are handled automatically. It is easier to use when you have lots of data since you don't have to create a loop to interate through batches. The result is a numpy array of predictions. In can be used like this:
predictions = trained_network.predict(x_test) # you can set a batch_size if you want
What you're doing wrong is this part:
network_output = trained_network(input_images,input_number)
predicted_number = network_output.predict(test_images)
You have to call predict directly on your model trained_network.

How to generate predictions from new data using trained tensorflow network?

I want to train Googles VGGish network (Hershey et al 2017) from scratch to predict classes specific to my own audio files.
For this I am using the vggish_train_demo.py script available on their github repo which uses tensorflow. I've been able to modify the script to extract melspec features from my own audio by changing the _get_examples_batch() function, and, then train the model on the output of this function. This runs to completetion and prints the loss at each epoch.
However, I've been unable to figure out how to get this trained model to generate predictions from new data. Can this be done with changes to the vggish_train_demo.py script?

For anyone who stumbles across this in the future, I wrote this script which does the job. You must save logmel specs for train and test data in the arrays: X_train, y_train, X_test, y_test. The X_train/test are arrays of the (n, 96,64) features and the y_train/test are arrays of shape (n, _NUM_CLASSES) for two classes, where n = the number of 0.96s audio segments and _NUM_CLASSES = the number of classes used.
See the function definition statement for more info and the vggish github in my original post:
### Run the network and save the predictions and accuracy at each epoch
### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two
layers on top, and then trains this larger model.
We input log-mel spectrograms (X_train) calculated above with associated labels
(y_train), and feed the batches into the model. Once the model is trained, it
is then executed on the test log-mel spectrograms (X_test), and the accuracy is
ouput, alongside a .csv file with the predictions for each 0.96s chunk and their
true class."""
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
logits = slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='logits')
tf.sigmoid(logits, name='prediction')
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(linear_out, name='logits')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent')
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
learning_rate=vggish_params.LEARNING_RATE,
epsilon=vggish_params.ADAM_EPSILON)
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
sess.run(tf.global_variables_initializer())
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
vggish_params.INPUT_TENSOR_NAME)
accuracy_scores = []
for epoch in range(num_epochs):#FLAGS.num_batches):
epoch_loss = 0
i=0
while i < len(X_train):
start = i
end = i+batch_size
batch_x = np.array(X_train[start:end])
batch_y = np.array(y_train[start:end])
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
i+=batch_size
#print no. of epochs and loss
print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss) #FLAGS.num_batches,', loss:',epoch_loss)
#If these lines are left here, it will evaluate on the test data every iteration and print accuracy
#note this adds a small computational cost
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) #This line returns the max value of each array, which we want to be the same (think the prediction/logits is value given to each class with the highest value being the best match)
accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
accuracy1 = accuracy.eval({features_input:X_test, labels_input:y_test})
accuracy_scores.append(accuracy1)
print('Accuracy:', accuracy1)#TF is smart so just knows to feed it through the model without us seeming to tell it to.
#Save predictions for test data
predictions_sigm = logits.eval(feed_dict = {features_input:X_test}) #not really _sigm, change back later
#print(predictions_sigm) #shows table of predictions, meaningless if saving at each epoch
test_preds = pd.DataFrame(predictions_sigm, columns = col_names) #converts predictions to df
true_class = np.argmax(y_test, axis = 1) #This saves the true class
test_preds['True class'] = true_class #This adds true class to the df
#Saves csv file of table of predictions for test data. NB. header will not save when using np.text for some reason
np.savetxt("/content/drive/MyDrive/..."+"Epoch_"+str(epoch+1)+"_Accuracy_"+str(accuracy1), test_preds.values, delimiter=",")
if __name__ == '__main__':
tf.app.run()
#'An exception has occurred, use %tb to see the full traceback.' error will occur, fear not, this just means its finished (perhaps as its exited the tensorflow session?)

BatchNorm makes accuracy at prediction time around 10% of what's reported during training in tensorflow 2.6

I know this has been previously discussed, but I did not find a concrete answer, and some answers did not work after trying them, the case is simple, I have a model, if I use batch norm, the training accuracy reported by model.fit(training_data) is above 0.9 (it consistently increases, and the loss decreases), but then after training if I run model.evaluate(training_data) (notice is the same data) it returns 0.09, also predictions are really bad (the accuracy is low too if manually calculated using the results from model.predict(training_data). I know the difference between training and testing time in batch norm, and I know differences should be expected, but a drop from 0.9 to 0.09 seems just wrong(and the model is completely unusable). I tried some solutions from other threads:
use batch_size in .evaluate to be the same as .fit: did not make a difference
set tf.keras.backend.set_learning_phase(0): got a message saying it is now deprecated and made no difference.
set all batch norm layers to have layer.trainable=False before .predict and .evaluate: it did not a difference.
If I remove batch norm layers, the report from model.fit(training_data) coincides with model.evaluate(training_data) but the training is not doing any progress (results are consistent but bad) so I need to add it.
Is this a major bug in TF 2.6?
Update: also tested TF 2.5, result is the same.
Sample code(omitting irrelevant code, like data reading and pre-processing):
### model definition
class CLS_BERT_Embedding(tf.keras.Model):
"""Will only use the CLS token"""
def __init__(self, bert_trainable=False, number_filters=50,FNN_units=512,
number_clases=2,dropout_rate=0.1,name="dcnn"):
super(CLS_BERT_Embedding,self).__init__(name)
self.checkpoint_id ="CLS_BERT_Embedding_bn_3fc_{}filters_{}fc_units_berttrainable{}".format(number_filters,
FNN_units,bert_trainable)
# trainable= False so we don't fine-tune bert, just use as embedding layer
self.bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",
trainable=bert_trainable,
input_shape=(3,376))
self.dense_1 = layers.Dense(units = FNN_units,activation="relu")
self.bn1 = layers.BatchNormalization()
self.dense_2 = layers.Dense(units = FNN_units, activation="relu")
self.bn2 = layers.BatchNormalization()
self.dense_3 = layers.Dense(units = FNN_units, activation="relu")
self.bn3 = layers.BatchNormalization()
self.dropout = layers.Dropout(rate=dropout_rate)
if number_clases == 2:
self.last_dense = layers.Dense(units=1,activation="sigmoid")
else:
self.last_dense = layers.Dense(units=number_clases,activation="softmax")
def get_bert_embeddings(self,all_tokens):
CLS_embedding ,embeddings = self.bert_layer([all_tokens[:,0,:],
all_tokens[:,1,:],
all_tokens[:,2,:]])
return CLS_embedding,embeddings
def call(self,inputs,training):
CLS_embedding, x_seq = self.get_bert_embeddings(inputs)
x = self.dense_1(CLS_embedding)
x = self.bn1(x,training)
x = self.dense_2(x)
x = self.bn2(x,training)
x = self.dense_3(x)
x = self.bn3(x,training)
output = self.last_dense(x)
return output
#### config and hyper-params
NUMBER_FILTERS = 1024
FNN_UNITS = 2048
BERT_TRAINABLE = False
NUMBER_CLASSES = len(tokenizer.vocab)
DROPOUT_RATE = 0.2
NUMBER_EPOCHS = 3
LR = 0.001
DEVICE = '/GPU:0'
#### optimization definition
with tf.device(DEVICE):
model = CLS_BERT_Embedding(
bert_trainable = BERT_TRAINABLE,
number_filters=NUMBER_FILTERS,
FNN_units=FNN_UNITS,
number_clases=NUMBER_CLASSES,
dropout_rate = DROPOUT_RATE)
if NUMBER_CLASSES == 2:
loss = "binary_crossentropy"
metrics = ["accuracy"]
else:
loss="sparse_categorical_crossentropy"
metrics = ["sparse_categorical_accuracy"]
optimizer = tf.keras.optimizers.Adam(learning_rate = LR)
loss="sparse_categorical_crossentropy"
model.compile(loss=loss,optimizer=optimizer,metrics=metrics)
### training
with tf.device(DEVICE):
model.fit(train_dataset,
batch_size = BATCH_SIZE ,
epochs=NUMBER_EPOCHS,
shuffle=True,
callbacks=[MyCustomCallback(),
tf.keras.callbacks.ReduceLROnPlateau(monitor="loss",patience=5),
tensorboard,lr_tensorboard])
### testing
train_results = model.evaluate(train_dataset,batch_size = BATCH_SIZE)
print(train_results)

Try running inference without adjusting the trainable flag at all and then verifying that self.bn1.trainable=True. Then run forward prop by calling the model as a callable on each batch of your training data but with training=True, and evaluating each time. So that would be something like
for idx d in enumerate(train_dataset_:
_ = model(d[0], training=True)
model.evaluate(d)
if idx > 100:
break
If your loss starts dropping, then this is an example of your batch norm moving statistics not updating fast enough, which is possible given the BNs are not trained but your bert mode/layer is. If not, ignore the rest of this because you may have a different issue.
If that's the case, you have two options. One is to keep calling the model to allow the BN moving statistics to stabilize.
The other is to statistically analyze the output of your bert layer (get the mean and var) and directly update the BN's moving statistic weights. Probably the first BN is sufficient given your latter Dense layers are
Xavier Glorot initialized, but given you are using Relu, you might also try Kaiming He initialization on them.

I agree with #Yaoshiang, it is likely that the internal statistics (moving average, moving var) do not coincide with the mean and variance per batch, hence a different normalization in the BN layer at training and at testing. Thinking about it, if we use the same batch size for training and testing, then we can keep training=True in the test without it being a real problem (when using predict or evaluate). Otherwise, we can force in the training to use the moving average and moving variance for the normalization rather than the mean and variance by batch and still estimate beta and gamma. (This implies a minor modification of the BatchNormalization class.)

Tensorflow "model.evaluate()" giving different results each time is run on same dataset

I am having different results when I run model.evaluate in Tensorflow more than once in the same validation set.
The model includes data augmentation layers, EfficientNetB0 baseline, and a GlobalAveragePooling layer (see below). I am loading the validation dataset using tf.data pipeline from tensor slices from a dataframe, and it is not being shuffled, so that the order is always the same.
def get_custom_model(input_shape, saved_model_path=None, training_base_model=True):
input_layer = Input(shape=input_shape)
data_augmentation = RandomFlip('horizontal')(input_layer, training=False)
data_augmentation = RandomRotation(factor=(-0.2, 0.2))(data_augmentation, training=False)
data_augmentation = RandomZoom(height_factor=(-0.2, 0.2))(data_augmentation, training=False)
data_augmentation = RandomCrop(width = input_shape[0], height = input_shape[1](data_augmentation, training=False)
baseline_model = EfficientNetB0(include_top=False, weights='imagenet')
baseline_model.trainable = training_base_model # Added for bsg hypertuning
baseline_output = baseline_model(data_augmentation, training=training_base_model)
baseline_output = GlobalAveragePooling2D()(baseline_output)
attributes_output = Dense(units=228, activation='sigmoid', name='attributes_output')(baseline_output)
model = Model(inputs=[input_layer], outputs=[attributes_output])
# Load weights
if saved_model_path != None:
model.load_weights(saved_model_path)#.expect_partial()
return model
I am aware if I trained the model again, indeed the results might be different because some layers are initialized with random weights, but I expected the evaluation on the same model to be equal. I am running the method get_custom_model with the same saved_model_path so that every time the model loads the same weights (that were previously saved).
The metrics I am using to compare and that are different are loss, Precision, and Recall, in case they can be relevant. The optimizer is rmsprop and the loss BinaryCrossentropy. Also, I have tried changing training_base_model to False and the metrics are much poorer (almost like random weights).
PS: Also during the training, I was using the same validation set to have the validation metrics and save the best weights from them, but when I load the best weights again the results are not the same. For instance, I can get a Precision of 81.28% during the validation in a training epoch and then 57% when loading those weights and doing model.evaluate().

Adding Dropout to testing/inference phase

I've trained the following model for some timeseries in Keras:
input_layer = Input(batch_shape=(56, 3864))
first_layer = Dense(24, input_dim=28, activation='relu',
activity_regularizer=None,
kernel_regularizer=None)(input_layer)
first_layer = Dropout(0.3)(first_layer)
second_layer = Dense(12, activation='relu')(first_layer)
second_layer = Dropout(0.3)(second_layer)
out = Dense(56)(second_layer)
model_1 = Model(input_layer, out)
Then I defined a new model with the trained layers of model_1 and added dropout layers with a different rate, drp, to it:
input_2 = Input(batch_shape=(56, 3864))
first_dense_layer = model_1.layers[1](input_2)
first_dropout_layer = model_1.layers[2](first_dense_layer)
new_dropout = Dropout(drp)(first_dropout_layer)
snd_dense_layer = model_1.layers[3](new_dropout)
snd_dropout_layer = model_1.layers[4](snd_dense_layer)
new_dropout_2 = Dropout(drp)(snd_dropout_layer)
output = model_1.layers[5](new_dropout_2)
model_2 = Model(input_2, output)
Then I'm getting the prediction results of these two models as follow:
result_1 = model_1.predict(test_data, batch_size=56)
result_2 = model_2.predict(test_data, batch_size=56)
I was expecting to get completely different results because the second model has new dropout layers and theses two models are different (IMO), but that's not the case. Both are generating the same result. Why is that happening?

As I mentioned in the comments, the Dropout layer is turned off in inference phase (i.e. test mode), so when you use model.predict() the Dropout layers are not active. However, if you would like to have a model that uses Dropout both in training and inference phase, you can pass training argument when calling it, as suggested by François Chollet:
# ...
new_dropout = Dropout(drp)(first_dropout_layer, training=True)
# ...
Alternatively, If you have already trained your model and now want to use it in inference mode and keep the Dropout layers (and possibly other layers which have different behavior in training/inference phase such as BatchNormalization) active, you can define a backend function that takes the model's inputs as well as Keras learning phase:
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# to use it pass 1 to set the learning phase to training mode
outputs = func([input_arrays] + [1.])

your question has a simple solution in the latest version of Tensorflow. you can set the training argument of the call method to true.
you can run a code like the below code:
model(input,training=True)
by using training=True TensorFlow automatically applies the Dropout layer in inference mode.

As there are already some working code solutions above, I will simply add a few more details regarding dropout during inference to prevent confusion.
Based on the original paper, Dropout layers play the role of turning off (setting gradients to zero) the neuron nodes during training to reduce overfitting. However, once we finish off with training and start testing the model, we do not 'touch' any neurons, thus, all the units are considered to make the decision when inferencing. This causes previously 'dead' neuron weights to be large than expected due to the usage of Dropout. To prevent this, a scaling factor is applied to balance the network node. To be more precise, if a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p during the prediction stage.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Training and testing CNN with pytorch. With and without model.eval() - python

Related

How to make predictions on new dataset with tensorflow's gradient tape

How to generate predictions from new data using trained tensorflow network?

BatchNorm makes accuracy at prediction time around 10% of what's reported during training in tensorflow 2.6

Tensorflow "model.evaluate()" giving different results each time is run on same dataset

Adding Dropout to testing/inference phase

Categories

Resources