how to calculate the average of training accuracy in Keras?

how to calculate the average of training accuracy in Keras? - python

i am trying to calculate the average of the training accuracy in y model which is written with KERAS, i have 200 epochs. So in the end i want to sum each training accuracy in each epoch with the previous one and divided them by 200..
here is my code
num = 200
total_sum = 0
for n in range(num):
avg_train=np.array(model.fit(x_train,y_train, epochs=200, batch_size=64, verbose=2))
total_sum = avg_train + total_sum
avg = total_sum/num
score=model.evaluate(x_test, y_test, verbose=2)
print(score)
print('the average is',avg)
i am trying to store each accuracy in a numpy array to be able using it in the summation operation but it gives me the following error
Traceback (most recent call last):
File "G:\Master Implementation\MLPADAM.py", line 87, in <module>
total_sum = avg_train + total_sum
TypeError: unsupported operand type(s) for +: 'History' and 'int'

There are several issues with your question...
To start with, your code will fit a model with 200 epochs 200 times, i.e. a total of 200*200 = 40,000 epochs.
Moreover, since model.fit in Keras is run incrementally, each call of model.fit in your loop will continue training from where the previous iteration stopped, so effectively at the end you will indeed have a model fitted with 40,000 epochs.
Assuming that this is not what you are trying to do, but you want simply the average accuracy during your training, the answer is to use the History object returned by model.fit; from the model.fit docs:
Returns
A History object. Its History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).
So, here is a quick demonstration with MNIST and only 5 epochs (and forget the for loop!):
# your model definition
# your model.compile()
batch_size = 128
epochs = 5
hist = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test) # optional
)
# output
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 76s - loss: 0.3367 - acc: 0.8974 - val_loss: 0.0765 - val_acc: 0.9742
Epoch 2/5
60000/60000 [==============================] - 73s - loss: 0.1164 - acc: 0.9656 - val_loss: 0.0516 - val_acc: 0.9835
Epoch 3/5
60000/60000 [==============================] - 74s - loss: 0.0866 - acc: 0.9741 - val_loss: 0.0411 - val_acc: 0.9863
Epoch 4/5
60000/60000 [==============================] - 73s - loss: 0.0730 - acc: 0.9781 - val_loss: 0.0376 - val_acc: 0.9871
Epoch 5/5
60000/60000 [==============================] - 73s - loss: 0.0639 - acc: 0.9810 - val_loss: 0.0354 - val_acc: 0.9881
hist.history is a dictionary containing the value of the metrics for each epoch:
hist.history
# result:
{'acc': [0.8973833333969117,
0.9656000000635783,
0.9740500000317891,
0.9780500000635783,
0.9810333334604899],
'loss': [0.3367467244784037,
0.11638248273332914,
0.08664042545557023,
0.07301943883101146,
0.06391783343354861],
'val_acc': [0.9742, 0.9835, 0.9863, 0.9871, 0.9881],
'val_loss': [0.07650674062222243,
0.051606363496184346,
0.04107686730045825,
0.03761903735231608,
0.03537947320453823]}
To get the training accuracy per epoch:
hist.history['acc']
# result:
[0.8973833333969117,
0.9656000000635783,
0.9740500000317891,
0.9780500000635783,
0.9810333334604899]
and the average value is simply
np.mean(hist.history['acc']) # numpy assumed imported as np
# 0.9592233334032695

Related

CNN Model not training on whole training set in epochs [duplicate]

This question already has answers here:
TensorFlow Only running on 1/32 of the Training data provided [duplicate]
(1 answer)
Keras not training on entire dataset
(3 answers)
Closed 1 year ago.
#Resizing the image to make it suitable for apply convolution
import numpy as np
img_size = 28
X_trainr = np.array(X_train).reshape(-1, img_size, img_size, 1)
X_testr = np.array(X_test).reshape(-1, img_size, img_size, 1)
# Model Compilation
model.compile(loss = "sparse_categorical_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model.fit(X_trainr, y_train, epochs = 5, validation_split = 0.2) #training the model
I loaded the MNIST dataset for digit recognizing code.
Then I splited the dataset in training and test set.
Then added a new dimension to the 3D training set array and named new array as X_trainr.
Then I complited and fitted the model.
And after fitting the model, model was not taking whole training set (42000 samples) instead it is taking only 1500 samples.
I have tried : set validation_split = 0.3 then it was training on 1313 samples. Why my model is not taking whole training set(42000 samples)?
Output
Epoch 1/5
1500/1500 [==============================] - 102s 63ms/step - loss: 0.2930 - accuracy: 0.9063 - val_loss: 0.1152 - val_accuracy: 0.9649
Epoch 2/5
1500/1500 [==============================] - 84s 56ms/step - loss: 0.0922 - accuracy: 0.9723 - val_loss: 0.0696 - val_accuracy: 0.9780
Epoch 3/5
1500/1500 [==============================] - 80s 53ms/step - loss: 0.0666 - accuracy: 0.9795 - val_loss: 0.0619 - val_accuracy: 0.9818
Epoch 4/5
1500/1500 [==============================] - 79s 52ms/step - loss: 0.0519 - accuracy: 0.9837 - val_loss: 0.0623 - val_accuracy: 0.9831
Epoch 5/5
1500/1500 [==============================] - 84s 56ms/step - loss: 0.0412 - accuracy: 0.9870 - val_loss: 0.0602 - val_accuracy: 0.9818

if X_trainr has 42,000 samples initially and you use a validation split of .2 then you are training on 33600 samples. In model.fit you did not specify a batch size so it defaults to 32. The numbers shown during training are NOT the number of samples. What it is, is Number of training samples/batch_size. Which should be 33600/32=1050. However it shows 1500 so I suspect your X_trainr X .8 size is actually 48000. So Xtrainr= 48000/.8= 60000. Check the dimension of X_trainr.

How to get the highest accuracy of a model after the training

I have run a model with 4 epochs and using early_stopping.
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=2, restore_best_weights=True)
history = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=4, callbacks=[early_stopping])
Epoch 1/4
812/812 [==============================] - 68s 13ms/sample - loss: 0.6072 - acc: 0.717 - val_loss: 0.554 - val_acc: 0.7826
Epoch 2/4
812/812 [==============================] - 88s 11ms/sample - loss: 0.5650 - acc: 0.807 - val_loss: 0.527 - val_acc: 0.8157
Epoch 3/4
812/812 [==============================] - 88s 11ms/sample - loss: 0.5456 - acc: 0.830 - val_loss: 0.507 - val_acc: 0.8244
Epoch 4/4
812/812 [==============================] - 51s 9ms/sample - loss: 0.658 - acc: 0.833 - val_loss: 0.449 - val_acc: 0.8110
The highest val_ac corresponds to the third epoch, and is 0.8244. However, the accuracy_score function will return the last val_acc value, which is 0.8110.
yhat = model.predict_classes(testX)
accuracy = accuracy_score(testY, yhat)
It is possible to specify the epoch while calling the predict_classesin order to get the highest accuracy (in this case, the one corresponding to the third epoch) ?

It looks like early-stopping isn't being trigged because you're only training for 4 epochs and you've set early stopping to trigger when val_loss doesn't decrease over two epochs. If you look at your val_loss for each epoch, you can see it's still decreasing even on the fourth epoch.
So simply put, your model is just running the full four epochs without using early stopping, which is why it's using the weights learned in epoch 4 rather than the best in terms of val_acc.
To fix this, set monitor='val_acc' and run for a few more epochs. val_acc only starts to decrease after epoch 3, so earlystopping won't trigger until epoch 5 at the earliest.
Alternatively you could set patience=1 so it only checks a single epoch ahead.

Why there's a bad accuracy on dataset when it's used both for validation and training?

I trained a model with ResNet50 and got an amazing accuracy of 95% on training set.
I took the same training set for validation and the accuracy seem very bad.(<0.05%)
from keras.preprocessing.image import ImageDataGenerator
train_set = ImageDataGenerator(horizontal_flip=True,rescale=1./255,shear_range=0.2,zoom_range=0.2).flow_from_directory(data,target_size=(256,256),classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=31)
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import layers,Model
conv_base = ResNet50(
include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(31, activation='softmax')(x)
model = Model(conv_base.input, predictions)
# here you will write the path for train data or if you create your val data then you can test using that too.
# test_dir = ""
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
data,
target_size=(256, 256), classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=1,shuffle=True)
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit_generator(train_set,steps_per_epoch=1488//31,epochs=10,verbose=True,validation_data = test_generator,
validation_steps = test_generator.samples // 31)
Epoch 1/10
48/48 [==============================] - 27s 553ms/step - loss: 1.9631 - acc: 0.4825 - val_loss: 4.3134 - val_acc: 0.0208
Epoch 2/10
48/48 [==============================] - 22s 456ms/step - loss: 0.6395 - acc: 0.8212 - val_loss: 4.7584 - val_acc: 0.0833
Epoch 3/10
48/48 [==============================] - 23s 482ms/step - loss: 0.4325 - acc: 0.8810 - val_loss: 5.3852 - val_acc: 0.0625
Epoch 4/10
48/48 [==============================] - 23s 476ms/step - loss: 0.2925 - acc: 0.9153 - val_loss: 6.0963 - val_acc: 0.0208
Epoch 5/10
48/48 [==============================] - 23s 477ms/step - loss: 0.2275 - acc: 0.9341 - val_loss: 5.6571 - val_acc: 0.0625
Epoch 6/10
48/48 [==============================] - 23s 478ms/step - loss: 0.1855 - acc: 0.9489 - val_loss: 6.2440 - val_acc: 0.0208
Epoch 7/10
48/48 [==============================] - 23s 483ms/step - loss: 0.1704 - acc: 0.9543 - val_loss: 7.4446 - val_acc: 0.0208
Epoch 8/10
48/48 [==============================] - 23s 487ms/step - loss: 0.1828 - acc: 0.9476 - val_loss: 7.5198 - val_acc: 0.0417
What could be the reason?!

You have assigned train_set and test_datagen differently. In particular one is flipped and scaled where the other isn't. As I mentioned in my comment, if its the same data it will have the same accuracy. You can see a model is overfitting when you use validation correctly and use unseen data for validation. Using the same data will always give the same accuracy for training and validation

not sure what is exactly wrong but it is NOT an over fitting issue. It is clear your validation data(same as training data) is not going in correctly. For one thing you set the validation batch size =1 but you set the validation steps as validation_steps = test_generator.samples // 31) . If test_generator,samples = 1488 then you have 48 steps but with a batch size of 1 you will only validate 48 samples. You want to set the batch size and steps so that batch_size X validation_steps equals the total number of samples. That way you go through the validation set exactly one time. I also recommend that for the test generator you set shuffle=False. Also why do you bother entering all the class names. If you have your class directories labeled as 'airplane','airport','baseball_diamond' etc then you don;t need to specifically define the classes flow from directory will do that for you automatically. See documentation below.
classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.
Your training data is actually different than your test data because you are using data augmentation in the generator. That's OK it may lead to a small difference between your test and validation accuracy but your validation accuracy should be pretty close once you get the validation data to go in correctly

EarlyStopping TensorFlow 2.0

I am running a code using Python 3.7.5 with TensorFlow 2.0 for MNIST classification.
I am using EarlyStopping from TensorFlow 2.0 and the callback I have for it is:
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience = 3,
min_delta=0.001
)
]
According to EarlyStopping - TensorFlow 2.0 page, the definition of min_delta parameter is as follows:
min_delta: Minimum change in the monitored quantity to qualify as an
improvement, i.e. an absolute change of less than min_delta, will
count as no improvement.
Train on 60000 samples, validate on 10000 samples
Epoch 1/15 60000/60000 [==============================] - 10s
173us/sample - loss: 0.2040 - accuracy: 0.9391 - val_loss: 0.1117 -
val_accuracy: 0.9648
Epoch 2/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0845 - accuracy: 0.9736 - val_loss: 0.0801 -
val_accuracy: 0.9748
Epoch 3/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0574 - accuracy: 0.9817 - val_loss: 0.0709 -
val_accuracy: 0.9795
Epoch 4/15 60000/60000 [==============================] - 9s
149us/sample - loss: 0.0434 - accuracy: 0.9858 - val_loss: 0.0787 -
val_accuracy: 0.9761
Epoch 5/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0331 - accuracy: 0.9893 - val_loss: 0.0644 -
val_accuracy: 0.9808
Epoch 6/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0275 - accuracy: 0.9910 - val_loss: 0.0873 -
val_accuracy: 0.9779
Epoch 7/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0232 - accuracy: 0.9921 - val_loss: 0.0746 -
val_accuracy: 0.9805
Epoch 8/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0188 - accuracy: 0.9936 - val_loss: 0.1088 -
val_accuracy: 0.9748
Now if I look at the last three epochs viz., epochs 6, 7, and 8 and look at the validation loss ('val_loss'), their values are:
0.0688, 0.0843 and 0.0847.
And the differences between consecutive 3 terms are: 0.0155, 0.0004. But isn't the first difference greater than 'min_delta' as defined in the callback.
The code I came up with for EarlyStopping is as follows:
# numpy array to hold last 'patience = 3' values-
pv = [0.0688, 0.0843, 0.0847]
# numpy array to compute differences between consecutive elements in 'pv'-
differences = np.diff(pv, n=1)
differences
# array([0.0155, 0.0004])
# minimum change required for monitored metric's improvement-
min_delta = 0.001
# Check whether the consecutive differences is greater than 'min_delta' parameter-
check = differences > min_delta
check
# array([ True, False])
# Condition to see whether all 3 'val_loss' differences are less than 'min_delta'
# for training to stop since EarlyStopping is called-
if np.all(check == False):
print("Stop Training - EarlyStopping is called")
# stop training
But according to the 'val_loss', the differences between the not ALL of the 3 last epochs are greater than 'min_delta' of 0.001. For example, the first difference is greater than 0.001 (0.0843 - 0.0688) while the second difference is less than 0.001 (0.0847 - 0.0843).
Also, according to patience parameter definition of "EarlyStopping":
patience: Number of epochs with no improvement after which training will be stopped.
So, EarlyStopping should be called when there is no improvement for 'val_loss' for 3 consecutive epochs where the absolute change of less than 'min_delta' does not count as improvement.
Then why is EarlyStopping called?
Code for model definition and 'fit()' are:
import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import matplotlib.pyplot as plt from tensorflow.keras.layers import AveragePooling2D, Conv2D
from tensorflow.keras import models, layers, datasets
from tensorflow.keras.layers import Dense, Flatten, Reshape, Input, InputLayer
from tensorflow.keras.models import Sequential, Model
# Specify the parameters to be used for layer-wise pruning, NO PRUNING is done here:
pruning_params_unpruned = {
'pruning_schedule': sparsity.PolynomialDecay(
initial_sparsity=0.0, final_sparsity=0.0,
begin_step = 0, end_step=0, frequency=100) }
def pruned_nn(pruning_params):
"""
Function to define the architecture of a neural network model
following 300 100 architecture for MNIST dataset and using
provided parameter which are used to prune the model.
Input: 'pruning_params' Python 3 dictionary containing parameters which are used for pruning
Output: Returns designed and compiled neural network model
"""
pruned_model = Sequential()
pruned_model.add(l.InputLayer(input_shape=(784, )))
pruned_model.add(Flatten())
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 300, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.2))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 100, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.1))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = num_classes, activation='softmax'),
**pruning_params))
# Compile pruned CNN-
pruned_model.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.001),
metrics=['accuracy'])
return pruned_model
batch_size = 32
epochs = 50
# Instantiate NN-
orig_model = pruned_nn(pruning_params_unpruned)
# Train unpruned Neural Network-
history_orig = orig_model.fit(
x = X_train, y = y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
callbacks = callbacks,
validation_data = (X_test, y_test),
shuffle = True )

The behaviour of the Early Stopping callback is related to:
Metric or Loss to be monitored
min_delta which is the minimum quantity to be considered an improvement, between the performance of the monitored quantity in the current epoch and the best result in that metric.
patience which is the number of epochs without improvements (taking into consideration that improvements have to be of a greater change than min_delta) before stopping the algorithm.
In your case, the best val_lossis 0.0644 and should have a value of lower than 0.0634 to be registered as improvement:
Epoch 6/15 val_loss: 0.0873 | Difference is: + 0.0229
Epoch 7/15 val_loss: 0.0746 | Difference is: + 0.0102
Epoch 8/15 val_loss: 0.1088 | Difference is: + 0.0444
Be aware that the quantities that are printed in the "training log", are approximated and you shouldn't base your assumptions on these values. You should rather take into consideration the true values from callbacks or the history of the training.
Reference

Re-fitting model with partial data results lower accuracy than fitting model with full data

My dataset is large and cannot fit into RAM. My hypothesis is to divide the dataset into large chunks and train the model iteratively and see similar accuracy results as non-divided dataset (as described in #107). So, I tested it in two steps with a sample set of 146244 elements.
First, try splitting the data into three chunks (50000, 50000 and 46244) and fit the model with each chunk. To be on the safe side, I load the model saved in the previous iteration.
Second try fitting the model with full data (146244 elements).
Here's the code for creating and fitting the model:
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='sigmoid'))
optimizer = optimizers.adam(lr=0.0001)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])
model.save(top_model_path)
for train_data, train_labels in hdf5_generator('D:/_g/kaggle/cdisco/files/features_xception_1_100_90x90_tr1.hdf5', 'train', train_labels):
model = load_model(top_model_path)
model.fit(train_data, train_labels, epochs=2, batch_size=batch_size)
model.save(top_model_path)
(eval_loss, eval_accuracy) = model.evaluate(validation_data, validation_labels, batch_size=batch_size, verbose=1)
print("Accuracy: {:.2f}%".format(eval_accuracy * 100))
print("Loss: {}\n".format(eval_loss))
And, here is the code for the generator:
def hdf5_generator(file_name, dataset, labels):
hf = h5py.File(file_name, 'r')
i = 0
batch_size = 50000
while 1:
start = i * batch_size
end = ((i + 1) * batch_size)
print("start: %d -- end: %d" % (start, end))
yield (hf[dataset][start:end], labels[start:end])
if end >= len(hf[dataset]): break
i += 1
First try with chunked data results in unstable accuracy between steps (22,90%, 59,93%, 51,17%):
start: 0 -- end: 50000
Epoch 1/2
50000/50000 [==============================] - 39s - loss: 2.4969 - acc: 0.5143
Epoch 2/2
50000/50000 [==============================] - 38s - loss: 1.5667 - acc: 0.6201
16156/16156 [==============================] - 33s
Accuracy: 22.90%
Loss: 5.976185762991436
start: 50000 -- end: 100000
Epoch 1/2
50000/50000 [==============================] - 38s - loss: 1.3759 - acc: 0.7211
Epoch 2/2
50000/50000 [==============================] - 39s - loss: 0.5446 - acc: 0.8621
16156/16156 [==============================] - 34s
Accuracy: 59.93%
Loss: 2.540657121840312
start: 100000 -- end: 150000
Epoch 1/2
46244/46244 [==============================] - 36s - loss: 0.2640 - acc: 0.9531
Epoch 2/2
46244/46244 [==============================] - 36s - loss: 0.1283 - acc: 0.9672
16156/16156 [==============================] - 34s
Accuracy: 51.17%
Loss: 3.8107337748964336
Second try (with batch_size=146244 in hdf5_generator) results in 77.49% accuracy:
start: 0 -- end: 146244
Epoch 1/2
146244/146244 [==============================] - 112s - loss: 1.8089 - acc: 0.6123
Epoch 2/2
146244/146244 [==============================] - 113s - loss: 1.0966 - acc: 0.7265
16156/16156 [==============================] - 39s
Accuracy: 77.49%
Loss: 0.8401890944788202
I expected to see similar accuracy results. However, results were different and first results seems like model loses parameters between iterations. How can I get results with chunked dataset with re-fitting similar to single fitting with full data?
I tried using HDF5Matrix, but it resulted in very low performance.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to calculate the average of training accuracy in Keras? - python

Related

CNN Model not training on whole training set in epochs [duplicate]

How to get the highest accuracy of a model after the training

Why there's a bad accuracy on dataset when it's used both for validation and training?

EarlyStopping TensorFlow 2.0

Re-fitting model with partial data results lower accuracy than fitting model with full data

Categories

Resources