This question already has answers here:
TensorFlow Only running on 1/32 of the Training data provided [duplicate]
(1 answer)
Keras not training on entire dataset
(3 answers)
Closed 1 year ago.
#Resizing the image to make it suitable for apply convolution
import numpy as np
img_size = 28
X_trainr = np.array(X_train).reshape(-1, img_size, img_size, 1)
X_testr = np.array(X_test).reshape(-1, img_size, img_size, 1)
# Model Compilation
model.compile(loss = "sparse_categorical_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model.fit(X_trainr, y_train, epochs = 5, validation_split = 0.2) #training the model
I loaded the MNIST dataset for digit recognizing code.
Then I splited the dataset in training and test set.
Then added a new dimension to the 3D training set array and named new array as X_trainr.
Then I complited and fitted the model.
And after fitting the model, model was not taking whole training set (42000 samples) instead it is taking only 1500 samples.
I have tried : set validation_split = 0.3 then it was training on 1313 samples. Why my model is not taking whole training set(42000 samples)?
Output
Epoch 1/5
1500/1500 [==============================] - 102s 63ms/step - loss: 0.2930 - accuracy: 0.9063 - val_loss: 0.1152 - val_accuracy: 0.9649
Epoch 2/5
1500/1500 [==============================] - 84s 56ms/step - loss: 0.0922 - accuracy: 0.9723 - val_loss: 0.0696 - val_accuracy: 0.9780
Epoch 3/5
1500/1500 [==============================] - 80s 53ms/step - loss: 0.0666 - accuracy: 0.9795 - val_loss: 0.0619 - val_accuracy: 0.9818
Epoch 4/5
1500/1500 [==============================] - 79s 52ms/step - loss: 0.0519 - accuracy: 0.9837 - val_loss: 0.0623 - val_accuracy: 0.9831
Epoch 5/5
1500/1500 [==============================] - 84s 56ms/step - loss: 0.0412 - accuracy: 0.9870 - val_loss: 0.0602 - val_accuracy: 0.9818
if X_trainr has 42,000 samples initially and you use a validation split of .2 then you are training on 33600 samples. In model.fit you did not specify a batch size so it defaults to 32. The numbers shown during training are NOT the number of samples. What it is, is Number of training samples/batch_size. Which should be 33600/32=1050. However it shows 1500 so I suspect your X_trainr X .8 size is actually 48000. So Xtrainr= 48000/.8= 60000. Check the dimension of X_trainr.
Related
I am working on project with Keras Captcha OCR model. This model is about text Captcha recognition with CTC encoded output, apart form combining CNN and RNN.
I am trying to see the accuracy number from training output. How can I get the number of accuracy and validation accuracy?
Here is the training code form keras model:
epochs = 100
early_stopping_patience = 10
# Add early stopping
early_stopping = keras.callbacks.EarlyStopping(
monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True
)
# Train the model
history = model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=epochs,
callbacks=[early_stopping],
)
And this is the training output:
Epoch 1/100
59/59 [==============================] - 3s 53ms/step - loss: 21.5722 - val_loss: 16.3351
Epoch 2/100
59/59 [==============================] - 2s 27ms/step - loss: 16.3335 - val_loss: 16.3062
Epoch 3/100
59/59 [==============================] - 2s 27ms/step - loss: 16.3360 - val_loss: 16.3116
Epoch 4/100
59/59 [==============================] - 2s 27ms/step - loss: 16.3318 - val_loss: 16.3167
Epoch 5/100
Before calling model.fit just specify the metric you want to compute during training, in this case accuracy:
model.compile(optimizer= your_optimizer, loss= your_loss, metrics=['acc'])
Link to the documentation.
I trained a model with ResNet50 and got an amazing accuracy of 95% on training set.
I took the same training set for validation and the accuracy seem very bad.(<0.05%)
from keras.preprocessing.image import ImageDataGenerator
train_set = ImageDataGenerator(horizontal_flip=True,rescale=1./255,shear_range=0.2,zoom_range=0.2).flow_from_directory(data,target_size=(256,256),classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=31)
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import layers,Model
conv_base = ResNet50(
include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(31, activation='softmax')(x)
model = Model(conv_base.input, predictions)
# here you will write the path for train data or if you create your val data then you can test using that too.
# test_dir = ""
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
data,
target_size=(256, 256), classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=1,shuffle=True)
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit_generator(train_set,steps_per_epoch=1488//31,epochs=10,verbose=True,validation_data = test_generator,
validation_steps = test_generator.samples // 31)
Epoch 1/10
48/48 [==============================] - 27s 553ms/step - loss: 1.9631 - acc: 0.4825 - val_loss: 4.3134 - val_acc: 0.0208
Epoch 2/10
48/48 [==============================] - 22s 456ms/step - loss: 0.6395 - acc: 0.8212 - val_loss: 4.7584 - val_acc: 0.0833
Epoch 3/10
48/48 [==============================] - 23s 482ms/step - loss: 0.4325 - acc: 0.8810 - val_loss: 5.3852 - val_acc: 0.0625
Epoch 4/10
48/48 [==============================] - 23s 476ms/step - loss: 0.2925 - acc: 0.9153 - val_loss: 6.0963 - val_acc: 0.0208
Epoch 5/10
48/48 [==============================] - 23s 477ms/step - loss: 0.2275 - acc: 0.9341 - val_loss: 5.6571 - val_acc: 0.0625
Epoch 6/10
48/48 [==============================] - 23s 478ms/step - loss: 0.1855 - acc: 0.9489 - val_loss: 6.2440 - val_acc: 0.0208
Epoch 7/10
48/48 [==============================] - 23s 483ms/step - loss: 0.1704 - acc: 0.9543 - val_loss: 7.4446 - val_acc: 0.0208
Epoch 8/10
48/48 [==============================] - 23s 487ms/step - loss: 0.1828 - acc: 0.9476 - val_loss: 7.5198 - val_acc: 0.0417
What could be the reason?!
You have assigned train_set and test_datagen differently. In particular one is flipped and scaled where the other isn't. As I mentioned in my comment, if its the same data it will have the same accuracy. You can see a model is overfitting when you use validation correctly and use unseen data for validation. Using the same data will always give the same accuracy for training and validation
not sure what is exactly wrong but it is NOT an over fitting issue. It is clear your validation data(same as training data) is not going in correctly. For one thing you set the validation batch size =1 but you set the validation steps as validation_steps = test_generator.samples // 31) . If test_generator,samples = 1488 then you have 48 steps but with a batch size of 1 you will only validate 48 samples. You want to set the batch size and steps so that batch_size X validation_steps equals the total number of samples. That way you go through the validation set exactly one time. I also recommend that for the test generator you set shuffle=False. Also why do you bother entering all the class names. If you have your class directories labeled as 'airplane','airport','baseball_diamond' etc then you don;t need to specifically define the classes flow from directory will do that for you automatically. See documentation below.
classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.
Your training data is actually different than your test data because you are using data augmentation in the generator. That's OK it may lead to a small difference between your test and validation accuracy but your validation accuracy should be pretty close once you get the validation data to go in correctly
I have a rather complex sequence to sequence encoder decoder model. I run into an issue where my loss and accuracy drop to zero and I can't reproduce this error. It has nothing to do with the training data as it happens with different sets.
It seems to be learning as the loss slowly drops. Below is what it is like just before:
Epoch 1/2
5000/5000 [==============================] - 235s 47ms/step - loss: 0.9825 - acc: 0.7077
Epoch 2/2
5000/5000 [==============================] - 235s 47ms/step - loss: 0.9443 - acc: 0.7177
And here is what is like during the next mode.fit() iteration:
Epoch 1/2
2882/2882 [==============================] - 136s 47ms/step - loss: 0.7033 - acc: 0.4399
Epoch 2/2
2882/2882 [==============================] - 136s 47ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
After this, the loss and accuracy remain the same:
Epoch 1/2
5000/5000 [==============================] - 278s 56ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
Epoch 2/2
5000/5000 [==============================] - 279s 56ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
The reason I have to train in such a manner is because I have variable input sizes and output sizes. So I have to make batches of my training data with fixed input size before I train.
sgd = optimizers.SGD(lr= 0.015, decay=0.002)
out2 = model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
I need to use curriculum learning to reach sentence level predictions, so I am doing the following:
I initially train my model to output "1 word + end" token. Training on this works fine. When i start to train on "2 words + end", this problem starts to arise.
After training on 1 word, I save the model. Then I define a new model with output size for 2 words, and use the following:
new_model = createModel(...,num_output_words)
new_model.set_weights(old_model.get_weights())
I have to do this as I can't define a model with variable output length.
I can provide more information if needed. I can't find any information online.
I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.
I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.
Here is my code:
Generate some two random data with two classes:
N = 100
x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
y = np.concatenate([np.zeros(N//2), np.ones(N//2)])
And plotting the two class data distribution (one feature x):
data = pd.DataFrame({'x': x.ravel(), 'y': y})
sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
pyplot.tight_layout(0)
pyplot.show()
Build and fit the keras model:
model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)
Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.
The training results are:
100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
Epoch 2/10
100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
Epoch 3/10
100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
Epoch 4/10
100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
Epoch 5/10
100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
Epoch 6/10
100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
Epoch 7/10
100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
Epoch 8/10
100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
Epoch 9/10
100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
Epoch 10/10
100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800
The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.
I have been going through the code in keras, particularly this part:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364
and so far, all I can say that the difference is due to some different computation through the computation graph.
Does anyone has any idea why there would be such difference?
So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.
Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.
This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...
A quick check using tensorflow confirmed that values are fetched before variables are updated:
import tensorflow as tf
import numpy as np
np.random.seed(1)
x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")
W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
z = tf.matmul(x, W) + b
error = tf.square(z - y)
obj = tf.reduce_mean(error, name="obj")
opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
grads = opt.compute_gradients(obj)
train_step = opt.apply_gradients(grads)
N = 100
x_np = np.random.randn(N).reshape(-1, 1)
y_np = 2*x_np + 3 + np.random.randn(N)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(2):
res = sess.run([obj, W, b, train_step], feed_dict={x: x_np, y: y_np})
print('MSE: {}, W: {}, b: {}'.format(res[0], res[1][0, 0], res[2][0]))
Output:
MSE: 14.721437454223633, W: 0.0, b: 0.0
MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985
Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...
i am trying to calculate the average of the training accuracy in y model which is written with KERAS, i have 200 epochs. So in the end i want to sum each training accuracy in each epoch with the previous one and divided them by 200..
here is my code
num = 200
total_sum = 0
for n in range(num):
avg_train=np.array(model.fit(x_train,y_train, epochs=200, batch_size=64, verbose=2))
total_sum = avg_train + total_sum
avg = total_sum/num
score=model.evaluate(x_test, y_test, verbose=2)
print(score)
print('the average is',avg)
i am trying to store each accuracy in a numpy array to be able using it in the summation operation but it gives me the following error
Traceback (most recent call last):
File "G:\Master Implementation\MLPADAM.py", line 87, in <module>
total_sum = avg_train + total_sum
TypeError: unsupported operand type(s) for +: 'History' and 'int'
There are several issues with your question...
To start with, your code will fit a model with 200 epochs 200 times, i.e. a total of 200*200 = 40,000 epochs.
Moreover, since model.fit in Keras is run incrementally, each call of model.fit in your loop will continue training from where the previous iteration stopped, so effectively at the end you will indeed have a model fitted with 40,000 epochs.
Assuming that this is not what you are trying to do, but you want simply the average accuracy during your training, the answer is to use the History object returned by model.fit; from the model.fit docs:
Returns
A History object. Its History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).
So, here is a quick demonstration with MNIST and only 5 epochs (and forget the for loop!):
# your model definition
# your model.compile()
batch_size = 128
epochs = 5
hist = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test) # optional
)
# output
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 76s - loss: 0.3367 - acc: 0.8974 - val_loss: 0.0765 - val_acc: 0.9742
Epoch 2/5
60000/60000 [==============================] - 73s - loss: 0.1164 - acc: 0.9656 - val_loss: 0.0516 - val_acc: 0.9835
Epoch 3/5
60000/60000 [==============================] - 74s - loss: 0.0866 - acc: 0.9741 - val_loss: 0.0411 - val_acc: 0.9863
Epoch 4/5
60000/60000 [==============================] - 73s - loss: 0.0730 - acc: 0.9781 - val_loss: 0.0376 - val_acc: 0.9871
Epoch 5/5
60000/60000 [==============================] - 73s - loss: 0.0639 - acc: 0.9810 - val_loss: 0.0354 - val_acc: 0.9881
hist.history is a dictionary containing the value of the metrics for each epoch:
hist.history
# result:
{'acc': [0.8973833333969117,
0.9656000000635783,
0.9740500000317891,
0.9780500000635783,
0.9810333334604899],
'loss': [0.3367467244784037,
0.11638248273332914,
0.08664042545557023,
0.07301943883101146,
0.06391783343354861],
'val_acc': [0.9742, 0.9835, 0.9863, 0.9871, 0.9881],
'val_loss': [0.07650674062222243,
0.051606363496184346,
0.04107686730045825,
0.03761903735231608,
0.03537947320453823]}
To get the training accuracy per epoch:
hist.history['acc']
# result:
[0.8973833333969117,
0.9656000000635783,
0.9740500000317891,
0.9780500000635783,
0.9810333334604899]
and the average value is simply
np.mean(hist.history['acc']) # numpy assumed imported as np
# 0.9592233334032695