I followed this tutorial https://medium.com/the-andela-way/deep-learning-hello-world-e1fc53ea888 to experiment with Keras, the source code is here https://github.com/sirghiny/mnist
However I received a very low score and the training was very short, as if the model is trained on very few samples. Here is the output in terminal:
Epoch 1/5
300/300 [==============================] - 12s 39ms/step - loss: 2.3791 - categorical_accuracy: 0.0899 - val_loss: 2.3104 - val_categorical_accuracy: 0.0528
Epoch 2/5
300/300 [==============================] - 11s 38ms/step - loss: 2.3326 - categorical_accuracy: 0.1060 - val_loss: 2.2920 - val_categorical_accuracy: 0.0864
Epoch 3/5
300/300 [==============================] - 10s 32ms/step - loss: 2.2891 - categorical_accuracy: 0.1315 - val_loss: 2.2742 - val_categorical_accuracy: 0.1571
Epoch 4/5
300/300 [==============================] - 9s 31ms/step - loss: 2.2510 - categorical_accuracy: 0.1576 - val_loss: 2.2569 - val_categorical_accuracy: 0.2367
Epoch 5/5
300/300 [==============================] - 9s 30ms/step - loss: 2.2174 - categorical_accuracy: 0.1889 - val_loss: 2.2397 - val_categorical_accuracy: 0.3133
Evaluating the model...
1250/1250 [==============================] - 2s 2ms/step - loss: 2.2382 - categorical_accuracy: 0.3171
938/938 [==============================] - 2s 2ms/step - loss: 2.2369 - categorical_accuracy: 0.3232
Please tell me what did i do wrong?
You updated your model weight only 1500 times (epochs*number_of_batch).
You might want to increase the epochs or/and reduce the batch_size to perform more weights' update as we see in your logs that the network is still learning.
Additionally, you should find an up-to-date tutorial like this one as TensorFlow changed a lot recently.
Related
The code and output when I execute once:
model.fit(X,y,validation_split=0.2, epochs=10, batch_size= 100)
Epoch 1/10
8/8 [==============================] - 1s 31ms/step - loss: 0.6233 - accuracy: 0.6259 - val_loss: 0.6333 - val_accuracy: 0.6461
Epoch 2/10
8/8 [==============================] - 0s 5ms/step - loss: 0.5443 - accuracy: 0.7722 - val_loss: 0.4803 - val_accuracy: 0.7978
Epoch 3/10
8/8 [==============================] - 0s 4ms/step - loss: 0.5385 - accuracy: 0.7904 - val_loss: 0.4465 - val_accuracy: 0.8202
Epoch 4/10
8/8 [==============================] - 0s 5ms/step - loss: 0.5014 - accuracy: 0.7932 - val_loss: 0.5228 - val_accuracy: 0.7753
Epoch 5/10
8/8 [==============================] - 0s 4ms/step - loss: 0.5283 - accuracy: 0.7736 - val_loss: 0.4284 - val_accuracy: 0.8315
Epoch 6/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4936 - accuracy: 0.7989 - val_loss: 0.4309 - val_accuracy: 0.8539
Epoch 7/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4700 - accuracy: 0.8045 - val_loss: 0.4622 - val_accuracy: 0.8146
Epoch 8/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4732 - accuracy: 0.8087 - val_loss: 0.4159 - val_accuracy: 0.8202
Epoch 9/10
8/8 [==============================] - 0s 5ms/step - loss: 0.5623 - accuracy: 0.7764 - val_loss: 0.7438 - val_accuracy: 0.8090
Epoch 10/10
8/8 [==============================] - 0s 4ms/step - loss: 0.5886 - accuracy: 0.7806 - val_loss: 0.5889 - val_accuracy: 0.6798
Output when I execute the same line of code again in jupyter lab:
Epoch 1/10
8/8 [==============================] - 0s 9ms/step - loss: 0.5269 - accuracy: 0.7496 - val_loss: 0.4568 - val_accuracy: 0.8371
Epoch 2/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4688 - accuracy: 0.8087 - val_loss: 0.4885 - val_accuracy: 0.7753
Epoch 3/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4597 - accuracy: 0.8017 - val_loss: 0.4638 - val_accuracy: 0.7865
Epoch 4/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4741 - accuracy: 0.7890 - val_loss: 0.4277 - val_accuracy: 0.8258
Epoch 5/10
8/8 [==============================] - 0s 5ms/step - loss: 0.4840 - accuracy: 0.8003 - val_loss: 0.4712 - val_accuracy: 0.7978
Epoch 6/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4488 - accuracy: 0.8087 - val_loss: 0.4825 - val_accuracy: 0.7809
Epoch 7/10
8/8 [==============================] - 0s 5ms/step - loss: 0.4432 - accuracy: 0.8087 - val_loss: 0.4865 - val_accuracy: 0.8090
Epoch 8/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4299 - accuracy: 0.8059 - val_loss: 0.4458 - val_accuracy: 0.8371
Epoch 9/10
8/8 [==============================] - 0s 4ms/step - loss: 0.4358 - accuracy: 0.8172 - val_loss: 0.5232 - val_accuracy: 0.8034
Epoch 10/10
8/8 [==============================] - 0s 5ms/step - loss: 0.4697 - accuracy: 0.8059 - val_loss: 0.4421 - val_accuracy: 0.8202
It continues the previous fit, and my question is: how can I make it start from the beginning again? without having to create a new model, so the second time I execute the line of code is independent of the first one
This is a little bit tricky without being able to see the code to initialise the model, and not sure why you'd need to reset the weights without re-initialising the model.
If you save the weights of your model before training, you can then then reset to those initial weights before you train again.
modelWeights = model.get_weights()
model.set_weights(modelWeights)
I am using Tensorflow 2.3 and trying to save model checkpoint after n number of epochs. n can be anything but for now trying with 10
Per this thread, I tried save_freq = 'epoch' and period = 10 which works but since period parameter is deprecated, I wanted to try an alternative approach.
HEIGHT = 256
WIDTH = 256
CHANNELS = 3
EPOCHS = 100
BATCH_SIZE = 1
SAVE_PERIOD = 10
n_monet_samples = 21
checkpoint_filepath = "./model_checkpoints/cyclegan_checkpoints.{epoch:03d}"
model_checkpoint_callback = callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_freq=SAVE_PERIOD * (n_monet_samples//BATCH_SIZE)
)
If I use save_freq=SAVE_PERIOD * (n_monet_samples//BATCH_SIZE) for the checkpoint callback definition, I get error
ValueError: Unrecognized save_freq: 210
I am not sure why since per Keras callback code, as long as save_freq is in epochs or in integer, it should be good.
Please suggest.
It does not show any error to me when I tried the same code in same Tensorflow version==2.3:
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
BATCH_SIZE = 1
SAVE_PERIOD = 10
n_monet_samples = 21
# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1, save_freq=SAVE_PERIOD * (n_monet_samples//BATCH_SIZE))
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=20,
validation_data=(test_images, test_labels),
callbacks=[cp_callback])
Output:
Epoch 1/20
32/32 [==============================] - 0s 14ms/step - loss: 1.1152 - sparse_categorical_accuracy: 0.6890 - val_loss: 0.6934 - val_sparse_categorical_accuracy: 0.7940
Epoch 2/20
32/32 [==============================] - 0s 9ms/step - loss: 0.4154 - sparse_categorical_accuracy: 0.8840 - val_loss: 0.5317 - val_sparse_categorical_accuracy: 0.8330
Epoch 3/20
32/32 [==============================] - 0s 8ms/step - loss: 0.2787 - sparse_categorical_accuracy: 0.9270 - val_loss: 0.4854 - val_sparse_categorical_accuracy: 0.8400
Epoch 4/20
32/32 [==============================] - 0s 8ms/step - loss: 0.2230 - sparse_categorical_accuracy: 0.9420 - val_loss: 0.4525 - val_sparse_categorical_accuracy: 0.8590
Epoch 5/20
32/32 [==============================] - 0s 10ms/step - loss: 0.1549 - sparse_categorical_accuracy: 0.9620 - val_loss: 0.4275 - val_sparse_categorical_accuracy: 0.8650
Epoch 6/20
32/32 [==============================] - 0s 10ms/step - loss: 0.1110 - sparse_categorical_accuracy: 0.9770 - val_loss: 0.4251 - val_sparse_categorical_accuracy: 0.8630
Epoch 7/20
11/32 [=========>....................] - ETA: 0s - loss: 0.0936 - sparse_categorical_accuracy: 0.9886
Epoch 00007: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 14ms/step - loss: 0.0807 - sparse_categorical_accuracy: 0.9840 - val_loss: 0.4248 - val_sparse_categorical_accuracy: 0.8610
Epoch 8/20
32/32 [==============================] - 0s 10ms/step - loss: 0.0612 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.4058 - val_sparse_categorical_accuracy: 0.8650
Epoch 9/20
32/32 [==============================] - 0s 8ms/step - loss: 0.0489 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.4393 - val_sparse_categorical_accuracy: 0.8610
Epoch 10/20
32/32 [==============================] - 0s 6ms/step - loss: 0.0361 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4150 - val_sparse_categorical_accuracy: 0.8620
Epoch 11/20
32/32 [==============================] - 0s 10ms/step - loss: 0.0294 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4090 - val_sparse_categorical_accuracy: 0.8670
Epoch 12/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0272 - sparse_categorical_accuracy: 0.9990 - val_loss: 0.4365 - val_sparse_categorical_accuracy: 0.8600
Epoch 13/20
32/32 [==============================] - 0s 8ms/step - loss: 0.0203 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4231 - val_sparse_categorical_accuracy: 0.8620
Epoch 14/20
1/32 [..............................] - ETA: 0s - loss: 0.0115 - sparse_categorical_accuracy: 1.0000
Epoch 00014: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 9ms/step - loss: 0.0164 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4263 - val_sparse_categorical_accuracy: 0.8650
Epoch 15/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0128 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4260 - val_sparse_categorical_accuracy: 0.8690
Epoch 16/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0120 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4194 - val_sparse_categorical_accuracy: 0.8740
Epoch 17/20
32/32 [==============================] - 0s 9ms/step - loss: 0.0110 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4302 - val_sparse_categorical_accuracy: 0.8710
Epoch 18/20
32/32 [==============================] - 0s 6ms/step - loss: 0.0090 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4331 - val_sparse_categorical_accuracy: 0.8660
Epoch 19/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0084 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4320 - val_sparse_categorical_accuracy: 0.8760
Epoch 20/20
16/32 [==============>...............] - ETA: 0s - loss: 0.0074 - sparse_categorical_accuracy: 1.0000
Epoch 00020: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 13ms/step - loss: 0.0072 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4280 - val_sparse_categorical_accuracy: 0.8750
<tensorflow.python.keras.callbacks.History at 0x7f90f0082cd0>
As you already know save_freq is equal to 'epoch' or integer. When using 'epoch', the callback saves the model after each epoch. When using integer, the callback saves the model at end of theses many batches(end of these many steps_per_epoch).
As above definition of save_freq, checkpoints saves every after 210 steps.
Please check this for more details on ModelCheckpoint Arguments.
CNN is:
input: 2 image 128x128: input1 and input2
output: CNN return a Trasformation Matrix:
outLayer = Layers.Dense(3 * 2, activation=activations.linear,kernel_initializer="zeos",bias_initializer=output_bias)(d2)
After I:
append STN (https://github.com/kevinzakka/spatial-transformer-network) for translate and rotate input1.
The complete network is CNN with STN. Return an image (128x128) with 32 batch
My loss is MSE between input2 and output (CNN): target error registration. MSE between 2 image (input2 and output)
The network must find the transformation matrix because input2 = output
Can I use STN only for translate and rotate input1? Because I have problem with val_loss (use tensorflow):
Epoch 1/90
30/30 [==============================] - 81s 3s/step - loss: 16.0190 - val_loss: 24.3248
Epoch 2/90
30/30 [==============================] - 79s 3s/step - loss: 13.9868 - val_loss: 21.4465
Epoch 3/90
30/30 [==============================] - 73s 2s/step - loss: 13.2970 - val_loss: 21.3151
Epoch 4/90
30/30 [==============================] - 69s 2s/step - loss: 12.9244 - val_loss: 21.6154
Epoch 5/90
30/30 [==============================] - 67s 2s/step - loss: 12.6868 - val_loss: 20.0113
Epoch 6/90
30/30 [==============================] - 66s 2s/step - loss: 12.4998 - val_loss: 20.8911
Epoch 7/90
30/30 [==============================] - 69s 2s/step - loss: 12.3066 - val_loss: 21.4276
Epoch 8/90
30/30 [==============================] - 67s 2s/step - loss: 12.1034 - val_loss: 21.3593
Epoch 9/90
30/30 [==============================] - 69s 2s/step - loss: 11.7645 - val_loss: 20.8941
Epoch 10/90
30/30 [==============================] - 67s 2s/step - loss: 11.6544 - val_loss: 20.4768
Epoch 11/90
30/30 [==============================] - 70s 2s/step - loss: 11.5483 - val_loss: 21.7420
Epoch 12/90
30/30 [==============================] - 68s 2s/step - loss: 11.5680 - val_loss: 19.4531
Here is my code and result of the training.
batch_size = 100
epochs = 50
yale_history = yale_classifier.fit(x_train, y_train_oh,batch_size=batch_size,epochs=epochs,validation_data=(x_train,y_train_oh))
Epoch 1/50
20/20 [==============================] - 32s 2s/step - loss: 3.9801 - accuracy: 0.2071 - val_loss: 3.6919 - val_accuracy: 0.0245
Epoch 2/50
20/20 [==============================] - 30s 2s/step - loss: 1.2557 - accuracy: 0.6847 - val_loss: 4.1914 - val_accuracy: 0.0245
Epoch 3/50
20/20 [==============================] - 30s 2s/step - loss: 0.4408 - accuracy: 0.8954 - val_loss: 4.6284 - val_accuracy: 0.0245
Epoch 4/50
20/20 [==============================] - 30s 2s/step - loss: 0.1822 - accuracy: 0.9592 - val_loss: 4.9481 - val_accuracy: 0.0398
Epoch 5/50
20/20 [==============================] - 30s 2s/step - loss: 0.1252 - accuracy: 0.9760 - val_loss: 5.3728 - val_accuracy: 0.0276
Epoch 6/50
20/20 [==============================] - 30s 2s/step - loss: 0.0927 - accuracy: 0.9816 - val_loss: 5.7009 - val_accuracy: 0.0260
Epoch 7/50
20/20 [==============================] - 30s 2s/step - loss: 0.0858 - accuracy: 0.9837 - val_loss: 6.0049 - val_accuracy: 0.0260
Epoch 8/50
20/20 [==============================] - 30s 2s/step - loss: 0.0646 - accuracy: 0.9867 - val_loss: 6.3786 - val_accuracy: 0.0260
Epoch 9/50
20/20 [==============================] - 30s 2s/step - loss: 0.0489 - accuracy: 0.9898 - val_loss: 6.5156 - val_accuracy: 0.0260
You can see that I also used the training data as the validation data. This is weird that the training loss is not the same as the validation loss. Further, when I evaluated it, seem like my model was not trained at all as follow.
yale_classifier.evaluate(x_train, y_train_oh)
62/62 [==============================] - 6s 96ms/step - loss: 7.1123 - accuracy: 0.0260
[7.112329483032227, 0.026020407676696777]
Do you have any recommened to solve this problem ?
I am new to deep learning however I know that usually, the loss function output would be in digits like 0.4, 0.6 something like that but the loss values I get look a bit different. Could some please tell me what is happening?
In the epoch, we can see the loss numbers are looking like this 4736.9226.
code:
#3d model
from keras.layers import Conv3D, MaxPooling3D, BatchNormalization, Dropout, Dense, Flatten, concatenate
from keras.models import Model
from keras import Input
# 3D Convolutional Model:
input_model=Input(shape=(10,250,250,1))
layer=Conv3D(32,(3,3,3),strides=(1,1,1),activation='relu')(input_model)
layer=MaxPooling3D((2,2,2))(layer)
layer=Conv3D(64,(3,3,3),strides=(1,1,1),activation='relu')(layer)
layer=MaxPooling3D((2,2,2))(layer)
layer=BatchNormalization()(layer)
layer=Flatten()(layer)
layer=Dense(128,activation='relu')(layer)
layer=Dropout(0.1)(layer)
layer=Dense(64,activation='relu')(layer)
layer=Dropout(0.1)(layer)
layer=Dense(32,activation='relu')(layer)
layer=Dropout(0.1)(layer)
layer_output=Dense(2,activation='softmax')(layer)
model_3dConv=Model(input_model,layer_output)
model_3dConv.summary()
epoch:
Epoch 1/10
13/13 [==============================] - 91s 7s/step - loss: 4736.9226 - acc: 0.4918 - val_loss: 387258.4062 - val_acc: 0.5625
Epoch 2/10
13/13 [==============================] - 90s 7s/step - loss: 4021.6621 - acc: 0.5050 - val_loss: 246713.4844 - val_acc: 0.5625
Epoch 3/10
13/13 [==============================] - 89s 7s/step - loss: 3532.2977 - acc: 0.5936 - val_loss: 166724.2500 - val_acc: 0.5625
Whereas if I use 2d model this does not happen.
Update after 10 epoch:
[![Epoch 1/10
13/13 \[==============================\] - 91s 7s/step - loss: 4736.9226 - acc: 0.4918 - val_loss: 387258.4062 - val_acc: 0.5625
Epoch 2/10
13/13 \[==============================\] - 90s 7s/step - loss: 4021.6621 - acc: 0.5050 - val_loss: 246713.4844 - val_acc: 0.5625
Epoch 3/10
13/13 \[==============================\] - 89s 7s/step - loss: 3532.2977 - acc: 0.5936 - val_loss: 166724.2500 - val_acc: 0.5625
Epoch 4/10
13/13 \[==============================\] - 94s 7s/step - loss: 2712.2616 - acc: 0.5445 - val_loss: 112906.0078 - val_acc: 0.5625
Epoch 5/10
13/13 \[==============================\] - 89s 7s/step - loss: 3779.4980 - acc: 0.5557 - val_loss: 75516.1641 - val_acc: 0.5625
Epoch 6/10
13/13 \[==============================\] - 89s 7s/step - loss: 3778.8524 - acc: 0.5036 - val_loss: 53132.3477 - val_acc: 0.6875
Epoch 7/10
13/13 \[==============================\] - 91s 7s/step - loss: 3544.4086 - acc: 0.4869 - val_loss: 36817.3906 - val_acc: 0.6875][1]][1]
There is absolutely no reason the loss function output would be in digits like 0.4, 0.6 something. I'm not sure what loss function your using here but for example the loss function mse can be any positive number from [0, inf). All the matters is your loss/val_loss should be going down each epoch, which it looks like your is doing.
Train for the full 10 epochs and check that val_loss is dropping and hopefully val_acc is going up.