I'm building a CNN model using Tensorflow, without the use of any frontend APIs such as Keras. I'm creating a VGG-16 model and using the pre-trained weights, and want to fine tune the last layers to serve my purpose.
Following the tutorial here, http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
I re-created the training script and modified as per my requirements. However, my training does not happen and the training accuracy is stuck at 50.00% and validation accuracy is forming a pattern repeating the numbers.
Attached is the screenshot of the same.
I have been stuck on this for days now and can't seem to find the error. Any help is appreciated.
The code is pretty long and hence here is the gist file for the same
Your cross entropy is wrong, you are comparing your logits with the softmax of your logits.
This:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_pred)
Should be:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
Some things to note. I would not train on some data point and then evaluate on the same datapoint. Your training accuracy is probably going to be biased by doing so. Another point to note ist that tf.argmax(tf.softmax(logits)) is the same as tf.argmax(logits).
Related
I am building a two-layer neural network from scratch on the Fashion MNIST dataset. In between, using the RELU as activation and on the last layer, I am using softmax cross entropy. I am getting the below learning curve between train and validation accuracy which is wrong obviously. But if you see my loss curve, it's decreasing but my model is not learning. I am not able to my head around where I am going wrong. Could anyone explain these two graphs, like where I could be possibly going wrong?
I don't know exactly what you are doing, and I don't know anything about your architecture, but it's wrong to use ReLU on the last layer.
Usually you leave the last layer as linear (no activation). This will produce the logits that enter the Softmax. The output of the softmax will try to approximate the probability distribution on the classes.
This could be a reason for your results.
I am creating a CNN using TensorFlow and when training, I find that the training dataset is still improving (i.e. loss still decreasing), while the test/validation dataset has converged and is no longer improving. (Learning Curve Plot attached below)
Does anyone know why this might be the case and how could I possibly fix it, to have the validation loss reduce along with the training? Would be greatly appreciated!
Plot of my models learning curve:
The plot of losses is very typical. Your model appears to be performing very well with very low MSE losss. At this point you have essentially reached the limits of your models performance. One thing which may help is to use an adjustable learning rate. The Keras callback ReduceLROnPlateau can be setup to monitor the validation loss. If the validation loss fails to decrease for a 'patience' number of epochs the learning rate will be reduced by a factor "factor" where factor is a number less than 1. Documentation is here.
You may also want to use the Keras EarlyStopping callback. This callback can be set to monitor validation loss and halt training if it fails to decrease for "patience" number of epochs. If you set restore_best_weights=True it will leave your model with the weights used in the epoch with the lowest validation loss. This will prevent your model from returning an over fit model. My recommended code is shown below
rlronp=f.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=1)
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=3,restore_best_weights=True)
callbacks=[rlronp, estop]
In model.fit include callbacks=callbacks. I suspect neither of the above will provide much improvement. You will probably have to try some changes to your model as well. Adding a Dropout layer may help to some degree to reduce over-fitting as would including regularization. Documentation for that is here.. Of course the standard approach of getting a larger data set may also help but is not always easy to achieve. If you are working with images you could try image augmentation using say the Keras ImageDataGenerator or Tensorflow Image Augmentation layers. Documentation for that is here.. One thing I found which helps for the case of images is to crop your images to just the Region of Interest (ROI). For example if you were doing face recognition cropping the images to just be of the face will help significantly.
this means you're hitting your architecture's limit, training loss will keep decreasing (this is known as overfitting), which will eventually INCREASE validation loss, make changes to the parameters or consider altering your layers (adding, removing, etc.), maybe even look into ways you could alter the dataset.
when this happened to me a while ago I added a LSTM layer to my CNN architecture and also incorporated K-means validation, this is not a walkthrough, you need to figure this out for your specific problem, good luck.
I'm trying to transfer learning VGG16 model with imagenet in a dataset of retinal images but i'm confused to get a graph like this, I don't know why the validation accuracy didn't increase in a normal way over the epochs, like training accuracy did, is it an index of overfitting ? if yes, how can i overcome it ?
My first suggestion would be to use a ResNet (network that contains residual connections) as a first step towards the improvement, in order to avoid the vanishing gradient problem.
VGGs have become less used are no longer relevant for benchmarking. What you should use instead is a ResNet50, which is available in tensorflow.keras.applications alongside other relevant pre-trained neural networks.
Also, the validation accuracy fluctuates very much; in addition to the previously mentioned possible improvement, you may want to recheck the construction of your training and validation sets.
I'm trying to do a binary classification with a Deep Neural Network (esp. VGG16) in Keras. Unfortunately I have a very imbalanced data-set (15.000/1.800 images) but just can't find a way to circumvent that..
Results that I'm seeing
(on training and validation data)
Recall = 1
Precision = 0.1208 (which is exactly the ratio between class 0 and class 1 samples)
AUC = 0.88 (after ~30 epochs with SGD, which seems to be 1 - Precision)
What I've done
Switching from loss/accuracy metrics to AUC with this little helper
Utilizing class_weight like described here which doesn't seem to help
Trying different optimizers (SGD, Adam, RMSProp)
Adding BatchNormalization layers to my (untrained) VGG16 and set use_bias to False on Convolutional Layers. See my whole network as a gist here.
Doing Augmentation to enlarge dataset with Keras inbuilt ImageDataGenerator.
What I think could help further (but did not try yet)
Doing more data augmentation for one class than the other. Unfortunately I'm using one ImageDataGenerator for my whole training data and I don't know how to augment one class more than the other.
Maybe a custom loss-function which penalises false decisions more? How would I implement that? Currently I'm just using binary_crossentropy.
Theoretically I could adjust the class-membership-threshold for prediction but that doesn't help with training and would not improve the result, right?
Maybe decrease batch-size like suggested here. But I don't really see why that should help. Currently I'm determining the batch-size programmatically to show all the training and validation data to the network in one epoch:
steps_per_epoch = int(len(train_gen.filenames) / args.batch_size)
validation_steps = int(len(val_gen.filenames) / args.batch_size)
What do you think should I tackle first or do you have a better idea? I'm also glad for every help with implementation details.
Thank you so much in advance!
Maybe try to prepare class-balanced batches ( includes doublings of class 1 ) like described in https://community.rstudio.com/t/ensure-balanced-mini-batches-while-training/7505 ( R Studio ). Also read Neural Network - Working with a imbalanced dataset and balancing an imbalanced dataset with keras image generator
Another possibility is to perform feature extraction in the pre-processing meaning letting run image processing algorithms over the images to highlight characteristic features
Recently, I use keras to train a network to classify pictures, and use the keras function model.fit_generator() to fit my model. The fit_generator() will automatically run the model in validation data and return a validation accuracy when finish a epoch.
But odd thing happened, when I used the model to predict the validation data and compared the results with the correct class, the validation accuracy is lower than what I get when use the fit_generator().
I have two assumptions:
1. I use a generator to get data from dictionary, so I assume in one single epoch, the generator may repeatedly fetch data which is highly fitted to the model, so that the accuracy may be higher.
2. keras may use some tricks or preprocess the data when do validation, thus enhance the accuracy.
I tried to look through the source code and document of keras, but nothing helped. I would be very thankful if anyone could give me some advice about the problem.