I'm using Keras to classify EEG signals, which are processed with different feature extraction methods. I constantly run into an issue, where the Validation loss is way greater than 1 regardless of the method used. I am using Sparse Categorical Crossentropy as a loss function, and the ANN is a traditional resolution neural network with 50 layers (ResNet50). The validation set is made by choosing random samples from the training set. Hope someone has an idea and I really hope it is caused by a silly mistake.
Related
I'm using a relatively simple neural network with fully connected layers in keras. For some reason, the accuracy drastically increases basically to its final value after only one training epoch (likewise, the loss sharply decreases). I've tried architectures with larger and smaller numbers of hidden layers too. This network also performs poorly on the testing data, so I am trying to find a more optimal architecture or improve my training set accordingly.
It is trained on a set of 6500 1D array-like data, and I'm using a batch size of 512.
As said by Murilo, hard to say much without more information but it can come from multiple things:
Your network learns through the batches of each epoch, meaning that
your ~12 batches (6500/512) are already enough to learn a good bit of
classification.
Your weights are not really well initialized, and produce a huge
loss for the first epoch. The massive decrease in the loss is
actually the solver 'squishing' the weights. The best explanation I
found for this comes from A. Karpathy in his 'MakeMore' tutorial:
https://youtu.be/P6sfmUTpUmc?t=260
Now this sudden decrease of the loss is not extreme here (from 0.5 to 0.2) so I would not care much. I agree with Murilo that low accuracy in validation can come from too few samples in your validation set, or a bad shuffling between train and validation sets.
Today I was working on a classifier to detect whether or not a mushroom was poisonous given its features. The data was in a .csv file(read to a pandas DataFrame) and the link to the data can be found at the end.
I used sci-kit learn's train_test_split function to split the data into training and testing sets.
I then removed the column that specified whether or not the mushroom was poisonous or not for the training and testing labels and assigned this to a yTrain, and yTest variable.
I then applied a one-hot-encoding (Using pd.get_dummies()) to the data since the parameters were categorical.
After this, I normalized the training and testing input data.
Essentially the training and testing input data was a distinct list of one-hot-encoded parameters and the output data was a list of one's and zeroes representing the output(one meant poisonous, zero meant edible).
I used Keras and a simple-feed forward network for this project. This network is comprised of three layers; A simple Dense(Linear Layer for PyTorch users) layer with 300 neurons, a Dense layer with 100 neurons, and a Dense layer with two neurons, each representing the probability of whether or not the given parameters of the mushroom signified it was poisonous, or edible. Adam was the optimizer that I had used, and Sparse-Categorical-Crossentropy was my loss-function.
I trained my network for 60 epochs. After about 5 epochs the loss was basically zero, and my accuracy was 1. After training, I was worried that my network had overfitted, so I tried it on my distinct testing data. The results were the same as the training and validation data; the accuracy was at 100% and my loss was negligible.
My validation loss at the end of 50 epochs is 2.258996e-07, and my training loss is 1.998715e-07. My testing loss was 4.732502e-09. I am really confused at the state of this, is the loss supposed to be this low? I don't think I am overfitting, and my validation loss is only a bit higher than my training loss, so I don't think that I am underfitting, as well.
Do any of you know the answer to this question? I am sorry if I had messed up in a silly way of some sort.
Link to dataset: https://www.kaggle.com/uciml/mushroom-classification
It seems that that Kaggle dataset is solvable, in the sense that you can create a model which gives the correct answer 100% of the time (if these results are to be believed). If you look at those results, you can see that the author was actually able to find models which give 100% accuracy using several methods, including decisions trees.
I am building a predictive model where I want to know can I predict whether a package will be delivered on time (Binary Yes / No), in the event that the package is not delivered on time, I wish to be able to predict by when it will be delivered in categories of <7days, <14days, <21days >28days after expected date.
I have built and tested a model for binary classification and have got an f Score of 0.92, which is satisfactory for my needs. However, when I train my categorical model, I start to see training accuracy and validation accuracy diverge (training accuracy is much better than validation accuracy). This is a sign of overfitting.
However, I have tried regularization and different values, plus using dropout and different values, and the validation accuracy never gets above 0.7. My total training set is of ~10k examples, ~3k validation, and whilst the catgorical spread is not equal there are sufficient examples of each category (I think). I am using a NN and have increased / decreased both layers and activations and still no joy
Any thoughts on where to go next. Thanks
Because you are using NN, introduce dropout layers. See if it can help to reduce the overfitting problem. And also checkout this How to choose the number of hidden layers and nodes in a feedforward neural network?
The more complex the network (hidden layers, number of neurons in them), also contribute to overfitting problem
The approach we have chosen is to carry out a linear regression with the expected duration as target variable. We have excluded some outliers, and then taken the differences between the actual and predicted days. We then max'd and min'd the difference and we now have a prediction with a tolerable range. We will keep working on the other techniques to see if we can improve. Thanks to everyone who suggested ideas
I'm trying to do a binary classification with a Deep Neural Network (esp. VGG16) in Keras. Unfortunately I have a very imbalanced data-set (15.000/1.800 images) but just can't find a way to circumvent that..
Results that I'm seeing
(on training and validation data)
Recall = 1
Precision = 0.1208 (which is exactly the ratio between class 0 and class 1 samples)
AUC = 0.88 (after ~30 epochs with SGD, which seems to be 1 - Precision)
What I've done
Switching from loss/accuracy metrics to AUC with this little helper
Utilizing class_weight like described here which doesn't seem to help
Trying different optimizers (SGD, Adam, RMSProp)
Adding BatchNormalization layers to my (untrained) VGG16 and set use_bias to False on Convolutional Layers. See my whole network as a gist here.
Doing Augmentation to enlarge dataset with Keras inbuilt ImageDataGenerator.
What I think could help further (but did not try yet)
Doing more data augmentation for one class than the other. Unfortunately I'm using one ImageDataGenerator for my whole training data and I don't know how to augment one class more than the other.
Maybe a custom loss-function which penalises false decisions more? How would I implement that? Currently I'm just using binary_crossentropy.
Theoretically I could adjust the class-membership-threshold for prediction but that doesn't help with training and would not improve the result, right?
Maybe decrease batch-size like suggested here. But I don't really see why that should help. Currently I'm determining the batch-size programmatically to show all the training and validation data to the network in one epoch:
steps_per_epoch = int(len(train_gen.filenames) / args.batch_size)
validation_steps = int(len(val_gen.filenames) / args.batch_size)
What do you think should I tackle first or do you have a better idea? I'm also glad for every help with implementation details.
Thank you so much in advance!
Maybe try to prepare class-balanced batches ( includes doublings of class 1 ) like described in https://community.rstudio.com/t/ensure-balanced-mini-batches-while-training/7505 ( R Studio ). Also read Neural Network - Working with a imbalanced dataset and balancing an imbalanced dataset with keras image generator
Another possibility is to perform feature extraction in the pre-processing meaning letting run image processing algorithms over the images to highlight characteristic features
Can any one please help me out?
I am working on my thesis work. Its about Predicting Parkinson disease, Since i want to build an LSTM model to adapt independent of patients. Currently i have implemented it using TensorFlow with my own loss function.
Since i am planning to introduce both labeled train and unlabeled train data in every batch of data to train the model. I want to apply my own loss function on this both labeled and unlabeled train data and also want to apply cross entropy loss only on labeled train data. Can i do this in tensorflow?
So my question is, Can i have combination of loss functions in a single model training on different set of train data?
From an implementation perspective, the short answer would be yes. However, I believe your question could be more specific, maybe what you mean is whether you could do it with tf.estimator?