My val-accuracy is far lower than the training accuracy. What might be the reasons for this?
Thank you.
Seems your problem is all about overfitting. To understand what are the causes behind overfitting problem, first is to understand what is overfitting.
One resource is in the link about overfitting.
To eliminate this issue, there are several things you should check. Especially for your model:
1) Are you using dropout? Check dropout.
2) Are you using regularization? Check regularization.
Note: These two are one of the two important things to utilize. This list may be a lot longer if you dig deeper.
Furthermore, there may be some problems in your dataset. For example:
Your test-train split may be not suitable for your case.
Your dataset may be too small to train a network. Maybe you should generate or collect more data.(Eg: if you're classifying images, you can flip the images or use some augmentation techniques to artificially increase the size of your dataset.
My overall suggestion is to understand What are the main reasons causing overfitting in machine learning? since the given answers are so limited.
I think overfitting problem, try to generalize your model more by Regulating and using Dropout layers on. You can do another task, maybe there are periodic variation of your inputted datasets, so try to shuffle on your both train and text datasets. I think the problem will solve. Thank you
Related
I'm currently working on a image classification task, involving a large datasets of grayscale images of cartoons and my CNN needs to classify them. Atm my model has a test accuracy of about 88% but I know a higher accuracy is possible.
I've tried:
improving / changing the actual model / architecture
using different meta parameters
different loss functions from the pytorch libraries
a bunch of different transforms
different optimizes from torch.optim
I've also tried a bunch of the standard models included in torchvision.models and am still getting sub 90% accuracy on the test set.
Do I just need to keep trying the above things to squeeze out better accuracy or are there any other avenues I can try? Would really appreciate any suggestions, the only other thing I can think of would be making my own custom loss function specific for the data set but I'm not exactly sure how much that would help?
From what you've described, it sounds like it might be worth spending some time on the data preparation. Here is a good article on how to do that for images. Some ideas you could try are:
Resizing all your images to a fixed size
Subtracting mean pixel values, i.e. normalizing the dataset
I don't really know the context of what you're doing but I would also consider adding additional features that may be relevant and seeing if that helps.
When I run my NN the only way to get any training to occur is if I divide X by 1000. The network also needs to be trained under 70000 times with a 0.03 training rate and if those values are larger the NN gets worse. I think this is a due to bad processing of data and maybe the lack of having biases, but I don't really know.
Code on Google Colab
In short: all of the problems you mentioned and more.
Scaling is essential, typically to 0 mean and a variance of 1. Otherwise, you will quickly saturate the hidden units, their gradients will be near zero and (almost) no learning will be possible.
Bias is mandatory for such ANN. It's like an offset for fitting linear function. If you drop it, getting good fit will be very difficult.
You seem to be checking accuracy on your training data.
You have very few training samples.
Sigmoid is proven to be poor choice. Use ReLU and check e.g. here for explanation.
Also, I'd recommend spending some time on learning Python before going into this. For starter, avoid using global, it can get you unforeseen behaviour if you're not careful.
I have a dataset with 11k instances containing 0s,1s and -1s. I heard that deep learning can be applied to feature values.Hence applied the same for my dataset but surprisingly it resulted in less accuracy (<50%) compared to traditional machine learning algos (RF,SVM,ELM). Is it appropriate to apply deep learning algos to feature values for classification task? Any suggestion is greatly appreciated.
First of all, Deep Learning isn't a mythical hammer you can throw at every problem and expect better results. It requires careful analysis of your problem, choosing the right method, crafting your network, properly setting up your training, and only then, with a lot of luck will you see significantly better results than classical methods.
From what you describe (and without any more details about your implementation), it seems to me that there could have been several things going wrong:
Your task is simply not designed for a neural network. Some tasks are still better solved with classical methods, since they manually account for patterns in your data, or distill your advanced reasoning/knowledge into a prediction. You might not be directly aware of it, but sometimes neural networks are just overkill.
You don't describe how your 11000 instances are distributed with respect to the target classes, how big the input is, what kind of preprocessing you are performing for either method, etc, etc. Maybe your data is simply processed wrong, your training is diverging due to unfortunate parameter setups, or plenty of other things.
To expect a reasonable answer, you would have to share at least a bit of code regarding the implementation of your task, and parameters you are using for training.
I want to ask if there is an optimal sequence length of a LSTM network in general, or in terms of time series prediction problems?
I read about vanishing gradient or exploding gradient problems that very long RNN networks had and LSTM tried to solve and succeeded to a certain extent.
I also heard about techniques to handle very large sequences with LSTM’s and RNN’s in general like: truncating sequences, summarizing sequences, truncating backpropagation through time or even using an Encoder-Decoder architecture.
I asked this question because I didn’t find a research paper about this, only this blog post that stated an optimal sequence length between 10-30.
Do some Model Selection.
TLDR: Just try it out.
Because training is already very computationally expensive, the easiest way to calculate how successful a model would be is to test it out. The combination that works best cannot be easily predetermined, especially not with such a vague description (or no description at all) of how the actual problem looks like.
From this answer:
It totally depends on the nature of your data and the inner correlations, there is no rule of thumb. However, given that you have a large amount of data a 2-layer LSTM can model a large body of time series problems / benchmarks.
So in your case, you might want to try sequence lengths from 10 - 30. But I'd also try and evaluate how your training algorithm performs outside of that recommendation by the post you linked.
I'm working on a training a neural network model using Python and Keras library.
My model test accuracy is very low (60.0%) and I tried a lot to rise it, but I couldn't. I'm using DEAP dataset (total 32 participants) to train the model. The splitting technique that I'm using is a fixed one. It was as the followings:28 participants for training, 2 for validation and 2 for testing.
For the model I'm using is as follows.
sequential model
Optimizer = Adam
With L2_regularizer, Gaussian noise, dropout, and Batch normalization
Number of hidden layers = 3
Activation = relu
Compile loss = categorical_crossentropy
initializer = he_normal
Now, I'm using train-test technique (fixed one also) to split the data and I got better results. However, I figured out that some of the participants are affecting the training accuracy in a negative way. Thus, I want to know if there is a way to study the effect of the each data (participant) on the accuracy (performance) of a model?
Best Regards,
From my Starting deep learning hands-on: image classification on CIFAR-10 tutorial, in which I insist on keeping track of both:
global metrics (log-loss, accuracy),
examples (correctly and incorrectly classifies cases).
The later may help us telling which kinds of patterns are problematic, and on numerous occasions helped me with changing the network (or supplementing training data, if it was the case).
And example how does it work (here with Neptune, though you can do it manually in Jupyter Notebook, or using TensorBoard image channel):
And then looking at particular examples, along with the predicted probabilities:
Full disclaimer: I collaborate with deepsense.ai, the creators or Neptune - Machine Learning Lab.
This is, perhaps, more broad an answer than you may like, but I hope it'll be useful nevertheless.
Neural networks are great. I like them. But the vast majority of top-performance, hyper-tuned models are ensembles; use a combination of stats-on-crack techniques, neural networks among them. One of the main reasons for this is that some techniques handle some situations better. In your case, you've run into a situation for which I'd recommend exploring alternative techniques.
In the case of outliers, rigorous value analyses are the first line of defense. You might also consider using principle component analysis or linear discriminant analysis. You could also try to chase them out with density estimation or nearest neighbors. There are many other techniques for handling outliers, and hopefully you'll find the tools I've pointed to easily implemented (with help from their docs); sklearn tends to readily accept data prepared for Keras.