Methods for increasing accuracying of a CNN for image classification - python

I'm currently working on a image classification task, involving a large datasets of grayscale images of cartoons and my CNN needs to classify them. Atm my model has a test accuracy of about 88% but I know a higher accuracy is possible.
I've tried:
improving / changing the actual model / architecture
using different meta parameters
different loss functions from the pytorch libraries
a bunch of different transforms
different optimizes from torch.optim
I've also tried a bunch of the standard models included in torchvision.models and am still getting sub 90% accuracy on the test set.
Do I just need to keep trying the above things to squeeze out better accuracy or are there any other avenues I can try? Would really appreciate any suggestions, the only other thing I can think of would be making my own custom loss function specific for the data set but I'm not exactly sure how much that would help?

From what you've described, it sounds like it might be worth spending some time on the data preparation. Here is a good article on how to do that for images. Some ideas you could try are:
Resizing all your images to a fixed size
Subtracting mean pixel values, i.e. normalizing the dataset
I don't really know the context of what you're doing but I would also consider adding additional features that may be relevant and seeing if that helps.

Related

Training Independent Object Detection Models

I'm working on a research project for detecting and segmenting two different defects in a material given an input image of such material.
I started by focusing on one defect since it was predominant in the training set. I implemented the MaskRCNN (Matterport) model and adapted for PNG annotation masks. It works really well after spending some time fine tuning it.
It might be naive/easy for most of you but my question is:
Is it preferable/advantageous to train independent models for each class of objects (two models, each for each defect) ifvcomputational time/power is not a limitation? I would feed an image to both models in parallel and one would return the instances of defect 1 and the other the instances of defect 2.
The reason for this question is that I have the feeling that if you train a single model for multi-class detection it can happen that when trying to minimize losses, since you are optimizing the overall loss, you are optimizing weights for working fine for both classes but you are not optimizing the weights and losses separately for each class and you might loss some detection/segmentation accuracy.
A common approach would be to try both alternatives: 1. single model for both classes and 2. two independent models for two classes.
I will eventually implement both alternatives and compare them. However, I want to know if the second alternative has already been tested and what has been the experience in order to properly justify this alternative if a paper comes out of this research.
In most of the cases if you train a separate model for each of the classes it would improve the performance when you have many classes and computation resources is not an issue. But as I guess, you have only two classes, so by training two different models you would not see much improvements in the accuracy. You can try both approaches but you will be beneficial when have many classes to detect.

Is a VGG-based CNN model sometimes better for image classfication than a modern architecture?

I have an image classification task to solve, but based on quite simple/good terms:
There are only two classes (either good or not good)
The images always show the same kind of piece (either with or w/o fault)
That piece is always filmed from the same angle & distance
I have at least 1000 sample images for both classes
So I thought it should be easy to come up with a good CNN solution - and it was. I created a VGG16-based model with a custom classifier (Keras/TF). Via transfer learning I was able to achieve up to 100% validation accuracy during model training, so all is fine on that end.
Out of curiosity and because the VGG-based approach seems a bit "slow", I also wanted to try it with a more modern model architecture as base, so I did with ResNet50v2 and Xception. I trained it similar to the VGG-based model, tried it with several hyperparameter modifications etc. However, I was not able to achieve a better validation accuracy then 95% - so much worse than with the "old" VGG architecture.
Hence my question is: Given these "simple" (always the same) images and only two classes, is the VGG model probably a better base than a modern network like ResNet or Xception? Or is it more likely that I messed something up with my model or simply got the training / hyperparameters not right?

How should I approach a 300 classes classification machine learning problem?

I am trying to make a Multi-Class classification application, but my dataset has 300 classes, is it possible to train my model with all these classes with a normal PC?
Sure it is. You can even train imagenet with 1000 categories or more, if you have enough time! ;)
You just have to think about which loss function you want (categorical crossentropy, sparse categorical crossentropy or even binary if you want to penalize each output node independently), apart from that there's not really much difference between 10, 100 or a 1000 classes.
Of course you have to increase your model size to account for more classes, so RAM may be an issue, but then you can always decrease batch size. If you are using images and convnets and your model is still too large, try to downsample the images, use pooling layers or larger strides.
If your computer is too old and slow, you can also try Google Colab which offers free GPU and even TPU online!
It is difficult to answer this question. The training time of your model depends on a number of factors. It might be best to train your model for a certain amount of hours and evaluate the performance. You could make use of fitting a learning curve, which could provide an esitmation of how many data points your require for training to achieve a certain performance. After that you could link the required amount of data points to computation time.
Here is an article provides an algorithm for fitting a learning curve: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307431/.

Neural network library for true-false based image recognition

I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.
There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.
I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.

Using Pybrain to detect malicious PDF files

I'm trying to make an ANN to classify a PDF file as either malicious or clean, by utilising the 26,000 PDF samples (both clean and malicious) found on contagiodump. For each PDF file, I used PDFid.py to parse the file and return a vector of 42 numbers. The 26000 vectors are then passed into pybrain; 50% for training and 50% for testing. This is my source code:
https://gist.github.com/sirpoot/6805938
After much tweaking with the dimensions and other parameters I managed to get a false positive rate of about 0.90%. This is my output:
https://gist.github.com/sirpoot/6805948
My question is, is there any explicit way for me to decrease the false positive rate further? What do I have to do to reduce the rate to perhaps 0.05%?
There are several things you can try to increase the accuracy of your neural network.
Use more of your data for training. This will permit the network to learn from a larger set of training samples. The drawback of this is that having a smaller test set will make your error measurements more noisy. As a rule of thumb, however, I find that 80%-90% of your data can be used in the training set, with the rest for test.
Augment your feature representation. I'm not familiar with PDFid.py, but it only returns ~40 values for a given PDF file. It's possible that there are many more than 40 features that might be relevant in determining whether a PDF is malicious, so you could conceivably use a different feature representation that includes more values to increase the accuracy of your model.
Note that this can potentially involve a lot of work -- feature engineering is difficult! One suggestion I have if you decide to go this route is to look at the PDF files that your model misclassifies, and try to get an intuitive idea of what went wrong with those files. If you can identify a common feature that they all share, you could try adding that feature to your input representation (giving you a vector of 43 values) and re-train your model.
Optimize the model hyperparameters. You could try training several different models using training parameters (momentum, learning rate, etc.) and architecture parameters (weight decay, number of hidden units, etc.) chosen randomly from some reasonable intervals. This is one way to do what is called "hyperparameter optimization" and, like feature engineering, it can involve a lot of work. However, unlike feature engineering, hyperparameter optimization can largely be done automatically and in parallel, provided you have access to a lot of processing cores.
Try a deeper model. Deep models have become quite "hot" in the machine learning literature recently, especially for speech processing and some types of image classification. By using stacked RBMs, a second-order learning method (PDF), or a different nonlinearity like a rectified linear activation function, then you can add multiple layers of hidden units to your model, and sometimes this will help improve your error rate.
These are the ones that come to mind right off the bat. Good luck !
Let me first say I am in no ways an expert in Neural Networks. But I played with pyBrain once and I used the .train() method in a while error < 0.001 loop to get the error rate I wanted. So you can try using all of them for training with that loop and test it with other files.

Categories

Resources