Neural network library for true-false based image recognition

Neural network library for true-false based image recognition - python

I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.

There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.

I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.

Related

Recognition of nipple exposure in the image, and Automatically cover nipple area

I'd like to implement something like the title, but I wonder if it's technically possible.
I know that it is possible to recognize pictures with CNN,
but I don't know if can be automatically covered nipple area.
If have library information about any related information,
I would like to get some advice.

CNNs are able to detect whatever you train them for, to varying degree of accuracy. What you would need are a lot of training samples (ie. samples of ground truths with the original image, and the labeled image) with which to train your models, and then some new data which you can test the accuracy of your model on. The point is, CNNs are not biased to innately learn a task, you have to tell them what to learn!
I can recommend the machine learning library Keras (https://keras.io/) if you plan to do some machine learning using CNNs, as it's pretty simple and somewhat beginner-friendly. Take some of the tutorials for CNNs, which are quite good.
Essentially, you have what I can only assume is a pretty niche problem. The main issue will come down to how much data you have to train your model. CNNs need a lot of training data, especially for a problem like this which isn't simple. A way which would make this simpler would be to have a model which detects the ahem area of interest and denotes it as such on a per-pixel basis. Then a simple mask could be applied to the source image to censor it. This relates to image segmentation, and there are many academic papers on the topic.

How to study the effect of each data on a deep neural network model?

I'm working on a training a neural network model using Python and Keras library.
My model test accuracy is very low (60.0%) and I tried a lot to rise it, but I couldn't. I'm using DEAP dataset (total 32 participants) to train the model. The splitting technique that I'm using is a fixed one. It was as the followings:28 participants for training, 2 for validation and 2 for testing.
For the model I'm using is as follows.
sequential model
Optimizer = Adam
With L2_regularizer, Gaussian noise, dropout, and Batch normalization
Number of hidden layers = 3
Activation = relu
Compile loss = categorical_crossentropy
initializer = he_normal
Now, I'm using train-test technique (fixed one also) to split the data and I got better results. However, I figured out that some of the participants are affecting the training accuracy in a negative way. Thus, I want to know if there is a way to study the effect of the each data (participant) on the accuracy (performance) of a model?
Best Regards,

From my Starting deep learning hands-on: image classification on CIFAR-10 tutorial, in which I insist on keeping track of both:
global metrics (log-loss, accuracy),
examples (correctly and incorrectly classifies cases).
The later may help us telling which kinds of patterns are problematic, and on numerous occasions helped me with changing the network (or supplementing training data, if it was the case).
And example how does it work (here with Neptune, though you can do it manually in Jupyter Notebook, or using TensorBoard image channel):
And then looking at particular examples, along with the predicted probabilities:
Full disclaimer: I collaborate with deepsense.ai, the creators or Neptune - Machine Learning Lab.

This is, perhaps, more broad an answer than you may like, but I hope it'll be useful nevertheless.
Neural networks are great. I like them. But the vast majority of top-performance, hyper-tuned models are ensembles; use a combination of stats-on-crack techniques, neural networks among them. One of the main reasons for this is that some techniques handle some situations better. In your case, you've run into a situation for which I'd recommend exploring alternative techniques.
In the case of outliers, rigorous value analyses are the first line of defense. You might also consider using principle component analysis or linear discriminant analysis. You could also try to chase them out with density estimation or nearest neighbors. There are many other techniques for handling outliers, and hopefully you'll find the tools I've pointed to easily implemented (with help from their docs); sklearn tends to readily accept data prepared for Keras.

Way to embed fixed length spectograms to tensor with CNN perhaps

I'm developing a way to compare two spectrograms and score their similarity.
I have been thinking for a long time how to do so, how to pick the whole model/approach.
Audioclips I'm using to make spectrograms are recordings from android phone, i convert them from .m4a to .wav and then process them to plot the spectrogram, all in python.
All audio recordings have same length
Thats something that really help because all the data can then be represented in the same dimensional space.
I filtered the audio using Butterworth Bandpass Filter, which is commonly used in voice filtering thanks to its steady behavior in the persisted part of signal. As cutoff freq i used 400Hz and 3500Hz
After this procedure the output looks like this
My first idea was to find region of interest using OpenCV on that spectrogram, so i filtered color and get this output, which can be roughly use to get the limits of the signal, but that will make every clip different lenght and i perhaps dont want that to happen
Now to get to my question - i was thinking about embedding those spectrograms to multidimensional points and simply score their accuracy as the distance to the most accurate sample, which would be visualisable thanks to dimensionality reduction in some cluster-like space. But thats seems to plain, doesnt involve training and thus making it hard to verify. SO
Is there any possibility to use Convolution Neural Network, or combination of networks like CNN -> delayed NN to embed this spectogram to multidim point and thus making it possible to not compare them directly but comparing output of the network?
If there is anything i missed in this question please comment, i would fix that right away, thank you very much for your time.
Josef K.
EDIT:
After tip from Nikolay Shmyrev i switched to using the Mel spectrogram:
That looks much more promising, but my question remains almost the same, can i use pretrained CNN models, like VGG16 to embed those spectrograms to tensors and thus being able to compare them ?? And if so, how? Just remove last fully connected layer and Flatten it instead?

In my opinion, and according to Yann Lecun, when you target speech recognition with Deep Neural Network you have two obligations:
You need to use a Recurrent Neural Network in order to have the memory ability (memory is really important for speech recognition...)
and
you will need a lot of training data 
You may try to use RNN on tensorflow, but you definitely need a lot of training data.
If you don't want (or can't) find or generate a lot training data, you have forget the deep learning to solve this ...
In that case (forget deep learning) you may take a look of how Shazam work (based on fingerprint algorithm)

You can use CNN of course, tensorflow has special classes for that for example as many other frameworks. You simply convert your image to a tensor and apply the network and as a result you get lower-dimensional vector you can compare.
You can train your own CNN too.
For best accuracy it is better to scale lower frequencies (bottom part) and compress higher frequencies in your picture since lower frequencies have more importance. You can read about Mel Scale for more information

Using Pybrain to detect malicious PDF files

I'm trying to make an ANN to classify a PDF file as either malicious or clean, by utilising the 26,000 PDF samples (both clean and malicious) found on contagiodump. For each PDF file, I used PDFid.py to parse the file and return a vector of 42 numbers. The 26000 vectors are then passed into pybrain; 50% for training and 50% for testing. This is my source code:
https://gist.github.com/sirpoot/6805938
After much tweaking with the dimensions and other parameters I managed to get a false positive rate of about 0.90%. This is my output:
https://gist.github.com/sirpoot/6805948
My question is, is there any explicit way for me to decrease the false positive rate further? What do I have to do to reduce the rate to perhaps 0.05%?

There are several things you can try to increase the accuracy of your neural network.
Use more of your data for training. This will permit the network to learn from a larger set of training samples. The drawback of this is that having a smaller test set will make your error measurements more noisy. As a rule of thumb, however, I find that 80%-90% of your data can be used in the training set, with the rest for test.
Augment your feature representation. I'm not familiar with PDFid.py, but it only returns ~40 values for a given PDF file. It's possible that there are many more than 40 features that might be relevant in determining whether a PDF is malicious, so you could conceivably use a different feature representation that includes more values to increase the accuracy of your model.
Note that this can potentially involve a lot of work -- feature engineering is difficult! One suggestion I have if you decide to go this route is to look at the PDF files that your model misclassifies, and try to get an intuitive idea of what went wrong with those files. If you can identify a common feature that they all share, you could try adding that feature to your input representation (giving you a vector of 43 values) and re-train your model.
Optimize the model hyperparameters. You could try training several different models using training parameters (momentum, learning rate, etc.) and architecture parameters (weight decay, number of hidden units, etc.) chosen randomly from some reasonable intervals. This is one way to do what is called "hyperparameter optimization" and, like feature engineering, it can involve a lot of work. However, unlike feature engineering, hyperparameter optimization can largely be done automatically and in parallel, provided you have access to a lot of processing cores.
Try a deeper model. Deep models have become quite "hot" in the machine learning literature recently, especially for speech processing and some types of image classification. By using stacked RBMs, a second-order learning method (PDF), or a different nonlinearity like a rectified linear activation function, then you can add multiple layers of hidden units to your model, and sometimes this will help improve your error rate.
These are the ones that come to mind right off the bat. Good luck !

Let me first say I am in no ways an expert in Neural Networks. But I played with pyBrain once and I used the .train() method in a while error < 0.001 loop to get the error rate I wanted. So you can try using all of them for training with that loop and test it with other files.

Can a neural network recognize a screen and replicate a finite set of actions?

I learned, that neural networks can replicate any function.
Normally the neural network is fed with a set of descriptors to its input neurons and then gives out a certain score at its output neuron. I want my neural network to recognize certain behaviours from a screen. Objects on the screen are already preprocessed and clearly visible, so recognition should not be a problem.
Is it possible to use the neural network to recognize a pixelated picture of the screen and make decisions on that basis? The amount of training data would be huge of course. Is there way to teach the ANN by online supervised learning?
Edit:
Because a commenter said the programming problem would be too general:
I would like to implement this in python first, to see if it works. If anyone could point me to a resource where i could do this online-learning thing with python, i would be grateful.

I would suggest
http://www.neuroforge.co.uk/index.php/getting-started-with-python-a-opencv
http://docs.opencv.org/doc/tutorials/ml/table_of_content_ml/table_of_content_ml.html
http://blog.damiles.com/2008/11/the-basic-patter-recognition-and-classification-with-opencv/
https://github.com/bytefish/machinelearning-opencv
openCV is basically an image processing library but also has some amazing helper classes that you you can use for almost any task. Its machine learning module is pretty easy to use and you can go through the source to see explanation and background theory about each function.
You could also use a pure python machine learning library like:
http://scikit-learn.org/stable/
But, before you feed in the data from your screen (i'm assuming thats in pixels?) to your ANN or SVM or whatever ML algorithm you choose, you need to perform "Feature Extraction" on your data. (which are the objects on the screen)
Feature Extraction can be thought of like representing the same data on the screen but with fewer numbers so i have less numbers to give to my ANN. You need to experiment with different features before you find a combination that works well for your particular scenario. a sample one could look something like this:
[x1,y1,x2,y2...,col]
This is basically a list of edge points that represent the area your object is in. a sort of ROI (Region of Interest) and perform egde detection, color detection and also extract any other relevant characteristics. The important thing is that now all your objects, their shape/color information is represented by a number of these lists, one for each object detected.
This is the data that can be provided as input to the neural network. but you'll have to define some meaningfull output parameters depending on your specific problem statements before you can train/test your system of course.
Hope this helps.

This is not entirely correct.
A 3-layer feedforward MLP can theoretically replicate any CONTINUOUS function.
If there are discontinuities, then you need a 4th layer.
Since you are dealing with pixelated screens and such, you probably would need to consider a fourth layer.
Finally, if you are looking at circular shapes, etc., than a radial basis function (RBF) network may be more suitable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.