My final goal is to dynamically recognize different object which is 2D and often has same appearance(2D deck game) in video. I was studying opencv-python tutorial, but there aren't any topic about this, so I want to know what topic, library or function should I learn to reach my goal. Thanks.
You can try several Machine Learning techniques. You may get inspiration from Viola-Jones or Histograms of Oriented Gradients + SVM algorithms (even though those algorithms solve a problem that may differ yours, I had plenty of insights from them). In other words, try "sliding" a window along a horizontal and vertical axes of predefined aspect ratio and try to recognize the Region Of Interest with a model of your choice (CNN, SVM, Logistic Regression etc.). But the problem may be that you will need to train a model, which may require a lot of data.
Or you can do a template matching, which is more of Image Processing problem rather than Machine Learning. It would not require dataset and training, but it will be sensitive to noises, lighting, and position.
Good luck!
Related
As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.
i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .
As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.
I do side work writing/improving a research project web application for some political scientists. This application collects articles pertaining to the U.S. Supreme Court and runs analysis on them, and after nearly a year and half, we have a database of around 10,000 articles (and growing) to work with.
One of the primary challenges of the project is being able to determine the "relevancy" of an article - that is, the primary focus is the federal U.S. Supreme Court (and/or its justices), and not a local or foreign supreme court. Since its inception, the way we've addressed it is to primarily parse the title for various explicit references to the federal court, as well as to verify that "supreme" and "court" are keywords collected from the article text. Basic and sloppy, but it actually works fairly well. That being said, irrelevant articles can find their way into the database - usually ones with headlines that don't explicitly mention a state or foreign country (the Indian Supreme Court is the usual offender).
I've reached a point in development where I can focus on this aspect of the project more, but I'm not quite sure where to start. All I know is that I'm looking for a method of analyzing article text to determine its relevance to the federal court, and nothing else. I imagine this will entail some machine learning, but I've basically got no experience in the field. I've done a little reading into things like tf-idf weighting, vector space modeling, and word2vec (+ CBOW and Skip-Gram models), but I'm not quite seeing a "big picture" yet that shows me how just how applicable these concepts can be to my problem. Can anyone point me in the right direction?
Framing the Problem
When starting a novel machine learning project like this there are a few fundamental questions to think through that can help you refine the problem and lit review + experiment more effectively.
Do you have the right data to build a model? You have ~10,000 articles that will be your model input, however, to use a supervised learning approach you will need trustworthy labels for all articles that will be used in model training. It sounds like you already have done this.
What metric(s) to use to quantify success. How can you measure if your model is doing what you want? In your specific case this sounds like a binary classification problem - you want to be able to label articles as relevant or not. You could measure your success using a standard binary classification metric like area under the ROC. Or since you have a specific issue with False Positives you could choose a metric like Precision.
How well can you do with a random or naive approach. Once a dataset and metric have been established you can quantify how well you can do at your task with a basic approach. This could be a simple as calculating your metric for a model that chooses at random, but in your case you have your keyword parser model which is a perfect way to set a bench mark. Quantify how well your keyword parsing approach does for your dataset so you can determine when a machine learning model is doing well.
Sorry if this was obvious and basic to you but I wanted to make sure it was in the answer. In an innovative open ended project like this diving straight into machine learning experiments without thinking through these fundamentals can be inefficient.
Machine Learning Approaches
As suggested by Evan Mata and Stefan G, the best approach is to first reduce your articles into features. This could be done without machine learning (eg vector space model) or with machine learning (word2vec and other examples you cited). For your problem I think something like BOW makes sense to try as a starting point.
Once you have a feature representation of your articles you are almost done and there are a number of binary classification models that will do well. Experiment from here to find the best solution.
Wikipedia has a nice example of a simple way to use this two step approach in spam filtering, an analogous problem (See the Example Usage section of the article).
Good luck, sounds like a fun project!
If you have sufficient labeled data - not only for "yes this article is relevant" but also for "no this article is not relevant" (you're basically making a binary model between y/n relevant - so I would research spam filters) then you can train a fair model. I don't know if you actually have a decent quantity of no-data. If you do, you could train a relatively simple supervised model by doing (pesudocode) the following:
Corpus = preprocess(Corpus) #(remove stop words, etc.)
Vectors = BOW(Corpus) #Or TFIDF or Whatever model you want to use
SomeModel.train(Vectors[~3/4 of them], Labels[corresponding 3/4]) #Labels = 1 if relevant, 0 if not
SomeModel.evaluate(Vectors[remainder], Labels[remainder]) #Make sure the model doesn't overfit
SomeModel.Predict(new_document)
The exact model will depend on your data. A simple Naive-Bayes could (probably will) work fine if you can get a decent number of no-documents. One note - you imply that you have two kinds of no-documents - those that are reasonably close (Indian Supreme Court) or those that are completely irrelevant (say Taxes). You should test training with "close" erroneous cases with "far" erroneous cases filtered out as you do now vs both "close" erroneous cases and "far" erroneous cases and see which one comes out better.
There are many many ways to do this, and the best method changes depending on the project. Perhaps the easiest way to do this is to keyword search in your articles and then empirically choose a cut off score. Although simple, this actually works pretty well, especially in a topic like this one where you can think of a small list of words that are highly likely to appear somewhere in a relevant article.
When a topic is more broad with something like 'business' or 'sports', keyword search can be prohibitive and lacking. This is when a machine learning approach might start to become the better idea. If machine learning is the way you want to go, then there are two steps:
Embed your articles into feature vectors
Train your model
Step 1 can be something simple like a TFIDF vector. However, embedding your documents can also be deep learning on its own. This is where CBOW and Skip-Gram come into play. A popular way to do this is Doc2Vec (PV-DM). A fine implementation is in the Python Gensim library. Modern and more complicated character, word, and document embeddings are much more of a challenge to start with, but are very rewarding. Examples of these are ELMo embeddings or BERT.
Step 2 can be a typical model, as it is now just binary classification. You can try a multilayer neural network, either fully-connected or convolutional, or you can try simpler things like logistic regression or Naive Bayes.
My personal suggestion would be to stick with TFIDF vectors and Naive Bayes. From experience, I can say that this works very well, is by far the easiest to implement, and can even outperform approaches like CBOW or Doc2Vec depending on your data.
I'd like to implement something like the title, but I wonder if it's technically possible.
I know that it is possible to recognize pictures with CNN,
but I don't know if can be automatically covered nipple area.
If have library information about any related information,
I would like to get some advice.
CNNs are able to detect whatever you train them for, to varying degree of accuracy. What you would need are a lot of training samples (ie. samples of ground truths with the original image, and the labeled image) with which to train your models, and then some new data which you can test the accuracy of your model on. The point is, CNNs are not biased to innately learn a task, you have to tell them what to learn!
I can recommend the machine learning library Keras (https://keras.io/) if you plan to do some machine learning using CNNs, as it's pretty simple and somewhat beginner-friendly. Take some of the tutorials for CNNs, which are quite good.
Essentially, you have what I can only assume is a pretty niche problem. The main issue will come down to how much data you have to train your model. CNNs need a lot of training data, especially for a problem like this which isn't simple. A way which would make this simpler would be to have a model which detects the ahem area of interest and denotes it as such on a per-pixel basis. Then a simple mask could be applied to the source image to censor it. This relates to image segmentation, and there are many academic papers on the topic.
I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.
There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.
I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.
I learned, that neural networks can replicate any function.
Normally the neural network is fed with a set of descriptors to its input neurons and then gives out a certain score at its output neuron. I want my neural network to recognize certain behaviours from a screen. Objects on the screen are already preprocessed and clearly visible, so recognition should not be a problem.
Is it possible to use the neural network to recognize a pixelated picture of the screen and make decisions on that basis? The amount of training data would be huge of course. Is there way to teach the ANN by online supervised learning?
Edit:
Because a commenter said the programming problem would be too general:
I would like to implement this in python first, to see if it works. If anyone could point me to a resource where i could do this online-learning thing with python, i would be grateful.
I would suggest
http://www.neuroforge.co.uk/index.php/getting-started-with-python-a-opencv
http://docs.opencv.org/doc/tutorials/ml/table_of_content_ml/table_of_content_ml.html
http://blog.damiles.com/2008/11/the-basic-patter-recognition-and-classification-with-opencv/
https://github.com/bytefish/machinelearning-opencv
openCV is basically an image processing library but also has some amazing helper classes that you you can use for almost any task. Its machine learning module is pretty easy to use and you can go through the source to see explanation and background theory about each function.
You could also use a pure python machine learning library like:
http://scikit-learn.org/stable/
But, before you feed in the data from your screen (i'm assuming thats in pixels?) to your ANN or SVM or whatever ML algorithm you choose, you need to perform "Feature Extraction" on your data. (which are the objects on the screen)
Feature Extraction can be thought of like representing the same data on the screen but with fewer numbers so i have less numbers to give to my ANN. You need to experiment with different features before you find a combination that works well for your particular scenario. a sample one could look something like this:
[x1,y1,x2,y2...,col]
This is basically a list of edge points that represent the area your object is in. a sort of ROI (Region of Interest) and perform egde detection, color detection and also extract any other relevant characteristics. The important thing is that now all your objects, their shape/color information is represented by a number of these lists, one for each object detected.
This is the data that can be provided as input to the neural network. but you'll have to define some meaningfull output parameters depending on your specific problem statements before you can train/test your system of course.
Hope this helps.
This is not entirely correct.
A 3-layer feedforward MLP can theoretically replicate any CONTINUOUS function.
If there are discontinuities, then you need a 4th layer.
Since you are dealing with pixelated screens and such, you probably would need to consider a fourth layer.
Finally, if you are looking at circular shapes, etc., than a radial basis function (RBF) network may be more suitable.