Python compare images of, piece of, clothing (identification) - python

As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.

i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .

As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.

Related

How to start to recognize object in opencv

My final goal is to dynamically recognize different object which is 2D and often has same appearance(2D deck game) in video. I was studying opencv-python tutorial, but there aren't any topic about this, so I want to know what topic, library or function should I learn to reach my goal. Thanks.
You can try several Machine Learning techniques. You may get inspiration from Viola-Jones or Histograms of Oriented Gradients + SVM algorithms (even though those algorithms solve a problem that may differ yours, I had plenty of insights from them). In other words, try "sliding" a window along a horizontal and vertical axes of predefined aspect ratio and try to recognize the Region Of Interest with a model of your choice (CNN, SVM, Logistic Regression etc.). But the problem may be that you will need to train a model, which may require a lot of data.
Or you can do a template matching, which is more of Image Processing problem rather than Machine Learning. It would not require dataset and training, but it will be sensitive to noises, lighting, and position.
Good luck!

Clustering a set of images

I have a folder with hundres/thousands of images, some of them look alike. I would like to create clusters separating those images (those which look alike in the same cluster).
I can't determine the number of clusters that will be needed, it depends on the images.
Does anyone have an idea on how to do this using Python, OpenCV and which algorithm to use?
I've made some research and found that AffinityPropagation or DBSCAN can be useful for me but I don't know where to start (how to encode my images, what should I pass to those algorithms etc...)
Unfortunately it is not that simple with images, since naively clustering would result in clusters of images with the same colors, not the same "content". You can use a neural network as a feature extractor for the images, I see two options:
Use a pre-trained network and get the features from an intermediate layer
Train an autoencoder on your dataset, and use the latent features
Option 1 is cheaper since you can easily find pre-trained models, option 2 is much more computationally expensive but should work better, especially if there is no pre-trained model on your domain.
This tutorial (randomly found on the internet) seems to be a good introduction to method 2.

Recognition of nipple exposure in the image, and Automatically cover nipple area

I'd like to implement something like the title, but I wonder if it's technically possible.
I know that it is possible to recognize pictures with CNN,
but I don't know if can be automatically covered nipple area.
If have library information about any related information,
I would like to get some advice.
CNNs are able to detect whatever you train them for, to varying degree of accuracy. What you would need are a lot of training samples (ie. samples of ground truths with the original image, and the labeled image) with which to train your models, and then some new data which you can test the accuracy of your model on. The point is, CNNs are not biased to innately learn a task, you have to tell them what to learn!
I can recommend the machine learning library Keras (https://keras.io/) if you plan to do some machine learning using CNNs, as it's pretty simple and somewhat beginner-friendly. Take some of the tutorials for CNNs, which are quite good.
Essentially, you have what I can only assume is a pretty niche problem. The main issue will come down to how much data you have to train your model. CNNs need a lot of training data, especially for a problem like this which isn't simple. A way which would make this simpler would be to have a model which detects the ahem area of interest and denotes it as such on a per-pixel basis. Then a simple mask could be applied to the source image to censor it. This relates to image segmentation, and there are many academic papers on the topic.

Image processing and neural networks approach

I am currently working on a project for identification of mood/emotions of a person .
As the first step we are working on a image recognition , detection and tracking python code.
I went through the various different approach towards this problem and found out.
1)Haar cascade method (fast but no scope for recognition and reading expressions ).
2)Neural networks(Great at image recognition ie for details such as smile/ anger..... ).
I am confused with neural networks , ie the approach.
We can first use haar cascade to detect the faces with ease(really fast) then use either canny edge detection or Cropping to crop out the part of the face.
After that is done i have no clue on how to proceed .
This is my idea of it.
continue using haar cascade method to detect the features of the face like eyes, nose ,cheek ,lips....
then find out the distance between them to find out ratios which we can further use to form a neural network .
Different internal layers would be used to detect different features.
We can use differential method to optimize the cost by altering the weights of the synapses .
How good is the approach and is there a better way to do it .
Like say we can use canny edge to detect the edges and then make a new matrix just out of the edges and then use this to train the data.
I dont know , i am really confused.
Anyways thanks in advance for all answers
Image processing libraries such as scikit-image or OpenCV are a good place to start. For example, here's an example of canny edge detection in OpenCV.
Regarding neural networks, as lejlot pointed out, you've got to ask yourself how much you want to build from scratch.
An example for building your own neural network based on some parameters (which you'd have to define for your facial features), I suggest you read through A Neural Network in 11 lines of Python which illustrates some of problems you might face (especially part 2 where it's about image processing too).
What you seem to need are Convolutional Neurl Networks (check this http://cs231n.github.io/convolutional-networks/ to learn about them).
Convolutional Neural Networks (CNN for short) are a kind of neural nets that learn to extract visual features from an image and how to relate those features to recognize what's on the image, so you don't need to detect all the features, just give a CNN a bunch of labeled face pictures and it will learn to identificate the mood of the eprson.
What you can do is to detect the face in every picture (openCV is good enough at detecting faces) and then crop and align each face so all the faces have the same size. Then feed the CNN with all the faces, and it will gradually learn to recognize the emotions of a person.

Can a neural network recognize a screen and replicate a finite set of actions?

I learned, that neural networks can replicate any function.
Normally the neural network is fed with a set of descriptors to its input neurons and then gives out a certain score at its output neuron. I want my neural network to recognize certain behaviours from a screen. Objects on the screen are already preprocessed and clearly visible, so recognition should not be a problem.
Is it possible to use the neural network to recognize a pixelated picture of the screen and make decisions on that basis? The amount of training data would be huge of course. Is there way to teach the ANN by online supervised learning?
Edit:
Because a commenter said the programming problem would be too general:
I would like to implement this in python first, to see if it works. If anyone could point me to a resource where i could do this online-learning thing with python, i would be grateful.
I would suggest
http://www.neuroforge.co.uk/index.php/getting-started-with-python-a-opencv
http://docs.opencv.org/doc/tutorials/ml/table_of_content_ml/table_of_content_ml.html
http://blog.damiles.com/2008/11/the-basic-patter-recognition-and-classification-with-opencv/
https://github.com/bytefish/machinelearning-opencv
openCV is basically an image processing library but also has some amazing helper classes that you you can use for almost any task. Its machine learning module is pretty easy to use and you can go through the source to see explanation and background theory about each function.
You could also use a pure python machine learning library like:
http://scikit-learn.org/stable/
But, before you feed in the data from your screen (i'm assuming thats in pixels?) to your ANN or SVM or whatever ML algorithm you choose, you need to perform "Feature Extraction" on your data. (which are the objects on the screen)
Feature Extraction can be thought of like representing the same data on the screen but with fewer numbers so i have less numbers to give to my ANN. You need to experiment with different features before you find a combination that works well for your particular scenario. a sample one could look something like this:
[x1,y1,x2,y2...,col]
This is basically a list of edge points that represent the area your object is in. a sort of ROI (Region of Interest) and perform egde detection, color detection and also extract any other relevant characteristics. The important thing is that now all your objects, their shape/color information is represented by a number of these lists, one for each object detected.
This is the data that can be provided as input to the neural network. but you'll have to define some meaningfull output parameters depending on your specific problem statements before you can train/test your system of course.
Hope this helps.
This is not entirely correct.
A 3-layer feedforward MLP can theoretically replicate any CONTINUOUS function.
If there are discontinuities, then you need a 4th layer.
Since you are dealing with pixelated screens and such, you probably would need to consider a fourth layer.
Finally, if you are looking at circular shapes, etc., than a radial basis function (RBF) network may be more suitable.

Categories

Resources