Clustering images using unsupervised Machine Learning - python

I have a database of images that contains identity cards, bills and passports.
I want to classify these images into different groups (i.e identity cards, bills and passports).
As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised).
The idea for me is like this: the clustering will be based on the similarity between images (i.e images that have similar features will be grouped together).
I know also that this process can be done by using k-means.
So the problem for me is about features and using images with K-means.
If anyone has done this before, or has a clue about it, please would you recommend some links to start with or suggest any features that can be helpful.

Most simple way to get good results will be to break down the problem into two parts :
Getting the features from the images: Using the raw pixels as features will give you poor results. Pass the images through a pre trained CNN(you can get several of those online). Then use the last CNN layer(just before the fully connected) as the image features.
Clustering of features : Having got the rich features for each image, you can do clustering on these(like K-means).
I would recommend implementing(using already implemented) 1, 2 in Keras and Sklearn respectively.

Label a few examples, and use classification.
Clustering is as likely to give you the clusters "images with a blueish tint", "grayscale scans" and "warm color temperature". That is a quote reasonable way to cluster such images.
Furthermore, k-means is very sensitive to outliers. And you probably have some in there.
Since you want your clusters correspond to certain human concepts, classification is what you need to use.

I have implemented Unsupervised Clustering based on Image Similarity using Agglomerative Hierarchical Clustering.
My use case had images of People, so I had extracted the Face Embedding (aka Feature) Vector from each image. I have used dlib for face embedding and so each feature vector was 128d.
In general, the feature vector of each image can be extracted. A pre-trained VGG or CNN network, with its final classification layer removed; can be used for feature extraction.
A dictionary with KEY as the IMAGE_FILENAME and VALUE as the FEATURE_VECTOR can be created for all the images in the folder. This will make the co-relation between the filename and it’s feature vector easier.
Then create a single feature vector say X, which comprises of individual feature vectors of each image in the folder/group which needs to be clustered.
In my use case, X had the dimension as : NUMBER OF IMAGE IN THE FOLDER, 128 (i.e SIZE OF EACH FEATURE VECTOR). For instance, Shape of X : 50,128
This feature vector can then be used to fit an Agglomerative Hierarchical Cluster. One needs to fine tune the distance threshold parameter empirically.
Finally, we can write a code to identify which IMAGE_FILENAME belongs to which cluster.
In my case, there were about 50 images per folder so this was a manageable solution. This approach was able to group image of a single person into a single clusters. For example, 15 images of PERSON1 belongs to CLUSTER 0, 10 images of PERSON2 belongs to CLUSTER 2 and so on…

Related

Python compare images of, piece of, clothing (identification)

As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.
i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .
As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.

Clustering a set of images

I have a folder with hundres/thousands of images, some of them look alike. I would like to create clusters separating those images (those which look alike in the same cluster).
I can't determine the number of clusters that will be needed, it depends on the images.
Does anyone have an idea on how to do this using Python, OpenCV and which algorithm to use?
I've made some research and found that AffinityPropagation or DBSCAN can be useful for me but I don't know where to start (how to encode my images, what should I pass to those algorithms etc...)
Unfortunately it is not that simple with images, since naively clustering would result in clusters of images with the same colors, not the same "content". You can use a neural network as a feature extractor for the images, I see two options:
Use a pre-trained network and get the features from an intermediate layer
Train an autoencoder on your dataset, and use the latent features
Option 1 is cheaper since you can easily find pre-trained models, option 2 is much more computationally expensive but should work better, especially if there is no pre-trained model on your domain.
This tutorial (randomly found on the internet) seems to be a good introduction to method 2.

comparing HOG feature vectors without SVM

I am relatively a newbie to computer vision and now currently doing a learning project on shape detection where I have a fixed region of interest(ROI) in all the images where the object is most likely present and I have to compare their shapes to give whether the object present in two input images are same or not.There are slight translational and scale changes and illumination changes.
I am trying to compare the shape of the object between two input images and trying to provide an output value describing their similarity. If the similarity is above a certain threshold, I can tell that the same object is present in both input images.
I have tried contours, but it does not give reliable results(thresholding either gives too many details or misses some vital details) and doesn't generalize well to all images. I am thinking of using global shape descriptors like HOG.
But I have problems with understanding the feature vector values from the HOG descriptor. How to compare HOG feature vectors(1D) for the two input images to find similarity without using SVM or machine learning? What is the best way to compare HOG feature vectors?
I don't understand how the distance measures work for comparing the future vectors. I want to understand the physical meaning of how distances are used to compare feature vectors and histograms? How to use them to compare HOG feature vectors?
Sorry, your question is actually hard to understand.
I think you are going into wrong direction.
How to compare HOG feature vectors(1D) for the two input images to find similarity without using SVM or machine learning?
SVM is tool for comparing a vector with a dictionary to find the rightest answer. For similarity, it is just the distance of the two image-represented vector. Do not overthinking, it will kill you
In your case, you use HOG feature as your image-represented vector. So calculate the Euclid distance between them. That value is there similarity.
You can see matlab pdist method to find a list of easy-to-used distance calculating method.
The problem lie down here is not how to compare feature vector, it is how to represent your image by single vector. A better image represent lead to better performance. For example: Bag-of-word, CNN, etc . There are ton of them, for newbie like you start with Bag-of-word.
Hope that help and welcome to Computer vision world

Image Segmentation with TensorFlow

I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?
Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.
SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.
Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.

Classify Signal Images using Python

I have following Signal Images which I want to classify depending upon the shape. Which algorithm is suited to do this ? I have attached 2-2 images of each class.
You'll probably want to use sklearn. Assuming you want to classify these patterns based on images rather than the data from which the images were generated, you can use a simple k-nearest-neighbor (KNN) classifier.
KNN classification is a way you can classify many different types of data. First, the algorithm is trained using tagged data (images, in your case, that appear to be in classes of differing frequencies). Then, the algorithm analyzes untagged data, data you want to classify. The "nearest neighbor" part means that each new piece of data seen by the algorithm is classified based on the k nearest pieces of data that you trained the algorithm with. The idea is that new data of a certain category will be numerically similar to another category. Here's a high-level workflow of how the algorithm works:
train_set = [(img1, 'low freq'), (img2, 'hi freq'), (img3, 'low freq'), (img4, 'med freq'), (img5, 'med freq')]
img_classifier = algorithm(train_set)
Then, you call your trained algorithm on new data to identify untagged images.
test = [img6, img7]
for i in test:
img_classifier(test)
You'll want to use a LOT more than five training images, though. The value of k that you choose is important, too. Assuming you train with the same amount of images for each class (let's say n), for a total of 3n images trained with, a good k to use is be k=n/2. Too high and you risk misclassification because you take into account too much of the training data, too low and you may take into account too little.
There is an excellent tutorial here that you should definitely check out if you decide to use sklearn.
Your images appear to be in very discrete classes. If you don't want to use sklearn, you might be able to classify your images based on the area of the image that your curve covers. If these are the only these three classes, you can try some of these to see if they give you a good threshold for image classification:
Calculate the area of the blue (light+dark) in the image--different frequencies may be covered by different areas.
Check out the ratio of light blue to dark blue, it may be different.
Calculate the maximum y-displacement of the dark blue from the center of the image (x-axis). This will easily separate the high-frequency from the mid and low frequency images, and then you can use the area calculation in the first method to differentiate the low and mid frequencies as they clearly cover different areas.
If you decide to go with the second method, definitely check out the Python Imaging Library. It's used in sklearn, actually, if I'm not mistaken.

Categories

Resources