I am relatively a newbie to computer vision and now currently doing a learning project on shape detection where I have a fixed region of interest(ROI) in all the images where the object is most likely present and I have to compare their shapes to give whether the object present in two input images are same or not.There are slight translational and scale changes and illumination changes.
I am trying to compare the shape of the object between two input images and trying to provide an output value describing their similarity. If the similarity is above a certain threshold, I can tell that the same object is present in both input images.
I have tried contours, but it does not give reliable results(thresholding either gives too many details or misses some vital details) and doesn't generalize well to all images. I am thinking of using global shape descriptors like HOG.
But I have problems with understanding the feature vector values from the HOG descriptor. How to compare HOG feature vectors(1D) for the two input images to find similarity without using SVM or machine learning? What is the best way to compare HOG feature vectors?
I don't understand how the distance measures work for comparing the future vectors. I want to understand the physical meaning of how distances are used to compare feature vectors and histograms? How to use them to compare HOG feature vectors?
Sorry, your question is actually hard to understand.
I think you are going into wrong direction.
How to compare HOG feature vectors(1D) for the two input images to find similarity without using SVM or machine learning?
SVM is tool for comparing a vector with a dictionary to find the rightest answer. For similarity, it is just the distance of the two image-represented vector. Do not overthinking, it will kill you
In your case, you use HOG feature as your image-represented vector. So calculate the Euclid distance between them. That value is there similarity.
You can see matlab pdist method to find a list of easy-to-used distance calculating method.
The problem lie down here is not how to compare feature vector, it is how to represent your image by single vector. A better image represent lead to better performance. For example: Bag-of-word, CNN, etc . There are ton of them, for newbie like you start with Bag-of-word.
Hope that help and welcome to Computer vision world
Related
As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.
i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .
As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.
I have a database of images that contains identity cards, bills and passports.
I want to classify these images into different groups (i.e identity cards, bills and passports).
As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised).
The idea for me is like this: the clustering will be based on the similarity between images (i.e images that have similar features will be grouped together).
I know also that this process can be done by using k-means.
So the problem for me is about features and using images with K-means.
If anyone has done this before, or has a clue about it, please would you recommend some links to start with or suggest any features that can be helpful.
Most simple way to get good results will be to break down the problem into two parts :
Getting the features from the images: Using the raw pixels as features will give you poor results. Pass the images through a pre trained CNN(you can get several of those online). Then use the last CNN layer(just before the fully connected) as the image features.
Clustering of features : Having got the rich features for each image, you can do clustering on these(like K-means).
I would recommend implementing(using already implemented) 1, 2 in Keras and Sklearn respectively.
Label a few examples, and use classification.
Clustering is as likely to give you the clusters "images with a blueish tint", "grayscale scans" and "warm color temperature". That is a quote reasonable way to cluster such images.
Furthermore, k-means is very sensitive to outliers. And you probably have some in there.
Since you want your clusters correspond to certain human concepts, classification is what you need to use.
I have implemented Unsupervised Clustering based on Image Similarity using Agglomerative Hierarchical Clustering.
My use case had images of People, so I had extracted the Face Embedding (aka Feature) Vector from each image. I have used dlib for face embedding and so each feature vector was 128d.
In general, the feature vector of each image can be extracted. A pre-trained VGG or CNN network, with its final classification layer removed; can be used for feature extraction.
A dictionary with KEY as the IMAGE_FILENAME and VALUE as the FEATURE_VECTOR can be created for all the images in the folder. This will make the co-relation between the filename and it’s feature vector easier.
Then create a single feature vector say X, which comprises of individual feature vectors of each image in the folder/group which needs to be clustered.
In my use case, X had the dimension as : NUMBER OF IMAGE IN THE FOLDER, 128 (i.e SIZE OF EACH FEATURE VECTOR). For instance, Shape of X : 50,128
This feature vector can then be used to fit an Agglomerative Hierarchical Cluster. One needs to fine tune the distance threshold parameter empirically.
Finally, we can write a code to identify which IMAGE_FILENAME belongs to which cluster.
In my case, there were about 50 images per folder so this was a manageable solution. This approach was able to group image of a single person into a single clusters. For example, 15 images of PERSON1 belongs to CLUSTER 0, 10 images of PERSON2 belongs to CLUSTER 2 and so on…
I am looking for a Machine learning approach to find the most likely class lable (with the probability value) for a given feature vector. I have a training set for n classes and most of the feature vector consist of boolean values. Till now I was thinking of counting the number of True values for features and normalizing ( for eg m= number of training samples with value True for a feature and n =number of training samples. feat_val=m/n) it to create a representational feature vector for a class. Once created, similarity measures like cosine distance or eucledian distance between the class representation vector and the given feature vector.
Can anyone suggest whether this approach will be worth implementing?
The problem you are trying to solve is called classification and is a major part of supervised learning. Great place to start is an open source library called scikit-learn and their documentation (try this).
There are a lot of classification models available but once you pick a specific one and train it then you simply use the predict_proba method to get probabilities for a given feature vector or matrix.
I'm very confused about the steps to follow to use BOVW for CBIR. I found a lot of literature about classification, machine learning and SVM but it is not quite what I'm looking for.
My problem is related to searching image similarity in a database with an image query.
My steps until now:
extract features (example: ORB, BRISK, SIFT...).
store all images' features to disk.
read features and calculate K-means in order to obtain centroids (my vocabulary, right?)
And now I'm stuck. I found many different ways to proceed.
This is my hypothesis:
for each k-means compute nearest neighbour (FLANN?)
Build histogram with set of nearest neighbour
Do I have to extract a dictionary also for every single image and then indexing the images?
Why is vector quantization (step 4. and 5.) necessary?
Can you suggest me a possible way to proceed, or any article, tutorial on the topic?
NOTE: For the implementation of BOVW I cannot use OpenCV because it does not work with binary descriptors so I need to try with sklearn library.
Ok, this is pretty much what I was looking for:
https://stackoverflow.com/a/8549874/8894489
Hope that can be helpful for someone.
I am using Scikit to make some prediction on a very large set of data. The data is very wide, but not very long so I want to set some weights to the parts of the data. If I know some parts of the data are more important then other parts how should I inform SCikit of this, or does it kinda break the whole machine learning approach to do some pre-teaching.
The most straightforward way of doing this is perhaps by using Principal Component Analysis on your data matrix X. Principal vectors form an orthogonal basis of X, and they are each one a linear combination of the original feature space (normally columns) of X. The decomposition is such that each principal vector has a corresponding eigenvalue (or singular value depending on how you compute PCA) a scalar that reflects how much reconstruction can be made solely on the basis of that principal vector alone, in a least-squares sense.
The magnitude of coefficients of principal vectors can be interpreted as importance of the individual features of your data, since each coefficient maps 1:1 to a feature or column of the matrix. By selecting one or two principal vectors and examining their magnitudes, you may have a preliminary insight of what columns are more relevant, of course up to how much these vectors approximate the matrix.
This is the detailed scikit-learn API description. Again, PCA is a simple but just one way of doing it, among others.
This probably depends a bit on the machine learning algorithm you're using -- many will discover feature importances on their own (as elaborated via the feature_importances_ property in random forest and others).
If you're using a distance-based measure (e.g. k-means, knn) you could manually weight the features differently by scaling the values of each feature accordingly (though it's possible scikit does some normalization...).
Alternatively, if you know some features really don't carry much information you could simply eliminate them, though you'd lose any diagnostic value these features might unexpectedly bring. There are some tools in scikit for feature selection that might help make this kind of judgement.