Bag of Visual Words (obtained from features) for CBIR. Steps?

Bag of Visual Words (obtained from features) for CBIR. Steps? - python

I'm very confused about the steps to follow to use BOVW for CBIR. I found a lot of literature about classification, machine learning and SVM but it is not quite what I'm looking for.
My problem is related to searching image similarity in a database with an image query.
My steps until now:
extract features (example: ORB, BRISK, SIFT...).
store all images' features to disk.
read features and calculate K-means in order to obtain centroids (my vocabulary, right?)
And now I'm stuck. I found many different ways to proceed.
This is my hypothesis:
for each k-means compute nearest neighbour (FLANN?)
Build histogram with set of nearest neighbour
Do I have to extract a dictionary also for every single image and then indexing the images?
Why is vector quantization (step 4. and 5.) necessary?
Can you suggest me a possible way to proceed, or any article, tutorial on the topic?
NOTE: For the implementation of BOVW I cannot use OpenCV because it does not work with binary descriptors so I need to try with sklearn library.

Ok, this is pretty much what I was looking for:
https://stackoverflow.com/a/8549874/8894489
Hope that can be helpful for someone.

Related

Python compare images of, piece of, clothing (identification)

As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.

i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .

As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.

What is the effective way to store and retrieve the image feature?

I am a beginner in computer vision, my goal is to extract K-nearest neighbours for an image, I had a look at Annoy, Faiss, NMSLIB algorithms and decided to use Faiss for image similarity. I have a large dataset to check for the nearest neighbours. I am currently using faiss.IndexFlatIP, Since it is a brute force approach, it provides good results at the cost of time.
Questions:
Instead of brute force approach, is there any other way we can do this with good results?
Currently I am using pickle to save the large extracted features, is there any other way we can store and retrieve those features effectively?
I am currently using SIFT for feature extraction, whether CNN model will outperform this?
Any help would be greatly appreciated.

You might want to use a managed faiss solution which supports feature retrieval.

Milvus is an open-source vector database built to power embedding similarity search and AI applications.(integrated Faiss/annoy/hnsw)
Compared to faiss, it's easy to manage embeddings with milvus
https://milvus.io/
https://github.com/milvus-io/milvus

NLP - which technique to use to classify labels of a paragraph?

I'm fairly new to NLP and trying to learn the techniques that can help me get my job done.
Here is my task: I have to classify stages of a drilling process based on text memos.
I have to classify labels for "Activity", "Activity Detail", "Operation" based on what's written in "Com" column.
I've been reading a lot of articles online and all the different kinds of techniques that I've read really confuses me.
The buzz words that I'm trying to understand are
Skip-gram (prediction based method, Word2Vec)
TF-IDF (frequency based method)
Co-Occurrence Matrix (frequency based method)
I am given about ~40,000 rows of data (pretty small, I know), and I came across an article that says neural-net based models like Skip-gram might not be a good choice if I have small number of training data. So I was also looking into frequency based methods too. Overall, I am unsure which technique is the best for me.
Here's what I understand:
Skip-gram: technique used to represent words in a vector space. But I don't understand what to do next once I vectorized my corpus
TF-IDF: tells how important each word is in each sentence. But I still don't know how it can be applied on my problem
Co-Occurence Matrix: I don'y really understand what it is.
All the three techniques are to numerically represent texts. But I am unsure what step I should take next to actually classify labels.
What approach & sequence of techniques should I use to tackle my problem? If there's any open source Jupyter notebook project, or link to an article (hopefully with codes) that did the similar job done, please share it here.

Let's get things a bit clearer. You task is to create a system that will predict labels for given texts, right? And label prediction (classification) can't be done for unstructured data (texts). So you need to make your data structured, and then train and infer your classifier. Therefore, you need to induce two separate systems:
Text vectorizer (as you said, it helps to numerically represent texts).
Classifier (to predict the labels for numerically represented texts).
Skip-Gram and co-occurrence matrix are ways to vectorize your texts (here is a nice article that explains their difference). In case of skip-gram you could download and use a 3rd party model that already has mapping of vectors to most of the words; in case of co-occurrence matrix you need to build it on your texts (if you have specific lexis, it will be a better way). In this matrix you could use different measures to represent the degree of co-occurrence of words with words or documents with documents. TF-IDF is one of such measures (that gives a score for every word-document pair); there are a lot of others (PMI, BM25, etc). This article should help to implement classification with co-occurrence matrix on your data. And this one gives an idea how to do the same with Word2Vec.
Hope it helped!

Clustering a set of images

I have a folder with hundres/thousands of images, some of them look alike. I would like to create clusters separating those images (those which look alike in the same cluster).
I can't determine the number of clusters that will be needed, it depends on the images.
Does anyone have an idea on how to do this using Python, OpenCV and which algorithm to use?
I've made some research and found that AffinityPropagation or DBSCAN can be useful for me but I don't know where to start (how to encode my images, what should I pass to those algorithms etc...)

Unfortunately it is not that simple with images, since naively clustering would result in clusters of images with the same colors, not the same "content". You can use a neural network as a feature extractor for the images, I see two options:
Use a pre-trained network and get the features from an intermediate layer
Train an autoencoder on your dataset, and use the latent features
Option 1 is cheaper since you can easily find pre-trained models, option 2 is much more computationally expensive but should work better, especially if there is no pre-trained model on your domain.
This tutorial (randomly found on the internet) seems to be a good introduction to method 2.

comparing HOG feature vectors without SVM

I am relatively a newbie to computer vision and now currently doing a learning project on shape detection where I have a fixed region of interest(ROI) in all the images where the object is most likely present and I have to compare their shapes to give whether the object present in two input images are same or not.There are slight translational and scale changes and illumination changes.
I am trying to compare the shape of the object between two input images and trying to provide an output value describing their similarity. If the similarity is above a certain threshold, I can tell that the same object is present in both input images.
I have tried contours, but it does not give reliable results(thresholding either gives too many details or misses some vital details) and doesn't generalize well to all images. I am thinking of using global shape descriptors like HOG.
But I have problems with understanding the feature vector values from the HOG descriptor. How to compare HOG feature vectors(1D) for the two input images to find similarity without using SVM or machine learning? What is the best way to compare HOG feature vectors?
I don't understand how the distance measures work for comparing the future vectors. I want to understand the physical meaning of how distances are used to compare feature vectors and histograms? How to use them to compare HOG feature vectors?

Sorry, your question is actually hard to understand.
I think you are going into wrong direction.
How to compare HOG feature vectors(1D) for the two input images to find similarity without using SVM or machine learning?
SVM is tool for comparing a vector with a dictionary to find the rightest answer. For similarity, it is just the distance of the two image-represented vector. Do not overthinking, it will kill you
In your case, you use HOG feature as your image-represented vector. So calculate the Euclid distance between them. That value is there similarity.
You can see matlab pdist method to find a list of easy-to-used distance calculating method.
The problem lie down here is not how to compare feature vector, it is how to represent your image by single vector. A better image represent lead to better performance. For example: Bag-of-word, CNN, etc . There are ton of them, for newbie like you start with Bag-of-word.
Hope that help and welcome to Computer vision world

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.