I understand that this is a popular question on Stack Overflow however, I have not managed to find the best solution yet.
Background
I am trying to classify an image. I currently have 10,000 unique images that a given image can match with. For each image in my database, I only have a single image for training. So I have a DB of 10,000 and the possible output classes are also 10,000. e.g. lets say there are 10,000 unique objects and I have a single image for each.
The goal is to match an input image to the 'best' matching image in the DB.
I am currently using Python with OpenCV and the Sift library to identify keypoints / descriptors then applying the standard matching methods to see which image in the DB that the input image best matches.
Code
I am using the following code to iterate over my database of images, to then find all the key points / descriptors and saving those descriptors to a file. This is to save time later on.
for i in tqdm(range(labels.shape[0])): #Use the length of the DB
# Read img from DB
img_path = 'data/'+labels['Image_Name'][i]
img = cv2.imread(img_path)
# Resize to ensure all images are equal for ROI
dim = (734,1024)
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
#Grayscale
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#Roi
img = img[150:630, 20:700]
# Sift
sift = cv2.xfeatures2d.SIFT_create()
keypoints_1, descriptors_1 = sift.detectAndCompute(img,None)
# Save descriptors
path = 'data/'+labels['Image_Name'][i].replace(".jpeg", "_descriptors.csv")
savetxt(path, descriptors_1, delimiter=',')
Then when I am ready to classify an image, I can then read in all of the descriptors. This has proven to be 30% quicker.
# Array to store all of the descriptors from SIFT
descriptors = []
for i in tqdm(range(labels.shape[0])): #Use the length of the DB
# Read in teh descriptor file
path = 'data/'+labels['Image_Name'][i].replace(".jpeg", "_descriptors.csv")
descriptor = loadtxt(path, delimiter=',')
# Add to array
descriptors.append(descriptor)
Finally, I just need to read in an image, apply the sift method and then find the best match.
# Calculate simaularity
img = cv2.imread(PATH)
# Resize
dim = (734,1024)
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
#Grayscale
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#Roi
img = img[150:630, 20:700]
# Sift
sift = cv2.xfeatures2d.SIFT_create()
keypoints_1, descriptors_1 = sift.detectAndCompute(img,None)
# Use Flann (Faster)
index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
# Store results
scoresdf = pd.DataFrame(columns=["index","score"])
#Find best matches in DB
for i in tqdm(range(labels.shape[0])):
# load in data
path = 'data/'+labels['Image_Name'][i].replace(".jpeg", "_descriptors.csv")
# Get descriptors for both images to compare
descriptors_2 = descriptors[i]
descriptors_2 = np.float32(descriptors_2)
# Find matches
matches = flann.knnMatch(descriptors_1, descriptors_2, k=2)
# select the lowest amount of keypoints
number_keypoints = 0
if len(descriptors_1) <= len(descriptors_2):
number_keypoints = len(descriptors_1)
else:
number_keypoints = len(descriptors_2)
# Find 'good' matches LOWE
good_points = []
ratio = 0.6
for m, n in matches:
if m.distance < ratio*n.distance:
good_points.append(m)
# Get simularity score
score = len(good_points) / number_keypoints * 100
scoresdf.loc[len(scoresdf)] = [i, score]
This all works but it does take some time and I would like to find a match much quicker.
Solutions?
I have read about the bag of word (BOW) method. However, I do not know if this will work given there are 10,000 classes. Would I need to set K=10000?
Given that each descriptor is an array, is there a way to reduce my search space? Can I find the X closest arrays (descriptors) to the descriptor of my input image?
Any help would be greatly appreciated :)
Edit
Can you use a Bag of Words (BOW) method to create X clusters. Then when I read in a new image, find out which cluster it belongs to. Then use SIFT matching on the images in that cluster to find the exact match? I am struggling to find much code examples for this.
Related
I am trying to implement KAZE and A-KAZE using Python and OpenCV for Feature Detection and Description on an aerial image.
What is the code?
Also, what descriptor should go along with it for Feature Matching?
KAZE, as well as some previous state-of-the-art methods such as SIFT and SURF, are Local Feature Descriptors, and in some ways, it shows better performance in both detection and description compared to SIFT descriptor. A-KAZE, on the other hand, is a Local Binary Descriptor and presents excellent results in terms of speed and performance compared to state-of-the-art methods such as Local Feature Descriptors: SIFT, SURF, and KAZE, and compared to Local Binary Descriptors: ORB, and BRISK.
Responding to your question, both of them can go along with it for Feature Matching, although, A-KAZE descriptor do not fit appropriately in smaller patches (e.g., smallest images — 32x32 patch), that is, in order to avoid the return of keypoints without descriptors, A-KAZE normally remove the keypoints.
Therefore, the choice between KAZE and A-KAZE depends on the context of your application. But, a priori A-KAZE has a better performance than KAZE.
In this example, I will show you Feature Detection and Matching with A-KAZE through the FLANN algorithm using Python and OpenCV.
First, load the input image and the image that will be used for training.
In this example, we are using those images:
image1:
image2:
# Imports
import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np
# Open and convert the input and training-set image from BGR to GRAYSCALE
image1 = cv.imread(filename = 'image1.jpg',
flags = cv.IMREAD_GRAYSCALE)
image2 = cv.imread(filename = 'image2.jpg',
flags = cv.IMREAD_GRAYSCALE)
Note that when importing the images, we use the flags = cv.IMREAD_GRAYSCALE parameter, because in OpenCV the default color mode setting is BGR. Therefore, to work with Descriptors, we need to convert the color mode pattern from BGR to grayscale.
Now we will use the A-KAZE algorithm:
# Initiate A-KAZE descriptor
AKAZE = cv.AKAZE_create()
# Find the keypoints and compute the descriptors for input and training-set image
keypoints1, descriptors1 = AKAZE.detectAndCompute(image1, None)
keypoints2, descriptors2 = AKAZE.detectAndCompute(image2, None)
The features detected by the A-KAZE algorithm can be combined to find objects or patterns that are similar between different images.
Now we will use the FLANN algorithm:
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE,
trees = 5)
search_params = dict(checks = 50)
# Convert to float32
descriptors1 = np.float32(descriptors1)
descriptors2 = np.float32(descriptors2)
# Create FLANN object
FLANN = cv.FlannBasedMatcher(indexParams = index_params,
searchParams = search_params)
# Matching descriptor vectors using FLANN Matcher
matches = FLANN.knnMatch(queryDescriptors = descriptors1,
trainDescriptors = descriptors2,
k = 2)
# Lowe's ratio test
ratio_thresh = 0.7
# "Good" matches
good_matches = []
# Filter matches
for m, n in matches:
if m.distance < ratio_thresh * n.distance:
good_matches.append(m)
# Draw only "good" matches
output = cv.drawMatches(img1 = image1,
keypoints1 = keypoints1,
img2 = image2,
keypoints2 = keypoints2,
matches1to2 = good_matches,
outImg = None,
flags = cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(output)
plt.show()
And the output will be:
To perform the same example with the KAZE descriptor, just initialize this descriptor, changing:
AKAZE = cv.AKAZE_create()
To:
KAZE = cv.KAZE_create()
To learn more about Detection, Description, and Feature Matching techniques, Local Feature Descriptors, Local Binary Descriptors, and algorithms for Feature Matching, I recommend the following repositories on GitHub:
https://github.com/whoisraibolt/Feature-Detection-and-Description
https://github.com/whoisraibolt/Feature-Detection-and-Matching
I am working on a project which requires me to stitch images together. I decided to test this with buildings due to a large number of possible key points that can be calculated. I have been following several guides, but the one with the best results for 2-3 images has been this guide: https://towardsdatascience.com/image-stitching-using-opencv-817779c86a83. The way I decided to stitch multiple images is to stitch the first two, then take the output and then stitch that with the third image, so on and so forth. I am confident in the matching of descriptors for the images. But as I stitch more and more images, the previous stitched part gets pushed further and further into -z axis. Meaning they get distorted and smaller. The code I use to accomplish this is as follows:
import cv2
import numpy as np
import os
os.chdir('images')
img_ = cv2.imread('Output.jpg', cv2.COLOR_BGR2GRAY)
img = cv2.imread('DJI_0019.jpg', cv2.COLOR_BGR2GRAY)
#Setting up orb key point detector
orb = cv2.ORB_create()
#using orb to compute keypoints and descriptors
kp, des = orb.detectAndCompute(img_, None)
kp2, des2 = orb.detectAndCompute(img, None)
print(len(kp))
#Setting up BFmatcher
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
matches = bf.knnMatch(des, des2, k=2) #Find 2 best matches for each descriptors (This is required for ratio test?)
#Using lowes ratio test as suggested in paper at .7-.8
good = []
for m in matches:
if m[0].distance < .8 * m[1].distance:
good.append(m)
matches = np.asarray(good) #matches is essentially a list of matching descriptors
#Aligning the images
if(len(matches)) >= 4:
src = np.float32([kp[m.queryIdx].pt for m in matches[:, 0]]).reshape(-1, 1, 2)
dst = np.float32([kp2[m.trainIdx].pt for m in matches[:, 0]]).reshape(-1, 1, 2)
#Creating the homography and mask
H, masked = cv2.findHomography(src, dst, cv2.RANSAC, 5.0)
print(H)
else:
print("Could not find 4 good matches to find homography")
dst = cv2.warpPerspective(img_, H, (img.shape[1] + 900, img.shape[0]))
dst[0:img.shape[0], 0:img.shape[1]] = img
cv2.imwrite("Output.jpg", dst)
With the output of the 4th+ stitch looking like such:
As you can see the images are getting further and further transformed in a weird way. My theory for such an event happening is due to the camera position and angle at which the images were taken, but I am not sure. If this might be the case, are there optimal parameters that will produce the best images to stitching?
Is there a way to fix this issue where the content can be pushed "flush" against the x axis?
Edit: Adding source images: https://imgur.com/zycPQuV
I have image dataset ant want to extract its features in order to be compared with the query image to select the best features inside threshold. I'm able to extract images features and select the best ones in two corresponding images as the following code:
img1 = cv2.imread("path\of\training\image")
img2 = cv2.imread("path\of\query\image")
# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# find the key-points and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=100) # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in range(len(matches))]
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.8*n.distance:
matchesMask[i]=[1,0]
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = 0)
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
plt.imshow(img3,),plt.show()
I want to compare the query image features with features of all images inside dataset, to select the best ones in order to recognize the specific object. How can I combine all dataset features and compare them with the query image features? can anyone please help me with thanks.
The first and naivest idea to solve your problem is: with your query image, which has some features represented by vectors, you find the nearest neighbours to those vectors in your features/vectors dataset, then the result should be the image which has most nearest features to the query features.
Basically, you have to calculate the distances between your vectors and FAISS give you some effort ways to do that.
I found this site that may help you:
https://waltyou.github.io/Faiss-In-Project-English/. The author faced the same situation as you did. And he used above way to get through it.
"To solve this problem, you can assign multiple ids to multiple vectors of an image when building a Faiss index. In this way, after searching with multiple vectors of a picture, in the returned result, only the number of times the associated id appears can be counted, and the similarity level can be obtained."
I found example in c++:
http://docs.opencv.org/3.0-beta/doc/tutorials/features2d/akaze_matching/akaze_matching.html
But there isn't any example in python showing how to use this feature detector (also couldn't find anything more in documentation about AKAZE there is ORB SIFT, SURF, etc but not what I'm looking for)
http://docs.opencv.org/3.1.0/db/d27/tutorial_py_table_of_contents_feature2d.html#gsc.tab=0
Can someone could share or show me where I can find information how to match images in python with akaze?
I am not sure on where to find it, the way I made it work was through this function which used the Brute Force matcher:
def kaze_match(im1_path, im2_path):
# load the image and convert it to grayscale
im1 = cv2.imread(im1_path)
im2 = cv2.imread(im2_path)
gray1 = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
# initialize the AKAZE descriptor, then detect keypoints and extract
# local invariant descriptors from the image
detector = cv2.AKAZE_create()
(kps1, descs1) = detector.detectAndCompute(gray1, None)
(kps2, descs2) = detector.detectAndCompute(gray2, None)
print("keypoints: {}, descriptors: {}".format(len(kps1), descs1.shape))
print("keypoints: {}, descriptors: {}".format(len(kps2), descs2.shape))
# Match the features
bf = cv2.BFMatcher(cv2.NORM_HAMMING)
matches = bf.knnMatch(descs1,descs2, k=2) # typo fixed
# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.9*n.distance:
good.append([m])
# cv2.drawMatchesKnn expects list of lists as matches.
im3 = cv2.drawMatchesKnn(im1, kps1, im2, kps2, good[1:20], None, flags=2)
cv2.imshow("AKAZE matching", im3)
cv2.waitKey(0)
Remember that the feature vectors are binary vectors. Therefore, the similarity is based on the Hamming distance, rather than the commonly used L2 norm or Euclidean distance if you will.
I searched for the same tutorial and found out the tutorial is given in 3 alternate languages C++, Python & Java. There are 3 hyperlinks for them before the start of code area.
Try this [ https://docs.opencv.org/3.4/db/d70/tutorial_akaze_matching.html ]
I'm currently doing 2D template matching using OpenCV's MatchTemplate function called from Python. I'm looking to extend my code into 3D but can't find any existing 3D cross-correlation programs. Can anyone help out?
Do you mean that you are currently looking for a known object somewhere in an image, and you are currently only able to handle that object being affine transformed (moved around on a 2D plane), but you want to be able to handle it being perspective transformed?
You could try using a SURF or SIFT algorithm to find features in your reference and unknown images:
def GetSurfPoints(image, mask)
surfDetector = cv2.FeatureDetector_create("SURF")
surfExtractor = cv2.DescriptorExtractor_create("SURF")
keyPoints = surfDetector.detect(image, mask)
keyPoints, descriptions = surfExtractor.compute(image, keyPoints)
return keyPoints, descriptions
Then use FLANN to find matching points (this is from one of the cv2 samples):
def MatchFlann(desc1, desc2, r_threshold = 0.6):
FLANN_INDEX_KDTREE = 1 # bug: flann enums are missing
flann_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 4)
flann = cv2.flann_Index(desc2, flann_params)
idx2, dist = flann.knnSearch(desc1, 2, params = {}) # bug: need to provide empty dict
mask = dist[:,0] / dist[:,1] < r_threshold
idx1 = numpy.arange(len(desc1))
matches = numpy.int32( zip(idx1, idx2[:,0]) )
return matches[mask]
Now if you want to, you could use FindHomography to find a transformation that aligns the two images:
referencePoints = numpy.array([keyPoints[match[0]].pt for match in matches])
newPoints = numpy.array([keyPoints[match[1]].pt for match in matches])
transformMatrix, mask = cv2.findHomography(newPoints, referencePoints, method = cv2.cv.CV_LMEDS)
You could then use WarpPerspective and that matrix to align the images. Or you could do something else with the set of matched points found earlier.