I'm trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a litte bit shifted. So if a have a 100% identical images but one is a little bit moved, I get a bad correlation value.
Any suggestions for a better algorithm?
BTW, I'm talking about to compare thousand of imgages...
Edit:
Here is an example of my pictures (microscopic):
im1:
im2:
im3:
im1 and im2 are the same but a little bit shifted/cutted, im3 should be recognized as completly different...
Edit:
Problem is solved with the suggestions of Peter Hansen! Works very well! Thanks to all answers! Some results can be found here
http://labtools.ipk-gatersleben.de/image%20comparison/image%20comparision.pdf
A similar question was asked a year ago and has numerous responses, including one regarding pixelizing the images, which I was going to suggest as at least a pre-qualification step (as it would exclude very non-similar images quite quickly).
There are also links there to still-earlier questions which have even more references and good answers.
Here's an implementation using some of the ideas with Scipy, using your above three images (saved as im1.jpg, im2.jpg, im3.jpg, respectively). The final output shows im1 compared with itself, as a baseline, and then each image compared with the others.
>>> import scipy as sp
>>> from scipy.misc import imread
>>> from scipy.signal.signaltools import correlate2d as c2d
>>>
>>> def get(i):
... # get JPG image as Scipy array, RGB (3 layer)
... data = imread('im%s.jpg' % i)
... # convert to grey-scale using W3C luminance calc
... data = sp.inner(data, [299, 587, 114]) / 1000.0
... # normalize per http://en.wikipedia.org/wiki/Cross-correlation
... return (data - data.mean()) / data.std()
...
>>> im1 = get(1)
>>> im2 = get(2)
>>> im3 = get(3)
>>> im1.shape
(105, 401)
>>> im2.shape
(109, 373)
>>> im3.shape
(121, 457)
>>> c11 = c2d(im1, im1, mode='same') # baseline
>>> c12 = c2d(im1, im2, mode='same')
>>> c13 = c2d(im1, im3, mode='same')
>>> c23 = c2d(im2, im3, mode='same')
>>> c11.max(), c12.max(), c13.max(), c23.max()
(42105.00000000259, 39898.103896795357, 16482.883608327804, 15873.465425120798)
So note that im1 compared with itself gives a score of 42105, im2 compared with im1 is not far off that, but im3 compared with either of the others gives well under half that value. You'd have to experiment with other images to see how well this might perform and how you might improve it.
Run time is long... several minutes on my machine. I would try some pre-filtering to avoid wasting time comparing very dissimilar images, maybe with the "compare jpg file size" trick mentioned in responses to the other question, or with pixelization. The fact that you have images of different sizes complicates things, but you didn't give enough information about the extent of butchering one might expect, so it's hard to give a specific answer that takes that into account.
I have one done this with an image histogram comparison. My basic algorithm was this:
Split image into red, green and blue
Create normalized histograms for red, green and blue channel and concatenate them into a vector (r0...rn, g0...gn, b0...bn) where n is the number of "buckets", 256 should be enough
subtract this histogram from the histogram of another image and calculate the distance
here is some code with numpy and pil
r = numpy.asarray(im.convert( "RGB", (1,0,0,0, 1,0,0,0, 1,0,0,0) ))
g = numpy.asarray(im.convert( "RGB", (0,1,0,0, 0,1,0,0, 0,1,0,0) ))
b = numpy.asarray(im.convert( "RGB", (0,0,1,0, 0,0,1,0, 0,0,1,0) ))
hr, h_bins = numpy.histogram(r, bins=256, new=True, normed=True)
hg, h_bins = numpy.histogram(g, bins=256, new=True, normed=True)
hb, h_bins = numpy.histogram(b, bins=256, new=True, normed=True)
hist = numpy.array([hr, hg, hb]).ravel()
if you have two histograms, you can get the distance like this:
diff = hist1 - hist2
distance = numpy.sqrt(numpy.dot(diff, diff))
If the two images are identical, the distance is 0, the more they diverge, the greater the distance.
It worked quite well for photos for me but failed on graphics like texts and logos.
You really need to specify the question better, but, looking at those 5 images, the organisms all seem to be oriented the same way. If this is always the case, you can try doing a normalized cross-correlation between the two images and taking the peak value as your degree of similarity. I don't know of a normalized cross-correlation function in Python, but there is a similar fftconvolve() function and you can do the circular cross-correlation yourself:
a = asarray(Image.open('c603225337.jpg').convert('L'))
b = asarray(Image.open('9b78f22f42.jpg').convert('L'))
f1 = rfftn(a)
f2 = rfftn(b)
g = f1 * f2
c = irfftn(g)
This won't work as written since the images are different sizes, and the output isn't weighted or normalized at all.
The location of the peak value of the output indicates the offset between the two images, and the magnitude of the peak indicates the similarity. There should be a way to weight/normalize it so that you can tell the difference between a good match and a poor match.
This isn't as good of an answer as I want, since I haven't figured out how to normalize it yet, but I'll update it if I figure it out, and it will give you an idea to look into.
If your problem is about shifted pixels, maybe you should compare against a frequency transform.
The FFT should be OK (numpy has an implementation for 2D matrices), but I'm always hearing that Wavelets are better for this kind of tasks ^_^
About the performance, if all the images are of the same size, if I remember well, the FFTW package created an specialised function for each FFT input size, so you can get a nice performance boost reusing the same code... I don't know if numpy is based on FFTW, but if it's not maybe you could try to investigate a little bit there.
Here you have a prototype... you can play a little bit with it to see which threshold fits with your images.
import Image
import numpy
import sys
def main():
img1 = Image.open(sys.argv[1])
img2 = Image.open(sys.argv[2])
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
s = 0
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.fft.fft2(numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size))
m2 = numpy.fft.fft2(numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size))
s += numpy.sum(numpy.abs(m1-m2))
print s
if __name__ == "__main__":
sys.exit(main())
Another way to proceed might be blurring the images, then subtracting the pixel values from the two images. If the difference is non nil, then you can shift one of the images 1 px in each direction and compare again, if the difference is lower than in the previous step, you can repeat shifting in the direction of the gradient and subtracting until the difference is lower than a certain threshold or increases again. That should work if the radius of the blurring kernel is larger than the shift of the images.
Also, you can try with some of the tools that are commonly used in the photography workflow for blending multiple expositions or doing panoramas, like the Pano Tools.
I have done some image processing course long ago, and remember that when matching I normally started with making the image grayscale, and then sharpening the edges of the image so you only see edges. You (the software) can then shift and subtract the images until the difference is minimal.
If that difference is larger than the treshold you set, the images are not equal and you can move on to the next. Images with a smaller treshold can then be analyzed next.
I do think that at best you can radically thin out possible matches, but will need to personally compare possible matches to determine they're really equal.
I can't really show code as it was a long time ago, and I used Khoros/Cantata for that course.
First off, correlation is a very CPU intensive rather inaccurate measure for similarity. Why not just go for the sum of the squares if differences between individual pixels?
A simple solution, if the maximum shift is limited: generate all possible shifted images and find the one that is the best match. Make sure you calculate your match variable (i.e. correllation) only over the subset of pixels that can be matched in all shifted images. Also, your maximum shift should be significantly smaller than the size of your images.
If you want to use some more advances image processing techniques I suggest you look at SIFT this is a very powerfull method that (theoretically anyway) can properly match items in images independent of translation, rotation and scale.
I guess you could do something like this:
estimate vertical / horizontal displacement of reference image vs the comparison image. a
simple SAD (sum of absolute difference) with motion vectors would do to.
shift the comparison image accordingly
compute the pearson correlation you were trying to do
Shift measurement is not difficult.
Take a region (say about 32x32) in comparison image.
Shift it by x pixels in horizontal and y pixels in vertical direction.
Compute the SAD (sum of absolute difference) w.r.t. original image
Do this for several values of x and y in a small range (-10, +10)
Find the place where the difference is minimum
Pick that value as the shift motion vector
Note:
If the SAD is coming very high for all values of x and y then you can anyway assume that the images are highly dissimilar and shift measurement is not necessary.
To get the imports to work correctly on my Ubuntu 16.04 (as of April 2017), I installed python 2.7 and these:
sudo apt-get install python-dev
sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk
sudo apt-get install python-scipy
sudo pip install pillow
Then I changed Snowflake's imports to these:
import scipy as sp
from scipy.ndimage import imread
from scipy.signal.signaltools import correlate2d as c2d
How awesome that Snowflake's scripted worked for me 8 years later!
I propose a solution based on the Jaccard index of similarity on the image histograms. See: https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard_similarity_and_distance
You can compute the difference in the distribution of the pixel colors. This is indeed pretty invariant to translations.
from PIL.Image import Image
from typing import List
def jaccard_similarity(im1: Image, im2: Image) -> float:
"""Compute the similarity between two images.
First, for each image an histogram of the pixels distribution is extracted.
Then, the similarity between the histograms is compared using the weighted Jaccard index of similarity, defined as:
Jsimilarity = sum(min(b1_i, b2_i)) / sum(max(b1_i, b2_i)
where b1_i, and b2_i are the ith histogram bin of images 1 and 2, respectively.
The two images must have same resolution and number of channels (depth).
See: https://en.wikipedia.org/wiki/Jaccard_index
Where it is also called Ruzicka similarity."""
if im1.size != im2.size:
raise Exception("Images must have the same size. Found {} and {}".format(im1.size, im2.size))
n_channels_1 = len(im1.getbands())
n_channels_2 = len(im2.getbands())
if n_channels_1 != n_channels_2:
raise Exception("Images must have the same number of channels. Found {} and {}".format(n_channels_1, n_channels_2))
assert n_channels_1 == n_channels_2
sum_mins = 0
sum_maxs = 0
hi1 = im1.histogram() # type: List[int]
hi2 = im2.histogram() # type: List[int]
# Since the two images have the same amount of channels, they must have the same amount of bins in the histogram.
assert len(hi1) == len(hi2)
for b1, b2 in zip(hi1, hi2):
min_b = min(b1, b2)
sum_mins += min_b
max_b = max(b1, b2)
sum_maxs += max_b
jaccard_index = sum_mins / sum_maxs
return jaccard_index
With respect to mean squared error, the Jaccard index lies always in the range [0,1], thus allowing for comparisons among different image sizes.
Then, you can compare the two images, but after rescaling to the same size! Or pixel counts will have to be somehow normalized. I used this:
import sys
from skincare.common.utils import jaccard_similarity
import PIL.Image
from PIL.Image import Image
file1 = sys.argv[1]
file2 = sys.argv[2]
im1 = PIL.Image.open(file1) # type: Image
im2 = PIL.Image.open(file2) # type: Image
print("Image 1: mode={}, size={}".format(im1.mode, im1.size))
print("Image 2: mode={}, size={}".format(im2.mode, im2.size))
if im1.size != im2.size:
print("Resizing image 2 to {}".format(im1.size))
im2 = im2.resize(im1.size, resample=PIL.Image.BILINEAR)
j = jaccard_similarity(im1, im2)
print("Jaccard similarity index = {}".format(j))
Testing on your images:
$ python CompareTwoImages.py im1.jpg im2.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(373, 109)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.7238955686269157
$ python CompareTwoImages.py im1.jpg im3.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.22785529941822316
$ python CompareTwoImages.py im2.jpg im3.jpg
Image 1: mode=RGB, size=(373, 109)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (373, 109)
Jaccard similarity index = 0.29066426814105445
You might also consider experimenting with different resampling filters (like NEAREST or LANCZOS), as they, of course, alter the color distribution when resizing.
Additionally, consider that swapping images change the results, as the second image might be downsampled instead of upsampled (After all, cropping might better suit your case rather than rescaling.)
Related
I'm trying to port some code that was originally written in scikit to OpenCV, as I already use OpenCV for some other tasks. I have these two images:
which are essentially the polar forms of two images that share a common center, where one image is a rotation of the other. I need to use phase correlation to determine what this angle is. In OpenCV, I'm doing:
import cv2
import numpy as np
im1 = np.float32(cv2.cvtColor(cv2.imread('polar-part.png'), cv2.COLOR_BGR2GRAY))
im2 = np.float32(cv2.cvtColor(cv2.imread('polar-template.png'), cv2.COLOR_BGR2GRAY))
print(cv2.phaseCorrelate(im1, im2))
Which produces the incorrect answer
((-0.07302320870314816, -0.19596856380076133), 0.03418491033860195)
In Scikit, I do
template_polar = rgb2gray(imread('polar-template.png'))
up_cam_polar = rgb2gray(imread('polar-part.png'))
print(phase_cross_correlation(up_cam_polar, template_polar, upsample_factor=20))
which produces the correct answer of
(array([ 1.3625e+02, -5.0000e-02]), 0.2080647049014251, 2.6434620698588315e-07)
The import number here is the y shift, which is about 136. This is the correct number of pixels to translate one image onto the other.
Why does OpenCV give back a drastically different answer?
Best I can tell is as follows from the documentation.
Skimage is returning the correct offsets for your images:
Returns the (y,x) shifts and the normalized rms error.
OpenCV is returning incorrect offsets for your images (phase correlation can be very sensitive to noise and non-cyclic images). Your only available argument would be to adjust the windowing:
Returns the (x,y) shifts and the sum of the elements of the correlation (r) within the 5x5 centroid around the peak location. It is normalized to a maximum of 1
Here is a different example:
Input 1:
Input 2:
Results:
opencv: ((20.20846249610655, 22.459076722413144), 0.5959940360504904)
skimage: (array([-22., -20.]), 0.31940809429051836, -2.0134398e-10)
which shows equivalent shifts. (The signs are opposite depending upon how the two methods assign the reference and target images). The correlation metric values returned are different, due to the different ways that they are computed and normalized.
I have a bunch of images (from the M.C. Escher collection) i want to organize, so first step i had in mind is to group them up, by comparing them (you know, some have different resolutions/shapes, etc).
i wrote a very brutal script to:
* read the files
* compute their histograms
* compare them
but the quality of the comparison is really low, like there are files matching that are absolutely different
take a look at what i wrote so far:
Preparing the histograms
files_hist = {}
for i, f in enumerate(files):
try:
frame = cv2.imread(f)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
hist = cv2.calcHist([frame],[0],None,[4096],[0,4096])
cv2.normalize(hist, hist, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
files_hist[f] = hist
except Exception as e:
print('ERROR:', f, e)
Comparing the histograms
pairs = list(itertools.combinations(files_hist.keys(), 2))
for i, (f1, f2) in enumerate(pairs):
correl = cv2.compareHist(files_hist[f1], files_hist[f2], cv2.HISTCMP_CORREL)
if correl >= 0.999:
print('MATCH:', correl, f1, f2)
now, for example i get a match for these 2 files:
m._c._escher_244_(1933).jpg
and
m._c._escher_208_(1931).jpg
and their correlation, using the code above, is 0.9996699595530539 (so their practically the same :( )
what am i doing wrong? how can i improve that code to avoid this false matches?
thanks!
Histograms are not a good way to compare images, in black and white images, for example, if they have the same amount of black pixels, the histograms will be identical, regardless on the pixels distributions in the image (that is why the images you mentioned are classified as almost equal).
There are better ways to quantify the difference between images, this post mentions a good option:
Load both images as arrays (scipy.misc.imread) and calculate an element-wise (pixel-by-pixel) difference. Calculate the norm of the difference.
edit:
Answering some questions:
I take the zero norm per-pixel is going to be 0.0-1.0 value, with values close to 0.0 meaning "images are the same", correct?
Values close to 0.0 means the pixels are the same. To compare the images as a whole you need to sum over all pixels. If the summed value is close to 0.0 this means the images are almost the same.
what if the 2 image sizes are different?
that's a good one. To calculate the norm difference the images must have the same size. I see two ways to achieve that:
the first would be resizing one of the images to the shape of the other one, the problem is that this can cause distortion in the image.
the second would be padding the smaller image with zeros until the sizes match.
obs: if you sum over the pixel-wise norm you will have a value between zero and the number of pixels in the image. This can be confusing if you are comparing multiple images. For example, suppose you are comparing images A and B and both have shape 50x50 (therefore, the images have 2500 pixels); values close to 2500 mean the images are completely different. Now suppose you are comparing images C and D and both have shape 1000x1000, in this case, values like 2500 would mean the images are similar. To overcome this problem you can divide the pixel-wise sum over the number of pixels in the image, this will result in a value between 0.0 and 1.0, 0.0 meaning the images are the same and 1.0 meaning they are completely different.
yeah here's the error i received when comparing 2 images with different size diff = image1 - image2 ValueError: operands could not be broadcast together with shapes (850,534) (663,650)
This happens because the images have different shapes. Resizing or padding can avoid this error (as mentioned above).
This is how I tried:
(1) use PIL.Image to open the original(say 100*100) and target(say 20*20) image and convert them into np.array;
(2) start from every pixel in the original one as a starting position, crop a 20*20 area and compare every pixel RGB with the target.
(3) If the total difference is under certain given level, then stop and output the specific starting pixel position in the original one.
The problem is, step(3) costs over 10s which is much too long, even step(2) costs over 0.04s and I hope to optimize my program. In both steps I used For to iterate array, is there a more efficient way?
To compare two signals (or images) at different displacements one can use cross-correlation.
If you have the scipy package you can use 2D cross-correlation to measure how similar the two images are when you slide one image over the other.
This example is copied from the correlate2d function:
from scipy import signal
from scipy import misc
lena = misc.lena() - misc.lena().mean()
template = np.copy(lena[235:295, 310:370]) # right eye
template -= template.mean()
lena = lena + np.random.randn(*lena.shape) * 50 # add noise
corr = signal.correlate2d(lena, template, boundary='symm', mode='same')
y, x = np.unravel_index(np.argmax(corr), corr.shape) # find the match
If you don't want to use a toolbox you could implement the cross-correlation yourself.
I have an image from an electron micrograph depicting dense and rare layers in a biological system, as shown below.
The layers in question are in the middle of the image, starting just to near the label "re" and tapering up to the left. I would like to:
1) count the total number of dark/dense and light/rare layers
2) measure the width of each layer, given that the black scale bar in the bottom right is 1 micron long
I've been trying to do this in Python. If I crop the image beforehand so as to only contain parts of a few layers, such the 3 dark and 3 light layers shown here:
I am able to count the number of layers using the code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
from PIL import Image
tap = Image.open("VDtap.png").convert('L')
tap_a = np.array(tap)
tap_g = ndimage.gaussian_filter(tap_a, 1)
tap_norm = (tap_g - tap_g.min())/(float(tap_g.max()) - tap_g.min())
tap_norm[tap_norm < 0.5] = 0
tap_norm[tap_norm >= 0.5] = 1
result = 255 - (tap_norm * 255).astype(np.uint8)
tap_labeled, count = ndimage.label(result)
plt.imshow(tap_labeled)
plt.show()
However, I'm not sure how to incorporate the scale bar and measure the widths of these layers that I have counted. Even worse, when analyzing the entire image so as to include the scale bar I am having trouble even distinguishing the layers from everything else that is going on in the image.
I would really appreciate any insight in tackling this problem. Thanks in advance.
EDIT 1:
I've made a bit of progress on this problem so far. If I crop the image beforehand so as to contain just a bit of the layers, I've been able to use the following code to get at the thicknesses of each layer.
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
from PIL import Image
from skimage.measure import regionprops
tap = Image.open("VDtap.png").convert('L')
tap_a = np.array(tap)
tap_g = ndimage.gaussian_filter(tap_a, 1)
tap_norm = (tap_g - tap_g.min())/(float(tap_g.max()) - tap_g.min())
tap_norm[tap_norm < 0.5] = 0
tap_norm[tap_norm >= 0.5] = 1
result = 255 - (tap_norm * 255).astype(np.uint8)
tap_labeled, count = ndimage.label(result)
props = regionprops(tap_labeled)
ds = np.array([])
for i in xrange(len(props)):
if i==0:
ds = np.append(ds, props[i].bbox[1] - 0)
else:
ds = np.append(ds, props[i].bbox[1] - props[i-1].bbox[3])
ds = np.append(ds, props[i].bbox[3] - props[i].bbox[1])
Essentially, I discovered the Python module skimage, which can take a labeled image array and return the four coordinates of a boundary box for each labeled object; the 1 and [3] positions give the x coordinates of the boundary box, so their difference yields the extent of each layer in the x-dimension. Also, the first part of the for loop (the if-else condition) is used to get the light/rare layers that precede each dark/dense layer, since only the dark layers get labeled by ndimage.label.
Unfortunately this is still not ideal. Firstly, I would like to not have to crop the image beforehand, as I intend to repeat this procedure for many such images. I've considered that perhaps the (rough) periodicity of the layers could be highlighted using some sort of filter, but I'm not sure if such a filter exists? Secondly, the code above really only gives me the relative width of each layer - I still haven't figured out a way to incorporate the scale bar so as to get the actual widths.
I don't want to be a party-pooper, but I think your problem is harder than you first thought. I can't post a working code snippet because there are so many parts of your post that require in depth attention. I have worked in several bio/med labs and this work is usual done with a human to tag specific image points and a computer to calculate distances. That being said, one should probably try to automate =D.
To you, the problem is a simple, yet tedious job, of getting out a ruler and making a few hundred measurements. Perfect for a computer right? Well yes and no. The computer has no idea how to identify any of the bands in the picture and has to be told exactly what its looking for, and that will be tricky.
Identifying the scale bar
What do you know about the scale bars in all your images. Are they always the same number of vertical and horizontal pictures, are they always solid black? Are there always just one bar (what about the solid line for the letter r)? My suggestion is to try a wavelet transform. Imagine the 2d analog to the function
(probably helps to draw this function)
f(x) =
0 if |x| > 1,
1 if |x| <1 && |x| > 0.5
-1 if |x| < 0.5
Then when our wavelet f(x, y) is convolved over the image, the output image will have high values only when it finds the black scale bar. Also the length that I set to 1 can also be tuned for wavelets and that will help you find the scale bar too.
Finding the ridges
I'd solve the above problem first because it seems easier and sets you up for this one. I'd construct another wavelet for this one but just as a preprocessing step. For this wavelet I'd try a 2d 0-sum box function again, but this try to match three (or more) boxes next to each other. Also in addition to the height and width parameters for the box, we need a spacing and tilt angle parameter. You probably don't have to get very close to the actual value, just close enough that the rest of the image blackens out.
Measuring the ridges
There are lots and lots of ways to do this, but let's use our previous step for simplicity. Take your 3 box wavelet answer and it should be centered at the middle ridge and report a box "width" that is the average width of those three ridges it has captured. Probably close enough considering how slowly the widths are changing!
Good hunting!
I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.
I get this code in this post and change for my context
# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
# kNN training
knn = cv2.KNearest()
knn.train(samples,responses)
modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]
for modelImage in modelImages:
# Now loading a template image and searching for similar keypoints
template = cv2.imread(modelImage)
templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
keys = surf.detect(templateg)
keys,desc = surfDescriptorExtractor.compute(templateg, keys)
for h,des in enumerate(desc):
des = np.array(des,np.float32).reshape((1,128))
retval, results, neigh_resp, dists = knn.find_nearest(des,1)
res,dist = int(results[0][0]),dists[0][0]
if dist<0.1: # draw matched keypoints in red color
color = (0,0,255)
else: # draw unmatched in blue color
#print dist
color = (255,0,0)
#Draw matched key points on original image
x,y = kp[res].pt
center = (int(x),int(y))
cv2.circle(img,center,2,color,-1)
#Draw matched key points on template image
x,y = keys[h].pt
center = (int(x),int(y))
cv2.circle(template,center,2,color,-1)
cv2.imshow('img',img)
cv2.imshow('tm',template)
cv2.waitKey(0)
cv2.destroyAllWindows()
My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?
I suggest you to take a look to the earth mover's distance (EMD) between the images.
This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:
robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf
It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.
The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.
I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.
EDIT:
Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).
In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.
This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.
Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).
It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):
#original data, two 2x2 images, normalized
x = rand(2,2)
x/=sum(x)
y = rand(2,2)
y/=sum(y)
#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()
#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)
D = D.flatten()
x = x.flatten()
y = y.flatten()
#COST=sum(F*D)
#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[i,:])-x[i]} for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size
from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)
the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.
The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results
You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.
Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)
If you have a tight deadline or need to get this done and working soon... yikes.
Here's a ton of papers on the topic:
http://scholar.google.com/scholar?q=content+based+image+retrieval
Generally you will need to do a few things:
Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
Cluster / build a model of image distributions
This can involve feature descriptors, image gists, multiple instance learning. etc.
I wrote a program to do something very similar maybe 2 years ago using Python/Cython. Later I rewrote it to Go to get better performance. The base idea comes from findimagedupes IIRC.
It basically computes a "fingerprint" for each image, and then compares these fingerprints to match similar images.
The fingerprint is generated by resizing the image to 160x160, converting it to grayscale, adding some blur, normalizing it, then resizing it to 16x16 monochrome. At the end you have 256 bits of output: that's your fingerprint. This is very easy to do using convert:
convert path[0] -sample 160x160! -modulate 100,0 -blur 3x99 \
-normalize -equalize -sample 16x16 -threshold 50% -monochrome mono:-
(The [0] in path[0] is used to only extract the first frame of animated GIFs; if you're not interested in such images you can just remove it.)
After applying this to 2 images, you will have 2 (256-bit) fingerprints, fp1 and fp2.
The similarity score of these 2 images is then computed by XORing these 2 values and counting the bits set to 1. To do this bit counting, you can use the bitsoncount() function from this answer:
# fp1 and fp2 are stored as lists of 8 (32-bit) integers
score = 0
for n in range(8):
score += bitsoncount(fp1[n] ^ fp2[n])
score will be a number between 0 and 256 indicating how similar your images are. In my application I divide it by 2.56 (normalize to 0-100) and I've found that images with a normalized score of 20 or less are often identical.
If you want to implement this method and use it to compare lots of images, I strongly suggest you use Cython (or just plain C) as much as possible: XORing and bit counting is very slow with pure Python integers.
I'm really sorry but I can't find my Python code anymore. Right now I only have a Go version, but I'm afraid I can't post it here (tightly integrated in some other code, and probably a little ugly as it was my first serious program in Go...).
There's also a very good "find by similarity" function in GQView/Geeqie; its source is here.
For a simpler implementation of Earth Mover's Distance (aka Wasserstein Distance) in Python, you could use Scipy:
from keras.preprocessing.image import load_img, img_to_array
from scipy.stats import wasserstein_distance
import numpy as np
def get_histogram(img):
'''
Get the histogram of an image. For an 8-bit, grayscale image, the
histogram will be a 256 unit vector in which the nth value indicates
the percent of the pixels in the image with the given darkness level.
The histogram's values sum to 1.
'''
h, w = img.shape[:2]
hist = [0.0] * 256
for i in range(h):
for j in range(w):
hist[img[i, j]] += 1
return np.array(hist) / (h * w)
a = img_to_array(load_img('a.jpg', grayscale=True))
b = img_to_array(load_img('b.jpg', grayscale=True))
a_hist = get_histogram(a)
b_hist = get_histogram(b)
dist = wasserstein_distance(a_hist, b_hist)
print(dist)