How can I find specific points in binary images? - python

I'm starting a research at the university, with the theme of enabling the use of AI to calculate a region of the retina. The first part we stipulated was to segment two important parts of the retina, using u-net. The second is to use the result of segmentation to find the important points and perform a calculation.
So, in the below image, I show the output of segmentation each region, using u-net (The red annotation isn't part of segmentation). I tried represent regions that I want find in first and second block. Once did, I could calculate distance between these points, when merge them.
So, my question is: what kind of technique I can use to read the pixels in order of to find the coordinates where I marked?
Is OpenCV a lib that could help me? It's the first time that I handle with this kind of problem, so thanks for any suggestion or guidance.

Using OpenCV:
detecting the margins can be done with connectedComponentsWithStats() method.
connectivity = 8
output = cv2.connectedComponentsWithStats(binary_img, connectivity, cv2.CV_32S)
stats = output[2] # stat matrix
first_blob_left = stats[0,cv2.CC_STAT_LEFT]
first_blob_right = stats[0,cv2.CC_STAT_RIGHT]
second_blob_left = stats[1,cv2.CC_STAT_LEFT]
second_blob_right = stats[1,cv2.CC_STAT_RIGHT]
if first_blob_left < second_blob_left:
dist = second_blob_left - first_blob_right
else:
dist = first_blob_left - second_blob_right
detecting the deepest point can be done with the same method:
connectivity = 8
output = cv2.connectedComponentsWithStats(binary_img, connectivity, cv2.CV_32S)
stats = output[2] # stat matrix
blob_top = stats[0,cv2.CC_STAT_TOP]
blob_height = stats[0,cv2.CC_STAT_HEIGHT]
deepest_point_y_position = blob_top + blob_height
Note: This code hasn't been tested, it may contains some typos. But still, the idea stays the same, and should work without much effort.
Take a look at
labels = output[1]
"labels" is an array the site of the input image, where each pixel of a blob is labelled with the same value. This should help you to find the coordinates of the margins

Related

How to judge if an image is part of another one in Python?

This is how I tried:
(1) use PIL.Image to open the original(say 100*100) and target(say 20*20) image and convert them into np.array;
(2) start from every pixel in the original one as a starting position, crop a 20*20 area and compare every pixel RGB with the target.
(3) If the total difference is under certain given level, then stop and output the specific starting pixel position in the original one.
The problem is, step(3) costs over 10s which is much too long, even step(2) costs over 0.04s and I hope to optimize my program. In both steps I used For to iterate array, is there a more efficient way?
To compare two signals (or images) at different displacements one can use cross-correlation.
If you have the scipy package you can use 2D cross-correlation to measure how similar the two images are when you slide one image over the other.
This example is copied from the correlate2d function:
from scipy import signal
from scipy import misc
lena = misc.lena() - misc.lena().mean()
template = np.copy(lena[235:295, 310:370]) # right eye
template -= template.mean()
lena = lena + np.random.randn(*lena.shape) * 50 # add noise
corr = signal.correlate2d(lena, template, boundary='symm', mode='same')
y, x = np.unravel_index(np.argmax(corr), corr.shape) # find the match
If you don't want to use a toolbox you could implement the cross-correlation yourself.

How can I extract this obvious event from this image?

EDIT: I have found a solution :D thanks for the help.
I've created an image processing algorithm which extracts this image from the data. It's complex, so I won't go into detail, but this image is essentially a giant numpy array (it's visualizing angular dependence of pixel intensity of an object).
I want to write a program which automatically determines when the curves switch direction. I have the data and I also have this image, but it turns out doing something meaningful with either has been tricky. Thresholding fails because there are bands of different background color. Sobel operators and Hough Transforms also do not work well for this same reason.
This is really easy for humans to see when this switch happens, but not so easy to tell a computer. Any tips? Thanks!
Edit: Thanks all, I'm now fitting lines to this image after convolution with general gaussian and skeletonization of the result. Any pointers on doing this would be appreciated :)
You can take a weighted dot product of successive columns to get a one-dimensional signal that is much easier to work with. You might be able to extract the patterns using this signal:
import numpy as np
A = np.loadtxt("img.txt")
N = A.shape[0]
L = np.logspace(1,2,N)
X = []
for c0,c1 in zip(A.T, A.T[1:]):
x = c0.dot(c1*L) / (np.linalg.norm(c0)*np.linalg.norm(c1))
X.append(x)
X = np.array(X)
import pylab as plt
plt.matshow(A,alpha=.5)
plt.plot(X*3-X.mean(),'k',lw=2)
plt.axis('tight')
plt.show()
This is absolutely not a complete answer to the question, but a useful observation that is too long for a comment. I'll delete if a better answer comes along.
With the help of Mark McCurry, I was able to get a good result.
Step 1: Load original image. Remove background by subtracting median of each vertical column from itself.
no_background=[]
for i in range(num_frames):
no_background.append(orig[:,i]-np.median(orig,1))
no_background=np.array(no_background).T
Step 2: Change negative values to 0.
clipped_background = no_background.clip(min=0)
Step 3: Extract a 1D signal. Take weighted sum of the vertical columns, which relates the max intensity in a column to its position.
def exp_func(x):
return np.dot(np.arange(len(x)), np.power(x, 10))/(np.sum(np.power(x, 10)))
weighted_sum = np.apply_along_axis(exp_func,0, clipped_background)
Step 4: Take the derivative of 1D signal.
conv = np.convolve([-1.,1],weighted_sum, mode='same')
pl.plot(conv)
Step 5: Determine when the derivative changes sign.
signs=np.sign(conv)
pl.plot(signs)
pl.ylim(-1.2,1.2)
Step 6: Apply median filter to above signal.
filtered_signs=median_filter(signs, 5) #pick window size based on result. second arg and odd number.
pl.plot(filtered_signs)
pl.ylim(-1.2,1.2)
Step 7: Find the indices (frame locations) of when the sign switches. Plot result.
def sign_switch(oneDarray):
inds=[]
for ind in range(len(oneDarray)-1):
if (oneDarray[ind]<0 and oneDarray[ind+1]>0) or (oneDarray[ind]>0 and oneDarray[ind+1]<0):
inds.append(ind)
return np.array(inds)
switched_frames = sign_switch(filtered_signs)
For detecting tip positions or turning points, you might try using a corner detector on the original image (not the skeletonized one). As a corner detector the structure tensor could be applicable. The structure tensor is also useful for calculating the local orientation in an image.

Python Image Processing: Measuring Layer Widths from Electron Micrograph

I have an image from an electron micrograph depicting dense and rare layers in a biological system, as shown below.
The layers in question are in the middle of the image, starting just to near the label "re" and tapering up to the left. I would like to:
1) count the total number of dark/dense and light/rare layers
2) measure the width of each layer, given that the black scale bar in the bottom right is 1 micron long
I've been trying to do this in Python. If I crop the image beforehand so as to only contain parts of a few layers, such the 3 dark and 3 light layers shown here:
I am able to count the number of layers using the code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
from PIL import Image
tap = Image.open("VDtap.png").convert('L')
tap_a = np.array(tap)
tap_g = ndimage.gaussian_filter(tap_a, 1)
tap_norm = (tap_g - tap_g.min())/(float(tap_g.max()) - tap_g.min())
tap_norm[tap_norm < 0.5] = 0
tap_norm[tap_norm >= 0.5] = 1
result = 255 - (tap_norm * 255).astype(np.uint8)
tap_labeled, count = ndimage.label(result)
plt.imshow(tap_labeled)
plt.show()
However, I'm not sure how to incorporate the scale bar and measure the widths of these layers that I have counted. Even worse, when analyzing the entire image so as to include the scale bar I am having trouble even distinguishing the layers from everything else that is going on in the image.
I would really appreciate any insight in tackling this problem. Thanks in advance.
EDIT 1:
I've made a bit of progress on this problem so far. If I crop the image beforehand so as to contain just a bit of the layers, I've been able to use the following code to get at the thicknesses of each layer.
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
from PIL import Image
from skimage.measure import regionprops
tap = Image.open("VDtap.png").convert('L')
tap_a = np.array(tap)
tap_g = ndimage.gaussian_filter(tap_a, 1)
tap_norm = (tap_g - tap_g.min())/(float(tap_g.max()) - tap_g.min())
tap_norm[tap_norm < 0.5] = 0
tap_norm[tap_norm >= 0.5] = 1
result = 255 - (tap_norm * 255).astype(np.uint8)
tap_labeled, count = ndimage.label(result)
props = regionprops(tap_labeled)
ds = np.array([])
for i in xrange(len(props)):
if i==0:
ds = np.append(ds, props[i].bbox[1] - 0)
else:
ds = np.append(ds, props[i].bbox[1] - props[i-1].bbox[3])
ds = np.append(ds, props[i].bbox[3] - props[i].bbox[1])
Essentially, I discovered the Python module skimage, which can take a labeled image array and return the four coordinates of a boundary box for each labeled object; the 1 and [3] positions give the x coordinates of the boundary box, so their difference yields the extent of each layer in the x-dimension. Also, the first part of the for loop (the if-else condition) is used to get the light/rare layers that precede each dark/dense layer, since only the dark layers get labeled by ndimage.label.
Unfortunately this is still not ideal. Firstly, I would like to not have to crop the image beforehand, as I intend to repeat this procedure for many such images. I've considered that perhaps the (rough) periodicity of the layers could be highlighted using some sort of filter, but I'm not sure if such a filter exists? Secondly, the code above really only gives me the relative width of each layer - I still haven't figured out a way to incorporate the scale bar so as to get the actual widths.
I don't want to be a party-pooper, but I think your problem is harder than you first thought. I can't post a working code snippet because there are so many parts of your post that require in depth attention. I have worked in several bio/med labs and this work is usual done with a human to tag specific image points and a computer to calculate distances. That being said, one should probably try to automate =D.
To you, the problem is a simple, yet tedious job, of getting out a ruler and making a few hundred measurements. Perfect for a computer right? Well yes and no. The computer has no idea how to identify any of the bands in the picture and has to be told exactly what its looking for, and that will be tricky.
Identifying the scale bar
What do you know about the scale bars in all your images. Are they always the same number of vertical and horizontal pictures, are they always solid black? Are there always just one bar (what about the solid line for the letter r)? My suggestion is to try a wavelet transform. Imagine the 2d analog to the function
(probably helps to draw this function)
f(x) =
0 if |x| > 1,
1 if |x| <1 && |x| > 0.5
-1 if |x| < 0.5
Then when our wavelet f(x, y) is convolved over the image, the output image will have high values only when it finds the black scale bar. Also the length that I set to 1 can also be tuned for wavelets and that will help you find the scale bar too.
Finding the ridges
I'd solve the above problem first because it seems easier and sets you up for this one. I'd construct another wavelet for this one but just as a preprocessing step. For this wavelet I'd try a 2d 0-sum box function again, but this try to match three (or more) boxes next to each other. Also in addition to the height and width parameters for the box, we need a spacing and tilt angle parameter. You probably don't have to get very close to the actual value, just close enough that the rest of the image blackens out.
Measuring the ridges
There are lots and lots of ways to do this, but let's use our previous step for simplicity. Take your 3 box wavelet answer and it should be centered at the middle ridge and report a box "width" that is the average width of those three ridges it has captured. Probably close enough considering how slowly the widths are changing!
Good hunting!

Rudimentary Computer Vision Techniques for Python Bot.

After completion several chapters in computer vision books I decided to apply those methods to create some primitive bot for a game. I chose Fling that has almost no dynamics and all I needed to do was to find balls. Balls may have 5 different colors and also they can be directed to any of 4 directions (depending on eyes' location). I cropped each block in the field such that I can just check each block whether it contains a ball or not. My problem is that I'm not able to find balls correctly.
My first attempt was following. I sum RGB colors for each ball and get [R, G, B] array. Then I sum RGB colors for each block in the field. If block's array has a similar [R, G, B] as ball's array I suggest that this block has a ball.
The problem is it's hard to find good value for 'similarity'. Even different empty blocks vary in such sums significantly.
Second, I tried to use openCV module that has matchTemplate function. This function matches image with another source image and along with minMaxLoc function returns a value maxLoc. If maxLoc is close to 1 then the image is probably in source image. I made all possible variations of balls (20 overall), and passed them with the entire field. This function worked well but unfortunately it sometimes misses some balls in the field or assigns two different types of balls (say green and yellow) for one ball. I tried to improve the process by matching balls not with the entire field but with each block (this method has advantage that it checks each block and should detect correct number of balls in the field, when matching with entire field only gives one location for each color of ball. If there are two balls of the same color matchTemplate loses information about 2nd ball) . Surprisingly it still has false negatives\positives.
Probably there is much easier way to solve this problem (maybe a library that I don't know yet) but for now I can't find one. Any suggestions are welcomed.
The balls seem pretty distinct in terms of colour. The problems you initially described seem to be related to some of the finer, random detail present in the image - especially in the background and in the different shading/poses of the ball.
On this basis, I would say you could simplify the task significantly by applying a set of pre-processing steps to "collapse" the range of colours in the image.
There are any number of more principled ways to achieving accurate colour segmentation (which is what, more formally, you want to achieve) - but taking a more pragmatic view, here are a few quick'n'dirty hacks.
So, for example, we can initially smooth the image to reduce higher frequency components...
Then, convert to a normalised RGB representation...
Before, finally posterizing it with the mean shift filtering step...
Here is the code in Python, using the OpenCV bindings, that does all this in order:
import cv
# get orginal image
orig = cv.LoadImage('fling.png')
# show original
cv.ShowImage("orig", orig)
# blur a bit to remove higher frequency variation
cv.Smooth(orig,orig,cv.CV_GAUSSIAN,5,5)
# normalise RGB
norm = cv.CreateImage(cv.GetSize(orig), 8, 3)
red = cv.CreateImage(cv.GetSize(orig), 8, 1)
grn = cv.CreateImage(cv.GetSize(orig), 8, 1)
blu = cv.CreateImage(cv.GetSize(orig), 8, 1)
total = cv.CreateImage(cv.GetSize(orig), 8, 1)
cv.Split(orig,red,grn,blu,None)
cv.Add(red,grn,total)
cv.Add(blu,total,total)
cv.Div(red,total,red,255.0)
cv.Div(grn,total,grn,255.0)
cv.Div(blu,total,blu,255.0)
cv.Merge(red,grn,blu,None,norm)
cv.ShowImage("norm", norm)
# posterize simply with mean shift filtering
post = cv.CreateImage(cv.GetSize(orig), 8, 3)
cv.PyrMeanShiftFiltering(norm,post,20,30)
cv.ShowImage("post", post)
Your task is simpler in several respects than the ones the general computer vision algorithms you'll find were designed for: you know exactly what to look for and you know exactly where to look for it. As such I think involving an external library is an unnecessary complication, unless you're already familiar with it and can use it effectively as a tool to solve your own problem. In this post I will only use PIL.
First, distinguish the task into two simpler tasks:
Given a tile, determine whether there's a ball there.
Given a tile where we're pretty sure that there's a ball, identify the colour of the ball.
The second task should be simple and I won't spend time on it here. Basically, sample some pixels where the ball's main colour will be visible and compare the colours you find to the known ball colours.
So let's look at the first task.
First off, note that the balls don't extend to the edge of the tiles. Thus you can find a fairly representative sample of the background of a tile, whether or not there's a ball there, by sampling the pixels along the edge of the tile.
A simple way to proceed is to compare every pixel in a tile with this sample of the tile background, and to obtain some sort of measure of whether it's generally similar (no ball) or dissimilar (ball).
The following is one way to do this. The basic approach used here is to calculate the mean and the standard deviation of the background pixels -- separately for the red, green, and blue channels. For every pixel, we then calculate the number of standard deviations we are from the mean in every channel. We take this value for the most dissimilar channel as our measure of dissimilarity.
import Image
import math
def fetch_pixels(col, row):
img = Image.open( "image.png" )
img = img.crop( (col*32,row*32,(col+1)*32,(row+1)*32) )
return img.load()
def border_pixels( a ):
rv = [ a[x,y] for x in range(32) for y in (0,31) ]
rv.extend( [ a[x,y] for x in (0,31) for y in range(1,31) ] )
return rv
def mean_and_stddev( xs ):
mean = float(sum( xs )) / len(xs)
dev = math.sqrt( float(sum( [ (x-mean)**2 for x in xs ] )) / len(xs) )
return mean, dev
def calculate_deviations(cols = 7, rows = 8):
outimg = Image.new( "L", (cols*32,rows*32) )
pixels = outimg.load()
for col in range(cols):
for row in range(rows):
rv = calculate_deviations_for( col, row, pixels )
print rv
outimg.save( "image_output.png" )
def calculate_deviations_for( col, row, opixels ):
a = fetch_pixels( col, row )
border = border_pixels( a )
bru, brd = mean_and_stddev( map( lambda x : x[0], border ) )
bgu, bgd = mean_and_stddev( map( lambda x : x[1], border ) )
bbu, bbd = mean_and_stddev( map( lambda x : x[2], border ) )
rv = []
for y in range(32):
for x in range(32):
r, g, b = a[x,y]
dr = (bru-r) / brd
dg = (bgu-g) / bgd
db = (bbu-b) / bbd
t = max(abs(dr), abs(dg), abs(db))
opixel = 0
limit, span = 2.5, 8.0
if t > limit:
v = min(1.0, (t - limit) / span)
print t,v
opixel = 127 + int( 128 * v )
opixels[col*32+x,row*32+y] = opixel
rv.append( t )
return (sum(rv) / float(len(rv)))
A visualization of the result is here:
Note that most of the non-ball pixels are pure black. It should now be possible to determine whether a ball is present or not by simply counting the black pixels. (Or more reliably: count the size of the largest single blob of non-black pixels.)
Now, this is a very ad-hoc method and I certainly don't make any claim that it's the best method. The "limit" value was determined by experimentation -- essentially, by trial and error. It's included here to illustrate the sort of method I think you should be exploring, and to give you a starting point to tweak from. (If you want a place to start experimenting, you could try to make it give a better result for the top purple ball. Can you think of weaknesses in the approach above that might make it give a result like that? Always keep in mind, however, that you don't need a perfect-looking result, just one that's good enough. The final answer you want is "ball" or "no ball", and you just want to be able to answer that reliably.)
Note that:
You need to make sure you take the screengrab when the balls have finished rolling and are lying still in the center of their tiles. This simplifies the problem immensely.
The game's background affects the problem -- if there are ocean-themed or desert-themed levels coming up, you will need to test and possibly tweak the recognizer to make sure it still reliably works.
Special effects and/or GUI elements that cover the playing field will complicate the problem. (E.g. consider if the game has a 'cloud' or 'smoke' effect that sometimes floats over the playing field.) You may want to tweak the recognizer to be able to return "no result" if it's not sure -- then you can try another screengrab later. You may want to take several screengrabs and average the results.
I have assumed that there are only balls and non-balls. If later levels have other kinds of objects, you will have to experiment more to find out how to best recognize those.
I haven't used the 'reference picture' approach. However, if you have an image containing all the objects in the game and you can exactly align the pixels with your tiles, that's likely going to be the most reliable approach. Instead of comparing the foreground to the sampled background, compare the foreground to a set of known foreground images.

Compare similarity of images using OpenCV with Python

I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.
I get this code in this post and change for my context
# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
# kNN training
knn = cv2.KNearest()
knn.train(samples,responses)
modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]
for modelImage in modelImages:
# Now loading a template image and searching for similar keypoints
template = cv2.imread(modelImage)
templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
keys = surf.detect(templateg)
keys,desc = surfDescriptorExtractor.compute(templateg, keys)
for h,des in enumerate(desc):
des = np.array(des,np.float32).reshape((1,128))
retval, results, neigh_resp, dists = knn.find_nearest(des,1)
res,dist = int(results[0][0]),dists[0][0]
if dist<0.1: # draw matched keypoints in red color
color = (0,0,255)
else: # draw unmatched in blue color
#print dist
color = (255,0,0)
#Draw matched key points on original image
x,y = kp[res].pt
center = (int(x),int(y))
cv2.circle(img,center,2,color,-1)
#Draw matched key points on template image
x,y = keys[h].pt
center = (int(x),int(y))
cv2.circle(template,center,2,color,-1)
cv2.imshow('img',img)
cv2.imshow('tm',template)
cv2.waitKey(0)
cv2.destroyAllWindows()
My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?
I suggest you to take a look to the earth mover's distance (EMD) between the images.
This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:
robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf
It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.
The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.
I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.
EDIT:
Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).
In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.
This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.
Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).
It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):
#original data, two 2x2 images, normalized
x = rand(2,2)
x/=sum(x)
y = rand(2,2)
y/=sum(y)
#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()
#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)
D = D.flatten()
x = x.flatten()
y = y.flatten()
#COST=sum(F*D)
#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[i,:])-x[i]} for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size
from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)
the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.
The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results
You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.
Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)
If you have a tight deadline or need to get this done and working soon... yikes.
Here's a ton of papers on the topic:
http://scholar.google.com/scholar?q=content+based+image+retrieval
Generally you will need to do a few things:
Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
Cluster / build a model of image distributions
This can involve feature descriptors, image gists, multiple instance learning. etc.
I wrote a program to do something very similar maybe 2 years ago using Python/Cython. Later I rewrote it to Go to get better performance. The base idea comes from findimagedupes IIRC.
It basically computes a "fingerprint" for each image, and then compares these fingerprints to match similar images.
The fingerprint is generated by resizing the image to 160x160, converting it to grayscale, adding some blur, normalizing it, then resizing it to 16x16 monochrome. At the end you have 256 bits of output: that's your fingerprint. This is very easy to do using convert:
convert path[0] -sample 160x160! -modulate 100,0 -blur 3x99 \
-normalize -equalize -sample 16x16 -threshold 50% -monochrome mono:-
(The [0] in path[0] is used to only extract the first frame of animated GIFs; if you're not interested in such images you can just remove it.)
After applying this to 2 images, you will have 2 (256-bit) fingerprints, fp1 and fp2.
The similarity score of these 2 images is then computed by XORing these 2 values and counting the bits set to 1. To do this bit counting, you can use the bitsoncount() function from this answer:
# fp1 and fp2 are stored as lists of 8 (32-bit) integers
score = 0
for n in range(8):
score += bitsoncount(fp1[n] ^ fp2[n])
score will be a number between 0 and 256 indicating how similar your images are. In my application I divide it by 2.56 (normalize to 0-100) and I've found that images with a normalized score of 20 or less are often identical.
If you want to implement this method and use it to compare lots of images, I strongly suggest you use Cython (or just plain C) as much as possible: XORing and bit counting is very slow with pure Python integers.
I'm really sorry but I can't find my Python code anymore. Right now I only have a Go version, but I'm afraid I can't post it here (tightly integrated in some other code, and probably a little ugly as it was my first serious program in Go...).
There's also a very good "find by similarity" function in GQView/Geeqie; its source is here.
For a simpler implementation of Earth Mover's Distance (aka Wasserstein Distance) in Python, you could use Scipy:
from keras.preprocessing.image import load_img, img_to_array
from scipy.stats import wasserstein_distance
import numpy as np
def get_histogram(img):
'''
Get the histogram of an image. For an 8-bit, grayscale image, the
histogram will be a 256 unit vector in which the nth value indicates
the percent of the pixels in the image with the given darkness level.
The histogram's values sum to 1.
'''
h, w = img.shape[:2]
hist = [0.0] * 256
for i in range(h):
for j in range(w):
hist[img[i, j]] += 1
return np.array(hist) / (h * w)
a = img_to_array(load_img('a.jpg', grayscale=True))
b = img_to_array(load_img('b.jpg', grayscale=True))
a_hist = get_histogram(a)
b_hist = get_histogram(b)
dist = wasserstein_distance(a_hist, b_hist)
print(dist)

Categories

Resources