Related
I am handling a set of data recorded by a 2D detector. Therefore, the data are represented by three arrays: x and y labelling the coordinate of a pixel and intensity storing the measured signal.
For example, a 6x6 grid will give a set of data:
xraw = np.array([0,1,2,3,4,5,0,1,2,3,4,5,...])
yraw = np.array([0,0,0,0,0,0,1,1,1,1,1,1,...])
intensity = np.array([i_00,i_01,i_02,i_03,i_04,i_05,i_10,i_11,...])
Due to various reasons, such as pixel defects, some of the data points are discarded in the raw data. Therefore, xraw, yraw, intensity have a size smaller than 36 (if that's a 6x6 grid), with, say, the point at (2,3) missing.
The intensity data needs further treatment by an element-wise multiplication with another array. This treatment array is from theoretical calculation and so it has a size of nxn (6x6 in this case). However, as some of the points in the true data are missing, the two arrays have different sizes.
I can use a loop to check for the missing points and eliminate the corresponding element in the treatment array. I wonder if there are some methods in numpy that take care of such operations. Thanks
First, construct the indices of available and all possible pixel positions by
avail_ind = yraw * h + xraw
all_ind = np.arange(0, h * w)
where h and w is the image's height and width in pixels.
Then, find the indices of the missing pixels by
missing_ind = all_ind[~np.in1d(all_ind, avail_ind)]
Once having the missing indices, use np.delete to construct a copy of the treatment_array with elements at the indices removed, then simply multiply that with your intensity array.
result = intensity * np.delete(treatment_array, missing_ind)
I have a dataset existing of a bunch of png files with different sizes regarding their heights and widths.
I read in this files with the following code to get a numpy array. In this case, it is 2D. But actually I want to get a 3D array which exists of the number of images n, the height of the images h and the width w.
import os.path
import glob
import numpy as np
def open_images(images_directory):
pattern_to_match = os.path.join(images_directory, "*.png")
png_files = (x for x in glob.iglob(pattern_to_match)
if os.path.isfile(x))
for current_png_filename in png_files:
print("Opening file", current_png_filename)
with open(current_png_filename, "rb") as current_png_file:
data = current_png_file.read()
return np.frombuffer(data, dtype=np.uint8, offset=16)\
.reshape(-1, 3)\
.astype(np.float32)
pass
directory_to_search = r"C:\Users\tobis\OneDrive\Desktop\Masterarbeit\data\2017-IWT4S-HDR_LP-dataset\crop_h1"
open_images(directory_to_search)
At the moment, I get an array with a shpae like this:
(21559, 3). I think the first number is a combination of width and height and the last is the RGB value. I would like to get an array that looks like this one: (n, h, w).
Is there a way to get such an array? Unfortunately, I have two unknown dimensions. This seems to be the problem...
You can't just read an image file like that. You need to use a library to read it and interpret the height, width, colourspace, bits per pixel, date, the GPS data, the camera make and model and all the compressed, encoded pixels.
For example, with PIL/Pillow:
from PIL import Image
import numpy as np
# Open image and make sure it is RGB - not palette
im = Image.open('image.png').convert('RGB')
# Make into Numpy array
na = np.array(im)
# Check shape
print(na.shape) # prints (480,640,3) for height, width, channels
If you have a flattened image and would like to recover the original row and column dimensions you can apply a heuristic that tests various possible combinations and checks the "smoothness" of the image along the row axis. This smoothness can be checked via the mean squared error of consecutive rows for example. This assumes that the original image has some kind of structure, also along the row axis, and so the change in consecutive pixels in the original image will be relatively small when compare to other possible shapes.
For example let's say the original image is 155 x 79 pixels and it has been flattened into an array of 155 * 79 == 12245. The prime factorization of this is 5, 31, 79. So the possible row dimensions are all unique combinations of these prime factors, i.e. 5, 31, 79, 155, 395, 2449. Now these possible row dimensions, in the following referred to as estimates, can be sorted into two different categories:
Estimates which are a divisor of the original row dimension: 5, 31 and 155. This means effectively that multiple row-skipped copies of the original image are stacked next to each other. So the resulting image will retain the original column grouping. Since similar columns remain together each element of the stack will have roughly the same smoothness. For example if the estimate is 31 this means that the original shape 31 x 5 , 79 is transformed to 31 , 5 x 79, i.e. only every 5-th row of the original image is considered and five such copies are stacked next to each other. For the original image (i.e. an estimate of 155) length-1 correlations are considered (i.e. each pair of consecutive rows is compared), while for an estimate of 31 length-5 correlations are considered (i.e. comparing row-pairs that have another 4 rows between them). Since the original image is expected to have some smooth structure, the smoothness should decrease when longer ranges are compared. The decrease in smoothness will be bigger when the skip-range increases, but it can also completely vanish if the image contains some degree of periodicity along the row axis.
All other estimates: 79, 155, 395, 2449. For estimates of this category different columns of the original image are mixed in the test image corresponding to the estimate. For example if the estimate is 79 we have 155 % 79 == 76, i.e. each new row in the test image shifts the original columns by 3 with respect to the previous row. Assuming that the original image varies along the column dimension these shifts will introduce an increasingly strong deviation for the emerging consecutive rows. Since this column shift increases from row to row the resulting decrease in row-smoothness should be strong unless the number of rows is small. If the original image is column-periodic with the shift number of the estimate this can lead to a perfect agreement however.
So to summarize, if we compute the smoothness for all row dimension estimates we expect the smoothness to decrease for a wrong estimate and the decrease will be small if the estimate falls in category (1) and bigger if it falls in category (2).
Important: If the images are periodic along either the row or column dimension this can lead to a false estimate.
The implementation needs to cover the following steps:
Compute the prime factorization of the length of the flattened image.
Compute all unique row dimension estimates from combinations of the prime factors.
For each estimate compute the row-smoothness of the resulting test image. For example use the mean squared error of consecutive rows (actually this will be a non-smoothness score).
Find the best estimate from the scores.
Here is some example code for the implementation:
import itertools as it
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
image = np.array(Image.open('example.jpg'))
original_shape = image.shape[:2]
image = image.reshape(-1, 3)
def compute_prime_factors(n):
i = 2
while i <= n:
if n % i == 0:
n //= i
yield i
else:
i += 1
prime_factors = list(compute_prime_factors(len(image)))
combinations = it.chain.from_iterable(it.combinations(prime_factors, r=i) for i in range(1, len(prime_factors)))
row_dims = sorted({np.prod(x) for x in combinations})
def test_row_dim(r):
c = len(image) // r
test = image.reshape(r, c, 3)
return np.mean((test[1:] - test[:-1])**2)
scores = [test_row_dim(r) for r in row_dims]
best_estimate = row_dims[np.argmin(scores)]
fig, ax = plt.subplots()
ax.set(xlabel='row dimension', ylabel='score')
ax.set_xscale('log')
ax.plot(row_dims, scores, '-o', label='Estimations')
ax.plot([best_estimate], [np.min(scores)], '*', ms=12, label=f'Best Estimate ({best_estimate})')
ax.axvline(original_shape[0], label=f'Actual Dim ({original_shape[0]})', color='#2ca02c', zorder=-100, lw=1.5, ls='--')
ax.legend()
plt.figure()
plt.imshow(image.reshape(205, -1, 3)) # second best score
plt.show()
Let's test it on some image (H x W: 410 x 640):
Photo by Cameron Venti on Unsplash
This produces the following estimate scores:
The peaks to the left of the best estimate are the category (1) estimates that have the smallest row-skip. The prime factorization of 410 and 640 is 2*5*41 and 2**7 * 5 respectively. So the category (1) estimates that get closest to the original row dimension are 205, 82 and 41 (the side peaks from right to left). A decreasing estimate implies an increasing row-skip range and hence an increasing MSE score. The peak to the left of the best estimate corresponds to an estimate of 205, i.e. each second row gets skipped and hence two such row-skipped versions are stacked next to each other:
As you can imagine, by skipping every second row, the image doesn't change too much and the change is the same for the two side-by-side versions. Hence the small difference to the original image's MSE score.
I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1
The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.
Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.
I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?
With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.
I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.
I get this code in this post and change for my context
# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
# kNN training
knn = cv2.KNearest()
knn.train(samples,responses)
modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]
for modelImage in modelImages:
# Now loading a template image and searching for similar keypoints
template = cv2.imread(modelImage)
templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
keys = surf.detect(templateg)
keys,desc = surfDescriptorExtractor.compute(templateg, keys)
for h,des in enumerate(desc):
des = np.array(des,np.float32).reshape((1,128))
retval, results, neigh_resp, dists = knn.find_nearest(des,1)
res,dist = int(results[0][0]),dists[0][0]
if dist<0.1: # draw matched keypoints in red color
color = (0,0,255)
else: # draw unmatched in blue color
#print dist
color = (255,0,0)
#Draw matched key points on original image
x,y = kp[res].pt
center = (int(x),int(y))
cv2.circle(img,center,2,color,-1)
#Draw matched key points on template image
x,y = keys[h].pt
center = (int(x),int(y))
cv2.circle(template,center,2,color,-1)
cv2.imshow('img',img)
cv2.imshow('tm',template)
cv2.waitKey(0)
cv2.destroyAllWindows()
My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?
I suggest you to take a look to the earth mover's distance (EMD) between the images.
This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:
robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf
It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.
The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.
I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.
EDIT:
Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).
In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.
This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.
Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).
It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):
#original data, two 2x2 images, normalized
x = rand(2,2)
x/=sum(x)
y = rand(2,2)
y/=sum(y)
#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()
#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)
D = D.flatten()
x = x.flatten()
y = y.flatten()
#COST=sum(F*D)
#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[i,:])-x[i]} for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size
from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)
the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.
The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results
You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.
Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)
If you have a tight deadline or need to get this done and working soon... yikes.
Here's a ton of papers on the topic:
http://scholar.google.com/scholar?q=content+based+image+retrieval
Generally you will need to do a few things:
Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
Cluster / build a model of image distributions
This can involve feature descriptors, image gists, multiple instance learning. etc.
I wrote a program to do something very similar maybe 2 years ago using Python/Cython. Later I rewrote it to Go to get better performance. The base idea comes from findimagedupes IIRC.
It basically computes a "fingerprint" for each image, and then compares these fingerprints to match similar images.
The fingerprint is generated by resizing the image to 160x160, converting it to grayscale, adding some blur, normalizing it, then resizing it to 16x16 monochrome. At the end you have 256 bits of output: that's your fingerprint. This is very easy to do using convert:
convert path[0] -sample 160x160! -modulate 100,0 -blur 3x99 \
-normalize -equalize -sample 16x16 -threshold 50% -monochrome mono:-
(The [0] in path[0] is used to only extract the first frame of animated GIFs; if you're not interested in such images you can just remove it.)
After applying this to 2 images, you will have 2 (256-bit) fingerprints, fp1 and fp2.
The similarity score of these 2 images is then computed by XORing these 2 values and counting the bits set to 1. To do this bit counting, you can use the bitsoncount() function from this answer:
# fp1 and fp2 are stored as lists of 8 (32-bit) integers
score = 0
for n in range(8):
score += bitsoncount(fp1[n] ^ fp2[n])
score will be a number between 0 and 256 indicating how similar your images are. In my application I divide it by 2.56 (normalize to 0-100) and I've found that images with a normalized score of 20 or less are often identical.
If you want to implement this method and use it to compare lots of images, I strongly suggest you use Cython (or just plain C) as much as possible: XORing and bit counting is very slow with pure Python integers.
I'm really sorry but I can't find my Python code anymore. Right now I only have a Go version, but I'm afraid I can't post it here (tightly integrated in some other code, and probably a little ugly as it was my first serious program in Go...).
There's also a very good "find by similarity" function in GQView/Geeqie; its source is here.
For a simpler implementation of Earth Mover's Distance (aka Wasserstein Distance) in Python, you could use Scipy:
from keras.preprocessing.image import load_img, img_to_array
from scipy.stats import wasserstein_distance
import numpy as np
def get_histogram(img):
'''
Get the histogram of an image. For an 8-bit, grayscale image, the
histogram will be a 256 unit vector in which the nth value indicates
the percent of the pixels in the image with the given darkness level.
The histogram's values sum to 1.
'''
h, w = img.shape[:2]
hist = [0.0] * 256
for i in range(h):
for j in range(w):
hist[img[i, j]] += 1
return np.array(hist) / (h * w)
a = img_to_array(load_img('a.jpg', grayscale=True))
b = img_to_array(load_img('b.jpg', grayscale=True))
a_hist = get_histogram(a)
b_hist = get_histogram(b)
dist = wasserstein_distance(a_hist, b_hist)
print(dist)