Calculate and use inverse of OpenCV distortion parameters - python

So basically I want to find out what the invert distortion parameters would be for a calibration I do as showed here.
NOTE: I know that we can perform undistortion and then use remapping to do what I have done below, but my goal is to be able to find out the actual inverted distortion parameters and use them to distort other images, not to just be able to revert what cv2.undistort() does
Overview:
I have tried passing in the negation of the distortion parameters:
# _, mat, distortion, _, _ = cv2.calibrateCamera(...)
# undistorted_image = cv2.undistort(with distortion)
# redistorted_image = cv2.undistort(with np.negative(distortion))
In theory, I was thinking that if the redistorted_image is similar to the original image, then the np.negative(distortion) parameters are what I am looking for, but it turned out to be false.
Actual method I use:
def save_undistorted_images(image, matrix, distortion):
test_image_su = cv.imread(image)
height, width = test_image_su.shape[:2]
new_camera_mtx, roi = cv.getOptimalNewCameraMatrix(matrix, distortion, (width, height), 1, (width, height))
distortion_u = np.negative(distortion)
# unsure if this line helps
new_camera_mtx_inv, roi = cv.getOptimalNewCameraMatrix(matrix, distortion_u, (width, height), 1, (width, height))
# undistort the image
undistorted_image = cv.undistort(test_image_su, matrix, distortion, None, new_camera_mtx)
cv.imwrite('undistorted_frame.png', undistorted_image)
# redistort trying to get something like original image (image_test_su)
distorted_image = cv.undistort(undistorted_image, matrix, distortion_u, None, new_camera_mtx_inv)
cv.imwrite('redistorted_frame.png', distorted_image)
The results:
(left a: original) (right b: undistorted)
(left c: distorted using np.negative(distortion)) (right d: undistorted image redistorted using np.negative(distortion)))
The image d here is basically c performed on b, which I expected would be similar to a
Why is b here overpowering the effect of c?
Other way of calculating inverse that I tried:
The following is my python implementation of this paper
distortion_u = distortion
k1 = distortion_u[0][0]
k2 = distortion_u[0][1]
k3 = distortion_u[0][4]
b1 = -1 * k1
b2 = 3 * (k1 * k1) - k2
b3 = (8 * k1 * k2) + (-1 * (12 * (k1 * k1 * k1))) - k3
# radial:
distortion_u[0][0] = b1
distortion_u[0][1] = b2
distortion_u[0][4] = b3
# tangential:
#distortion_u[0][2] = -1 * distortion_u[0][2]
#distortion_u[0][3] = -1 * distortion_u[0][3]
The results of applying distortion on undistorted image using above distortion parameters is also not good, looks really similar to results above.
So, this brings us to:
Why is the effect of normal distortion always overpowering np.negative(distortion) or anything else?
Does all distortion work this way? (negative values does not equal to opposite effect)
How to get the actually opposite distortion parameters?

Afraid you are doing it wrong. The opencv distortion model you have calibrated computes undistorted and normalized image coordinates from distorted ones. It is a nonlinear model, so inverting it involves solving a system of nonlinear (polynomial) equations.
A closed form (parametric) solution exists AFAIK only for the case of single-parameter pure radial distortion, i.e. when the only nonzero distortion parameter is k1, the coefficient of r^2. In this case the model inversion equation reduces to a cubic equation in r, and you can then express the inverse model using Cardano's formula for the solution of the cubic.
In all other cases one inverts the model numerically, using various algorithms for solving the nonlinear system of equations. OpenCV uses an iterative "false-position" method.
Since you want to use the inverse model to un-distort a set of images (which is the normal use case), you should use initUndistortRectifyMap to calculate the undistortion solution for the image once and for all, and then pass it for every image to remap to actually undistort the images.
If you really need a parametric model for the inverse model, my advice would be to look into approximating the maps returned by initUndistortRectifyMap with a pair of higher order polynomials, or thin-plate splines.

Related

Create an Undistorted Top-Down View of Camera Image

I have a fixed camera mounted on a wall viewing a rectangular lawn at an angle. My goal is to obtain an undistorted, top-down view of the lawn.
I have an image from the camera as a python numpy array which looks like this:
raw camera image
I use an inverse matrix with skimage.transform.warp to correct the image to a top down view:
top down distorted
This works perfectly, however the camera lens introduces barrel distortion.
Seperately, I can correct the distortion with a generated lookup table using skimage.transform.warp_coords and passing a simple undistort callable function based on the algorithm described here.
The image is then generated using scipy.ndimage.map_coordinates:
undistorted camera view
These 2 processes work individually, but how do I combine them to create an undistorted top-down view, without creating an intermediate image?
I could run each point in the lookup table through the matrix to create a new table, but the table is massive and memory is tight (Raspberry Pi Zero).
I would like to define the undistortion as a matrix and just combine the 2 matrices, but as I understand it, the projective homography matrix is linear but undistortion is non-linear, so this can't be done. I can't use OpenCV due to resource constraints, and the calibration procedure involving multiple chessboard images is impractical. Currently, I calibrate by taking 4 lawn corner points and generate the matrix from them, which works well.
I would have anticipated that this is a common problem in Computer Vision but can't find any suitable solutions.
The barrel distortion is nonlinear, but it is also smooth. This means it can be well approximated by a collection of piecewise linear approximations.
So you do not need a large, per-pixel look-up table of un-distortion displacements. Rather, you can subsample it (or just scale it down), and use bilinear interpolation for in-between pixels.
I have found a solution that appears to work by creating seperate functions for undistort and transformation, then chaining them together.
The skimage source code here has the _apply_mat method for generating a mapping from a matrix. I based my unwarp function on that:
def unwarp(coords, matrix):
coords = np.array(coords, copy=False, ndmin=2)
x, y = np.transpose(coords)
src = np.vstack((x, y, np.ones_like(x)))
dst = src.T # matrix.T
# below, we will divide by the last dimension of the homogeneous
# coordinate matrix. In order to avoid division by zero,
# we replace exact zeros in this column with a very small number.
dst[dst[:, 2] == 0, 2] = np.finfo(float).eps
# rescale to homogeneous coordinates
dst[:, :2] /= dst[:, 2:3]
return dst[:, :2]
I created a similar function for undistorting based on Tanner Hellands algorithm:
def undistort(coords, cols, rows, correction_radius, zoom):
half_width = cols / 2
half_height = rows / 2
new_x = coords[:, 0] - half_width
new_y = coords[:, 1] - half_height
distance = np.hypot(new_x, new_y)
r = distance / correction_radius
theta = np.ones_like(r)
# only process non-zero values
np.divide(np.arctan(r), r, out=theta, where=r!=0)
source_x = half_width + theta * new_x * zoom
source_y = half_height + theta * new_y * zoom
result = np.column_stack([source_x, source_y])
return result
The only tricky bit here is the divide where we need to prevent division by zero.
Once we have each lookup table we can chain them together:
def undistort_unwarp(coords):
undistorted = undistort(coords)
both = unwarp(undistorted)
return both
Note that these are the callable functions passed to skimage.transform.warp_coords:
mymap = tf.warp_coords(undistort_unwarp, shape=(rows, cols), dtype=np.int16)
The map can then be passed to the skimage.transform.warp function.
Francesco's answer was helpful, however I needed the full pixel resolution for the transformation, so I used it for the undistort as well, and looked to other ways to reduce the memory consumption.
Each map consumes
rows * cols * bytes-per-item * 2 (x and y)
bytes. The default datatype is float64, which requires 8 bytes-per-item, and the documentation suggests sane choices would be the default or float32 at 4 bytes-per-item. I was able to reduce this to 2 bytes-per-item using int16 with no visible ill effects, but I suspect the spline interpolation is not being used to the full (at all?).
The map is the same for each channel of a colour RGB image. However, when I called warp_coords with shape=(rows, cols, 3) I got 3 duplicate maps, so I created a function to handle colour images by processing each channel separately:
def warp_colour(img_arr, coord_map):
if img_arr.ndim == 3:
# colour
rows, cols, _chans = img_arr.shape
r_arr = tf.warp(img_arr[:, :, 0], inverse_map=coord_map, output_shape=(rows, cols))
g_arr = tf.warp(img_arr[:, :, 1], inverse_map=coord_map, output_shape=(rows, cols))
b_arr = tf.warp(img_arr[:, :, 2], inverse_map=coord_map, output_shape=(rows, cols))
rgb_arr = np.dstack([r_arr, g_arr, b_arr])
else:
# grayscale
rows, cols = img_arr.shape
rgb_arr = tf.warp(img_arr, inverse_map=coord_map, output_shape=(rows, cols))
return rgb_arr
One issue with skimage.transform.warp_coords is that it does not have the map_args dictionary parameter that skimage.transform.warp has. I had to call my unwarp and undistort functions through an intermediate function to add the parameters.

How can I rotate a 2d image using a target image, landmark coordinates, the least squares approach, and a rotation matrix?

I have two 2d images, one is the source image and the other is a target image; I need to rotate the source image to match the target image using python (scikit & numpy). I have 3 landmark coordinates for each image, as follows:
image1_points = [(12,16),(7,4),(25,20)]
image2_points = [(15,22),(1,22),(25,10)]
I believe the following steps are what's needed:
Create rotation matrix using least squares approach using the 3 landmark coordinates
Use the rotation matrix to get theta
Convert theta to degrees (for the angle)
Use the apply_angle method with the angle to rotate the image
I've been trying to use these points and the least squares approach to compute a linear transformation matrix that transforms points from the source to the target image.
I know I need to create a rotation matrix, but having never taken algebra I'm a bit lost. I've done lots of reading, and tried using scipy's built-in procrustes to do an affine transformation below (which may be all wrong).
m1, m2, d = scipy.spatial.procrustes(target_points, source_points)
a = np.dot(m1.T, m2, out=None) / norm(m1)**2
#separate x and y for the sake of convenience
ref_x = m2[::2]
ref_y = m2[1::2]
x = m1[::2]
y = m1[1::2]
b = np.sum(x*ref_y - ref_x*y) / norm(m1)**2
scale = np.sqrt(a**2+b**2)
theta = atan(b / max(a.all(), 10**-10)) #avoid dividing by 0
degrees = cos(radians(theta))
apply_angle(source_img, degrees)
However, this is not giving me the result I would expect. It's giving me a degree around 1, where I would expect a degree around 72. I suspect that the degree is what's needed to rotate the image as the angle parameter.
Any help would be hugely appreciated. Thank you!

Manually wirting code for warpAffine in python

I want to implement affine transformation by not using library functions.
I have an image named "transformed" and I want to apply inverse transformation to obtain "img_org" image. Right now, I am using my own basic GetBilinearPixel function to set the intensity value. But, the image is not transforming properly.This is what I came up with. :
This is image("transformed.png"):
This is image("img_org.png"):
But My goal is to produce this image:
You can see the transformation matrix here:
pts1 = np.float32( [[693,349] , [605,331] , [445,59]] )
pts2 = np.float32 ( [[1379,895] , [1213,970] ,[684,428]] )
Mat = cv2.getAffineTransform(pts2,pts1)
B=Mat
code:
img_org=np.zeros(shape=(780,1050))
img_size=np.zeros(shape=(780,1050))
def GetBilinearPixel(imArr, posX, posY):
return imArr[posX][posY]
for i in range(1,img.shape[0]-1):
for j in range(1,img.shape[1]-1):
pos=np.array([[i],[j],[1]],np.float32)
#print pos
pos=np.matmul(B,pos)
r=int(pos[0][0])
c=int(pos[1][0])
#print r,c
if(c<=1024 and r<=768 and c>=0 and r>=0):
img_size[r][c]=img_size[r][c]+1
img_org[r][c] += GetBilinearPixel(img, i, j)
for i in range(0,img_org.shape[0]):
for j in range(0,img_org.shape[1]):
if(img_size[i][j]>0):
img_org[i][j] = img_org[i][j]/img_size[i][j]
Is my logic wrong? I know that i have applied very inefficient algorithm.
Is there any insight that i am missing?
Or can you give me any other algorithm which will work fine.
(Request) . I don't want to use warpAffine function.
So I vectorized the code and this method works---I can't find the exact issue with your implementation, but maybe this will shed some light (plus the speed is way faster).
The setup to vectorize is to create a linear (homogeneous) array containing every point in the image. We want an array that looks like
x0 x1 ... xN x0 x1 ... xN ..... x0 x1 ... xN
y0 y0 ... y0 y1 y1 ... y1 ..... yM yM ... yM
1 1 ... 1 1 1 ... 1 ..... 1 1 ... 1
So that every point (xi, yi, 1) is included. Then transforming is just a single matrix multiplication with your transformation matrix and this array.
To simplify matters (partially because your image naming conventions confused me), I'll say the original starting image is the "destination" or dst because we want to transform back to the "source" or src image. Bearing that in mind, creating this linear homogenous array could look something like this:
dst = cv2.imread('img.jpg', 0)
h, w = dst.shape[:2]
dst_y, dst_x = np.indices((h, w)) # similar to meshgrid/mgrid
dst_lin_homg_pts = np.stack((dst_x.ravel(), dst_y.ravel(), np.ones(dst_y.size)))
Then, to transform the points, just create the transformation matrix and multiply. I'll round the transformed pixel locations because I'm using them as an index and not bothering with interpolation:
src_pts = np.float32([[693, 349], [605, 331], [445, 59]])
dst_pts = np.float32([[1379, 895], [1213, 970], [684, 428]])
transf = cv2.getAffineTransform(dst_pts, src_pts)
src_lin_pts = np.round(transf.dot(dst_lin_homg_pts)).astype(int)
Now this transformation will send some pixels to negative indices, and if we index with those, it'll wrap around the image---probably not what we want to do. Of course in the OpenCV implementation, it just cuts those pixels off completely. But we can just shift all the transformed pixels so that all of the locations are positive and we don't cut off any (you can of course do whatever you want in this regard):
min_x, min_y = np.amin(src_lin_pts, axis=1)
src_lin_pts -= np.array([[min_x], [min_y]])
Then we'll need to create the source image src which the transform maps into. I'll create it with a gray background so we can see the extent of the black from the dst image.
trans_max_x, trans_max_y = np.amax(src_lin_pts, axis=1)
src = np.ones((trans_max_y+1, trans_max_x+1), dtype=np.uint8)*127
Now all we have to do is place some corresponding pixels from the destination image into the source image. Since I didn't cut off any of the pixels and there's the same number of pixels in both linear points array, I can just assign the transformed pixels the color they had in the original image.
src[src_lin_pts[1], src_lin_pts[0]] = dst.ravel()
Now, of course, this isn't interpolating on the image. But there's no built-ins in OpenCV for interpolation (there is backend C functions for other methods to use but not that you can access in Python AFAIK). But, you have the important parts---where the destination image gets mapped to, and the original image, so you can use any number of libraries to interpolate onto that grid. Or just implement a linear interpolation yourself as it's not too difficult. You'll probably want to un-round the warped pixel locations of course before then.
cv2.imshow('src', src)
cv2.waitKey()
Edit: Also this same method will work for warpPerspective too, although your resulting matrix multiplication will give a three-rowed (homogeneous) vector, and you'll need to divide the first two rows by the third row to set them back into Cartesian world. Other than that, everything else stays the same.

Disparity maps with Normalized Cross Correlation using Python

For understanding purposes, I want to implement a stereo algorithm in Python (and Numpy), that computes a disparity map. As image data, I used the Tsukuba image dataset from Middlebury*. For simplicity, I choose normalised cross correlation (NCC)** as the similarity measure to find correspondence pixels. I will assume scanline agreement.
Here my implemented NCC:
left_mu = np.mean(left_patch)
right_mu = np.mean(right_patch)
left_sigma = np.sqrt(np.mean((left_patch - left_mu)**2))
right_sigma = np.sqrt(np.mean((right_patch - right_mu)**2))
patch = left_patch * right_patch
mu = left_mu * right_mu
num = np.mean(patch) - mu
denom = left_sigma * right_sigma
ncc = num/denom
where the left_patch and right_patch are some 3x3 patches from the original images. This outputs integers between -1 and 1, which describes the similarity between two pixels.
The idea is now to find the best-fit pixel. The disparity between the two pixels should now be stored in a new image - the disparity map.
Since I assumed scanline agreement I only have to search in one image row. For each pixel in the row, I want to take the index of the value that maximises the NCC value and store it as the disparity value.
My problem is now, that my results are rather odd. My disparity values are at around 180-200 pixels for an image which is 384x288 pixels. Here the resulting image.
Can you see the mistake in my thinking?
(*) vision.middlebury.edu/stereo/data/scenes2001/data/anigif/orig/tsukuba_o_a.gif
(**) A two-stage correlation method for stereoscopic depth estimation. - N. Einecke and J. Eggert
It seems that you didn't compute the numerator properly. It should be:
num = np.mean( (left_patch - left_mu) * (right_patch - right_mu) )

Compare similarity of images using OpenCV with Python

I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.
I get this code in this post and change for my context
# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
# kNN training
knn = cv2.KNearest()
knn.train(samples,responses)
modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]
for modelImage in modelImages:
# Now loading a template image and searching for similar keypoints
template = cv2.imread(modelImage)
templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
keys = surf.detect(templateg)
keys,desc = surfDescriptorExtractor.compute(templateg, keys)
for h,des in enumerate(desc):
des = np.array(des,np.float32).reshape((1,128))
retval, results, neigh_resp, dists = knn.find_nearest(des,1)
res,dist = int(results[0][0]),dists[0][0]
if dist<0.1: # draw matched keypoints in red color
color = (0,0,255)
else: # draw unmatched in blue color
#print dist
color = (255,0,0)
#Draw matched key points on original image
x,y = kp[res].pt
center = (int(x),int(y))
cv2.circle(img,center,2,color,-1)
#Draw matched key points on template image
x,y = keys[h].pt
center = (int(x),int(y))
cv2.circle(template,center,2,color,-1)
cv2.imshow('img',img)
cv2.imshow('tm',template)
cv2.waitKey(0)
cv2.destroyAllWindows()
My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?
I suggest you to take a look to the earth mover's distance (EMD) between the images.
This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:
robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf
It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.
The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.
I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.
EDIT:
Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).
In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.
This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.
Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).
It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):
#original data, two 2x2 images, normalized
x = rand(2,2)
x/=sum(x)
y = rand(2,2)
y/=sum(y)
#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()
#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)
D = D.flatten()
x = x.flatten()
y = y.flatten()
#COST=sum(F*D)
#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[i,:])-x[i]} for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size
from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)
the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.
The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results
You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.
Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)
If you have a tight deadline or need to get this done and working soon... yikes.
Here's a ton of papers on the topic:
http://scholar.google.com/scholar?q=content+based+image+retrieval
Generally you will need to do a few things:
Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
Cluster / build a model of image distributions
This can involve feature descriptors, image gists, multiple instance learning. etc.
I wrote a program to do something very similar maybe 2 years ago using Python/Cython. Later I rewrote it to Go to get better performance. The base idea comes from findimagedupes IIRC.
It basically computes a "fingerprint" for each image, and then compares these fingerprints to match similar images.
The fingerprint is generated by resizing the image to 160x160, converting it to grayscale, adding some blur, normalizing it, then resizing it to 16x16 monochrome. At the end you have 256 bits of output: that's your fingerprint. This is very easy to do using convert:
convert path[0] -sample 160x160! -modulate 100,0 -blur 3x99 \
-normalize -equalize -sample 16x16 -threshold 50% -monochrome mono:-
(The [0] in path[0] is used to only extract the first frame of animated GIFs; if you're not interested in such images you can just remove it.)
After applying this to 2 images, you will have 2 (256-bit) fingerprints, fp1 and fp2.
The similarity score of these 2 images is then computed by XORing these 2 values and counting the bits set to 1. To do this bit counting, you can use the bitsoncount() function from this answer:
# fp1 and fp2 are stored as lists of 8 (32-bit) integers
score = 0
for n in range(8):
score += bitsoncount(fp1[n] ^ fp2[n])
score will be a number between 0 and 256 indicating how similar your images are. In my application I divide it by 2.56 (normalize to 0-100) and I've found that images with a normalized score of 20 or less are often identical.
If you want to implement this method and use it to compare lots of images, I strongly suggest you use Cython (or just plain C) as much as possible: XORing and bit counting is very slow with pure Python integers.
I'm really sorry but I can't find my Python code anymore. Right now I only have a Go version, but I'm afraid I can't post it here (tightly integrated in some other code, and probably a little ugly as it was my first serious program in Go...).
There's also a very good "find by similarity" function in GQView/Geeqie; its source is here.
For a simpler implementation of Earth Mover's Distance (aka Wasserstein Distance) in Python, you could use Scipy:
from keras.preprocessing.image import load_img, img_to_array
from scipy.stats import wasserstein_distance
import numpy as np
def get_histogram(img):
'''
Get the histogram of an image. For an 8-bit, grayscale image, the
histogram will be a 256 unit vector in which the nth value indicates
the percent of the pixels in the image with the given darkness level.
The histogram's values sum to 1.
'''
h, w = img.shape[:2]
hist = [0.0] * 256
for i in range(h):
for j in range(w):
hist[img[i, j]] += 1
return np.array(hist) / (h * w)
a = img_to_array(load_img('a.jpg', grayscale=True))
b = img_to_array(load_img('b.jpg', grayscale=True))
a_hist = get_histogram(a)
b_hist = get_histogram(b)
dist = wasserstein_distance(a_hist, b_hist)
print(dist)

Categories

Resources