Disparity maps with Normalized Cross Correlation using Python

Disparity maps with Normalized Cross Correlation using Python - python

For understanding purposes, I want to implement a stereo algorithm in Python (and Numpy), that computes a disparity map. As image data, I used the Tsukuba image dataset from Middlebury*. For simplicity, I choose normalised cross correlation (NCC)** as the similarity measure to find correspondence pixels. I will assume scanline agreement.
Here my implemented NCC:
left_mu = np.mean(left_patch)
right_mu = np.mean(right_patch)
left_sigma = np.sqrt(np.mean((left_patch - left_mu)**2))
right_sigma = np.sqrt(np.mean((right_patch - right_mu)**2))
patch = left_patch * right_patch
mu = left_mu * right_mu
num = np.mean(patch) - mu
denom = left_sigma * right_sigma
ncc = num/denom
where the left_patch and right_patch are some 3x3 patches from the original images. This outputs integers between -1 and 1, which describes the similarity between two pixels.
The idea is now to find the best-fit pixel. The disparity between the two pixels should now be stored in a new image - the disparity map.
Since I assumed scanline agreement I only have to search in one image row. For each pixel in the row, I want to take the index of the value that maximises the NCC value and store it as the disparity value.
My problem is now, that my results are rather odd. My disparity values are at around 180-200 pixels for an image which is 384x288 pixels. Here the resulting image.
Can you see the mistake in my thinking?
(*) vision.middlebury.edu/stereo/data/scenes2001/data/anigif/orig/tsukuba_o_a.gif
(**) A two-stage correlation method for stereoscopic depth estimation. - N. Einecke and J. Eggert

It seems that you didn't compute the numerator properly. It should be:
num = np.mean( (left_patch - left_mu) * (right_patch - right_mu) )

Related

Create an Undistorted Top-Down View of Camera Image

I have a fixed camera mounted on a wall viewing a rectangular lawn at an angle. My goal is to obtain an undistorted, top-down view of the lawn.
I have an image from the camera as a python numpy array which looks like this:
raw camera image
I use an inverse matrix with skimage.transform.warp to correct the image to a top down view:
top down distorted
This works perfectly, however the camera lens introduces barrel distortion.
Seperately, I can correct the distortion with a generated lookup table using skimage.transform.warp_coords and passing a simple undistort callable function based on the algorithm described here.
The image is then generated using scipy.ndimage.map_coordinates:
undistorted camera view
These 2 processes work individually, but how do I combine them to create an undistorted top-down view, without creating an intermediate image?
I could run each point in the lookup table through the matrix to create a new table, but the table is massive and memory is tight (Raspberry Pi Zero).
I would like to define the undistortion as a matrix and just combine the 2 matrices, but as I understand it, the projective homography matrix is linear but undistortion is non-linear, so this can't be done. I can't use OpenCV due to resource constraints, and the calibration procedure involving multiple chessboard images is impractical. Currently, I calibrate by taking 4 lawn corner points and generate the matrix from them, which works well.
I would have anticipated that this is a common problem in Computer Vision but can't find any suitable solutions.

The barrel distortion is nonlinear, but it is also smooth. This means it can be well approximated by a collection of piecewise linear approximations.
So you do not need a large, per-pixel look-up table of un-distortion displacements. Rather, you can subsample it (or just scale it down), and use bilinear interpolation for in-between pixels.

I have found a solution that appears to work by creating seperate functions for undistort and transformation, then chaining them together.
The skimage source code here has the _apply_mat method for generating a mapping from a matrix. I based my unwarp function on that:
def unwarp(coords, matrix):
coords = np.array(coords, copy=False, ndmin=2)
x, y = np.transpose(coords)
src = np.vstack((x, y, np.ones_like(x)))
dst = src.T # matrix.T
# below, we will divide by the last dimension of the homogeneous
# coordinate matrix. In order to avoid division by zero,
# we replace exact zeros in this column with a very small number.
dst[dst[:, 2] == 0, 2] = np.finfo(float).eps
# rescale to homogeneous coordinates
dst[:, :2] /= dst[:, 2:3]
return dst[:, :2]
I created a similar function for undistorting based on Tanner Hellands algorithm:
def undistort(coords, cols, rows, correction_radius, zoom):
half_width = cols / 2
half_height = rows / 2
new_x = coords[:, 0] - half_width
new_y = coords[:, 1] - half_height
distance = np.hypot(new_x, new_y)
r = distance / correction_radius
theta = np.ones_like(r)
# only process non-zero values
np.divide(np.arctan(r), r, out=theta, where=r!=0)
source_x = half_width + theta * new_x * zoom
source_y = half_height + theta * new_y * zoom
result = np.column_stack([source_x, source_y])
return result
The only tricky bit here is the divide where we need to prevent division by zero.
Once we have each lookup table we can chain them together:
def undistort_unwarp(coords):
undistorted = undistort(coords)
both = unwarp(undistorted)
return both
Note that these are the callable functions passed to skimage.transform.warp_coords:
mymap = tf.warp_coords(undistort_unwarp, shape=(rows, cols), dtype=np.int16)
The map can then be passed to the skimage.transform.warp function.
Francesco's answer was helpful, however I needed the full pixel resolution for the transformation, so I used it for the undistort as well, and looked to other ways to reduce the memory consumption.
Each map consumes
rows * cols * bytes-per-item * 2 (x and y)
bytes. The default datatype is float64, which requires 8 bytes-per-item, and the documentation suggests sane choices would be the default or float32 at 4 bytes-per-item. I was able to reduce this to 2 bytes-per-item using int16 with no visible ill effects, but I suspect the spline interpolation is not being used to the full (at all?).
The map is the same for each channel of a colour RGB image. However, when I called warp_coords with shape=(rows, cols, 3) I got 3 duplicate maps, so I created a function to handle colour images by processing each channel separately:
def warp_colour(img_arr, coord_map):
if img_arr.ndim == 3:
# colour
rows, cols, _chans = img_arr.shape
r_arr = tf.warp(img_arr[:, :, 0], inverse_map=coord_map, output_shape=(rows, cols))
g_arr = tf.warp(img_arr[:, :, 1], inverse_map=coord_map, output_shape=(rows, cols))
b_arr = tf.warp(img_arr[:, :, 2], inverse_map=coord_map, output_shape=(rows, cols))
rgb_arr = np.dstack([r_arr, g_arr, b_arr])
else:
# grayscale
rows, cols = img_arr.shape
rgb_arr = tf.warp(img_arr, inverse_map=coord_map, output_shape=(rows, cols))
return rgb_arr
One issue with skimage.transform.warp_coords is that it does not have the map_args dictionary parameter that skimage.transform.warp has. I had to call my unwarp and undistort functions through an intermediate function to add the parameters.

Calculate and use inverse of OpenCV distortion parameters

So basically I want to find out what the invert distortion parameters would be for a calibration I do as showed here.
NOTE: I know that we can perform undistortion and then use remapping to do what I have done below, but my goal is to be able to find out the actual inverted distortion parameters and use them to distort other images, not to just be able to revert what cv2.undistort() does
Overview:
I have tried passing in the negation of the distortion parameters:
# _, mat, distortion, _, _ = cv2.calibrateCamera(...)
# undistorted_image = cv2.undistort(with distortion)
# redistorted_image = cv2.undistort(with np.negative(distortion))
In theory, I was thinking that if the redistorted_image is similar to the original image, then the np.negative(distortion) parameters are what I am looking for, but it turned out to be false.
Actual method I use:
def save_undistorted_images(image, matrix, distortion):
test_image_su = cv.imread(image)
height, width = test_image_su.shape[:2]
new_camera_mtx, roi = cv.getOptimalNewCameraMatrix(matrix, distortion, (width, height), 1, (width, height))
distortion_u = np.negative(distortion)
# unsure if this line helps
new_camera_mtx_inv, roi = cv.getOptimalNewCameraMatrix(matrix, distortion_u, (width, height), 1, (width, height))
# undistort the image
undistorted_image = cv.undistort(test_image_su, matrix, distortion, None, new_camera_mtx)
cv.imwrite('undistorted_frame.png', undistorted_image)
# redistort trying to get something like original image (image_test_su)
distorted_image = cv.undistort(undistorted_image, matrix, distortion_u, None, new_camera_mtx_inv)
cv.imwrite('redistorted_frame.png', distorted_image)
The results:
(left a: original) (right b: undistorted)
(left c: distorted using np.negative(distortion)) (right d: undistorted image redistorted using np.negative(distortion)))
The image d here is basically c performed on b, which I expected would be similar to a
Why is b here overpowering the effect of c?
Other way of calculating inverse that I tried:
The following is my python implementation of this paper
distortion_u = distortion
k1 = distortion_u[0][0]
k2 = distortion_u[0][1]
k3 = distortion_u[0][4]
b1 = -1 * k1
b2 = 3 * (k1 * k1) - k2
b3 = (8 * k1 * k2) + (-1 * (12 * (k1 * k1 * k1))) - k3
# radial:
distortion_u[0][0] = b1
distortion_u[0][1] = b2
distortion_u[0][4] = b3
# tangential:
#distortion_u[0][2] = -1 * distortion_u[0][2]
#distortion_u[0][3] = -1 * distortion_u[0][3]
The results of applying distortion on undistorted image using above distortion parameters is also not good, looks really similar to results above.
So, this brings us to:
Why is the effect of normal distortion always overpowering np.negative(distortion) or anything else?
Does all distortion work this way? (negative values does not equal to opposite effect)
How to get the actually opposite distortion parameters?

Afraid you are doing it wrong. The opencv distortion model you have calibrated computes undistorted and normalized image coordinates from distorted ones. It is a nonlinear model, so inverting it involves solving a system of nonlinear (polynomial) equations.
A closed form (parametric) solution exists AFAIK only for the case of single-parameter pure radial distortion, i.e. when the only nonzero distortion parameter is k1, the coefficient of r^2. In this case the model inversion equation reduces to a cubic equation in r, and you can then express the inverse model using Cardano's formula for the solution of the cubic.
In all other cases one inverts the model numerically, using various algorithms for solving the nonlinear system of equations. OpenCV uses an iterative "false-position" method.
Since you want to use the inverse model to un-distort a set of images (which is the normal use case), you should use initUndistortRectifyMap to calculate the undistortion solution for the image once and for all, and then pass it for every image to remap to actually undistort the images.
If you really need a parametric model for the inverse model, my advice would be to look into approximating the maps returned by initUndistortRectifyMap with a pair of higher order polynomials, or thin-plate splines.

Python fractal box count - fractal dimension

I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?

With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.

How to extend an arc to a complete circle?

Given a binary square array of the fixed size like on the image below.
It is assumed in advance that the array contains an image of a circle or part of it. It's important that this circle is always centred on the image.
Example
It is necessary to find an effective way to supplement the arc to the full circle, if it's possible.
I've tried to statistically calculate the average distance from the centre to the white points and complete the circle. And it works. I've also tried the Hough Transform to fit the ellipse and determine its size. But both methods are very resource intensive.
1 method sketch:
points = np.transpose(np.array(np.nonzero(array))).tolist() # array of one-value points
random.shuffle(points)
points = np.array(points[:500]).astype('uint8') # take into account only 500 random points
distances = np.zeros(points.shape[0], dtype='int32') # array of distances from the centre of image (40, 40) to some point
for i in xrange(points.shape[0]):
distances[i] = int(np.sqrt((points[i][0] - 40) ** 2 + (points[i][1] - 40) ** 2))
u, indices = np.unique(distances, return_inverse=True)
mean_dist = u[np.argmax(np.bincount(indices))] # most probable distance
# use this mean_dist in order to draw a complete circle
1 method result
2 method sketch:
from skimage.transform import hough_ellipse
result = hough_ellipse(array, min_size=..., max_size=...)
result.sort(order='accumulator')
# ... extract the necessary info from result variable if it's not empty
Could someone suggest another and effective solution? Thank you!

I've tried to statistically calculate the average distance from the centre to the white points and complete the circle.
Well this seems to be a good start. Given an image with n pixels, this Algorithm is O(n) which is already pretty efficient.
If you want a faster implementation, try using randomization:
Take m random sample points from the image and use those to calculate the average radius of the white points. Then complete the circle using this radius.
This algorithm would then have O(m) which means that it's faster for all m < n. Choosing a good value for m might be tricky because you have to compromise between runtime and output quality.

Compare similarity of images using OpenCV with Python

I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.
I get this code in this post and change for my context
# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
# kNN training
knn = cv2.KNearest()
knn.train(samples,responses)
modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]
for modelImage in modelImages:
# Now loading a template image and searching for similar keypoints
template = cv2.imread(modelImage)
templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
keys = surf.detect(templateg)
keys,desc = surfDescriptorExtractor.compute(templateg, keys)
for h,des in enumerate(desc):
des = np.array(des,np.float32).reshape((1,128))
retval, results, neigh_resp, dists = knn.find_nearest(des,1)
res,dist = int(results[0][0]),dists[0][0]
if dist<0.1: # draw matched keypoints in red color
color = (0,0,255)
else: # draw unmatched in blue color
#print dist
color = (255,0,0)
#Draw matched key points on original image
x,y = kp[res].pt
center = (int(x),int(y))
cv2.circle(img,center,2,color,-1)
#Draw matched key points on template image
x,y = keys[h].pt
center = (int(x),int(y))
cv2.circle(template,center,2,color,-1)
cv2.imshow('img',img)
cv2.imshow('tm',template)
cv2.waitKey(0)
cv2.destroyAllWindows()
My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?

I suggest you to take a look to the earth mover's distance (EMD) between the images.
This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:
robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf
It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.
The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.
I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.
EDIT:
Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).
In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.
This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.
Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).
It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):
#original data, two 2x2 images, normalized
x = rand(2,2)
x/=sum(x)
y = rand(2,2)
y/=sum(y)
#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()
#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)
D = D.flatten()
x = x.flatten()
y = y.flatten()
#COST=sum(F*D)
#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[i,:])-x[i]} for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size
from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)
the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.
The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results

You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.
Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)
If you have a tight deadline or need to get this done and working soon... yikes.
Here's a ton of papers on the topic:
http://scholar.google.com/scholar?q=content+based+image+retrieval
Generally you will need to do a few things:
Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
Cluster / build a model of image distributions
This can involve feature descriptors, image gists, multiple instance learning. etc.

I wrote a program to do something very similar maybe 2 years ago using Python/Cython. Later I rewrote it to Go to get better performance. The base idea comes from findimagedupes IIRC.
It basically computes a "fingerprint" for each image, and then compares these fingerprints to match similar images.
The fingerprint is generated by resizing the image to 160x160, converting it to grayscale, adding some blur, normalizing it, then resizing it to 16x16 monochrome. At the end you have 256 bits of output: that's your fingerprint. This is very easy to do using convert:
convert path[0] -sample 160x160! -modulate 100,0 -blur 3x99 \
-normalize -equalize -sample 16x16 -threshold 50% -monochrome mono:-
(The [0] in path[0] is used to only extract the first frame of animated GIFs; if you're not interested in such images you can just remove it.)
After applying this to 2 images, you will have 2 (256-bit) fingerprints, fp1 and fp2.
The similarity score of these 2 images is then computed by XORing these 2 values and counting the bits set to 1. To do this bit counting, you can use the bitsoncount() function from this answer:
# fp1 and fp2 are stored as lists of 8 (32-bit) integers
score = 0
for n in range(8):
score += bitsoncount(fp1[n] ^ fp2[n])
score will be a number between 0 and 256 indicating how similar your images are. In my application I divide it by 2.56 (normalize to 0-100) and I've found that images with a normalized score of 20 or less are often identical.
If you want to implement this method and use it to compare lots of images, I strongly suggest you use Cython (or just plain C) as much as possible: XORing and bit counting is very slow with pure Python integers.
I'm really sorry but I can't find my Python code anymore. Right now I only have a Go version, but I'm afraid I can't post it here (tightly integrated in some other code, and probably a little ugly as it was my first serious program in Go...).
There's also a very good "find by similarity" function in GQView/Geeqie; its source is here.

For a simpler implementation of Earth Mover's Distance (aka Wasserstein Distance) in Python, you could use Scipy:
from keras.preprocessing.image import load_img, img_to_array
from scipy.stats import wasserstein_distance
import numpy as np
def get_histogram(img):
'''
Get the histogram of an image. For an 8-bit, grayscale image, the
histogram will be a 256 unit vector in which the nth value indicates
the percent of the pixels in the image with the given darkness level.
The histogram's values sum to 1.
'''
h, w = img.shape[:2]
hist = [0.0] * 256
for i in range(h):
for j in range(w):
hist[img[i, j]] += 1
return np.array(hist) / (h * w)
a = img_to_array(load_img('a.jpg', grayscale=True))
b = img_to_array(load_img('b.jpg', grayscale=True))
a_hist = get_histogram(a)
b_hist = get_histogram(b)
dist = wasserstein_distance(a_hist, b_hist)
print(dist)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.