I have two sets of exposure-bracketed images of a color chart from two different camera systems, A and B. Each data set, at a given exposure, gives me 24 RGB tuples from the patches on the color chart.
I want to match camera B to camera A through a 3-dimensional transform via an interpolation of these two data sets. This is basically the process of creating a look-up table. The methods for parsing and applying LUTs to existing images are well-documented, but I cannot find good resources on how to analytically create a LUT given two different data sets. I know that the answer involves interpolation through a sparse data set, and could involve something like trilinear interpolation, but I'm not sure about the actual execution.
For example, taking the case of trilinear interpolation, it expects 8 corners, but in the case of matching image A to image B, what do those 8 corners consist of? The closest hits to the given pixel in all dimensions? Searching through an unordered data set for close values seems expensive and not correct.
Overall, I'm looking for some advice on how to proceed to match two images with the data set I've acquired, specifically with a 3d transformation. Preferred tool is Python but it can be anything non-proprietary.
In the first place, you need to establish the correspondences, i.e. associate the patches of the same color in the two images (much to say about this but not in the scope of this answer). And get the RGB color values (preferably by averaging over the patches to reduce random fluctuations).
Now you have a set of N pairs of RGB triples, to which you want to fit a mathematical model,
RGB' = f(RGB)
(f is a vector function of a vector argument).
To begin with, you should try an affine model,
RGB' = A RGB + RGB0
where A is a 3x3 matrix and RGB0 a constant vector. Notice that in the affine case, the equations are independent, like
R' = Ar RGB + R0
G' = Ag RGB + G0
B' = Ab RGB + B0
where Ar, Ag, Ab are vectors.
There are twelve unknown coefficients so you need N≥4. If N>4, you can resort to least-squares fitting, also easy in the linear case.
In case the affine model is insufficient, you can try a polynomial model such as a quadric one (requires N≥10).
Related
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
All methods seem to work almost identically, but have slight differences.
An explanation why they end up with slightly different results would be appreciated.
After computing those I started playing with the result to learn about it, and I found out something that confused me:
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
Why is that?
I added the code I used to compute this (the first method at least) because maybe I'm doing something wrong.
from scipy.signal import convolve2d
im = sl.read_image(r'C:\Users\ahhal\Desktop\Essentials\Uni\year3\SemesterA\ImageProcessing\Exercises\Ex2\external\monkey.jpg', 1)
b = [[-1, 1]]
print(np.median(convolve2d(im, b)))
output: 0.0
The read_image function is my own and this is the implementation:
from imageio import imread
from skimage.color import rgb2gray
import numpy as np
def read_image(filename, representation):
"""
Receives an image file and converts it into one of two given representations.
:param filename: The file name of an image on disk (could be grayscale or RGB).
:param representation: representation code, either 1 or 2 defining wether the output
should be a grayscale image (1) or an RGB image (2). If the input image is grayscale,
we won't call it with representation = 2.
:return: An image, represented by a matrix of type (np.float64) with intensities
normalized to the range [0,1].
"""
assert representation in [1, 2]
# reads the image
im = imread(filename)
if representation == 1: # If the user specified they need grayscale image,
if len(im.shape) == 3: # AND the image is not grayscale yet
im = rgb2gray(im) # convert to grayscale (**Assuming its RGB and not a different format**)
im_float = im.astype(np.float64) # Convert the image type to one we can work with.
if im_float.max() > 1: # If image values are out of bound, normalize them.
im_float = im_float / 255
return im_float
Edit 2:
I tried it on several different images, and got 0.0 at all of them.
The image I'm using in the example is:
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
These derivative methods are all approximate and make different assumptions:
Convolution by [[-1, 1]] computes differences between adjacent elements,
derivative ~= data[n+1] − data[n]
You can interpret this like interpolating the data with a line segment, then taking the derivative of that interpolant:
I(x) = data[n] + (data[n+1] − data[n]) * (x − n)
So the approximation assumes the underlying function is locally linear. You can analyze the error by Taylor expansion to find that the error comes from the ignored higher-order terms. In other words, the approximation is accurate provided the function doesn't have strong nonlinear terms. This is a simple case of finite differences.
This is the same as 1, except with different boundary handling to handle convolution of samples near the edges of the image. By default, scipy.signal.convolve2d does zero padding (though you can use the boundary option to choose some other methods). However when computing the convolution through the DFT, then implicitly the boundary handling is periodic, wrapping around at the image edges. So the results of 1 and 2 differ for a margin of pixels near the edge because of the different boundary handling.
Computing the derivative through multiplying iω under the DFT representation can be interpreted like evaluating the derivative of the sinc interpolation the data. Sinc interpolation assumes the data is band limited. The error comes from spectra beyond the Nyquist frequency. Particularly, if there is a hard jump discontinuity from an object boundary, then the image is not bandlimited and the DFT-based derivative will have substantial error in the vicinity of the jump, appearing as ringing artifacts.
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
I don't know why this happened here, but it shouldn't always be the case. For instance if each image row is the unit ramp data[n] = n, then the convolution by [[-1, 1]] is equal to 1 everywhere, except depending on boundary handling possibly not at the edges, so the median is 1.
Pascal already gave a wonderful explanation of the differences between the various approximations to the derivative. So I'll focus here on the "why always 0.0?" question.
The median of the derivative is 0.0 only by approximation. When I compute it, based on the finite difference approximation (method #1), I get -5.15e-5 as the median. Close to zero, but not exactly zero.
The derivative is 0 in uniform (flat) regions of the image such as the out-of-focus background. Other features in the image tend to have both a positive and a negative edge, making the histogram of the derivative image very symmetric:
This symmetry causes the median (as well as the mean) to be close to zero for such an image. However, this is not always the case. For example, if the image is brighter on the left edge than the right edge (or the other way around), then there must be a net gradient across the image, causing the mean or median to be different from zero.
I have a 3d numpy array representing an object with cells as voxels and the voxels having values from 1 to 10. I would like to compress the image (a) to make it smaller and (b) to get a quick idea later on of how complex the image is by compressing it to a minimum level of agreement with the original image.
I have used SVD to do this with 2D images and seeing how many singular values were required but it looks to have difficulty with 3D ones. If e.g. I look at the diagonal terms in the S matrix, they are all zero and I was expecting singular values.
Is there any way I can use svd to compress 3D arrays (e.g. flattening in some way)? Or are other methods more appropriate? If necessary I could probably simplify the voxel values to 0 or 1.
You could essentially apply the same principle to the 3D data without flattening it. There are some algorithms to separate N-dimensional matrices, such as the CP-ALS (using Alternating Least Squares) and this is implemented in the package sktensor. You can use the package to decompose the tensor given a rank:
from sktensor import dtensor, cp_als
T = dtensor(X)
rank = 5
P, fit, itr, exectimes = cp_als(T, rank, init='random')
With X being your data. You could then use the weights weights = P.lmbda to reconstruct the original array X and calculate the reconstruction error, as you would do with SVD.
Other decomposition methods for 3D data (or in general tensors) include the Tucker Decomposition or the Canonical Decomposition (also available in the same package).
It is not directly a 3D SVD, but all the methods above can be used to analyze the principal components of your data.
Find bellow (just for completeness) an image of the tucker decomposition:
And bellow another image of the decomposition that CP-ALS (optimization algorithm) tries to obtain:
Image credits to:
1- http://www.slideshare.net/KoheiHayashi1/talk-in-jokyonokai-12989223
2- http://www.bsp.brain.riken.jp/~zhougx/tensor.html
What you want is a higher order svd/Tucker decomposition.
In the 3D case, you will get three projection matrices (one for each dimension) and a low rank core tensor (a 3D array).
You can do this easily using TensorLy:
from tensorly.decomposition import tucker
core, factors = tucker(tensor, ranks=[2, 3, 4])
Here, core will have shape (2, 3, 4) and len(factors) will be 3, one factor for each dimension.
I have a question about the linear interpolation in python\numpy.
I have a 4D array with the data (all data in binary files) that arrange in this way:
t- time (lets say each hour for a month = 720)
Z-levels (lets say Z'=7)
Y-data1 (one for each t and Z)
X-data2 (one for each t and Z)
So, I want to obtain a new Y and X data for the Z'=25 with the same t.
The first thing, I have a small trouble with the right way to read my data from the binary file. Second, I have to interpolate first 3 levels to Z'=15 and others for the other values.
If anyone has an idea how to do it and can help it will be great.
Thank you for your attention!
You can create different interpolation formulas for different combinations of z' and t.
For example, for z=7, and a specific value of t, you can create an interpolation formula:
formula = scipy.interp1d(x,y)
Another one for say z=25 and so on.
Then, given any combination of z and t, you can refer to the specific interpolation formula and do the interpolation.
In 2D for instance there is bilinear interpolation - with an example on the unit square with the z-values 0, 1, 1 and 0.5 as indicated. Interpolated values in between represented by colour:
Then trilinear, and so on...
Follow the pattern and you'll see that you can nest interpolations to any dimension you require...
:)
I am using the Exemplar-Based algorithm by Criminisi. In section 3 of his paper it describes the algorithm. The target region that needs to be inpainted is denoted as Ω (omega). The border or the contour of Ω where it meets the rest of the image (denoted as Φ(phi)), is δΩ (delta omega).
Now on page four of the paper, it states that np(n subscript p) is the normal to the contour of δΩ. and ▽Ip (also includes orthogonal superscript) is the isophote at point p, which is the gradient turned 90 degrees.
My multivariable calculus is rusty, but how do we go about computing np and ▽Ip with python libraries? Also isn't np different for each point p on δΩ?
There are different ways of computing those variables, all depending in your numeric description of that boundary. n_p is the normal direction of the contour.
Generally, if your contour is described with an analytic equation, or if you can write an analytic equation that approximates the contour (e.g. a spline curve that fits 5 points (2 in each side of the point you want), you can derive that spline, compute the tangent line in the point you want using
Then, get a unit vector among that line and get the orthonormal vector to that one. All this is very easy to do (ask if you don't understand).
Then you have the isophone. It looks like its a vector orthonormal of the gradient with its modulus. Computing the directional gradient on an image is very very commonly used technique in image processing. You can get the X and Y derivatives of the image easily (hint: numpy.gradient, or SO python gradient). Then, the total gradient of the image is described as:
So just create a vector with the x and y gradients (taken from numpy.gradient). Then get the orthogonal vector to that one.
NOTE: How to get an orthogonal vector in 2D
[v2x v2y] = [v1y, -v1x]
As part of a digital image processing class, we have been assigned the Inverse Filter for image restoration. I'm using numpy. The variable names below try to follow the names in Digital Image Processing Gonzalez+Woods, 3e.
A zoom of the original image.
.
Gaussian kernel "zz.tif" same size as original image.
Zoom of the gaussian smoothed image with no noise added
f = imtools.load_image( sys.argv[1], mode="L", dtype="float" )
zz = imtools.load_image( "zz.tif", mode="L", dtype="float" )
F = np.fft.fft2( f )
F2 = np.fft.fftshift( F )
# normalize to [0,1]
H = zz/255.
# calculate the damaged image
G = H * F2
# Inverse Filter
F_hat = G / H
# cheat? replace division by zero (NaN) with zeroes
a = np.nan_to_num(F_hat)
f_hat = np.fft.ifft2( np.fft.ifftshift(a) )
imtools.save_image( np.abs(f_hat), "out.tif" )
imtools is just my wrapper using PIL+numpy to load/store images. (Can post that src, too.)
Zoom of the inverse filtered image.
Am I calculating the Inverse Filter correctly? Am I using numpy correctly?
Is the ringing in the final image expected or am I doing something wrong?
Generally, yes you seem to be doing things correctly, as far as I know.
The ringing is due to an overly "sharp" high pass filter, but that's what the method you're using does.
However, you might consider using numpy.fft.rfft2 ("real fft") and numpy.fft.irfft2 instead of numpy.fft.fft2 and numpy.fft.ifft2 because you're dealing purely with real values. It should be slightly faster.
I don't know much about Python but the 'ringing' is normal for the inverse filter. The Gibbs phenomenon lies at the basis of the ringing. Since the input is not entirely smooth but has some discontinuities, an infinite number of Fourier components is in principle needed to represent it completely. A finite number of components is sufficient here since the display resolution is finite, the image is pixelated. However, some information is lost in the recorded image because of the multiplication by zeros in H, by consequence the restored image approximates the input image with components covering a finite bandwidth, lower than that of the display, revealing the Gibbs oscillations.
To mitigate this use proper regularization as with a 2D Wiener filter: F_hat=G * H.conjugate()/(abs(H)2+NSR2) where NSR is an estimate of the noise to signal ratio, e.g. linearly increasing from 0 to 10 at the highest spatial frequency. This will account for the finite signal to noise ratio and when the NSR estimate is close enough you should see little 'ringing' after restoration.