I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
All methods seem to work almost identically, but have slight differences.
An explanation why they end up with slightly different results would be appreciated.
After computing those I started playing with the result to learn about it, and I found out something that confused me:
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
Why is that?
I added the code I used to compute this (the first method at least) because maybe I'm doing something wrong.
from scipy.signal import convolve2d
im = sl.read_image(r'C:\Users\ahhal\Desktop\Essentials\Uni\year3\SemesterA\ImageProcessing\Exercises\Ex2\external\monkey.jpg', 1)
b = [[-1, 1]]
print(np.median(convolve2d(im, b)))
output: 0.0
The read_image function is my own and this is the implementation:
from imageio import imread
from skimage.color import rgb2gray
import numpy as np
def read_image(filename, representation):
"""
Receives an image file and converts it into one of two given representations.
:param filename: The file name of an image on disk (could be grayscale or RGB).
:param representation: representation code, either 1 or 2 defining wether the output
should be a grayscale image (1) or an RGB image (2). If the input image is grayscale,
we won't call it with representation = 2.
:return: An image, represented by a matrix of type (np.float64) with intensities
normalized to the range [0,1].
"""
assert representation in [1, 2]
# reads the image
im = imread(filename)
if representation == 1: # If the user specified they need grayscale image,
if len(im.shape) == 3: # AND the image is not grayscale yet
im = rgb2gray(im) # convert to grayscale (**Assuming its RGB and not a different format**)
im_float = im.astype(np.float64) # Convert the image type to one we can work with.
if im_float.max() > 1: # If image values are out of bound, normalize them.
im_float = im_float / 255
return im_float
Edit 2:
I tried it on several different images, and got 0.0 at all of them.
The image I'm using in the example is:
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
These derivative methods are all approximate and make different assumptions:
Convolution by [[-1, 1]] computes differences between adjacent elements,
derivative ~= data[n+1] − data[n]
You can interpret this like interpolating the data with a line segment, then taking the derivative of that interpolant:
I(x) = data[n] + (data[n+1] − data[n]) * (x − n)
So the approximation assumes the underlying function is locally linear. You can analyze the error by Taylor expansion to find that the error comes from the ignored higher-order terms. In other words, the approximation is accurate provided the function doesn't have strong nonlinear terms. This is a simple case of finite differences.
This is the same as 1, except with different boundary handling to handle convolution of samples near the edges of the image. By default, scipy.signal.convolve2d does zero padding (though you can use the boundary option to choose some other methods). However when computing the convolution through the DFT, then implicitly the boundary handling is periodic, wrapping around at the image edges. So the results of 1 and 2 differ for a margin of pixels near the edge because of the different boundary handling.
Computing the derivative through multiplying iω under the DFT representation can be interpreted like evaluating the derivative of the sinc interpolation the data. Sinc interpolation assumes the data is band limited. The error comes from spectra beyond the Nyquist frequency. Particularly, if there is a hard jump discontinuity from an object boundary, then the image is not bandlimited and the DFT-based derivative will have substantial error in the vicinity of the jump, appearing as ringing artifacts.
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
I don't know why this happened here, but it shouldn't always be the case. For instance if each image row is the unit ramp data[n] = n, then the convolution by [[-1, 1]] is equal to 1 everywhere, except depending on boundary handling possibly not at the edges, so the median is 1.
Pascal already gave a wonderful explanation of the differences between the various approximations to the derivative. So I'll focus here on the "why always 0.0?" question.
The median of the derivative is 0.0 only by approximation. When I compute it, based on the finite difference approximation (method #1), I get -5.15e-5 as the median. Close to zero, but not exactly zero.
The derivative is 0 in uniform (flat) regions of the image such as the out-of-focus background. Other features in the image tend to have both a positive and a negative edge, making the histogram of the derivative image very symmetric:
This symmetry causes the median (as well as the mean) to be close to zero for such an image. However, this is not always the case. For example, if the image is brighter on the left edge than the right edge (or the other way around), then there must be a net gradient across the image, causing the mean or median to be different from zero.
I'm having a little trouble with doing a Fourier Deconvolution using numpy. I'm currently attempting this with a test case of 3 Gaussians so I know exactly what to expect at each end.
What I'm trying to recover is the input signal given the exact form of the filter and the output.
Here, I have used a naive constraint to suppress the high frequency ends setting it to zero (because the signals are all Gaussians in fourier space as well). I expected to recover my original input with a tiny bit of ringing due to this constraint.
#Dummy Case for Gaussian convolve with Gaussian
N = 128
x = np.arange(-5, 5, 10./(2 * N))
epsilon = 1e-18
def gaus(x,sigma):
return 1./np.sqrt(2*np.pi)/sigma * np.exp(-(x * x)/(2 * sigma**2))
y_g = gaus(x,0.3) #output gaussian blurred signal
y_b = gaus(x,0.1) #gaussian blur filter
y_i = gaus(x,np.sqrt(0.3**2 - 0.1**2)) #og gaussian input
f_yg = np.fft.fft(y_g) #fft the blur
f_yb = np.fft.fft(y_b) #fft the filter
f_yi = np.fft.fft(y_i)
r_f = (np.fft.fftshift(f_yg)+epsilon)/(np.fft.fftshift(f_yb)+epsilon) #deconvolve by division in fourier space
r_f[np.abs(x)>0.5] = 0 #naive constraint to remove the artifacts by knowing final form is gaussian
r_f = np.fft.ifftshift(r_f)
r_if = np.fft.ifft(r_f)
y_gf = np.fft.ifft(f_yg)
y_bf = np.fft.ifft(f_yb)
y_if = np.fft.ifft(f_yi)
plt.plot(x,y_if, label='fft true input')
plt.plot(x,r_if, label='fft recv. input')
plt.legend(framealpha=0.)
plt.show()
Here the orange is the recovered input signal using the deconvolution of the output and the blur.
There are a few questions I have with this:
There is clearly a scaling issue. The only area where I can think that this may come in is when I applied the naive constraint. Should I renormalize in this step, knowing that 1/sqrt(N)*integral over my fourier space is equal to 1?
It looks like the position of the recovered Gaussian is messed up with half of the curve at either sides of the plot. Is this due to the division in Fourier space? How do I recover the original position (or have I done this completely wrong to begin with)
I've attached the script used to generate the two curves, the original input and recovered input in physical space.
Cheers,
Keven
EDIT: I should add I have no problem restoring the image using scipy.deconvolve + some small edits. This must mean my method here is somehow wrong?
1 ) As you correctly understood, the requirement for a scaling is related to the Discrete Fourier Transform. The best way to get it is to compute the deconvolution of two uniform unit signals. Their DFT is n 0 0 0 ...., where n is the number of points of the DFT. Hence the ratio r_f is 1 0 0 0 0 and its backward fft computed by np.fft.ifft() is 1/n 1/n 1/n ...
The correct signal resulting from the deconvolution should have been 1/T 1/T 1/T ..., where T=10. is the length of the frame.
As a consequence, the correct scaling to perform the deconvolution is n/T= len(r_f)/10.
r_if=r_if*len(r_if)/10.
2) The deconvoluted signal is translated by half a period. This is due to the fact that the gaussian kernel is centered on the middle of the frame. Simply shift the kernel by half a period and the problem is solved. The function np.fft.fftshift() can be applied to this end:
f_yb = np.fft.fft(np.fft.fftshift(y_b)) #fft the filter
EDIT: To investigate the reason for the translation, let's focus on the case of the deconvolution kernel being a very narrow gaussian distribution, nearly corresponding to Dirac distribution. Your input signal is a gaussian curve, centered at zero, the frame being sampled between -5 and 5. Similarly, the deconvolution kernel is a Dirac centered at zero. As a consequence, the deconvoluted signal must be identical to the input signal: a gaussian curve centered at zero. Nevertheless, the DFT as implemented in FFTW and consequently np.fft.fft() is computed as that of a frame starting at 0 and ending at 10, sampled at points 10j/n where j is in [0..n-1], the frequencies in the Fourier space being k/10 where k in [0..n/2,-n/2+1..-1]. As a consequence, this DFT sees your signal as a gaussian centered at 5 and the deconvolution kernel as a Dirac centered at 5. The convolution of a function f(t) with a Dirac delta(t-t0) centered at t0 is simply the translated function f(t-t0). Hence, the result of the deconvolution as computed by np.fft.fft() is the input signal translated by half a period. Since the input signal is centered at 0 in the [-5,5] frame, the output signal computed by np.fft.fft() is centered at -5 (or equivalently 5 due to periodicity). Shifting the kernel resolves the mismatch between us thinking of the frame as [-5 5] and np.fft.ifft() handling it as if it were [0 10].
Filters are often designed to reduce the effect of high-frequency noises. Deconvoluting therfore induce a potential magnification of high frequency noise. Screening the frequencies as you did is a potential solution. Notice that it is exactly equivalent to convoluting the signal with a particular filter!
In the range of tomographic reconstruction, the filtered backprojection algorithm requires applying a ramp filter, which dramatically inflate the high frequency noise. Here is proposed a Wiener filter: this kind of filter can be designed to minimize the mean square error on the deconvoluted signal, given the SNR of the convoluted signal. It nevertheless require some assumption regarding the power spectral densities of the signal and noise.
I am trying to implement an image expansion using the following algorithm:
1. Zero pad: Put a zero pixel between two pixels
2. Convolve with a column vector of a Gaussian, and multiply by 2
3. Convolve with a row vector of a Gaussian and multiply by 2
4. Return blurred image.
Here is the code I used (I used 3 for the guassian size):
def expand(im, filter_size):
''' Takes an image and expands it. '''
# zero - pad
new_image_shape = (im.shape[0]*2,im.shape[1]*2)
zero_image = np.zeros(new_image_shape)
zero_image[1::2,1::2] = im
padded_image = zero_image
gaussian = gaussian_kernel(filter_size)
im_blured_x = ndimage.filters.convolve(padded_image, 2*gaussian)
transpose = gaussian.transpose()
im_blured = ndimage.filters.convolve(im_blured_x,2*transpose)
return im_blured
The following are the original and expanded image respectively:
Original:
Expanded:
This is the implementation of the gaussian_kernel, but I already verified that for kernel_size=3 it gives [0.25,0.75,0.25]:
def gaussian_kernel(kernel_size):
''' Takes a kernel size that is odd and returns a 2D gaussian
kernel of this size. Results are undefined for even sizes.
Arguments:
kernel_size - The size of the kernel, an odd integer.
Returns:
gaussian - A normalized gaussian kernel of given size.
'''
if kernel_size == 1:
return np.array([0,1,0],dtype='float32')
# Create a gaussian kernel
gaussian = np.array([1,1],dtype='float32')
# Convolve it with itself kernel_size times
convolved = gaussian
num_of_convolutions = kernel_size-2
for i in range(0,num_of_convolutions):
convolved = np.convolve(convolved,gaussian)
# Normalize it
sum = convolved.sum()
nor_gaussian = np.divide(convolved,sum,dtype='float32')
return np.array([nor_gaussian,]).astype(np.float32)
When using ndimage.filters.convolve without a "mode" argument, it uses "reflect" by default, which seems to mean it reflects the values close to the end of the picture, and this seems to cause problems. Using "mode=constant" instead fixed the issue.
I am trying to convert a RGB image to Grayscale using the following paper.
The main algorithm using in the paper is this:
Novel PCA based algorithm to convert images to grayscale
However, when I am trying to extract eigen vectors from the image I am getting 500 eigen values, instead of 3 as required. As far as I know, a NxN matrix usually gives N Eigen vectors, but I am not really sure what I should be doing here to get only 3 Eigen vectors.
Any help as to what I should do? Here's my code so far:
import numpy as np
import cv2
def pca_rgb2gray(img):
"""
NOVEL PCA-BASED COLOR-TO-GRAY IMAGE CONVERSION
Authors:
-Ja-Won Seo
-Seong Dae Kim
2013 IEEE International Conference on Image Processing
"""
I_re = cv2.resize(img, (500,500))
Iycc = cv2.cvtColor(I_re, cv2.COLOR_BGR2YCrCb)
Izycc = Iycc - Iycc.mean()
eigvals = []
eigvecs = []
final_im = []
for i in range(3):
res = np.linalg.eig(Izycc[:,:,i])
eigvals.append(res[0])
eigvecs.append(res[1])
eignorm = np.linalg.norm(eigvals)
for i in range(3):
eigvals[i]/=eignorm
eigvecs[i]/=np.linalg.norm(eigvecs[i])
temp = eigvals[i] * np.dot(eigvecs[i], Izycc[:,:,i])
final_im.append(temp)
final_im = final_im[0] + final_im[1] + final_im[2]
return final_im
if __name__ == '__main__':
img = cv2.imread('image.png')
gray = pca_rgb2gray(img)
The accepted answer by Ahmed unfortunately has the PCA math wrong, leading to the a result quite different to the manuscript. Here are the images screen captured from the manuscript.
The mean centring and SVD should be done along the other dimension, with the channels treated as the different samples. The mean centring is aimed at getting an average pixel response of zero, not an average channel response of zero.
The linked algorithm also clearly states that the projection of the PCA model involves multiplication of the image by the scores first and this product by the eigenvalues, not the other way round as in the other answer.
For further info on the math see my PCA math answer here
The difference in the code can be seen in the outputs. Since the manuscript did not provide an example output (that I found) there may be subtle differences between the results as the manuscript ones are captured screenshots.
For comparison, the downloaded colour file, which is a little more contrasted than the screenshot, so one would expect the same from the output greyscale.
First the result from Ahmed's code:
Then the result from the updated code:
The corrected code (based on Ahmed's for ease of comparison) is
import numpy as np
import cv2
from numpy.linalg import svd, norm
# Read input image
Ibgr = cv2.imread('path/peppers.jpg')
#Convert to YCrCb
Iycc = cv2.cvtColor(Ibgr, cv2.COLOR_BGR2YCR_CB)
# Reshape the H by W by 3 array to a 3 by N array (N = W * H)
Izycc = Iycc.reshape([-1, 3]).T
# Remove mean along Y, Cr, and Cb *separately*!
Izycc = Izycc - Izycc.mean(0) #(1)[:, np.newaxis]
# Mean across channels is required (separate means for each channel is not a
# mathematically sensible idea) - each pixel's variation should centre around 0
# Make sure we're dealing with zero-mean data here: the mean for Y, Cr, and Cb
# should separately be zero. Recall: Izycc is 3 by N array.
# Original assertion was based on a false presmise. Mean value for each pixel should be 0
assert(np.allclose(np.mean(Izycc, 0), 0.0))
# Compute data array's SVD. Ignore the 3rd return value: unimportant in this context.
(U, S, L) = svd(Izycc, full_matrices=False)
# Square the data's singular vectors to get the eigenvalues. Then, normalize
# the three eigenvalues to unit norm and finally, make a diagonal matrix out of
# them.
eigvals = np.diag(S**2 / norm(S**2))
# Eigenvectors are just the right-singular vectors.
eigvecs = U;
# Project the YCrCb data onto the principal components and reshape to W by H
# array.
# This was performed incorrectly, the published algorithm shows that the eigenvectors
# are multiplied by the flattened image then scaled by eigenvalues
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0).reshape(Iycc.shape[:2])
Igray2 = np.dot(eigvals, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
eigvals3 = eigvals*[1,-1,1]
Igray3 = np.dot(eigvals3, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
eigvals4 = eigvals*[1,-1,-1]
Igray4 = np.dot(eigvals4, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
# Rescale Igray to [0, 255]. This is a fancy way to do this.
from scipy.interpolate import interp1d
Igray = np.floor((interp1d([Igray.min(), Igray.max()],
[0.0, 256.0 - 1e-4]))(Igray))
Igray2 = np.floor((interp1d([Igray2.min(), Igray2.max()],
[0.0, 256.0 - 1e-4]))(Igray2))
Igray3 = np.floor((interp1d([Igray3.min(), Igray3.max()],
[0.0, 256.0 - 1e-4]))(Igray3))
Igray4 = np.floor((interp1d([Igray4.min(), Igray4.max()],
[0.0, 256.0 - 1e-4]))(Igray4))
# Make sure we don't accidentally produce a photographic negative (flip image
# intensities). N.B.: `norm` is often expensive; in real life, try to see if
# there's a more efficient way to do this.
if norm(Iycc[:,:,0] - Igray) > norm(Iycc[:,:,0] - (255.0 - Igray)):
Igray = 255 - Igray
if norm(Iycc[:,:,0] - Igray2) > norm(Iycc[:,:,0] - (255.0 - Igray2)):
Igray2 = 255 - Igray2
if norm(Iycc[:,:,0] - Igray3) > norm(Iycc[:,:,0] - (255.0 - Igray3)):
Igray3 = 255 - Igray3
if norm(Iycc[:,:,0] - Igray4) > norm(Iycc[:,:,0] - (255.0 - Igray4)):
Igray4 = 255 - Igray4
# Display result
if True:
import pylab
pylab.ion()
fGray = pylab.imshow(Igray, cmap='gray')
# Save result
cv2.imwrite('peppers-gray.png', Igray.astype(np.uint8))
fGray2 = pylab.imshow(Igray2, cmap='gray')
# Save result
cv2.imwrite('peppers-gray2.png', Igray2.astype(np.uint8))
fGray3 =pylab.imshow(Igray3, cmap='gray')
# Save result
cv2.imwrite('peppers-gray3.png', Igray3.astype(np.uint8))
fGray4 =pylab.imshow(Igray4, cmap='gray')
# Save result
cv2.imwrite('peppers-gray4.png', Igray4.astype(np.uint8))
****EDIT*****
Following Nazlok's query about the instability of eigenvector direction (which direction any one eigenvectors is oriented in is arbitrary, so there is not guarantee that different algorithms (or single algorithms without a reproducible standardisation step for orientation) would give the same result. I have now added in two extra examples, where I have simply switched the sign of the eigenvectors (number 2 and numbers 2 and 3). The results are again different, with the switching of only PC2 giving a much lighter tone, while switching 2 and 3 is similar (not surprising as the exponential scaling relegates the influence of PC3 to very little). I'll leave that last one for people bothered to run the code.
Conclusion
Without clear additional steps taken to provide a repeatable and reproducible orientation of PCs this algorithm is unstable and I personally would not be comfortable employing it as is. Nazlok's suggestion of using the balance of positive and negative intensities could provide a rule but would need validated so is out of scope of this answer. Such a rule however would not guarantee a 'best' solution, just a stable one. Eigenvectors are unit vectors, so are balanced in variance (square of intensity). Which side of zero has the largest sum of magnitudes is only telling us which side has individual pixels contributing larger variances which I suspect is generally not very informative.
Background
When Seo and Kim ask for lambda_i, v_i <- PCA(Iycc), for i = 1, 2, 3, they want:
from numpy.linalg import eig
lambdas, vs = eig(np.dot(Izycc, Izycc.T))
for a 3×N array Izycc. That is, they want the three eigenvalues and eigenvectors of the 3×3 covariance matrix of Izycc, the 3×N array (for you, N = 500*500).
However, you almost never want to compute the covariance matrix, then find its eigendecomposition, because of numerical instability. There is a much better way to get the same lambdas, vs, using the singular value decomposition (SVD) of Izycc directly (see this answer). The code below shows you how to do this.
Just show me the code
First download http://cadik.posvete.cz/color_to_gray_evaluation/img/155_5572_jpg/155_5572_jpg.jpg and save it as peppers.jpg.
Then, run the following:
import numpy as np
import cv2
from numpy.linalg import svd, norm
# Read input image
Ibgr = cv2.imread('peppers.jpg')
# Convert to YCrCb
Iycc = cv2.cvtColor(Ibgr, cv2.COLOR_BGR2YCR_CB)
# Reshape the H by W by 3 array to a 3 by N array (N = W * H)
Izycc = Iycc.reshape([-1, 3]).T
# Remove mean along Y, Cr, and Cb *separately*!
Izycc = Izycc - Izycc.mean(1)[:, np.newaxis]
# Make sure we're dealing with zero-mean data here: the mean for Y, Cr, and Cb
# should separately be zero. Recall: Izycc is 3 by N array.
assert(np.allclose(np.mean(Izycc, 1), 0.0))
# Compute data array's SVD. Ignore the 3rd return value: unimportant.
(U, S) = svd(Izycc, full_matrices=False)[:2]
# Square the data's singular vectors to get the eigenvalues. Then, normalize
# the three eigenvalues to unit norm and finally, make a diagonal matrix out of
# them. N.B.: the scaling factor of `norm(S**2)` is, I believe, arbitrary: the
# rest of the algorithm doesn't really care if/how the eigenvalues are scaled,
# since we will rescale the grayscale values to [0, 255] anyway.
eigvals = np.diag(S**2 / norm(S**2))
# Eigenvectors are just the left-singular vectors.
eigvecs = U;
# Project the YCrCb data onto the principal components and reshape to W by H
# array.
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0).reshape(Iycc.shape[:2])
# Rescale Igray to [0, 255]. This is a fancy way to do this.
from scipy.interpolate import interp1d
Igray = np.floor((interp1d([Igray.min(), Igray.max()],
[0.0, 256.0 - 1e-4]))(Igray))
# Make sure we don't accidentally produce a photographic negative (flip image
# intensities). N.B.: `norm` is often expensive; in real life, try to see if
# there's a more efficient way to do this.
if norm(Iycc[:,:,0] - Igray) > norm(Iycc[:,:,0] - (255.0 - Igray)):
Igray = 255 - Igray
# Display result
if True:
import pylab
pylab.ion()
pylab.imshow(Igray, cmap='gray')
# Save result
cv2.imwrite('peppers-gray.png', Igray.astype(np.uint8))
This produces the following grayscale image, which seems to match the result in Figure 4 of the paper (though see caveat at the bottom of this answer!):
Errors in your implementation
Izycc = Iycc - Iycc.mean() WRONG. Iycc.mean() flattens the image and computes the mean. You want Izycc such that the Y channel, Cr channel, and Cb channel all have zero-mean. You could do this in a for dim in range(3)-loop, but I did it above with array broadcasting. I also have an assert above to make sure this condition holds. The trick where you get the eigendecomposition of the covariance matrix from the SVD of the data array requires zero-mean Y/Cr/Cb channels.
np.linalg.eig(Izycc[:,:,i]) WRONG. The contribution of this paper is to use principal components to convert color to grayscale. This means you have to combine the colors. The processing you were doing above was on a channel-by-channel basis—no combination of colors. Moreover, it was totally wrong to decompose the 500×500 array: the width/height of the array don’t matter, only pixels. For this reason, I reshape the three channels of the input into 3×whatever and operate on that matrix. Make sure you understand what’s happening after BGR-to-YCrCb conversion and before the SVD.
Not so much an error but a caution: when calling numpy.linalg.svd, the full_matrices=False keyword is important: this makes the “economy-size” SVD, calculating just three left/right singular vectors and just three singular values. The full-sized SVD will attempt to make an N×N array of right-singular vectors: with N = 114270 pixels (293 by 390 image), an N×N array of float64 will be N ** 2 * 8 / 1024 ** 3 or 97 gigabytes.
Final note
The magic of this algorithm is really in a single line from my code:
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0) # .reshape...
This is where The Math is thickest, so let’s break it down.
Izycc is a 3×N array whose rows are zero-mean;
eigvals is a 3×3 diagonal array containing the eigenvalues of the covariance matrix dot(Izycc, Izycc.T) (as mentioned above, computed via a shortcut, using SVD of Izycc),
eigvecs is a 3×3 orthonormal matrix whose columns are the eigenvectors corresponding to those eigenvalues of that covariance.
Because these are Numpy arrays and not matrixes, we have to use dot(x,y) for matrix-matrix-multiplication, and then we use sum, and both of these obscure the linear algebra. You can check for yourself but the above calculation (before the .reshape() call) is equivalent to
np.ones([1, 3]) · eigvecs.T · eigvals · Izycc = dot([[-0.79463857, -0.18382267, 0.11589724]], Izycc)
where · is true matrix-matrix-multiplication, and the sum is replaced by pre-multiplying by a row-vector of ones. Those three numbers,
-0.79463857 multiplying each pixels’s Y-channel (luma),
-0.18382267 multiplying Cr (red-difference), and
0.11589724 multiplying Cb (blue-difference),
specify the “perfect” weighted average, for this particular image: each pixel’s Y/Cr/Cb channels are being aligned with the image’s covariance matrix and summed. Numerically speaking, each pixel’s Y-value is slightly attenuated, its Cr-value is significantly attenuated, and its Cb-value is even more attenuated but with an opposite sign—this makes sense, we expect the luma to be most informative for a grayscale so its contribution is the highest.
Minor caveat
I’m not really sure where OpenCV’s RGB to YCrCb conversion comes from. The documentation for cvtColor, specifically the section on RGB ↔︎ YCrCb JPEG doesn’t seem to correspond to any of the transforms specified on Wikipedia. When I use, say, the Colorspace Transformations Matlab package to just do the RGB to YCrCb conversion (which cites the Wikipedia entry), I get a nicer grayscale image which appears to be more similar to the paper’s Figure 4:
I’m totally out of my depth when it comes to these color transformations—if someone can explain how to get Wikipedia or Matlab’s Colorspace Transformations equivalents in Python/OpenCV, that’d be very kind. Nonetheless, this caveat is about preparing the data. After you make Izycc, the 3×N zero-mean data array, the above code fully-specifies the remaining processing.
As part of a digital image processing class, we have been assigned the Inverse Filter for image restoration. I'm using numpy. The variable names below try to follow the names in Digital Image Processing Gonzalez+Woods, 3e.
A zoom of the original image.
.
Gaussian kernel "zz.tif" same size as original image.
Zoom of the gaussian smoothed image with no noise added
f = imtools.load_image( sys.argv[1], mode="L", dtype="float" )
zz = imtools.load_image( "zz.tif", mode="L", dtype="float" )
F = np.fft.fft2( f )
F2 = np.fft.fftshift( F )
# normalize to [0,1]
H = zz/255.
# calculate the damaged image
G = H * F2
# Inverse Filter
F_hat = G / H
# cheat? replace division by zero (NaN) with zeroes
a = np.nan_to_num(F_hat)
f_hat = np.fft.ifft2( np.fft.ifftshift(a) )
imtools.save_image( np.abs(f_hat), "out.tif" )
imtools is just my wrapper using PIL+numpy to load/store images. (Can post that src, too.)
Zoom of the inverse filtered image.
Am I calculating the Inverse Filter correctly? Am I using numpy correctly?
Is the ringing in the final image expected or am I doing something wrong?
Generally, yes you seem to be doing things correctly, as far as I know.
The ringing is due to an overly "sharp" high pass filter, but that's what the method you're using does.
However, you might consider using numpy.fft.rfft2 ("real fft") and numpy.fft.irfft2 instead of numpy.fft.fft2 and numpy.fft.ifft2 because you're dealing purely with real values. It should be slightly faster.
I don't know much about Python but the 'ringing' is normal for the inverse filter. The Gibbs phenomenon lies at the basis of the ringing. Since the input is not entirely smooth but has some discontinuities, an infinite number of Fourier components is in principle needed to represent it completely. A finite number of components is sufficient here since the display resolution is finite, the image is pixelated. However, some information is lost in the recorded image because of the multiplication by zeros in H, by consequence the restored image approximates the input image with components covering a finite bandwidth, lower than that of the display, revealing the Gibbs oscillations.
To mitigate this use proper regularization as with a 2D Wiener filter: F_hat=G * H.conjugate()/(abs(H)2+NSR2) where NSR is an estimate of the noise to signal ratio, e.g. linearly increasing from 0 to 10 at the highest spatial frequency. This will account for the finite signal to noise ratio and when the NSR estimate is close enough you should see little 'ringing' after restoration.