Geometric mean filter with opencv - python

I want to apply a geometric mean filter on an image in opencv (python). Is there a builtin function or should I implement the filter myself? What is the most efficient way to implement a nonlinear filter in opencv?

Recall from logarithmic identities that
log((x1 * x2 * ... * xn)^(1/n)) = (1/n) * (log(x1) + log(x2) + ... + log(xn))
From Wikipedia:
The geometric mean can also be expressed as the exponential of the arithmetic mean of logarithms. By using logarithmic identities to transform the formula, the multiplications can be expressed as a sum and the power as a multiplication.
This means that a geometric mean can be simply calculated as an arithmetic mean, i.e. a cv2.boxFilter() of the logarithm of the image values. Then you just exponentiate the result and you're done!
For e.g., let's test the manual method and this method and check the results. First load the image and define the kernel size:
import cv2
import numpy as np
img = cv2.imread('cameraman.png', cv2.IMREAD_GRAYSCALE).astype(float)
rows, cols = img.shape[:2]
ksize = 5
Next let's pad the image and calculate the geometric mean manually:
padsize = int((ksize-1)/2)
pad_img = cv2.copyMakeBorder(img, *[padsize]*4, cv2.BORDER_DEFAULT)
geomean1 = np.zeros_like(img)
for r in range(rows):
for c in range(cols):
geomean1[r, c] = np.prod(pad_img[r:r+ksize, c:c+ksize])**(1/(ksize**2))
geomean1 = np.uint8(geomean1)
cv2.imshow('1', geomean1)
cv2.waitKey()
Looks like what we'd expect. Now instead of this, if we use the use the logarithmic version, all we need to do is take the exponential of the box filter running on the log of the image:
geomean2 = np.uint8(np.exp(cv2.boxFilter(np.log(img), -1, (ksize, ksize))))
cv2.imshow('2', geomean2)
cv2.waitKey()
Well, they certainly look the same. Actually I cheated, this is the same uploaded image as above. But that's okay because:
print(np.array_equal(geomean1, geomean2))
True

Related

Fastest method to convolve a gaussian with an image containing a single delta for python

I have a images, each with a single value of 1 (delta) within it and previously known sigma. Reproduction of a single example:
img = np.zeros((40,40))
idx1 = np.random.randint(0, img.shape[0])
idx2 = np.random.randint(0, img.shape[1])
img [idx1, idx2] = 1
I wish to convolve each image with it's respected sigma value, such as in:
out_image = scipy.ndimage.filters.gaussian_filter(img, sigma, mode='constant')
The thing is since it is only a single delta, the output will just be to substitute the gaussian's values into the image, centered around the location of the delta. Will it be faster to implement this? If so, how do I generate the sigma filter? Maybe there is a faster sparse representation in skimage or cv2 which can make a faster job for me?
What will be the most efficient way (in terms of execution time), to repeatedly calculate such a case, given that the location of the delta and sigma size changes each time?
why not construct a Gaussian...?
idx1 = np.random.randint(0, img.shape[0])
idx2 = np.random.randint(0, img.shape[1])
yg,xg = np.mgrid[range(0,img.shape[0]),range(0,img.shape[1])]
simga=5
img =np.exp(-0.5/sigma**2*((xg-idx2)**2+(yg-idx1)**2))
img=img/np.sum(img)
UPDATE
Since your Gaussian is out of bounds, it does not always sum to 1 (due to the tails), and your new image does not sums to 1. If you would like to consider boundary conditions, you would need to calculate the double integral sum on your Gaussian, which is not straightforward:
https://www.wolframalpha.com/input?i=integrate+%28integrate+e%5E%28-0.5%2Fs%5E2*%28%28x-x_0%29%5E2%2B%28y-y_0%29%5E2%29%29+dx+from+x%3D0+to+w+%29+dy+from+y%3D0+to+h

Recovering original image after blurring

I have the following code that applies Gaussian filter to an arbitrary image 25 times. Every time the filter is applied, the resulting image is normalized.
kernel = np.array([[1.0,2.0,1.0],
[2.0,4.0,2.0],
[1.0,2.0,1.0]])
for i in range(25):
# handle each component separately
img[:,:,0] = convolve(img[:,:,0], kernel, mode='same')
img[:,:,1] = convolve(img[:,:,1], kernel, mode='same')
img[:,:,2] = convolve(img[:,:,2], kernel, mode='same')
img = img / 16 # normalize
What is the best way to reverse this process? I.e. if I have a blurred image (result of executing the code above) and want to get the original.
Edit 1:
Example
Original :
Blurred:
Edit 2:
Attempt at reproducing Cris's answer
I installed dipimage_2.9. I am using macOS 10.14.2 with Matlab R2016a.
It took me a while to get how to specify boundary conditions for convolutions, since DIPimage's convolve.m only accepts image_in and kernel args. I ended up using dip_setboundary for that (DIPimage User Manual section 9.2).
Here's the code (I simply added dip_setboundary accordingly and the origin of crop region for cut):
% Get data
a = readim('https://i.stack.imgur.com/OfSx2.png'); % using local path in real code
a = a{1}; % Keep only red channel
%% Create kernel
kernel = [1.0,2.0,1.0
2.0,4.0,2.0
1.0,2.0,1.0] / 16;
tmp = deltaim((size(kernel)-1)*25+1);
dip_setboundary('add_zeros');
for ii=1:25
tmp = convolve(tmp,kernel);
end
kernel = tmp;
%% Apply convolution
dip_setboundary('periodic');
b = convolve(a,kernel);
dip_setboundary('symmetric'); % change back to default
% Find inverse operation
% 1- pad stuff so image and kernel have the same size
% we maintain the periodic boundary condition for image b
b = repmat(b,ceil(imsize(kernel)./imsize(b)));
kernel = extend(kernel,imsize(b));
% 2- apply something similar to Wiener deconvolution
c = real(ift(ft(b)/(ft(kernel)+1e-6))); % Not exactly Wiener, but there's no noise!
% 3- undo padding
c = cut(c,imsize(a), [0, 0]); % upper left corner
Here's the resulting image c:
Let's look at the code in the question for a single channel, assuming img is a gray-scale image -- everything here can be applied per channel, so we don't need to repeat everything three times:
for i in range(25):
img = ndimage.convolve(img, kernel)
img = img / 16 # normalize
We'll get to undoing the convolution in a minute. First let's simplify the operation applied.
Simplify the processing
The above is identical (within numerical precision) to:
kernel = kernel / 16 # Normalize
for i in range(25):
img = ndimage.convolve(img, kernel)
This is true as long as img is not some integer type where clipping and/or rounding occurs. In general, with * the convolution, and C some constant,
g = C (f * h) = f * (C h)
Next, we know that applying the convolution 25 times is the same as applying the convolution once with a composite kernel,
g = (((f * h) * h) * h) * h = f * (h * h * h * h)
How can we obtain the composite kernel? Applying a convolution to an image that is all zeros and with a 1 in the middle pixel yields the kernel again, so that
delta = np.zeros(kernel.shape)
delta[delta.shape[0]//2, delta.shape[1]//2] = 1
kernel2 = ndimage.convolve(delta, kernel)
kernel2 == kernel # is true everywhere, up to numerical precision
Thus, the following code finds the kernel that is used to smooth the image in the question:
kernel = np.array([[1.0,2.0,1.0],
[2.0,4.0,2.0],
[1.0,2.0,1.0]]) / 16
delta = np.zeros(((kernel.shape[0]-1)*25+1, (kernel.shape[1]-1)*25+1))
delta[delta.shape[0]//2, delta.shape[1]//2] = 1
for i in range(25):
delta = ndimage.convolve(delta, kernel)
kernel = delta
This kernel will be very similar to a Gaussian kernel due to the central limit theorem.
Now we can obtain the same output as in the question with a single convolution:
output = ndimage.convolve(img, kernel)
Inverting the convolution
The process of inverse filtering is called deconvolution. In theory this is a very trivial process, but in practice it is very difficult due to noise, an inexact knowledge of the kernel, etc.
We know that we can compute the convolution through the Fourier domain:
output = np.convolve(img, kernel, mode='wrap')
is the same as
output = np.real(np.fft.ifft2( np.fft.fft2(img) * np.fft.fft2(np.fft.ifftshift(kernel)) ))
(assuming that kernel is the same size as img, we'd typically have to pad it with zeros first). Any differences between spatial and frequency domain operation results are caused by the how the image is extended past its boundary when using convolve. The Fourier method assumes a periodic boundary condition, this is why I used the 'wrap' mode for the convolution here.
The inverse operation is simply a division in the Fourier domain:
img = np.real(np.fft.ifft2( np.fft.fft2(output) / np.fft.fft2(np.fft.ifftshift(kernel)) ))
For this to work, we need to know the exact values of kernel, and there should be no noise added in the process. For output computed as above, this should give the exact result in theory
However, some kernels could be exactly zero for some frequency components (i.e. np.fft.fft2(np.fft.ifftshift(kernel)) contains zeros). These frequencies cannot be recovered, and dividing by 0 will lead to NaN values that will spread through the whole image in the inverse transform, the inverse image will be all NaNs.
For a Gaussian kernel there are no zeros, so this should not happen. However there will be many frequencies that are very nearly zero. The Fourier transform of output will therefore also have very small value for these elements. The inverse process then is the division of a very small value by another very small value, causing numerical precision issues.
And you can see how this process, if there is only a very little bit of noise, will greatly enhance this noise, such that the output is given almost entirely by this noise.
The Wiener deconvolution includes regularization to prevent these issues with noise and numerical imprecision. Basically, you prevent the division by very small numbers by adding a positive value to the Fourier transform of kernel. Wikipedia has a good description of Wiener deconvolution.
Demo
I'm using MATLAB with DIPimage 3 here to do a quick demo (much less effort for me than firing up Python and figuring out how to do all of this there). This is the code:
% Get data
a = readim('https://i.stack.imgur.com/OfSx2.png');
a = a{1}; % Keep only red channel
% Create kernel
kernel = [1.0,2.0,1.0
2.0,4.0,2.0
1.0,2.0,1.0] / 16;
tmp = deltaim((size(kernel)-1)*25+1);
for ii=1:25
tmp = convolve(tmp,kernel,'add zeros');
end
kernel = tmp;
% Apply convolution
b = convolve(a,kernel,'periodic');
% Find inverse operation
% 1- pad stuff so image and kernel have the same size
% we maintain the periodic boundary condition for image b
b = repmat(b,ceil(imsize(kernel)./imsize(b)));
kernel = extend(kernel,imsize(b));
% 2- apply something similar to Wiener deconvolution
c = ift(ft(b)/(ft(kernel)+1e-6),'real'); % Not exactly Wiener, but there's no noise!
% 3- undo padding
c = cut(c,imsize(a),'top left');
This is the output, the top third is the input image, the middle third is the blurred image, the bottom third is the output image:
Important to note here is that I used a periodic boundary condition for the initial convolution, which matches what happens in the Fourier transform. Other boundary conditions will cause artefacts in the inverse transform near the edges. Because the kernel size is larger than the image, the whole image will be one big artefact, and you won't be able to recover anything. Also note that, to pad the kernel with zeros to the size of the image, I had to replicate the image, since the kernel is larger than the image. Replicating the image matches again the periodic boundary condition imposted by the Fourier transform. Both of these tricks could be ignored if the input image were much larger than the convolution kernel, as you would expect in a normal situation.
Also note that, without the regularization in the deconvolution, the output is all NaN, because we are dividing very small values by very small values. The Fourier transform of the kernel has a lot of near-zeros in it because the blurring is quite heavy.
Finally, note that adding even a small amount of noise to the blurred image will make it impossible to deconvolve the image in a way that the text can be read. The inverse transform will look very nice, but the text strokes will be distorted enough for the letters to no longer be easily recognizable:
The code above uses DIPimage 3, which doesn't yet have an official binary to install, it needs to be build from source. To run the code using DIPimage 2.x a few changes are necessary:
The boundary condition must be set using dip_setboundary, instead of being able to pass it directly to the convolve function. The strings 'add zeros' and 'periodic' are the boundary condition.
The ft and ift functions use a symmetric normalization, each multiplies their output by 1/sqrt(prod(imsize(image))), whereas in DIPimage 3 the normalization is the more common multiplication by 1/prod(imsize(image)) for ift, and 1 for ft. This means that the Fourier transform of kernel must be multiplied by sqrt(prod(imsize(kernel))) to match the result of DIPimage 3:
c = real(ift(ft(b)/((ft(kernel)*sqrt(prod(imsize(kernel))))+1e-6)));
You cant - blurring looses information by averaging.
Consider a 1-dim example:
[1 2 1] on [1,2,3,4,5,6,7] assuming 0 for missing "pixel" on convolution
results in [4, 8, 12, 16, 20, 24, 20]. The 8 could come from [1,2,3] but also from [2,1,4] - so you already have 2 different solutions. Wich ever you take , influences whatever values might have been the source for 12.
This is an overly simplyfied example - you can solve this - but in image processing you might deal with 3000*2000 pixels and 2d-convolutions by 3x3,5x5,7x7,... matrices making the reversing impractical.
Make this two dimensional you might be able to solve it mathematically - but more often then not you get a miriad of solutions and very complex constrains to solve it if you apply this to a 2-dimensional convolution and 3000*2000 pixels.

How to denoise an image using least square and regularization?

f = u+n: f is noisy image, u is an desired reconstruction and n is noise.
The reconstruction error is ||u-f||_2^2 + lambda * ||gradient(u)||_2^2
Solve ||Ax-b||_2^2 where x is a vector that is vectorised from f in column-wise.
the above is my problem and I can't understand what means "solve ||Ax-b||_2^2".
what is 'A'? what is 'b'? How can get 'the reconstruction'?
I know the simple way of find minimizing least square using pseudo inverse.
But I just adjusted the way on find θ in ||Aθ-b||^2.
I don't know what I have to do. So I did what can I do.
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
from skimage import io, color
from skimage import exposure
file_image = 'image.jpg'
im_color = io.imread(file_image)
im_gray = color.rgb2gray(im_color)
im = (im_gray - np.mean(im_gray)) / np.std(im_gray)
(row, col) = im.shape
noise_std = 0.2 # try with varying noise standard deviation
noise = np.random.normal(0, noise_std, (row, col))
im_noise = im + noise
I made a noisy image. and I don't know next step.
Is there anyone who can explain?
This very much looks like a poorly phrased homework question. I have a fair background in mathematical image processing and inverse problems so I rewrote it for you the only way it makes sense.
Let f be a noisy image described by the relationship f = u+n,
where u is a noise-free image and n is the noise. The goal is to
recover u from n. To do this, we introduce the following function
||u - f||²,
which is equal to the squared summed difference between all pixels in
u and f, to measure the similarity between u and f. Furthermore, we introduce the following function to measure the amount
of noise in the image
||Du||²,
where Du(x, y) represents the magnitude of the gradient of u at
position (x, y), as a measure of the noise in an image. By
||Du||², we therefore mean the squared sum of the gradient in all pixels.
A way to measure how well we have reconstructed the noise-free image can then be represented by the following function
||u - f||² + ||Du||²
Solve the regularised least squares problem above.

How to extract only 3 eigen vectors of an nxn image in opencv?

I am trying to convert a RGB image to Grayscale using the following paper.
The main algorithm using in the paper is this:
Novel PCA based algorithm to convert images to grayscale
However, when I am trying to extract eigen vectors from the image I am getting 500 eigen values, instead of 3 as required. As far as I know, a NxN matrix usually gives N Eigen vectors, but I am not really sure what I should be doing here to get only 3 Eigen vectors.
Any help as to what I should do? Here's my code so far:
import numpy as np
import cv2
def pca_rgb2gray(img):
"""
NOVEL PCA-BASED COLOR-TO-GRAY IMAGE CONVERSION
Authors:
-Ja-Won Seo
-Seong Dae Kim
2013 IEEE International Conference on Image Processing
"""
I_re = cv2.resize(img, (500,500))
Iycc = cv2.cvtColor(I_re, cv2.COLOR_BGR2YCrCb)
Izycc = Iycc - Iycc.mean()
eigvals = []
eigvecs = []
final_im = []
for i in range(3):
res = np.linalg.eig(Izycc[:,:,i])
eigvals.append(res[0])
eigvecs.append(res[1])
eignorm = np.linalg.norm(eigvals)
for i in range(3):
eigvals[i]/=eignorm
eigvecs[i]/=np.linalg.norm(eigvecs[i])
temp = eigvals[i] * np.dot(eigvecs[i], Izycc[:,:,i])
final_im.append(temp)
final_im = final_im[0] + final_im[1] + final_im[2]
return final_im
if __name__ == '__main__':
img = cv2.imread('image.png')
gray = pca_rgb2gray(img)
The accepted answer by Ahmed unfortunately has the PCA math wrong, leading to the a result quite different to the manuscript. Here are the images screen captured from the manuscript.
The mean centring and SVD should be done along the other dimension, with the channels treated as the different samples. The mean centring is aimed at getting an average pixel response of zero, not an average channel response of zero.
The linked algorithm also clearly states that the projection of the PCA model involves multiplication of the image by the scores first and this product by the eigenvalues, not the other way round as in the other answer.
For further info on the math see my PCA math answer here
The difference in the code can be seen in the outputs. Since the manuscript did not provide an example output (that I found) there may be subtle differences between the results as the manuscript ones are captured screenshots.
For comparison, the downloaded colour file, which is a little more contrasted than the screenshot, so one would expect the same from the output greyscale.
First the result from Ahmed's code:
Then the result from the updated code:
The corrected code (based on Ahmed's for ease of comparison) is
import numpy as np
import cv2
from numpy.linalg import svd, norm
# Read input image
Ibgr = cv2.imread('path/peppers.jpg')
#Convert to YCrCb
Iycc = cv2.cvtColor(Ibgr, cv2.COLOR_BGR2YCR_CB)
# Reshape the H by W by 3 array to a 3 by N array (N = W * H)
Izycc = Iycc.reshape([-1, 3]).T
# Remove mean along Y, Cr, and Cb *separately*!
Izycc = Izycc - Izycc.mean(0) #(1)[:, np.newaxis]
# Mean across channels is required (separate means for each channel is not a
# mathematically sensible idea) - each pixel's variation should centre around 0
# Make sure we're dealing with zero-mean data here: the mean for Y, Cr, and Cb
# should separately be zero. Recall: Izycc is 3 by N array.
# Original assertion was based on a false presmise. Mean value for each pixel should be 0
assert(np.allclose(np.mean(Izycc, 0), 0.0))
# Compute data array's SVD. Ignore the 3rd return value: unimportant in this context.
(U, S, L) = svd(Izycc, full_matrices=False)
# Square the data's singular vectors to get the eigenvalues. Then, normalize
# the three eigenvalues to unit norm and finally, make a diagonal matrix out of
# them.
eigvals = np.diag(S**2 / norm(S**2))
# Eigenvectors are just the right-singular vectors.
eigvecs = U;
# Project the YCrCb data onto the principal components and reshape to W by H
# array.
# This was performed incorrectly, the published algorithm shows that the eigenvectors
# are multiplied by the flattened image then scaled by eigenvalues
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0).reshape(Iycc.shape[:2])
Igray2 = np.dot(eigvals, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
eigvals3 = eigvals*[1,-1,1]
Igray3 = np.dot(eigvals3, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
eigvals4 = eigvals*[1,-1,-1]
Igray4 = np.dot(eigvals4, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
# Rescale Igray to [0, 255]. This is a fancy way to do this.
from scipy.interpolate import interp1d
Igray = np.floor((interp1d([Igray.min(), Igray.max()],
[0.0, 256.0 - 1e-4]))(Igray))
Igray2 = np.floor((interp1d([Igray2.min(), Igray2.max()],
[0.0, 256.0 - 1e-4]))(Igray2))
Igray3 = np.floor((interp1d([Igray3.min(), Igray3.max()],
[0.0, 256.0 - 1e-4]))(Igray3))
Igray4 = np.floor((interp1d([Igray4.min(), Igray4.max()],
[0.0, 256.0 - 1e-4]))(Igray4))
# Make sure we don't accidentally produce a photographic negative (flip image
# intensities). N.B.: `norm` is often expensive; in real life, try to see if
# there's a more efficient way to do this.
if norm(Iycc[:,:,0] - Igray) > norm(Iycc[:,:,0] - (255.0 - Igray)):
Igray = 255 - Igray
if norm(Iycc[:,:,0] - Igray2) > norm(Iycc[:,:,0] - (255.0 - Igray2)):
Igray2 = 255 - Igray2
if norm(Iycc[:,:,0] - Igray3) > norm(Iycc[:,:,0] - (255.0 - Igray3)):
Igray3 = 255 - Igray3
if norm(Iycc[:,:,0] - Igray4) > norm(Iycc[:,:,0] - (255.0 - Igray4)):
Igray4 = 255 - Igray4
# Display result
if True:
import pylab
pylab.ion()
fGray = pylab.imshow(Igray, cmap='gray')
# Save result
cv2.imwrite('peppers-gray.png', Igray.astype(np.uint8))
fGray2 = pylab.imshow(Igray2, cmap='gray')
# Save result
cv2.imwrite('peppers-gray2.png', Igray2.astype(np.uint8))
fGray3 =pylab.imshow(Igray3, cmap='gray')
# Save result
cv2.imwrite('peppers-gray3.png', Igray3.astype(np.uint8))
fGray4 =pylab.imshow(Igray4, cmap='gray')
# Save result
cv2.imwrite('peppers-gray4.png', Igray4.astype(np.uint8))
****EDIT*****
Following Nazlok's query about the instability of eigenvector direction (which direction any one eigenvectors is oriented in is arbitrary, so there is not guarantee that different algorithms (or single algorithms without a reproducible standardisation step for orientation) would give the same result. I have now added in two extra examples, where I have simply switched the sign of the eigenvectors (number 2 and numbers 2 and 3). The results are again different, with the switching of only PC2 giving a much lighter tone, while switching 2 and 3 is similar (not surprising as the exponential scaling relegates the influence of PC3 to very little). I'll leave that last one for people bothered to run the code.
Conclusion
Without clear additional steps taken to provide a repeatable and reproducible orientation of PCs this algorithm is unstable and I personally would not be comfortable employing it as is. Nazlok's suggestion of using the balance of positive and negative intensities could provide a rule but would need validated so is out of scope of this answer. Such a rule however would not guarantee a 'best' solution, just a stable one. Eigenvectors are unit vectors, so are balanced in variance (square of intensity). Which side of zero has the largest sum of magnitudes is only telling us which side has individual pixels contributing larger variances which I suspect is generally not very informative.
Background
When Seo and Kim ask for lambda_i, v_i <- PCA(Iycc), for i = 1, 2, 3, they want:
from numpy.linalg import eig
lambdas, vs = eig(np.dot(Izycc, Izycc.T))
for a 3×N array Izycc. That is, they want the three eigenvalues and eigenvectors of the 3×3 covariance matrix of Izycc, the 3×N array (for you, N = 500*500).
However, you almost never want to compute the covariance matrix, then find its eigendecomposition, because of numerical instability. There is a much better way to get the same lambdas, vs, using the singular value decomposition (SVD) of Izycc directly (see this answer). The code below shows you how to do this.
Just show me the code
First download http://cadik.posvete.cz/color_to_gray_evaluation/img/155_5572_jpg/155_5572_jpg.jpg and save it as peppers.jpg.
Then, run the following:
import numpy as np
import cv2
from numpy.linalg import svd, norm
# Read input image
Ibgr = cv2.imread('peppers.jpg')
# Convert to YCrCb
Iycc = cv2.cvtColor(Ibgr, cv2.COLOR_BGR2YCR_CB)
# Reshape the H by W by 3 array to a 3 by N array (N = W * H)
Izycc = Iycc.reshape([-1, 3]).T
# Remove mean along Y, Cr, and Cb *separately*!
Izycc = Izycc - Izycc.mean(1)[:, np.newaxis]
# Make sure we're dealing with zero-mean data here: the mean for Y, Cr, and Cb
# should separately be zero. Recall: Izycc is 3 by N array.
assert(np.allclose(np.mean(Izycc, 1), 0.0))
# Compute data array's SVD. Ignore the 3rd return value: unimportant.
(U, S) = svd(Izycc, full_matrices=False)[:2]
# Square the data's singular vectors to get the eigenvalues. Then, normalize
# the three eigenvalues to unit norm and finally, make a diagonal matrix out of
# them. N.B.: the scaling factor of `norm(S**2)` is, I believe, arbitrary: the
# rest of the algorithm doesn't really care if/how the eigenvalues are scaled,
# since we will rescale the grayscale values to [0, 255] anyway.
eigvals = np.diag(S**2 / norm(S**2))
# Eigenvectors are just the left-singular vectors.
eigvecs = U;
# Project the YCrCb data onto the principal components and reshape to W by H
# array.
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0).reshape(Iycc.shape[:2])
# Rescale Igray to [0, 255]. This is a fancy way to do this.
from scipy.interpolate import interp1d
Igray = np.floor((interp1d([Igray.min(), Igray.max()],
[0.0, 256.0 - 1e-4]))(Igray))
# Make sure we don't accidentally produce a photographic negative (flip image
# intensities). N.B.: `norm` is often expensive; in real life, try to see if
# there's a more efficient way to do this.
if norm(Iycc[:,:,0] - Igray) > norm(Iycc[:,:,0] - (255.0 - Igray)):
Igray = 255 - Igray
# Display result
if True:
import pylab
pylab.ion()
pylab.imshow(Igray, cmap='gray')
# Save result
cv2.imwrite('peppers-gray.png', Igray.astype(np.uint8))
This produces the following grayscale image, which seems to match the result in Figure 4 of the paper (though see caveat at the bottom of this answer!):
Errors in your implementation
Izycc = Iycc - Iycc.mean() WRONG. Iycc.mean() flattens the image and computes the mean. You want Izycc such that the Y channel, Cr channel, and Cb channel all have zero-mean. You could do this in a for dim in range(3)-loop, but I did it above with array broadcasting. I also have an assert above to make sure this condition holds. The trick where you get the eigendecomposition of the covariance matrix from the SVD of the data array requires zero-mean Y/Cr/Cb channels.
np.linalg.eig(Izycc[:,:,i]) WRONG. The contribution of this paper is to use principal components to convert color to grayscale. This means you have to combine the colors. The processing you were doing above was on a channel-by-channel basis—no combination of colors. Moreover, it was totally wrong to decompose the 500×500 array: the width/height of the array don’t matter, only pixels. For this reason, I reshape the three channels of the input into 3×whatever and operate on that matrix. Make sure you understand what’s happening after BGR-to-YCrCb conversion and before the SVD.
Not so much an error but a caution: when calling numpy.linalg.svd, the full_matrices=False keyword is important: this makes the “economy-size” SVD, calculating just three left/right singular vectors and just three singular values. The full-sized SVD will attempt to make an N×N array of right-singular vectors: with N = 114270 pixels (293 by 390 image), an N×N array of float64 will be N ** 2 * 8 / 1024 ** 3 or 97 gigabytes.
Final note
The magic of this algorithm is really in a single line from my code:
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0) # .reshape...
This is where The Math is thickest, so let’s break it down.
Izycc is a 3×N array whose rows are zero-mean;
eigvals is a 3×3 diagonal array containing the eigenvalues of the covariance matrix dot(Izycc, Izycc.T) (as mentioned above, computed via a shortcut, using SVD of Izycc),
eigvecs is a 3×3 orthonormal matrix whose columns are the eigenvectors corresponding to those eigenvalues of that covariance.
Because these are Numpy arrays and not matrixes, we have to use dot(x,y) for matrix-matrix-multiplication, and then we use sum, and both of these obscure the linear algebra. You can check for yourself but the above calculation (before the .reshape() call) is equivalent to
np.ones([1, 3]) · eigvecs.T · eigvals · Izycc = dot([[-0.79463857, -0.18382267, 0.11589724]], Izycc)
where · is true matrix-matrix-multiplication, and the sum is replaced by pre-multiplying by a row-vector of ones. Those three numbers,
-0.79463857 multiplying each pixels’s Y-channel (luma),
-0.18382267 multiplying Cr (red-difference), and
0.11589724 multiplying Cb (blue-difference),
specify the “perfect” weighted average, for this particular image: each pixel’s Y/Cr/Cb channels are being aligned with the image’s covariance matrix and summed. Numerically speaking, each pixel’s Y-value is slightly attenuated, its Cr-value is significantly attenuated, and its Cb-value is even more attenuated but with an opposite sign—this makes sense, we expect the luma to be most informative for a grayscale so its contribution is the highest.
Minor caveat
I’m not really sure where OpenCV’s RGB to YCrCb conversion comes from. The documentation for cvtColor, specifically the section on RGB ↔︎ YCrCb JPEG doesn’t seem to correspond to any of the transforms specified on Wikipedia. When I use, say, the Colorspace Transformations Matlab package to just do the RGB to YCrCb conversion (which cites the Wikipedia entry), I get a nicer grayscale image which appears to be more similar to the paper’s Figure 4:
I’m totally out of my depth when it comes to these color transformations—if someone can explain how to get Wikipedia or Matlab’s Colorspace Transformations equivalents in Python/OpenCV, that’d be very kind. Nonetheless, this caveat is about preparing the data. After you make Izycc, the 3×N zero-mean data array, the above code fully-specifies the remaining processing.

Sliding Gabor Filter in python

Taken from the gabor filter example from skimage calculating a gabor filter for an image is easy:
import numpy as np
from scipy import ndimage as nd
from skimage import data
from skimage.util import img_as_float
from skimage.filter import gabor_kernel
brick = img_as_float(data.load('brick.png'))
kernel = np.real(gabor_kernel(0.15, theta = 0.5 * np.pi,sigma_x=5, sigma_y=5))
filtered = nd.convolve(brick, kernel, mode='reflect')
mean = filtered.mean()
variance = filtered.var()
brick is simply a numpy array. Suppose I have a 5000*5000 numpy array. What I want to achieve is to generate two new 5000*5000 numpy arrays where the pixels are the mean and var values of the gabor filter of the 15*15 window centered on them.
Could anyone help me achieve this?
EDIT
¿Why did I get downvoted? Anyway, to clarify I show an example on how to calculate a gabor filter on a single image. I would like to simply calculate a gabor filter on small square subsets of a very large image (hence the sliding window).
There no standard methods to do this (that I know of), but you can do it yourself directly.
Each pixel in the convolution is the sum of the values of the shift gabor filter times the image pixels. That is, each pixel in the convolution is basically the mean to within a constant normalization factor, so filtered is basically your mean.
The variance is a bit more difficult since that is the sum of the squares, and of course, you need to calculate the sqaures before you calculate the sums. But, you can do this easy enough by pre-squaring both the image and the kernel, that is:
N = kernel.shape[0]*kernel.shape[1]
mean = nd.convolve(brick, kernel, mode='reflect')/N
var = nd.convolve(brick*brick, kernel*kernel, mode='reflect')/N - mean*mean
If you just want to calculate the sliding average of an image (convolution with a square kernel with all 1's), the fast method is:
# fsize is the filter size in pixels
# integrate in the X direction
r_sum = numpy.sum(img[:, :fsize], axis=1)
r_diff = img[:, fsize:] - img[:, :-fsize]
r_int = numpy.cumsum(numpy.hstack((r_sum.reshape(-1,1), r_diff)), axis=1)
# integrate in the Y direction
c_sum = numpy.sum(r_img[:fsize, :], axis=0)
c_diff = r_img[fsize:, :] - r_img[:-fsize, :]
c_int = numpy.cumsum(numpy.vstack((c_sum, c_diff)), axis=0)
# now we have an array of sums, average can be obtained by division
avg_img = c_int / (f_size * f_size)
This method returns an image which is size-1 pixels smaller in both directions, so you'll have to take care of border effects yourself. The edge most pixels are bad anyway, but it is up to you to choose the correct border fill, if you need one. The algorithm is the fastest way to obtain the mean (fewest calculations), especially much faster than numpy.convolve.
Similar trickery can be used in calculating the variance, if both the image and its square are averaged as above. Then
npts = fsize * fsize
variance = (rolling_sum(img**2) - rolling_sum(img)/npts) / npts
where rolling_sum is a sliding sum (i.e. the algorithm above without the last division). So, only two rolling sums (image and its square) are required to calculate the rolling variance.
(Warning: the code above is untested, it is there just to illustrate the idea.)

Categories

Resources