I want to transform and align a detected face (320x240 Size) from a CelebA image (1024x1024 Size) using OpenCV's cv2.warpAffine function but the quality of the transformed image is significantly lower than when I try to align it by hand in Photoshop: (Left Image Is Transformed By Photoshop & Right Image Is Transformed in OpenCV)
I have used all of the interpolation techniques of OpenCV but none of them came close in quality to Photoshop.
The code I'm using is:
warped = cv2.warpAffine(image, TRANSFORM_MATRIX, (240, 320), flags=cv2.INTER_AREA)
What could be wrong that made the transformed image have such low quality?
Here's a Link to the original 1024x1024 image if needed.
Problem and general solution
You are down-sampling a signal.
The approach is always the same:
lowpass to remove high frequency components
resample/decimate
What not to do
If you don't do the lowpass, you'll get aliasing. You noticed that. Aliasing means the sampling step can completely miss some high frequency component (edge/corner/point/...), giving those strange artefacts. A properly resampled image would not completely lose such high frequency features.
If you do the lowpass after resampling, it won't fix the issue, only hide it. The damage has already been done.
You can convince yourself of both these aspects if you downsample some regular grid of strongly contrasting lines. Try alternating single-pixel lines of black and white for most effect.
Implementations
Libraries such as PIL do the lowpass implicitly before resampling.
OpenCV does not (kinda, in general). Not even with Lanczos interpolation (in OpenCV) will you be able to skip the lowpassing, because OpenCV's Lanczos has a fixed coefficient.
OpenCV has INTER_AREA, which is a linear interpolation, but it additionally sums over all pixels that are in the area between the corner samples (instead of just sampling those four corners). This can spare you the extra lowpass step.
here's the result of cv.resize(im, (240, 240), interpolation=cv.INTER_AREA):
Here's the result of cv.warpAffine(im, M[:2], (240, 240), interpolation=cv.INTER_AREA) with M = np.eye(3) * 0.25 (equivalent scaling):
It appears that warpAffine can't do INTER_AREA. That sucks for you :/
If you need to downsample with OpenCV, and it's a power of two, you can use pyrDown. That does the lowpass and decimation... for a factor of two. Repeated application gives you higher powers.
If you need arbitrary downsampling and you don't like INTER_AREA for some reason, you'd have to apply a GaussianBlur to the input. Sigma needs to be (inversely) proportional to the scale factor. There is some relation between the gaussian filter's sigma and the resulting cutoff frequency. You'll want to investigate that some more, if you don't want to pick a value arbitrarily. Check out the kernel for pyrDown, and what gaussian sigma it matches best. That's probably a good value for a scale factor of 0.5, and other factors should be (inversely) proportional.
For simple downscaling, one gaussian blur would be fine. For affine warps and higher transformations, you'd need to apply lowpassing that respects the different scale for every single pixel that is looked up, because their "support" in the source image isn't square any longer, maybe not even rectangular, but an arbitrary quad!
What am I not saying?
This goes for down-sampling. If you up-sample, do not lowpass.
Related
Here`s the deal. I want to create a mask that visualizes all the changes between two images (GeoTiffs which are converted to 2D numpy arrays).
For that I simply subtract the pixel values and normalize the absolute value of the subtraction:
Since the result will be covered in noise, I use a treshold and remove all pixels with a value below a certain limit.
def treshold(array, thresholdLimit):
print("Treshold...")
result = (array > thresholdLimit) * array
return result
This works without a problem. Now comes the issue. When applying the treshold, outliers remain, which is not intended:
What is a good way to remove those outliers?
Sometimes the outliers are small chunks of pixels, like 5-6 pixels together, how could those be removed?
Additionally, the images I use are about 10000x10000 pixels.
I would appreciate all advice!
EDIT:
Both images are landsat satelite images, covering the exact same area.
The difference here is that one image shows cloud coverage and the other one is free of clouds.
The bright snakey line in the top right is part of a river that has been covered by a cloud. Since water bodies like the ocean or rivers are depicted black in those images, the difference between the bright cloud and the dark river results in the river showing a high degree of change.
I hope the following images make this clear:
Source tiffs :
Subtraction result:
I also tried to smooth the result of the tresholding by using a median filter but the result was still covered in outliers:
from scipy.ndimage import median_filter
def filter(array, limit):
print("Median-Filter...")
filteredImg = np.array(median_filter(array, size=limit)).astype(np.float32)
return filteredImg
I would suggest the following:
Before proceeding please double check if the two images are 100% registered. To check that you should overlay them using e.g. different color channels. Even minimal registration errors can render your task impossible
Smooth both input images slightly (before the subtraction). For that I would suggest you use standard implementations. Play around with the filter parameters to find an acceptable compromise between smoothness (or reduction of graininess of source image 1) and resolution
Then try to match the image statistics by applying histogram normalization, using the histogram of image 2 as a target for the histogram of image 1. For this you can also use e.g. the OpenCV implementation
Subtract the images
If you then still observe obvious noise, look at the histogram of the subtraction result and see if you can relate the noise to intensity outliers. If you can clearly separate signal and noise based on intensity, apply again a thresholding (informed by your histogram). Alternatively (or additionally), if the noise is structurally different from your signal (e.g. clustered), you could look into morphological operations to remove it.
I am trying to deblur an image in Python but have run into some problems. Here is what I've tried, but keep in mind that I am not an expert on this topic. According to my understanding, if you know the point spread function, you should be able to deblur the image quite simply by performing a deconvolution. However, this doesn't seem to work and I don't know if I'm doing something stupid or if I just don't understand things correctly. In Mark Newman's Computational Physics book (using Python), he touches on this subject in problem 7.9. In this problem he supplies an image that he deliberately blurred using a Gaussian point spread function (psf), and the objective of the problem is to deblur the image using a Gaussian. This is accomplished by dividing the 2D FFT of the blurred image by the 2D FFT of the psf and then taking the inverse transform. This works reasonably well.
To extend this problem, I wanted to deblur a real image taken with a camera that was deliberately out of focus. So I set up a camera and took two sets of pictures. The first set of pictures were in focus. The first was of a very small LED light in a completely darkened room and the second was of a piece of paper with text on it (using the flash). Then, without changing any of the distances or anything, I changed the focus setting on the camera so that the text was very out of focus. I then took a picture of the text using the flash and took a second picture of the LED (without the flash). Here are the blurred images.
Now, according to my understanding, the image of the blurred point light source should be the point spread function, and as such I should be able to use it to deblur my image. The problem is that when I do so I get an image that just looks like noise. After doing a little research, it seems as though noise can be a big problem when using deconvolution techniques. However, given that I have measured what I believe to be the exact point spread function, I am surprised that noise would be an issue here.
One thing I did try was to replace small values (less than epsilon) in the psf transform with either 1 or with epsilon, and I tried this with a huge range of values for epsilon. This yielded an image that was not just noise, but is also not a deblurred version of the image; it looks like a weird, blurry version of the original (non-blurred) image. Here is an image from my program (you can ignore the value of sigma, which was not used in this program).
I believe I am dealing with a noise issue, but I don't know why and I don't know what to do about it. Any advice would be much appreciated (keeping in mind that I am no expert in this area).
Note that I have deliberately not posted the code because I think that is somewhat irrelevant at this point. But I would be happy to do so if anyone thinks that would be useful. I don't think it's a programming issue because I used the same technique and it works fine when I have the known point spread function (such as when I divide the FFT of the original in-focus image by the FFT of the out-of-focus image and then inverse transform). I just don't understand why I can't seem to use my experimentally measured point spread function.
The problem you have sought to solve is, unfortunately, more difficult than you might expect. Let me explain it in four parts. The first section assumes that you are comfortable with the Fourier transform.
Why you cannot solve this problem with a simple deconvolution.
An outline to how image deblurring can be performed.
Deconvolution by FFT and why it is a bad idea
An alternative method to perform deconvolution
But first, some notation:
I use I to represent an image and K to represent a convolution kernel. I * K is the convolution of the image I with the kernel K. F(I) is the (n-dimensional) Fourier transform of the image I and F(K) is the Fourier transform of the convolution kernel K (this is also called the point spread function, or PSF). Similarly, Fi is the inverse Fourier transform.
Why you cannot solve this problem with a simple deconvolution:
You are correct when you say that we can recover a blurred image Ib = I * K by dividing the Fourier transform of Ib by the Fourier transform of K. However, lens blur is not a convolution blurring operation. It is a modified convolution blurring operation where the blurring kernel K is dependent on the distance to the object you have photographed. Thus, the kernel changes from pixel to pixel.
You might think that this is not an issue with your image, as you have measured the correct kernel at the position of the image. However, this might not be the case, as the part of the image that is far away can influence the part of the image that is close. One way to fix this problem is to crop the image so that it is only the paper that is visible.
Why deconvolution by FFT is a bad idea:
The Convolution Theorem states that I * K = Fi(F(I)F(K)). This theorem leads to the reasonable assumption that if we have an image, Ib = I * K that is blurred by a convolution kernel K, then we can recover the deblurred image by computing I = (F(Ib)/F(K)).
Before we look at why this is a bad idea, I want to get some intuition for what the Convolution Theorem means. When we convolve an image with a kernel, then that is the same as taking the frequency components of the image and multiplying it elementwise with the frequency components of the kernel.
Now, let me explain why it is difficult to deconvolve an image with the FFT. Blurring, by default, removes high-frequency information. Thus, the high frequencies of K must go towards zero. The reason for this is that the high-frequency information of I is lost when it is blurred -- thus, the high-frequency components of Ib must go towards zero. For that to happen, the high-frequency components of K must also go towards zero.
As a result of the high-frequency components of K being almost zero, we see that the high-frequency components of Ib is amplified significantly (as we almost divide by zero) when we deconvolve with the FFT. This is not a problem in the noise-free case.
In the noisy case, however, this is a problem. The reason for this is that noise is, by definition, high-frequency information. So when we try to deconvolve Ib, the noise is amplified to an almost infinite extent. This is the reason that deconvolution by the FFT is a bad idea.
Furthermore, you need to consider how the FFT based convolution algorithm deals with boundary conditions. Normally, when we convolve images, the resolution decreases somewhat. This is unwanted behaviour, so we introduce boundary conditions that specify the pixel values of pixels outside the image. Example of such boundary conditions are
Pixels outside the image has the same value as the closest pixel inside the image
Pixels outside the image has a constant value (e.g. 0)
The image is part of a periodic signal, thus the row of pixel above the topmost row is equal to the bottom row of pixels.
The final boundary condition often makes sense for 1D signals. For images, however, it makes little sense. Unfortunately, the convolution theorem specifies that periodic boundary conditions are used.
In addition to this, it seems that the FFT based inversion method is significantly more sensitive to erroneous kernels than iterative methods (e.g. Gradient descent and FISTA).
An alternative method to perform deconvolution
It might seem like all hope is lost now, as all images are noisy, and deconvolving will increase the noise. However, this is not the case, as we have iterative methods to perform deconvolution. Let me start by showing you the simplest iterative method.
Let || I ||² be the squared sum of all of I's pixels. Solving the equation
Ib = I * K
with respect to I is then equivalent to solving the following optimisation problem:
min L(I) = min ||I * K - Ib||²
with respect to I. This can be done using gradient descent, as the gradient of L is given by
DL = Q * (I * K - Ib)
where Q is the kernel you get by transposing K (this is also called the matched filter in the signal processing litterature).
Thus, you can get the following iterative algorithm that will deblur an image.
from scipy.ndimage import convolve
blurred_image = # Load image
kernel = # Load kernel/psf
learning_rate = # You need to find this yourself, do a logarithmic line search. Small rate will always converge, but slowly. Start with 0.4 and divide by 2 every time it fails.
maxit = 100
def loss(image):
return 0.5 * np.sum((convolve(image, kernel) - blurred_image)**2)
def gradient(image):
return convolve(convolve(image, kernel) - blurred_image, kernel.T)
deblurred = blurred_image.copy()
for _ in range(maxit):
deblurred -= learning_rate*gradient(image)
The above method is perhaps the simplest of the iterative deconvolution algorithms. The way these are used in practice are through so-called regularised deconvolution algorithms. These algorithms work by firstly specifying a function that measures the amount of noise in an image, e.g. TV(I) (the total variation of I). Then the optimisation procedure is performed on L(I) + wTV(I). If you are interested in such algorithms, I recommend reading the FISTA paper by Amir Beck and Marc Teboulle. The paper is quite maths heavy, but you don't need to understand most of it -- only how to implement the TV deblurring algorithm.
In addition to using a regulariser, we use accelerated methods to minimise the loss L(I). One such example is Nesterov accelerated gradient descent. See Adaptive Restart for Accelerated Gradient Schemes by Brendan O'Donoghue, Emmanuel Candes for information about such methods.
An outline to how image deblurring can be performed.
Crop your image so that everything has same distance from the camera
Find the convolution kernel the same way you did now (Test your deconvolution algorithm on synthetically blurred images first)
Implement an iterative method to compute deconvolutoin
Deconvolve the image.
I have a sample of two-dimensional, black-and-white/binary barcodes that have been photographed.
The (colour) photographs typically suffer from all the usual suspects: blurring, distortion, contrast issues/lighting gradients, and erosion.
I am trying to reconstruct the original barcodes, which were once computer-generated pixel arrays of black/white values.
We should be able to exploit the images' spatial-frequency information to infer the dimensions of each pixel. The hope is to use this to better restore the original by convolving the image with such a structuring element defined by the data.
Although this is a very broad topic, I therefore have a very specific question:
What is the best way to establish a structuring element from image data in OpenCV/Python, without using prior knowledge of it?
(Assume for now that the underlying pixel scale is to some good approximation spatially invariant)
Note that I am not trying to execute the whole extraction pipeline: this question is simply about inferring an optimal structuring element from the data.
For example, the spatial kernel could be used as input to an unsharp mask, a la Python unsharp mask
References:
(1-D ideas) http://answers.opencv.org/question/174384/how-to-reconstruct-damaged-barcode, http://www.windytan.com/2016/02/barcode-recovery-using-priori.html
(Similar idea) Finding CheckerBoard Points in opencv for any random ChessBoard( pattern size not known)
(Sort of but not really, and answer-less) OpenCV find image frequencies
(Broad) https://en.wikipedia.org/wiki/Chessboard_detection
One way of doing this is:
Compute the Scharr gradient magnitude representations in both the x and y direction.
Subtract the y-gradient from the x-gradient. By performing this subtraction we are left with regions of the image that have high horizontal gradients and low vertical gradients.
Blur and threshold the image to filter out the noise.
Apply a closing kernel to the thresholded image to close the gaps between vertical stripes of the barcode.
Perform a series of dilations and erosions.
Find the largest contour in the image, which is now presumably the barcode.
More details and complete code can be found in this PyImageSearch blog post.
I have to scale down data to feed into a neural network.
I wanted to use cv2.imresize, but there are multiple options with regard to how it interpolates the data to scale it down:
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free
results. But when the image is zoomed, it is similar to the
INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
Has anyone experimented with these, and if so what have you found? Note: I don't have enough time to try out learning with all varieties of interpolation.
For down sampling image data (I presume the data is an image since you're using OpenCV), you can just use area averaging to get good performance in terms of speed and quality (unless the downscaling factor is quite small, where blurring may happen).
Nearest neighbor will drop some cells at regular intervals but will be quite fast since no interpolation is actually performed. However some aliasing is to be expected with most images.
If quality is your main concern, use Lanczos (slower than bicubic, but higher quality images, generally speaking).
Bicubic and bilinear are known to perform quite badly for downscaling images for factors less than 0.5.
Here is my question:
My optical system is made of a camera plus a circular plexiglass "lens" that changes its curvature depending on pressure (radial bending).
This curvature induces a deformation of the image captured by the camera.
To correct this deformation, images need to be calibrated.
Calibration can be made with a grid (chessboard, dots, lines), pressure range has to be discretized with a certain step.
For each pressure step, an image of the grid has to be taken.
Then each image has to be compared to the reference one (P=0), and a transformation matrix has to be computed and stored.
Finally, each image taken during the experiment for a specific pressure has to be corrected by the transformation matrix.
The deformation is non-linear (not only a combination of rotations and translations), but most likely Barrel distortion. (again not induced by the camera)
Which looks like that:
http://en.wikipedia.org/wiki/Distortion_%28optics%29#mediaviewer/File:Barrel_distortion.svg
I found a plugin in ImageJ called BunwarpJ, http://biocomp.cnb.csic.es/~iarganda/bUnwarpJ/
and I basically want to know if there is an equivalent way to produce the same result in Opencv.
(CalibrateCamera won't do the trick)
OpenCv has an undistort function that can take a current image, a matrix of camera coefficients, distorsion coeffs. and produces a new image corrected for sent camera coeffs. and a new set of camera coeffs. (if you need to do other transformations on the new image).
I have not used it before, so I can't say what exactly are camera or distorsion coefficients are but as manual describes:
The function transforms an image to compensate radial and tangential
lens distortion. The function is simply a combination of
initUndistortRectifyMap() (with unity R ) and remap() (with bilinear
interpolation).
So checking those two funcs. out are a good way to find out.
I believe you misunderstood the manual perhaps because you seem to think that CalibrateCamera does this for you. Instead CalibrateCamera actually returns the camera and distorsion coeffs. which you need to undistort your image.
Each lens has its own constant coeffs. which in your case means that you'll have to calibrateCamera for a range of pressures (I assume you control that experimentally?) and then call different undistort func. with different parameters which you'll get out of your experiments.
A matrix can only capture a linear transformation (or possibly a linear transformation in homogeneous space), not a general distortion.
In my experience any attempt to use a single global transformation formula wouldn't be very accurate (it's not trivial to get just 99.9% accuracy). Even just correcting camera lens distortion this way is difficult if you want high accuracy.
In the past I got good enough results using a sparse global RBF interpolation, but later I moved to an interpolating 2d spline approach; if you can choose your calibration points to be on a regular grid this is the solution I would suggest.
In the end the mapping could be a 2-valued 3d interpolating spline on a regular grid (XY for the image, Z for the pressure; values UV are the pixel coordinates).
Straightening the image once pressure is known is just texture mapping.