I am trying to sharpen an image by applying unsharp masking, where you subtract your image with the Gaussian blurred image and then add the difference back to the image.
Here is the code which I ran:
img = cv2.imread('redhat.jpg')
gauss = cv2.GaussianBlur(img,(7,7),0)
diff = img - gauss
sharp = img + diff
cv2_imshow(img)
cv2_imshow(sharp)
original image:
sharp:
Instead of above code, if I run:
img = cv2.imread('redhat.jpg')
gauss = cv2.GaussianBlur(img,(7,7),0)
sharp = cv2.addWeighted(img, 2, gauss, -1, 0)
cv2_imshow(img)
cv2_imshow(sharp)
then I am getting the correct sharp image:
Can someone explain to me why I got weird results in the first code? Per my understanding both the pieces of code are doing the same mathematical operations.
In diff = img - gauss, the subtraction produces negative values, but the two inputs are of type uint8, so the result of the operation is coerced to that same type, which cannot hold negative values.
You’d have to convert one of the images to a signed type for this to work. For example:
gauss = cv2.GaussianBlur(img,(7,7),0)
diff = img.astype(np.int_) - gauss
sharp = np.clip(img + diff, 0, 255).astype(np.uint8)
Using cv2.addWeighted() is more efficient.
I believe the difference is caused by over/underflow in
diff = img - gauss
If the source images have both 8-bit unsigned integer depth, the diff will have the same depth as well, which can cause underflow in the subtraction operation.
In contrast, addWeighted(), performs the operation in double precision, and perform saturation cast to the destination type after the operation (see documentation). That effectively reduces the likelihood of over/underflow, and cast will automatically trim the values to the supported range of the destination scalar type.
If you still want to use the first approach, either convert the images to floating point depth, or use big enough signed integers. After the operation, you may need to perform saturation cast to the destination depth.
Related
I have been trying to learn some image processing on OpenCV python. I have a 16-bit image, and I would like to apply a LUT conversion on this 16-bit image without reducing it to 8-bit. From the documentation, I read that LUT function in OpenCV is applicable only for 8-bit images. Does anyone know of an efficient way to use this function for 16-bit image?
I have used LUT coversion for 8-bit images. They work alright, but for 16bit images, the follwing error throws up: error: (-215:Assertion failed) (lutcn == cn || lutcn == 1) && \_lut.total() == 256 && \_lut.isContinuous() && (depth == CV_8U || depth == CV_8S) in function 'cv::LUT'.
Later, I found that this is because LUT function is application only for 8-bit images.
As you've already discovered, the implementation of OpenCV's LUT method only supports 8-bit LUTs. However, you can implement your own for arbitrary bit resolutions and it's actually quite simple. For each value in the image, this is directly used to access the LUT which will output the desired value. Because OpenCV interfaces with NumPy, you can just use the input image and index into the LUT directly in order to obtain the final output, taking advantage of NumPy array indexing.
First define a LUT - you'll need to ensure it's 16-bit and I'm assuming you have values that go from 0 to 65535 to respect the 16-bit resolution. Once you do that, use the table to index into your image. Here's an example using gamma adjusting:
import numpy as np
def adjust_gamma(image, gamma=1.0):
# build a lookup table mapping the pixel values [0, 65535] to
# their adjusted gamma values
inv_gamma = 1.0 / gamma
table = ((np.arange(0, 65536) / 65535) ** inv_gamma) * 65535
# Ensure table is 16-bit
table = table.astype(np.uint16)
# Now just index into this with the intensities to get the output
return table[image]
This applies inverse gamma adjusting of an input image, where we first generate a LUT that is 16-bit, then the image is used to directly index into it to create the output image. Take note that the input image is also assumed to be 16-bit. If you have any values that are beyond the 0-65535 range, this will give you an out-of-bounds indexing error.
Note - Multi-channel images
Take note that the above case assumes a single-channel image. If you want to apply this for multi-channel (i.e. RGB images), then you'll need to define a LUT for each channel and apply the LUT to each channel separately. The easiest way to do this would be a for loop across all channels. There are definitely more vectorized ways to do this in one-shot, but I will not diverge from the intent of your question and I want this to be as simple to read as possible.
First define a 2D LUT where each row in this matrix is a single LUT. Specifically, row i corresponds to the LUT to apply to channel i of the image. Once you're finished, loop through the channel dimension and apply the LUT. What we can also do to save some time is to preallocate the output image so that it's all zeroes, then fill in each channel accordingly.
Something like:
# Assume LUT is defined as `table` and it's a 2D NumPy array
output = np.zeros_like(image)
for i in range(image.shape[2]):
output[..., i] = table[i, image[..., i]]
output will contain the desired result. However, for the special case where the LUT is the same across all channels, you can just use the same 1D LUT you had previously and you can use the same indexing method that I talked about earlier:
output = table[image]
I would like to know how to read a HDR image (.hdr) by obtaining pixel values in the RGBE format quickly and efficiently in Python.
These are somethings I tried:
import imageio
img = imageio.imread(hdr_path, format="HDR-FI")
alternatively:
import cv2
img = cv2.imread(hdr_path, flags=cv2.IMREAD_ANYDEPTH)
This read the image, but gave the values in a RGB format.
How do you obtain the 4rth channel, the "E" channel for every pixel, without altered RGB values?
I would prefer a solution involving only imageio, as i am restricted to use only that module.
If you prefer the RGBE representation over the float representation you can convert between the two
def float_to_rgbe(image, *, channel_axis=-1):
# ensure channel-last
image = np.moveaxis(image, channel_axis, -1)
max_float = np.max(image, axis=-1)
scale, exponent = np.frexp(max_float)
scale *= 256.0/max_float
image_rgbe = np.empty((*image.shape[:-1], 4)
image_rgbe[..., :3] = image * scale
image_rgbe[..., -1] = exponent + 128
image_rgbe[scale < 1e-32, :] = 0
# restore original axis order
image_rgbe = np.moveaxis(image_rgbe, -1, channel_axis)
return image_rgbe
(Note: this is based on the RGBE reference implementation (found here) and can be further optimized if it actually is the bottleneck.)
In your comment, you mention "If i parse the numpy array manually and split the channels into an E channel, it takes too much time...", but it is hard to tell why that is the case without seeing the code. The above is O(height*width), which seems reasonable for a pixel-level image processing method.
I made a very simple program that reads an image, evaluates the sobel filter and then presents it with imshow.
import cv2
img = cv2.imread("/home/alex/imagens/train_5.jpg")
sobelx = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3) # x
sobely = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)
norm = cv2.magnitude(sobelx, sobely)
normUint8 = norm.astype('uint8')
cv2.imshow("img", img)
cv2.imshow("norm", norm)
cv2.imshow("normUint8", normUint8)
print "img=" + str(img.dtype) + ", sobel=" + str(norm.dtype) + ", normUint8=" + str(normUint8.dtype)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here I attach the result.
I expected the result from showing the norm and the normUint8 would be the same or much similar, because their values differ less than 1 at each pixel.
Thus, I believe opencv is performing some operation, before presenting it, when I use an CV_64FC3 image.
I am interested in finding this operation in order to use it.
Can anyone help with it?
Here I attach the original image I used.
Thanks.
You're feeding a 64FC3 (3 channel, 64bit floats) image to imshow. The documentation for this function states:
The function may scale the image, depending on its depth:
If the image is 8-bit unsigned, it is displayed as is.
If the image is 16-bit unsigned or 32-bit integer, the pixels are divided by 256. That is, the value range [0,255*256] is mapped to [0,255].
If the image is 32-bit floating-point, the pixel values are multiplied by 255. That is, the value range [0,1] is mapped to [0,255].
Even though 64-bit floating-point is not mentioned, we can make a decent assumption that they are handled in the same way as 32-bit floats. If we look at the source code, we find out that the conversion is done by function cvConvertImage. Specifically, on line 622
double scale = src_depth <= CV_8S ? 1 : src_depth <= CV_32S ? 1./256 : 255;
To explain this for those not familiar with the order of the type enums, it's 8U, 8S, 16U, 16S, 32S, 32F, 64F. Hence, bytes get no scaling, other integers division, the rest (floats) multiplication.
Since for display we need an 8bit image, it's important to note that the scaling will be done with saturation (in this case anything over 255 becomes 255, anything below 0 becomes 0).
Now that it's clear what kind of transformation imshow does, let's have a look at why you see those patches of colour in a sea of white.
Since a simple cast of norm to uint8 gives you an image which is not all black, we can safely assume the values of norm are not in range [0.0-1.0]. When the values are scaled by 255, anything greater or equal to 1.0 will become 255 (white). Due to being a 3 channel image, we can end up with places where only some of the channels are unsaturated, and thus we see various patches of colour.
We can simulate this behaviour by the following script:
b,g,r = cv2.split(norm)
r = np.uint8(np.where(r < 1.0, 0, 255))
g = np.uint8(np.where(g < 1.0, 0, 255))
b = np.uint8(np.where(b < 1.0, 0, 255))
cv2.imwrite('sobel_out.png', cv2.merge([b,g,r]))
We set pixels to black for values < 1.0, everything else gets white. When we combine the planes, we get the following image:
Looks familiar?
Note: I suspect the square pattern comes from the JPEG compression you used for your input.
Dan's answer is excellent and describes why a float image can have unintended display properties. Images need a minimum and maximum to know what is black and white, and this isn't always precisely defined for float images.
For instance, you could use a float image that still has values strictly between 0 and 255 just for calculation precision, to be later rounded to int for display. But it's typical in literature to use 0 and 1 as the minimum and maximum, respectively, for image values, as it makes the math that much simpler; and since you need float values to represent values between 0 and 1, it's simply common that float images use the range 0 to 1. So, OpenCV sticks to this for displaying a float image. It saturates the image in the range 0 to 1 for a float, meaning that it truncates values above and below.
Now if you read in an image, it gets read by default as 8-bit unsigned integers (CV_U8C3 for a 3-channel image). When you apply the Sobel operator, you specified that you wanted in return a float image. This is totally fine, but know that the Sobel operator is a convolution which multiplies multiple values and sums them up, so this operation can give you values larger than what the original image started with. If you used a different return type, then it's possible that these values would get saturated. However, with a float, they won't get saturated until display time. And this is very much on purpose; the Sobel operator can be used on arbitrary matrices, so saturating the values wouldn't always be wanted.
In order to display the image without strange artifacts, you would need to scale the image, either manually like the Stack Overflow answer linked above or by using cv2.normalize(). Or you can just straight cast to another type like you did, saturating the values at their high end.
I am converting an RGB image into YCbCr and then want to compute the laplacian pyramid for the same. After color conversion, I am experimenting with the code give on the Image Pyramid tutorial of OpenCV to find the Laplacian pyramid of an image and then reconstruct the original image. However, if I increase the number of levels in my code to a higher number, say 10, then the reconstructed image(after conversion back to RGB) does not look the same as the original image(image looks blurred - please see below link for the exact image). I am not sure why this is happening. Is it suppose to happen when the levels increase or is there anything wrong in the code?
frame = cv2.cvtColor(frame_RGB, cv2.COLOR_BGR2YCR_CB)
height = 10
Gauss = frame.copy()
gpA = [Gauss]
for i in xrange(height):
Gauss = cv2.pyrDown(Gauss)
gpA.append(Gauss)
lbImage = [gpA[height-1]]
for j in xrange(height-1,0,-1):
GE = cv2.pyrUp(gpA[j])
L = cv2.subtract(gpA[j-1],GE)
lbImage.append(L)
ls_ = lbImage[0]
for j in range(1,height,1):
ls_ = cv2.pyrUp(ls_)
ls_ = cv2.add(ls_,lbImage[j])
ls_ = cv2.cvtColor(ls_, cv2.COLOR_YCR_CB2BGR)
cv2.imshow("Pyramid reconstructed Image",ls_)
cv2.waitKey(0)
For reference, please see the reconstructed image and the original image.
Reconstructed Image
Original Image
Don't use np.add() or np.substract(). They perform a clipping. Use the direct - and + matrix operator. In other words, use:
L = gpA[j-1] - GE
Instead of:
L = cv2.subtract(gpA[j-1],GE)
pyrDown blurs an image and downsamples it, loosing some information. Saved pyramid levels (gpA[] here) contain smaller and smaller image matrices, but don't keep rejected information details (high-frequency ones).
So reconstructed image cannot show all original details
From tutorial:
Note: When we reduce the size of an image, we are actually losing information of the image.
I have a numpy array that I wish to resize using opencv.
Its values range from 0 to 255. If I opt to use cv2.INTER_CUBIC, I may get values outside this range. This is undesirable, since the resized array is supposed to still represent an image.
One solution is to clip the results to [0, 255]. Another is to use a different interpolation method.
It is my understanding that using INTER_AREA is valid for down-sampling an image, but works similar to nearest neighbor for upsampling it, rendering it less than optimal for my purpose.
Should I use INTER_CUBIC (and clip), INTER_AREA, or INTER_LINEAR?
an example for values outside of range using INTER_CUBIC:
a = np.array( [ 0, 10, 20, 0, 5, 2, 255, 0, 255 ] ).reshape( ( 3, 3 ) )
[[ 0 10 20]
[ 0 5 2]
[255 0 255]]
b = cv2.resize( a.astype('float'), ( 4, 4 ), interpolation = cv2.INTER_CUBIC )
[[ 0. 5.42489886 15.43670964 21.29199219]
[ -28.01513672 -2.46422291 1.62949324 -19.30908203]
[ 91.88964844 25.07939219 24.75106835 91.19140625]
[ 273.30322266 68.20603609 68.13853455 273.15966797]]
Edit: As berak pointed out, converting the type to float (from int64) allows for values outside the original range. the cv2.resize() function does not work with the default 'int64' type. However, converting to 'uint8' will automatically saturate the values to [0..255].
Also, as pointed out by SaulloCastro, another related answer demonstrated scipy's interpolation, and that there the defualt method is the cubic interpolation (with saturation).
If you are enlarging the image, you should prefer to use INTER_LINEAR or INTER_CUBIC interpolation.
If you are shrinking the image, you should prefer to use INTER_AREA interpolation.
Cubic interpolation is computationally more complex, and hence slower than linear interpolation. However, the quality of the resulting image will be higher.
To overcome such problem you should find out the new size of the given image where the interpolation can be made. And copy interpolated sampled image on the target image like:
# create target image and copy sample image into it
(wt, ht) = imgSize # target image size
(h, w) = img.shape # given image size
fx = w / wt
fy = h / ht
f = max(fx, fy)
newSize = (max(min(wt, int(w / f)), 1),
max(min(ht, int(h / f)), 1)) # scale according to f (result at least 1 and at most wt or ht)
img = cv2.resize(img, newSize, interpolation=cv2.INTER_CUBIC) #INTER_CUBIC interpolation
target = np.ones([ht, wt]) * 255 # shape=(64,800)
target[0:newSize[1], 0:newSize[0]] = img
Some of the possible interpolation in openCV are:
INTER_NEAREST – a nearest-neighbor interpolation
INTER_LINEAR – a bilinear interpolation (used by default)
INTER_AREA – resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
INTER_CUBIC – a bicubic interpolation over 4×4 pixel neighborhood
INTER_LANCZOS4 – a Lanczos interpolation over 8×8 pixel neighborhood
See here for results in each interpolation.
I think you should start with INTER_LINEAR which is the default option for resize() function. It combines sufficiently good visual results with sufficiently good time performance (although it is not as fast as INTER_NEAREST). And it won't create those out-of-range values.
My this answer is based on testing. And in the end it supports the answer of #shivam. I tested on these interpolation method, for both combination, shrinking and enlarging. And after enlarging I calculated psnr with orignal image.
[cv2.INTER_AREA,
cv2.INTER_BITS,
cv2.INTER_BITS2,
cv2.INTER_CUBIC,
cv2.INTER_LANCZOS4,
cv2.INTER_LINEAR,
cv2.INTER_LINEAR_EXACT,
cv2.INTER_NEAREST]
shirking=0.25
enlarge=4
Tested this on 165 images of different shapes. For results I picked maximum and 2nd maximum psnr and calculated the count.
For the Maximum the count for interpolation is show in image below.
From this test the maximum psnr is given by the combination of AREA and LANCZOS4 which gave max psnr for 141/204 images.
I also wanted to include the 2nd maximum. So here are the results of 2nd Maximum only.
Here the AREA and CUBIC gave the 2nd best result. 19/204 has the highest psnr and 158/347 have the 2nd highes psnr using AREA + CUBIC.
This results were vague so I opened the files for which CUBIC gave the highes psnr. Turns out the images with lot of texture/abstraction gave highest psnr using CUBIC.
So I did furthure tests only for AREA+CUBIC and AREA+LANCZOS4. I came to the conclusion that if you are shrinking image <10 times then go for the LANCZOS4. It will give you better results for less than 10times zoom and if the image is large than it's better than CUBIC.
As for my program I was shiriking image 4 times so for me the AREA+LANCZOS4 works better.
Scripts and images: https://github.com/crackaf/triple-recovery/tree/main/tests