I have been looking this conversion for a while. What are the ways of converting RGB image to YUV image and accessing Y, U and V channels using Python on Linux? (using opencv, skimage, or etc...)
Update:
I used opencv
img_yuv = cv2.cvtColor(image, cv2.COLOR_BGR2YUV)
y, u, v = cv2.split(img_yuv)
cv2.imshow('y', y)
cv2.imshow('u', u)
cv2.imshow('v', v)
cv2.waitKey(0)
and got this result but they are all seems gray. Couldn't get an result represented like on the wikipedia page
Am I doing something wrong?
NB: The YUV <-> RGB conversions in OpenCV versions prior to 3.2.0 are buggy! For one, in many cases the order of U and V channels was swapped. As far as I can tell, 2.x is still broken as of 2.4.13.2 release.
The reason they appear grayscale is that in splitting the 3-channel YUV image you created three 1-channel images. Since the data structures that contain the pixels do not store any information about what the values represent, imshow treats any 1-channel image as grayscale for display. Similarly, it would treat any 3-channel image as BGR.
What you see in the Wikipedia example is a false color rendering of the chrominance channels. In order to achieve this, you need to either apply a pre-defined colormap or use a custom look-up table (LUT). This will map the U and V values to appropriate BGR values which can then be displayed.
As it turns out, the colormaps used for the Wikipedia example are rather simple.
Colormap for U channel
Simple progression between green and blue:
colormap_u = np.array([[[i,255-i,0] for i in range(256)]],dtype=np.uint8)
Colormap for V channel
Simple progression between green and red:
colormap_v = np.array([[[0,255-i,i] for i in range(256)]],dtype=np.uint8)
Visualizing YUV Like the Example
Now, we can put it all together, to recreate the example:
import cv2
import numpy as np
def make_lut_u():
return np.array([[[i,255-i,0] for i in range(256)]],dtype=np.uint8)
def make_lut_v():
return np.array([[[0,255-i,i] for i in range(256)]],dtype=np.uint8)
img = cv2.imread('shed.png')
img_yuv = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
y, u, v = cv2.split(img_yuv)
lut_u, lut_v = make_lut_u(), make_lut_v()
# Convert back to BGR so we can apply the LUT and stack the images
y = cv2.cvtColor(y, cv2.COLOR_GRAY2BGR)
u = cv2.cvtColor(u, cv2.COLOR_GRAY2BGR)
v = cv2.cvtColor(v, cv2.COLOR_GRAY2BGR)
u_mapped = cv2.LUT(u, lut_u)
v_mapped = cv2.LUT(v, lut_v)
result = np.vstack([img, y, u_mapped, v_mapped])
cv2.imwrite('shed_combo.png', result)
Result:
Using the LUT values as described might be exactly how the Wikipedia article image was made but the description implies it's arbitrary and used maybe because it's simple. It isn't arbitrary; the results essentially match how RGB <-> YUV conversions work. If you are using OpenCV then the methods BGR2YUV and YUV2BGR give the result using the conversion formula found in the same Wikipedia YUV article. (My images generated using Java were slightly darker otherwise the same.)
Addendum: I feel bad that I picked on Dan Mašek after he answered the question perfectly and astutely by showing us the lookup table trick. The author of the Wikipedia YUV article didn't do a bad job depicting the green-blue and green-red gradient shown in the article but as Dan Mašek pointed out it wasn't perfect. The images of color for U and V do somewhat resemble what really happens so I'd call them exaggerated-color and not false-color. The Wikipedia article on YCrCb is similar but different somehow.
// most of the Java program which should work in other languages with OpenCV:
// everything duplicated to do both the U and V at the same time
Mat src = new Mat();
Mat dstA = new Mat();
Mat dstB = new Mat();
src = Imgcodecs.imread("shed.jpg", Imgcodecs.IMREAD_COLOR);
List<Mat> channelsYUVa = new ArrayList<Mat>();
List<Mat> channelsYUVb = new ArrayList<Mat>();
Imgproc.cvtColor(src, dstA, Imgproc.COLOR_BGR2YUV); // convert bgr image to yuv
Imgproc.cvtColor(src, dstB, Imgproc.COLOR_BGR2YUV);
Core.split(dstA, channelsYUVa); // isolate the channels y u v
Core.split(dstB, channelsYUVb);
// zero the 2 channels we do not want to see isolating the 1 channel we want to see
channelsYUVa.set(0, Mat.zeros(channelsYUVa.get(0).rows(),channelsYUVa.get(0).cols(),channelsYUVa.get(0).type()));
channelsYUVa.set(1, Mat.zeros(channelsYUVa.get(0).rows(),channelsYUVa.get(0).cols(),channelsYUVa.get(0).type()));
channelsYUVb.set(0, Mat.zeros(channelsYUVb.get(0).rows(),channelsYUVb.get(0).cols(),channelsYUVb.get(0).type()));
channelsYUVb.set(2, Mat.zeros(channelsYUVb.get(0).rows(),channelsYUVb.get(0).cols(),channelsYUVb.get(0).type()));
Core.merge(channelsYUVa, dstA); // combine channels (two of which are zero)
Core.merge(channelsYUVb, dstB);
Imgproc.cvtColor(dstA, dstA, Imgproc.COLOR_YUV2BGR); // convert to bgr so it can be displayed
Imgproc.cvtColor(dstB, dstB, Imgproc.COLOR_YUV2BGR);
HighGui.imshow("V channel", dstA); // display the image
HighGui.imshow("U channel", dstB);
HighGui.waitKey(0);
Related
I want a way or steps to unify the brightness of 2 images or in other words make their brightness the same but without assigning them. I know how to get the brightness of an image using PIL, the code is found below:
from PIL import Image
imag = Image.open("test.png")
# Convert the image te RGB if it is a .gif for example
imag = imag.convert('RGB')
# coordinates of the pixel
X, Y = 0, 0
# Get RGB
pixelRGB = imag.getpixel((X, Y))
R, G, B = pixelRGB
brightness = sum([R, G, B]) / 3 ##0 is dark (black) and 255 is bright (white)
print(brightness)
Does anyone have an idea of how to make 2 images having the same brightness. Thank you
You can use the mean/standard deviation color transfer technique in Python/OpenCV as described at https://www.pyimagesearch.com/2014/06/30/super-fast-color-transfer-images/. But to force it so as not to modify the color and only adjust the brightness/contrast, convert your image to HSV. Process only the V channel using the method described in that reference. Then combine the new V and old S and H channels and convert that back to BRG.
It looks like default library under Ubuntu changes colors a bit during the compression. I tried to set quality and sampling but I see no improvements, anyone ever challenged similar issue?
subsampling = 0 , quality = 100
#CORRECT COLORS FROM NPARRAY
cv2.imshow("Object cam:{}".format(self.camera_id), self.out)
print(self.out.item(1,1,0)) # B
print(self.out.item(1,1,1)) # G
print(self.out.item(1,1,2)) # R
self.out=cv2.cvtColor(self.out, cv2.COLOR_BGR2RGB)
#from PIL import Image
im = Image.fromarray(self.out)
r, g, b = im.getpixel((1, 1))
## just printing pixel and they are matching
print(r, g, b)
## WRONG COLORS
im.save(self.out_ramdisk_img,format='JPEG', subsampling=0, quality=100)
JPEG image should have the same colors as in imshow, but it's a bit more purple.
That is a natural result of JPEG compression. JPEG uses floating point arithmetic to calculate integer pixel values. This occurs in several stages of JPEG compression. Thus, small pixel value changes are expected.
When you have blanket changes in color they are usually the result input color values that are outside the gamut of the YCbCr color space. Such values get clamped.
I was wondering if it was possible to modify the contrast of an image, by modifying its RGB, HSV (or similar) values.
I am currently doing the following to mess with luminance, saturation and hue (in python):
import numpy as np
from PIL import Image as img
import colorsys as cs
#Fix colorsys rgb_to_hsv function
#cs.rgb_to_hsv only works on arrays of shape: [112, 112,255] and non n-dimensional arrays
rgb_to_hsv = np.vectorize(cs.rgb_to_hsv)
hsv_to_rgb = np.vectorize(cs.hsv_to_rgb)
def luminance_edit(a, h, s, new_v):
#Edits V - Luminance
#Changes RGB based on new luminance value
r, g, b = hsv_to_rgb(h, s, new_v)
#Merges R,G,B,A values to form new array
arr = np.dstack((r, g, b, a))
return arr
I have a separate function to deal with converting to and fro RGB and HSV. A is the alpha channel, h is the hue, s is saturation and new_v is the new V value (luminance).
Is it possible to edit contrast based on these values, or am I missing something?
Edit:
I have a separate function that imports images, extracts the RGBA values, and converts them into HSL/HSV. Lets call this function x.
In the code provided (function y), we take the hue(h), saturation(s), luminance (v) and the alpha channel (a) - the HSL values provided from function x, of some image.
The code edits the V value, or the luminance. It does not actually edit the contrast, It's just an example of what I'm aiming to achieve. Using the above data (HSL/HSV/RGB) or similar, I was wondering if it was possible to edit the contrast of an image.
I find it very hard to understand what you are trying to do in your question, so here is a "stab in the dark" that you are trying to increase contrast in an image without changing colours.
You are correct in going from RGB to HSL/HSV colourspace so that you can adjust luminance without affecting saturation and hue. So, I have basically taken the Luminance channel of a sombre image and normalised it so that the luminance now spans the entire brightness range from 0..255, and put it back into the image. I started with this image:
And ended up with this one:
I have code that looks like this
from skimage import io as sio
test_image = imread('/home/username/pat/file.png')
test_image = skimage.transform.resize(test_image, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)
print test_image.shape # prints (128,128)
print test_image.max(), test_image.min() # prints 65535.0 0.0
sio.imshow(test_image)
More importantly, I need to make this image be in 3 channels, so I can feed it into a neural network that expects such input, any idea how to do that?
I want to transform a 1-channel image into a 3-channel image that looks reasonable when I plot it, makes sense, etc. How?
I tried padding with 0s, I tried copying the same values 3 times for the 3 channels, but then when I try to display the image, it looks like gibberish. So how can I transform the image into 3 channels, even if it becomes something like, bluescale instead of greyscale, but still be able to visualize it in a meaningful way?
Edit:
if I try
test_image = skimage.color.gray2rgb(test_image)
I get all white image, with some black dots.
I get the same all white, rare small black dots if I try
convert Test1_PC_1.tif -colorspace sRGB -type truecolor Test1_PC_1_new.tif
Before the attempted transform with gray2rgb
print type(test_image[0,0])
<type 'numpy.uint16'>
After
print type(test_image[0,0,0])
<type 'numpy.float64'>
You need to convert the array from 2D to 3D, where the third dimension is the color.
You can use the gray2rgb function function provided by skimage:
test_image = skimage.color.gray2rgb(test_image)
Alternatively, you can write your own conversion -- which gives you some flexibility to tweak the pixel values:
# basic conversion from gray to RGB encoding
test_image = np.array([[[s,s,s] for s in r] for r in test_image],dtype="u1")
# conversion from gray to RGB encoding -- putting the image in the green channel
test_image = np.array([[[0,s,0] for s in r] for r in test_image],dtype="u1")
I notice from your max() value, that you're using 16-bit sample values (which is uncommon). You'll want a different dtype, maybe "u16" or "int32". Also, you may need to play some games to make the image display with the correct polarity (it may appear with black/white reversed).
One way to get there is to just invert all of the pixel values:
test_image = 65535-test_image ## invert 16-bit pixels
Or you could look into the norm parameter to imshow, which appears to have an inverse function.
Your conversion from gray-value to RGB by replicating the gray-value three times such that R==G==B is correct.
The strange displayed result is likely caused by assumptions made during display. You will need to scale your data before display to fix it.
Usually, a uint8 image has values 0-255, which are mapped to min-max scale of display. Uint16 has values 0-65535, with 65535 mapped to max. Floating-point images are very often assumed to be in the range 0-1, with 1 mapped to max. Any larger value will then also be mapped to max. This is why you see so much white in your output image.
If you divide each output sample by the maximum value in your image you’ll be able to display it properly.
Well, imshow is using by default, a kind of heatmap to display the image intensities. To display a grayscale image just specify the colormap as above:
plt.imshow(image, cmap="gray")
Now, i think you can get the channel of an image by doing:
image[:,:,i] where i is in {0,1,2}
To extract an image for a specific channel:
red_image = image.copy()
red_image[:,:,1] = 0
red_image[:,:,2] = 0
Edit:
Do you definitely have to use skimage? What about python-opencv module?
Have you tried the following example?
import cv2
import cv
color_img = cv2.cvtColor(gray_img, cv.CV_GRAY2RGB)
I have a problem with FFT implementation in Python. I have completely strange results.
Ok so, I want to open image, get value of every pixel in RGB, then I need to use fft on it, and convert to image again.
My steps:
1) I'm opening image with PIL library in Python like this
from PIL import Image
im = Image.open("test.png")
2) I'm getting pixels
pixels = list(im.getdata())
3) I'm seperate every pixel to r,g,b values
for x in range(width):
for y in range(height):
r,g,b = pixels[x*width+y]
red[x][y] = r
green[x][y] = g
blue[x][y] = b
4). Let's assume that I have one pixel (111,111,111). And use fft on all red values like this
red = np.fft.fft(red)
And then:
print (red[0][0], green[0][0], blue[0][0])
My output is:
(53866+0j) 111 111
It's completely wrong I think. My image is 64x64, and FFT from gimp is completely different. Actually, my FFT give me only arrays with huge values, thats why my output image is black.
Do you have any idea where is problem?
[EDIT]
I've changed as suggested to
red= np.fft.fft2(red)
And after that I scale it
scale = 1/(width*height)
red= abs(red* scale)
And still, I'm getting only black image.
[EDIT2]
Ok, so lets take one image.
Assume that I dont want to open it and save as greyscale image. So I'm doing like this.
def getGray(pixel):
r,g,b = pixel
return (r+g+b)/3
im = Image.open("test.png")
im.load()
pixels = list(im.getdata())
width, height = im.size
for x in range(width):
for y in range(height):
greyscale[x][y] = getGray(pixels[x*width+y])
data = []
for x in range(width):
for y in range(height):
pix = greyscale[x][y]
data.append(pix)
img = Image.new("L", (width,height), "white")
img.putdata(data)
img.save('out.png')
After this, I'm getting this image , which is ok. So now, I want to make fft on my image before I'll save it to new one, so I'm doing like this
scale = 1/(width*height)
greyscale = np.fft.fft2(greyscale)
greyscale = abs(greyscale * scale)
after loading it. After saving it to file, I have . So lets try now open test.png with gimp and use FFT filter plugin. I'm getting this image, which is correct
How I can handle it?
Great question. I’ve never heard of it but the Gimp Fourier plugin seems really neat:
A simple plug-in to do fourier transform on you image. The major advantage of this plugin is to be able to work with the transformed image inside GIMP. You can so draw or apply filters in fourier space, and get the modified image with an inverse FFT.
This idea—of doing Gimp-style manipulation on frequency-domain data and transforming back to an image—is very cool! Despite years of working with FFTs, I’ve never thought about doing this. Instead of messing with Gimp plugins and C executables and ugliness, let’s do this in Python!
Caveat. I experimented with a number of ways to do this, attempting to get something close to the output Gimp Fourier image (gray with moiré pattern) from the original input image, but I simply couldn’t. The Gimp image appears to be somewhat symmetric around the middle of the image, but it’s not flipped vertically or horizontally, nor is it transpose-symmetric. I’d expect the plugin to be using a real 2D FFT to transform an H×W image into a H×W array of real-valued data in the frequency domain, in which case there would be no symmetry (it’s just the to-complex FFT that’s conjugate-symmetric for real-valued inputs like images). So I gave up trying to reverse-engineer what the Gimp plugin is doing and looked at how I’d do this from scratch.
The code. Very simple: read an image, apply scipy.fftpack.rfft in the leading two dimensions to get the “frequency-image”, rescale to 0–255, and save.
Note how this is different from the other answers! No grayscaling—the 2D real-to-real FFT happens independently on all three channels. No abs needed: the frequency-domain image can legitimately have negative values, and if you make them positive, you can’t recover your original image. (Also a nice feature: no compromises on image size. The size of the array remains the same before and after the FFT, whether the width/height is even or odd.)
from PIL import Image
import numpy as np
import scipy.fftpack as fp
## Functions to go from image to frequency-image and back
im2freq = lambda data: fp.rfft(fp.rfft(data, axis=0),
axis=1)
freq2im = lambda f: fp.irfft(fp.irfft(f, axis=1),
axis=0)
## Read in data file and transform
data = np.array(Image.open('test.png'))
freq = im2freq(data)
back = freq2im(freq)
# Make sure the forward and backward transforms work!
assert(np.allclose(data, back))
## Helper functions to rescale a frequency-image to [0, 255] and save
remmax = lambda x: x/x.max()
remmin = lambda x: x - np.amin(x, axis=(0,1), keepdims=True)
touint8 = lambda x: (remmax(remmin(x))*(256-1e-4)).astype(int)
def arr2im(data, fname):
out = Image.new('RGB', data.shape[1::-1])
out.putdata(map(tuple, data.reshape(-1, 3)))
out.save(fname)
arr2im(touint8(freq), 'freq.png')
(Aside: FFT-lover geek note. Look at the documentation for rfft for details, but I used Scipy’s FFTPACK module because its rfft interleaves real and imaginary components of a single pixel as two adjacent real values, guaranteeing that the output for any-sized 2D image (even vs odd, width vs height) will be preserved. This is in contrast to Numpy’s numpy.fft.rfft2 which, because it returns complex data of size width/2+1 by height/2+1, forces you to deal with one extra row/column and deal with deinterleaving complex-to-real yourself. Who needs that hassle for this application.)
Results. Given input named test.png:
this snippet produces the following output (global min/max have been rescaled and quantized to 0-255):
And upscaled:
In this frequency-image, the DC (0 Hz frequency) component is in the top-left, and frequencies move higher as you go right and down.
Now, let’s see what happens when you manipulate this image in a couple of ways. Instead of this test image, let’s use a cat photo.
I made a few mask images in Gimp that I then load into Python and multiply the frequency-image with to see what effect the mask has on the image.
Here’s the code:
# Make frequency-image of cat photo
freq = im2freq(np.array(Image.open('cat.jpg')))
# Load three frequency-domain masks (DSP "filters")
bpfMask = np.array(Image.open('cat-mask-bpfcorner.png')).astype(float) / 255
hpfMask = np.array(Image.open('cat-mask-hpfcorner.png')).astype(float) / 255
lpfMask = np.array(Image.open('cat-mask-corner.png')).astype(float) / 255
# Apply each filter and save the output
arr2im(touint8(freq2im(freq * bpfMask)), 'cat-bpf.png')
arr2im(touint8(freq2im(freq * hpfMask)), 'cat-hpf.png')
arr2im(touint8(freq2im(freq * lpfMask)), 'cat-lpf.png')
Here’s a low-pass filter mask on the left, and on the right, the result—click to see the full-res image:
In the mask, black = 0.0, white = 1.0. So the lowest frequencies are kept here (white), while the high ones are blocked (black). This blurs the image by attenuating high frequencies. Low-pass filters are used all over the place, including when decimating (“downsampling”) an image (though they will be shaped much more carefully than me drawing in Gimp 😜).
Here’s a band-pass filter, where the lowest frequencies (see that bit of white in the top-left corner?) and high frequencies are kept, but the middling-frequencies are blocked. Quite bizarre!
Here’s a high-pass filter, where the top-left corner that was left white in the above mask is blacked out:
This is how edge-detection works.
Postscript. Someone, make a webapp using this technique that lets you draw masks and apply them to an image real-time!!!
There are several issues here.
1) Manual conversion to grayscale isn't good. Use Image.open("test.png").convert('L')
2) Most likely there is an issue with types. You shouldn't pass np.ndarray from fft2 to a PIL image without being sure their types are compatible. abs(np.fft.fft2(something)) will return you an array of type np.float32 or something like this, whereas PIL image is going to receive something like an array of type np.uint8.
3) Scaling suggested in the comments looks wrong. You actually need your values to fit into 0..255 range.
Here's my code that addresses these 3 points:
import numpy as np
from PIL import Image
def fft(channel):
fft = np.fft.fft2(channel)
fft *= 255.0 / fft.max() # proper scaling into 0..255 range
return np.absolute(fft)
input_image = Image.open("test.png")
channels = input_image.split() # splits an image into R, G, B channels
result_array = np.zeros_like(input_image) # make sure data types,
# sizes and numbers of channels of input and output numpy arrays are the save
if len(channels) > 1: # grayscale images have only one channel
for i, channel in enumerate(channels):
result_array[..., i] = fft(channel)
else:
result_array[...] = fft(channels[0])
result_image = Image.fromarray(result_array)
result_image.save('out.png')
I must admit I haven't managed to get results identical to the GIMP FFT plugin. As far as I see it does some post-processing. My results are all kinda very low contrast mess, and GIMP seems to overcome this by tuning contrast and scaling down non-informative channels (in your case all chanels except Red are just empty). Refer to the image: