I need to work with some greyscale tif files and I have been using PIL to import them as images and convert them into numpy arrays:
np.array(Image.open(src))
I want to have a transparent understanding of exactly what the values of these array correspond to and in particular, it was not clear what value was appropriate as a white point or black point for my images. For instance if I wanted to convert this array into an array of floats with pixel values of 1 for white values and 0 for black with other values scaled linearly in between.
I have tried some naive methods including scaling by the maximum value in the array but opening the resulting files, there is always some amount of shift in the color levels.
Is there any documentation for the proper way to understand the values stored in these tif arrays?
A TIFF is basically a computer file format for storing raster graphics images. It has a lot of specs and quick search on the web will get you the resources you need.
The thing is you are using PIL as your input library. The array you have is likely working with an uint8 data type, which means your data can be anywhere within 0 to 255. To obtain the 0 to 1 color range do the following:
im = np.array(Image.open(src)).astype('float32')/255
Notice your array will likely have 4 layers given in the third dimension im[:,:, here] (im.shape = (i,j,k)). So each trace im[i,j,:] (which represents a pixel) is going to be a quadruplet for an RGBA value.
The R stands for Red (or quantity of Red), G for Green, B for Blue. A is the alpha channel and it is what enables you to have transparency (lower values means less opacity and more transparency).
It can also have three layers for only RGB, or one layer if intended to be plotted in the grey-scale.
In the case you have RGB (or RGBA but not considering alpha) but need a single value you should understand that there are quite a few different ways of doing this. In this post #denis recommends the use of the following formulation:
Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
where gamma is 2.2 for many PCs. The usual R G B are sometimes written
as R' G' B' (R' = Rlin ^ (1/gamma)) (purists tongue-click) but here
I'll drop the '.
And finally L* = 116 * Y ^ 1/3 - 16 to obtain the luminance.
I recommend you to read his post. Also consider looking into the following concepts:
RGB Colors model
Gamma correction
Tagged Image File Format
Pillow documentation of TIFF
Working with TIFFs (import, export) in Python using numpy
Related
Assuming there are only 2 colors in an image. What's the simplest way in Python to tell an image has more (the colored areas) of these 2 colors than the other (group of similar images)?
Definition of "more": the area of total colored blocks of one picture, is bigger than the other. (please note the shape of colors might be irregular)
Thank you.
Okay, after some experimentation, I have a possible solution. You can use Pillow, a common image-loading/handling library, to convert the images to an ndarray, and then use the count_nonzero() method to get your desired results. As a fun side-effect, this works with an arbitrary amount of colors. Here's full working code that I just tried:
from PIL import Image # because for some reason, that's how you import something from Pillow
import numpy as np
im = Image.open("/path/to/image.png")
arr = np.array(im.getdata())
unique_colors, counts = np.unique(arr.reshape(-1, arr.shape[1]), axis=0, return_counts=True)
Now the unique_colors variable holds the unique colors that appear in your image, and counts holds the corresponding counts for each color in the image; that is to say, counts[i] is the number of times unique_colors[i] appears in the image for any i.
How does the unique + reshaping line work? This is borrowed from this particular answer. Basically, you flatten out your image array such that it has shape (num_pixels, num_channels), which could be 1, 3, or 4 depending on your image format (single-channel, RGB, RGBA, etc.). Now that I have a giant 2D "table" of pixels, I simply find which row values (hence axis=0) are unique, and then use the return_counts keyword to return, well, the counts.
At this point, you have extracted the unique colors and counts of those colors for a single image. To compare multiple images, you would repeat this process on multiple images, find the colors they have in common, and then you can simply compare integers to find out which image has more of a particular color.
For my particular image, the format of the channels happened to be RGBA; in any case, I would recommend printing out arr.shape prior to the reshape step to verify that you have the correct index. If you/anyone else knows of a more general method to find the channel index of an image obtained in this fashion — I'm all ears. Thus, you may have to change the index of arr.shape to something else depending on your image. For the record, I tried this on a .png image, like you specified. Hope this helps!
I am writing a function that takes an image input, and returns a list of the RGB codes.
im = Image.open('picture.jpg')
pix = list(im.getdata())
pix should be a list of RGB tuples. In most cases it is, but I have found some cases like this:
[(244,255,255), (100,100,90), (23,0,80), ..., 220, (100,100,100)]
i.e. somehow im.getdata() is retreiving an integer from the pixel values??
Similarly, there are cases where one of the entires in pix is a 4-tuple:
(1,0,0,255).
Can someone explain why this is? How can I change this so that I retrieve a list of only the RGB tuples of an image?
Any help or guidance would be appreciated!
Raster images can be stored in various ways - mainly in order to be efficient in terms of disk space and bandwidth required to transmit. Here are some of the options:
RGB triplets - this is the most common way for images to be stored and you get 3 values... 1 red, 1 green and 1 blue for each pixel,
RGBA quads - this means you have 4 values for each pixel... 1 red, 1 green, 1 blue and an alpha (A) value which specifies how opaque/transparent that pixel is,
greyscale - this means all pixels in the image are grey and you just get one value for each pixel specifying where it Sts on the scale between pure black (0) and pure white (255),
palettised - this means that there are fewer than 256 different colours in the image, and rather than store each one as 3 bytes of RGB, you just store the single byte for each pixel and use that to look up the corresponding value from the embedded table of 256 RGB values. This means you only 1 byte per pixel instead of 3.
Notwithstanding all that, some images use 8 bits per sample. some use 16 bits, some use 32 bits and others use 64 bits. Also some use compression and some do not. But all this is independent and a separate issue from the list of points I made above.
If you want to be assured of always getting 3 values per pixel, just be sure to convert to RGB mode when you open:
im = Image.open('picture.jpg').convert('RGB')
Note that you may be needlessly increasing the memory you need to store the image (if greyscale or palettised) , or you may be discarding the alpha channel.
Another option is to get the type of the image and deal with the different cases:
# Open image
im = Image.open('start.png')
# Check bands present - e.g. ('R', 'G', 'B')
bands = im.getbands()
I am trying to convert an image into an array of pixels.
Here is my current code.
im = Image.open("beeleg.png")
pixels = im.load()
im.getdata() # doesn't work
print(pixels # doesn't work
Ideally, my end goal is to convert the image into a vector of just pixels, so for instance if I have an image of dimensions 100x100, then I want a vector of dimensions 1x10000, where each value is between [0, 255]. Then, divide each of the values in the array by 256 and add a bias of 1 in the front of the vector. However, I am not able to proceed with all this without being able to obtain an array. How to proceed?
Scipy's ndimage library is generally the go-to library for working with pixels as data (arrays). You can load an image from file (most common formats supported) using scipy.ndimage.imread into a numpy array which can be easily reshaped and mathematically operated on. The mode keyword can be used to specify a colorspace transformation upon load (convert an RGB image to black and white). In your case you asked for single color pixels from 0-255 (8bit grayscale) so you would use mode='L'. See The Documentation for usage / more useful functions.
If use OpenCV, gray=cv2.imread(image,0) will return a grayscale image with n rows x m cols single channel numpy array. rows, cols = gray.shape will return the height and width of the image.
I am quite new at Python programming and I need your help. I always do a research for my problem first before posting.
I have SAR dual polarization image (2^16 gray level values) in tiff format. In this tiff image there are two bands. The first band (HH_band) is a horizontal polarization channel and the second one (HV_band) is the vertical polarization channel. I want to create an RGB composite image. For this to happen, I need to layer stack the two channels as follows:
get the first band (HH_band)
get the second band (HV_band)
get the ratio (HH_band/HV_band)
I know that there are many people posting about sometime similar to this (RGB composite image of natural colors). I tried to use cv2.merge or cv2.split from openCV library but didn't work. I thought it would be relatively easy to create a SAR RGB image in Python (as I have seen a few post about creating RGB image of LANDSAT) but I got stuck in my case.
I would much appreciate any help.
Here is a possible way to accomplish the band composition programmatically:
import numpy as np
tif = io.imread('dual_polarization_image.tif')
band = {'HH': 0, 'HV': 1}
r = tif[:, :, band['HH']]
g = tif[:, :, band['HV']]
hh = r.astype(np.float64)
hv = g.astype(np.float64)
b = np.divide(hh, hv, out=np.zeros_like(hh), where=hv!=0)
rgb = np.dstack((r, g, b.astype(np.uint16)))
Remarks:
It would be possible to deal with different arrangements of the bands in the TIFF image by simply redefining the values of the dictionary band.
Prior to calculating the band ratio is necessary to convert data to np.float64.
I have taken advantage of the where option for universal functions to avoid zero division warnings.
In order for the composition to be possible, the band ratio (blue channel) has to be converted back to the same type (i.e. np.uint16) as the original bands (red and green channels).
It's difficult to test without sample images, but you should be able to do this simply at the commandline with ImageMagick which is included in most Linux distributions and is available for OSX and Windows.
The command will look like:
convert HH.tif HV.tif \( -clone 0 -clone 1 -compose divide -composite \) \
-combine -auto-level result.png
I have a problem with FFT implementation in Python. I have completely strange results.
Ok so, I want to open image, get value of every pixel in RGB, then I need to use fft on it, and convert to image again.
My steps:
1) I'm opening image with PIL library in Python like this
from PIL import Image
im = Image.open("test.png")
2) I'm getting pixels
pixels = list(im.getdata())
3) I'm seperate every pixel to r,g,b values
for x in range(width):
for y in range(height):
r,g,b = pixels[x*width+y]
red[x][y] = r
green[x][y] = g
blue[x][y] = b
4). Let's assume that I have one pixel (111,111,111). And use fft on all red values like this
red = np.fft.fft(red)
And then:
print (red[0][0], green[0][0], blue[0][0])
My output is:
(53866+0j) 111 111
It's completely wrong I think. My image is 64x64, and FFT from gimp is completely different. Actually, my FFT give me only arrays with huge values, thats why my output image is black.
Do you have any idea where is problem?
[EDIT]
I've changed as suggested to
red= np.fft.fft2(red)
And after that I scale it
scale = 1/(width*height)
red= abs(red* scale)
And still, I'm getting only black image.
[EDIT2]
Ok, so lets take one image.
Assume that I dont want to open it and save as greyscale image. So I'm doing like this.
def getGray(pixel):
r,g,b = pixel
return (r+g+b)/3
im = Image.open("test.png")
im.load()
pixels = list(im.getdata())
width, height = im.size
for x in range(width):
for y in range(height):
greyscale[x][y] = getGray(pixels[x*width+y])
data = []
for x in range(width):
for y in range(height):
pix = greyscale[x][y]
data.append(pix)
img = Image.new("L", (width,height), "white")
img.putdata(data)
img.save('out.png')
After this, I'm getting this image , which is ok. So now, I want to make fft on my image before I'll save it to new one, so I'm doing like this
scale = 1/(width*height)
greyscale = np.fft.fft2(greyscale)
greyscale = abs(greyscale * scale)
after loading it. After saving it to file, I have . So lets try now open test.png with gimp and use FFT filter plugin. I'm getting this image, which is correct
How I can handle it?
Great question. I’ve never heard of it but the Gimp Fourier plugin seems really neat:
A simple plug-in to do fourier transform on you image. The major advantage of this plugin is to be able to work with the transformed image inside GIMP. You can so draw or apply filters in fourier space, and get the modified image with an inverse FFT.
This idea—of doing Gimp-style manipulation on frequency-domain data and transforming back to an image—is very cool! Despite years of working with FFTs, I’ve never thought about doing this. Instead of messing with Gimp plugins and C executables and ugliness, let’s do this in Python!
Caveat. I experimented with a number of ways to do this, attempting to get something close to the output Gimp Fourier image (gray with moiré pattern) from the original input image, but I simply couldn’t. The Gimp image appears to be somewhat symmetric around the middle of the image, but it’s not flipped vertically or horizontally, nor is it transpose-symmetric. I’d expect the plugin to be using a real 2D FFT to transform an H×W image into a H×W array of real-valued data in the frequency domain, in which case there would be no symmetry (it’s just the to-complex FFT that’s conjugate-symmetric for real-valued inputs like images). So I gave up trying to reverse-engineer what the Gimp plugin is doing and looked at how I’d do this from scratch.
The code. Very simple: read an image, apply scipy.fftpack.rfft in the leading two dimensions to get the “frequency-image”, rescale to 0–255, and save.
Note how this is different from the other answers! No grayscaling—the 2D real-to-real FFT happens independently on all three channels. No abs needed: the frequency-domain image can legitimately have negative values, and if you make them positive, you can’t recover your original image. (Also a nice feature: no compromises on image size. The size of the array remains the same before and after the FFT, whether the width/height is even or odd.)
from PIL import Image
import numpy as np
import scipy.fftpack as fp
## Functions to go from image to frequency-image and back
im2freq = lambda data: fp.rfft(fp.rfft(data, axis=0),
axis=1)
freq2im = lambda f: fp.irfft(fp.irfft(f, axis=1),
axis=0)
## Read in data file and transform
data = np.array(Image.open('test.png'))
freq = im2freq(data)
back = freq2im(freq)
# Make sure the forward and backward transforms work!
assert(np.allclose(data, back))
## Helper functions to rescale a frequency-image to [0, 255] and save
remmax = lambda x: x/x.max()
remmin = lambda x: x - np.amin(x, axis=(0,1), keepdims=True)
touint8 = lambda x: (remmax(remmin(x))*(256-1e-4)).astype(int)
def arr2im(data, fname):
out = Image.new('RGB', data.shape[1::-1])
out.putdata(map(tuple, data.reshape(-1, 3)))
out.save(fname)
arr2im(touint8(freq), 'freq.png')
(Aside: FFT-lover geek note. Look at the documentation for rfft for details, but I used Scipy’s FFTPACK module because its rfft interleaves real and imaginary components of a single pixel as two adjacent real values, guaranteeing that the output for any-sized 2D image (even vs odd, width vs height) will be preserved. This is in contrast to Numpy’s numpy.fft.rfft2 which, because it returns complex data of size width/2+1 by height/2+1, forces you to deal with one extra row/column and deal with deinterleaving complex-to-real yourself. Who needs that hassle for this application.)
Results. Given input named test.png:
this snippet produces the following output (global min/max have been rescaled and quantized to 0-255):
And upscaled:
In this frequency-image, the DC (0 Hz frequency) component is in the top-left, and frequencies move higher as you go right and down.
Now, let’s see what happens when you manipulate this image in a couple of ways. Instead of this test image, let’s use a cat photo.
I made a few mask images in Gimp that I then load into Python and multiply the frequency-image with to see what effect the mask has on the image.
Here’s the code:
# Make frequency-image of cat photo
freq = im2freq(np.array(Image.open('cat.jpg')))
# Load three frequency-domain masks (DSP "filters")
bpfMask = np.array(Image.open('cat-mask-bpfcorner.png')).astype(float) / 255
hpfMask = np.array(Image.open('cat-mask-hpfcorner.png')).astype(float) / 255
lpfMask = np.array(Image.open('cat-mask-corner.png')).astype(float) / 255
# Apply each filter and save the output
arr2im(touint8(freq2im(freq * bpfMask)), 'cat-bpf.png')
arr2im(touint8(freq2im(freq * hpfMask)), 'cat-hpf.png')
arr2im(touint8(freq2im(freq * lpfMask)), 'cat-lpf.png')
Here’s a low-pass filter mask on the left, and on the right, the result—click to see the full-res image:
In the mask, black = 0.0, white = 1.0. So the lowest frequencies are kept here (white), while the high ones are blocked (black). This blurs the image by attenuating high frequencies. Low-pass filters are used all over the place, including when decimating (“downsampling”) an image (though they will be shaped much more carefully than me drawing in Gimp 😜).
Here’s a band-pass filter, where the lowest frequencies (see that bit of white in the top-left corner?) and high frequencies are kept, but the middling-frequencies are blocked. Quite bizarre!
Here’s a high-pass filter, where the top-left corner that was left white in the above mask is blacked out:
This is how edge-detection works.
Postscript. Someone, make a webapp using this technique that lets you draw masks and apply them to an image real-time!!!
There are several issues here.
1) Manual conversion to grayscale isn't good. Use Image.open("test.png").convert('L')
2) Most likely there is an issue with types. You shouldn't pass np.ndarray from fft2 to a PIL image without being sure their types are compatible. abs(np.fft.fft2(something)) will return you an array of type np.float32 or something like this, whereas PIL image is going to receive something like an array of type np.uint8.
3) Scaling suggested in the comments looks wrong. You actually need your values to fit into 0..255 range.
Here's my code that addresses these 3 points:
import numpy as np
from PIL import Image
def fft(channel):
fft = np.fft.fft2(channel)
fft *= 255.0 / fft.max() # proper scaling into 0..255 range
return np.absolute(fft)
input_image = Image.open("test.png")
channels = input_image.split() # splits an image into R, G, B channels
result_array = np.zeros_like(input_image) # make sure data types,
# sizes and numbers of channels of input and output numpy arrays are the save
if len(channels) > 1: # grayscale images have only one channel
for i, channel in enumerate(channels):
result_array[..., i] = fft(channel)
else:
result_array[...] = fft(channels[0])
result_image = Image.fromarray(result_array)
result_image.save('out.png')
I must admit I haven't managed to get results identical to the GIMP FFT plugin. As far as I see it does some post-processing. My results are all kinda very low contrast mess, and GIMP seems to overcome this by tuning contrast and scaling down non-informative channels (in your case all chanels except Red are just empty). Refer to the image: