I have an image of a city with discrete colors (Green=meadow, black=buildings, white/yellow=roads). Using Pillow, I import the picture in my (Python) program and convert it to a Numpy array with discrete values for the colors (i.e. green pixels become 1's, black pixels become 2's, etc).
I want to downscale the resolution of the image (for computational purposes) while retaining as much information as possible. However, using Pillow's resize() method, colors deviate from these discrete values. How can I downscale this image while (most importantly) retaining the discrete colors and (also important) with losing as little information as possible?
Here an example of the image: https://i.imgur.com/6Tef55H.png
EDIT: per request, some code:
from PIL import Image
import Numpy as np
picture = Image.open(some_image.png)
width, height = picture.size
pic_array = np.zeros(width,height)
# Turn the image into discrete values
for i in range(0,width):
for j in range(0,height):
red, green, blue = picture.getpixel((i,j))
if red == a and green == b and blue == c:
#An example of how discrete colors are converted to values
pic_array[i][j] = 1
Scaling can be done in two ways:
1) Scaling the original image using Pillow's resize library or
2) rescaling the final array using something like:
scaled_array = pic_array[0:width:5, 0:height,5]
Option 1 is "well" in terms of retaining information but loses discrete values, while option 2 does it the other way around.
I was interested in this question and wrote some code to try out some ideas - specifically the "mode" filter suggested by #jasonharper in the comments. So, I programmed it up.
First of all the input image is not 4 nicely defined classes, but actually has 6,504 different colours, so I made a palette of 4 colours using ImageMagick like this:
magick xc:black xc:white xc:yellow xc:green +append palette.png
Here it is enlarged - in reality is 4x1 pixels:
Then I mapped the colours in the image to the palette of 4 discrete colours:
magick map.png +dither -remap palette.png start.png
Then I tried this code to calculate the median and the mode of each 3x3 window:
#!/usr/bin/env python3
from PIL import Image
import numpy as np
from scipy import stats
from skimage.util import view_as_blocks
# Open image and make into Numpy array
im = Image.open('start.png')
na = np.array(im)
# Make a view as 3x3 blocks - crop anything not a multiple of 3
block_shape=(3,3)
view = view_as_blocks(na[:747,:], block_shape)
flatView = view.reshape(view.shape[0], view.shape[1], -1) # now (249,303,9)
# Get median of each 3x3 block
resMedian = np.median(flatView, axis=2).astype(np.uint8)
Image.fromarray(resMedian*60).save('resMedian.png') # arbitrary scaling by 60 for contrast
# Get mode of each 3x3 block
resMode = stats.mode(flatView, axis=2)[0].reshape((249,303)).astype(np.uint8)
Image.fromarray(resMode*60).save('resMode.png') # arbitrary scaling by 60 for contrast
Here is the result of the median filter:
And here is the result of the "mode" filter which is indeed better IMHO:
Here is animated comparison:
If anyone wants to take the code and adapt it to try new ideas, please feel free!
Related
I have a multi class segmented image consisting of labels of 4 different classes represented in 4 different colors ( Darkblue,red,yellow and sky blue ), i would like to calculate the total area of pixels in each class label of segmented prediction.
I tried writing this code for obtaining total number of pixels in each label but i am not able to get any result which consists of total number of pixels in each corresponding class label.
import matplotlib.pyplot as plt
import numpy as np
from skimage import data, io, img_as_ubyte
from skimage.filters import threshold_multiotsu
# Read an image
image = io.imread("images/Ulcer_segmented.jpg")
# Apply multi-Otsu threshold
thresholds = threshold_multiotsu(image, classes=5)
# Digitize (segment) original image into multiple classes.
#np.digitize assign values 0, 1, 2, 3, ... to pixels in each class.
regions = np.digitize(image, bins=thresholds)
output = img_as_ubyte(regions) #Convert 64 bit integer values to uint8
plt.imsave("images/Ulcer_segmented..jpg", output)
props = measure.regionprops_table(label_image, output,
properties=['label',
'area', 'equivalent_diameter',
'mean_intensity', 'solidity'])
This is described in the docs:
from skimage.measure import label, regionprops
# Read an image
image = io.imread("your/image.jpg")
# label image regions
label_image = label(image)
for region in regionprops(label_image):
print(region.area)
Looks like you want to get an image histogram the issue of using np.histogram or skimage.exposure.histogram is that your image is not single-channel and using these functions you would get a histogram of flattened image which would not yield the expected results.
The way you chose to overcome this problem is using otsu thresholding which I'm not sure if works as the documentation states that it expects a single channel (grayscale) image.
The knowledge of the colors used to represent your classes would help here, you could do something like
coors = [
[cls_0_rgb_color],
[cls_1_rgb_color],
[cls_2_rgb_color],
[cls_3_rgb_color]
]
areas = [np.count_nonzero(np.all(img == c, axis=-1)) for c in colors]
If you don't know exactly what colors the classes have you probably have to reduce the last dimension of your image to uniquely represent the 3-dimensional color (I'm not sure exactly how this is done correctly, maybe someone smarter than me can answer this in a new question). What I would do is convert the image to HSV format and use the hue component as a class representation.
from skimage.color import rgb2hsv
hsv = rgb2hsv(image)
hue = hsv[:, :, 0]
areas, bin_edges = np.histogram(hue, bins=4)
What could be tricky here is deciphering which area corresponds to what class but knowing approximately what colors to expect and from knowing how colors in hue space are aligned we could say that the order would be red, yellow, light_blue, dark_blue or yellow, light_blue, dark_blue, red as red hue is symmetrical around 0 or 360 degrees. Checking the bin_edges vector could do the trick here.
# set red_threshold experimentally
if bin_edges[1] < red_threshold:
# (red, yellow, light_blue, dark_blue)
else:
# (yellow, light_blue, dark_blue, red)
I am wondering is there any workaround to convert RGB images to pixel vectors without losing its spatial information in python. As far as I know, I can read the images and do transformation for images to pixel vectors. I am not sure doing this way still preserve images' spatial information in pixel vectors. How can I make this happen for making pixel vectors from RGB image?
my attempt:
I tried as follow but I am not sure how to make
import matplotlib.pyplot as pl
image = plt.imread('dog.jpg')
im = image/255.0
print(im.shape) #(32, 32, 3)
pixels = im.reshape(im.shape[0]*im.shape[1], im.shape[2])
but I want to make sure how to make pixel vectors from RGB images without losing pixel order and its spatial information. How to make this happen? any thoughts?
I think maybe numpy might have functions to do this. Can anyone point me how to do this with numpy?
graphic illustration:
here is simple graphic illustration of making pixel vectors from RGB images:
as this diagram shows, we have RGB images with shape of (4,4,3) which needs to make pixel vectors without losing its spatial information and pixel orders then combine pixel vectors from each channel (Red, Green, Blue) as pixel matrix or dataframe. I am curious how to get this done in python?
goal:
I want to make pixel vectors from RGB images so resulted pixel vectors needs to be expanded with taylor expansion. Can anyone point me out how to make this happen?
Are You just trying to reshape each channel to a vector and then joining them horizontally? That's what I understood from the graphic illustration and the way i would do it is something like this:
import matplotlib.pyplot as plt
import numpy as np
image = plt.imread('monkey.png')
image = image / 255.0
red = image[:,:,0]
green = image[:,:,1]
blue = image[:,:,2]
def to_vector(matrix):
result = []
for i in range(matrix.shape[1]):
result = np.vstack(matrix[:,i])
return result
red = to_vector(red)
green = to_vector(green)
blue = to_vector(blue)
vector = np.hstack((red,green,blue))
Your original attempt was almost a full solution - maybe actually a full solution, depending on what the idea is.
print(im.shape) #(32, 32, 3)
pixels = im.reshape(im.shape[0]*im.shape[1], im.shape[2]) # this is exactly correct
print(pixels.shape) #(1024,3)
reds = pixels[:, 0] #just as an example for where things end up in the result
pixels_channelfirst = np.moveaxis(pixels, 1, 0) # if you want the first axis to be channels
print(pixels.shape) #(3, 1024)
reds = pixels[0, :]
"I want to preserve its pixel order and spatial information" - this does that already! Add one non-zero pixel to a zero image and plot where it goes, if you have doubts. np.hstack in the other answer does as well.
I am trying to plot some images using the code below:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
if __name__ == "__main__":
image2 = Image.open("Test Images/i3.png")
image2 = np.asarray(image2)
plt.imshow(image2, cmap="gray")
plt.tight_layout()
plt.show()
But it plots something different from what I see in an image viewer:
Why is this happening?
The original image:
This is because that PNG has a palette, so pixel data are palette indices. np.asarray takes raw pixel data, the palette is not taken into account.
Use .get_palette() is not None on the PIL image object to detect if the image has a palette and .convert() to convert pixel data to "real" colors.
This is a part of raw data, to get you an idea what it looks like. It includes a corner of the capital 'C' in the inscription:
The strange thing here that catches the eye is that while black is 0 (expectedly), white is for some reason 1 (rather than the expected 255) and other colors are higher but still small, the highest value being 20. Which hints that this is a palette thing.
When you .imshow this data, it's normalized to [0,1], mapping those 20 points, in order, to points equally apart from each other on the spectrum of the Colormap used.
png-parser can show the palette data:
$ png-parser -d amdNt.png
<...>
Filename: amdNt.png | Size: 2925
(0)
IHDR:
Data size : 13
- Width : 225
- Height : 225
- Bit depth : 8
- Color type : Code = 3 ; Depth Allow = [1, 2, 4, 8] ; Each pixel is a palette index; a PLTE chunk must appear.
- Compression method : 0
- Filter method : 0
- Interlace method : 0
(1)
PLTE:
Data size : 69
b'\x00\x00\x00\xff\xff\xff\xfe\xfe\xfe\x01\x01\x01\xfd\xfd\xfd\xb4\xb4\xb4\xb2\xb2\xb2\xb6\xb6\xb6\xaf\xaf\xaf\x05\x05\x05\xfa\xfa\xfa\x10\x10\x10\xbb\xbb\xbb\x16\x16\x16\xb8\xb8\xb8\x0e\x0e\x0e\xf5\xf5\xf5\xaa\xaa\xaa\xc0\xc0\xc0\xf0\xf0\xf0\x19\x19\x19\xc4\xc4\xc4\xa6\xa6\xa6'
<...>
as well as the palette itself:
Have a look here in order to understand palletised (or indexed) images.
The short story is:
if you want and expect a greyscale image with PIL, then you should always use:
im = Image.open("filename").convert('L')
otherwise you may get a palette image.
if you want and expect an RGB image with PIL, then you should use:
im = Image.open("filename").convert('RGB')
If you don't do that, PIL may give you a palette image.
Furthermore, if you are opening images with PIL, it's generally a good idea to display them with PIL and save them with PIL, else you may run into issues of RGB/BGR ordering if you open with PIL and display with OpenCV, or with colormaps if you display them with matplotlib. I am not saying you cannot interoperate, I am just saying it is somewhat harder to mix libraries without issues of channel ordering and value ranges.
I have an image created by many polygons of different solid colors. The coordinates themselves are not given, but can be detected if necessary.
I'm looking for a way to detect all points which are the intersection of 3 or more different colors. The colors are not known in advance, might be similar to each other (e.g one might be (255, 255, 250) and another is (255, 255, 245). The specific shade doesn't matter, just the fact that it is different).
for example, in the following image a tiny star marks all the points that I'm looking for.
As your annotations have obscured the intersections you are trying to identify, I made a new, similar image.
Rather than trying to bend my brain around trying to deal with 3-dimensions of 8-bit RGB colour, I converted that to a single 24-bit integer and then ran a generic filter from SciPy and counted the number of unique colours in each 3x3 window and made a new image from that. So each pixel in the result has a brightness value equal to the number of colours in its neighbourhood. I counted the number of colours by converting the Numpy array of neighbours into a Python set - exploiting the fact that a set can only have unique numbers in it.
#!/usr/bin/env python3
import numpy as np
from PIL import Image
from scipy.ndimage import generic_filter
# CountUnique
def CountUnique(P):
"""
We receive P[0]..P[8] with the pixels in the 3x3 surrounding window, return count of unique values
"""
return len(set(P))
# Open image and make into Numpy array
PILim = Image.open('patches.png').convert('RGB')
RGBim = np.array(PILim)
# Make a single channel 24-bit image rather than 3 channels of 8-bit each
RGB24 = (RGBim[...,0].astype(np.uint32)<<16) | (RGBim[...,1].astype(np.uint32)<<8) | RGBim[...,2].astype(np.uint32)
# Run generic filter counting unique colours in neighbourhood
result = generic_filter(RGB24, CountUnique, (3, 3))
# Save result
Image.fromarray(result.astype(np.uint8)).save('result.png')
The resultant image is shown here, with the contrast stretched so that you can see the brightest pixels at the intersections you seek.
A histogram of the values in the result image shows there are 21 pixels which have 3 unique colours in their 3x3 neighbourhood and 4,348 pixels which have 2 unique colours in their neighbourhood. You can find these by running np.where(result==3), for example.
Histogram:
155631: ( 1, 1, 1) #010101 gray(1)
4348: ( 2, 2, 2) #020202 gray(2)
21: ( 3, 3, 3) #030303 gray(3)
For extra fun, I had a go at programming the method suggested by #Micka and that gives the same results, code looks like this:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
from skimage.morphology import dilation, disk
# Open image and make into Numpy array
PILim = Image.open('patches.png').convert('RGB')
RGBim = np.array(PILim)
h, w = RGBim.shape[0], RGBim.shape[1]
# Make a single channel 24-bit image rather than 3 channels of 8-bit each
RGB24 = (RGBim[...,0].astype(np.uint32)<<16) | (RGBim[...,1].astype(np.uint32)<<8) | RGBim[...,2].astype(np.uint32)
# Make list of unique colours
UniqueColours = np.unique(RGB24)
# Create result image
result = np.zeros((h,w),dtype=np.uint8)
# Make mask for any particular colour - same size as original image
mask = np.zeros((h,w), dtype=np.uint8)
# Make disk-shaped structuring element for morphology
selem = disk(1)
# Iterate over unique colours
for i,u in enumerate(UniqueColours):
# Turn on all pixels matching this unique colour, turn off all others
mask = np.where(RGB24==u,1,0)
# Dilate (fatten) the mask by 1 pixel
mask = dilation(mask,selem)
# Add all activated pixels to result image
result = result + mask
# Save result
Image.fromarray(result.astype(np.uint8)).save('result.png')
For reference, I created the image with anti-aliasing disabled in ImageMagick at the command line like this:
convert -size 400x400 xc:red -background red +antialias \
-fill blue -draw "polygon 42,168 350,72 416,133 416,247 281,336" \
-fill yellow -draw "polygon 271,11 396,127 346,154 77,86" \
-fill lime -draw "polygon 366,260 366,400 120,400" patches.png
Keywords: Python, image, image processing, intersect, intersection, PIL/Pillow, adjacency, neighbourhood, neighborhood, neighbour, neighbor, generic, SciPy, 3x3, filter.
I have a problem with FFT implementation in Python. I have completely strange results.
Ok so, I want to open image, get value of every pixel in RGB, then I need to use fft on it, and convert to image again.
My steps:
1) I'm opening image with PIL library in Python like this
from PIL import Image
im = Image.open("test.png")
2) I'm getting pixels
pixels = list(im.getdata())
3) I'm seperate every pixel to r,g,b values
for x in range(width):
for y in range(height):
r,g,b = pixels[x*width+y]
red[x][y] = r
green[x][y] = g
blue[x][y] = b
4). Let's assume that I have one pixel (111,111,111). And use fft on all red values like this
red = np.fft.fft(red)
And then:
print (red[0][0], green[0][0], blue[0][0])
My output is:
(53866+0j) 111 111
It's completely wrong I think. My image is 64x64, and FFT from gimp is completely different. Actually, my FFT give me only arrays with huge values, thats why my output image is black.
Do you have any idea where is problem?
[EDIT]
I've changed as suggested to
red= np.fft.fft2(red)
And after that I scale it
scale = 1/(width*height)
red= abs(red* scale)
And still, I'm getting only black image.
[EDIT2]
Ok, so lets take one image.
Assume that I dont want to open it and save as greyscale image. So I'm doing like this.
def getGray(pixel):
r,g,b = pixel
return (r+g+b)/3
im = Image.open("test.png")
im.load()
pixels = list(im.getdata())
width, height = im.size
for x in range(width):
for y in range(height):
greyscale[x][y] = getGray(pixels[x*width+y])
data = []
for x in range(width):
for y in range(height):
pix = greyscale[x][y]
data.append(pix)
img = Image.new("L", (width,height), "white")
img.putdata(data)
img.save('out.png')
After this, I'm getting this image , which is ok. So now, I want to make fft on my image before I'll save it to new one, so I'm doing like this
scale = 1/(width*height)
greyscale = np.fft.fft2(greyscale)
greyscale = abs(greyscale * scale)
after loading it. After saving it to file, I have . So lets try now open test.png with gimp and use FFT filter plugin. I'm getting this image, which is correct
How I can handle it?
Great question. I’ve never heard of it but the Gimp Fourier plugin seems really neat:
A simple plug-in to do fourier transform on you image. The major advantage of this plugin is to be able to work with the transformed image inside GIMP. You can so draw or apply filters in fourier space, and get the modified image with an inverse FFT.
This idea—of doing Gimp-style manipulation on frequency-domain data and transforming back to an image—is very cool! Despite years of working with FFTs, I’ve never thought about doing this. Instead of messing with Gimp plugins and C executables and ugliness, let’s do this in Python!
Caveat. I experimented with a number of ways to do this, attempting to get something close to the output Gimp Fourier image (gray with moiré pattern) from the original input image, but I simply couldn’t. The Gimp image appears to be somewhat symmetric around the middle of the image, but it’s not flipped vertically or horizontally, nor is it transpose-symmetric. I’d expect the plugin to be using a real 2D FFT to transform an H×W image into a H×W array of real-valued data in the frequency domain, in which case there would be no symmetry (it’s just the to-complex FFT that’s conjugate-symmetric for real-valued inputs like images). So I gave up trying to reverse-engineer what the Gimp plugin is doing and looked at how I’d do this from scratch.
The code. Very simple: read an image, apply scipy.fftpack.rfft in the leading two dimensions to get the “frequency-image”, rescale to 0–255, and save.
Note how this is different from the other answers! No grayscaling—the 2D real-to-real FFT happens independently on all three channels. No abs needed: the frequency-domain image can legitimately have negative values, and if you make them positive, you can’t recover your original image. (Also a nice feature: no compromises on image size. The size of the array remains the same before and after the FFT, whether the width/height is even or odd.)
from PIL import Image
import numpy as np
import scipy.fftpack as fp
## Functions to go from image to frequency-image and back
im2freq = lambda data: fp.rfft(fp.rfft(data, axis=0),
axis=1)
freq2im = lambda f: fp.irfft(fp.irfft(f, axis=1),
axis=0)
## Read in data file and transform
data = np.array(Image.open('test.png'))
freq = im2freq(data)
back = freq2im(freq)
# Make sure the forward and backward transforms work!
assert(np.allclose(data, back))
## Helper functions to rescale a frequency-image to [0, 255] and save
remmax = lambda x: x/x.max()
remmin = lambda x: x - np.amin(x, axis=(0,1), keepdims=True)
touint8 = lambda x: (remmax(remmin(x))*(256-1e-4)).astype(int)
def arr2im(data, fname):
out = Image.new('RGB', data.shape[1::-1])
out.putdata(map(tuple, data.reshape(-1, 3)))
out.save(fname)
arr2im(touint8(freq), 'freq.png')
(Aside: FFT-lover geek note. Look at the documentation for rfft for details, but I used Scipy’s FFTPACK module because its rfft interleaves real and imaginary components of a single pixel as two adjacent real values, guaranteeing that the output for any-sized 2D image (even vs odd, width vs height) will be preserved. This is in contrast to Numpy’s numpy.fft.rfft2 which, because it returns complex data of size width/2+1 by height/2+1, forces you to deal with one extra row/column and deal with deinterleaving complex-to-real yourself. Who needs that hassle for this application.)
Results. Given input named test.png:
this snippet produces the following output (global min/max have been rescaled and quantized to 0-255):
And upscaled:
In this frequency-image, the DC (0 Hz frequency) component is in the top-left, and frequencies move higher as you go right and down.
Now, let’s see what happens when you manipulate this image in a couple of ways. Instead of this test image, let’s use a cat photo.
I made a few mask images in Gimp that I then load into Python and multiply the frequency-image with to see what effect the mask has on the image.
Here’s the code:
# Make frequency-image of cat photo
freq = im2freq(np.array(Image.open('cat.jpg')))
# Load three frequency-domain masks (DSP "filters")
bpfMask = np.array(Image.open('cat-mask-bpfcorner.png')).astype(float) / 255
hpfMask = np.array(Image.open('cat-mask-hpfcorner.png')).astype(float) / 255
lpfMask = np.array(Image.open('cat-mask-corner.png')).astype(float) / 255
# Apply each filter and save the output
arr2im(touint8(freq2im(freq * bpfMask)), 'cat-bpf.png')
arr2im(touint8(freq2im(freq * hpfMask)), 'cat-hpf.png')
arr2im(touint8(freq2im(freq * lpfMask)), 'cat-lpf.png')
Here’s a low-pass filter mask on the left, and on the right, the result—click to see the full-res image:
In the mask, black = 0.0, white = 1.0. So the lowest frequencies are kept here (white), while the high ones are blocked (black). This blurs the image by attenuating high frequencies. Low-pass filters are used all over the place, including when decimating (“downsampling”) an image (though they will be shaped much more carefully than me drawing in Gimp 😜).
Here’s a band-pass filter, where the lowest frequencies (see that bit of white in the top-left corner?) and high frequencies are kept, but the middling-frequencies are blocked. Quite bizarre!
Here’s a high-pass filter, where the top-left corner that was left white in the above mask is blacked out:
This is how edge-detection works.
Postscript. Someone, make a webapp using this technique that lets you draw masks and apply them to an image real-time!!!
There are several issues here.
1) Manual conversion to grayscale isn't good. Use Image.open("test.png").convert('L')
2) Most likely there is an issue with types. You shouldn't pass np.ndarray from fft2 to a PIL image without being sure their types are compatible. abs(np.fft.fft2(something)) will return you an array of type np.float32 or something like this, whereas PIL image is going to receive something like an array of type np.uint8.
3) Scaling suggested in the comments looks wrong. You actually need your values to fit into 0..255 range.
Here's my code that addresses these 3 points:
import numpy as np
from PIL import Image
def fft(channel):
fft = np.fft.fft2(channel)
fft *= 255.0 / fft.max() # proper scaling into 0..255 range
return np.absolute(fft)
input_image = Image.open("test.png")
channels = input_image.split() # splits an image into R, G, B channels
result_array = np.zeros_like(input_image) # make sure data types,
# sizes and numbers of channels of input and output numpy arrays are the save
if len(channels) > 1: # grayscale images have only one channel
for i, channel in enumerate(channels):
result_array[..., i] = fft(channel)
else:
result_array[...] = fft(channels[0])
result_image = Image.fromarray(result_array)
result_image.save('out.png')
I must admit I haven't managed to get results identical to the GIMP FFT plugin. As far as I see it does some post-processing. My results are all kinda very low contrast mess, and GIMP seems to overcome this by tuning contrast and scaling down non-informative channels (in your case all chanels except Red are just empty). Refer to the image: