Rudimentary Computer Vision Techniques for Python Bot. - python

After completion several chapters in computer vision books I decided to apply those methods to create some primitive bot for a game. I chose Fling that has almost no dynamics and all I needed to do was to find balls. Balls may have 5 different colors and also they can be directed to any of 4 directions (depending on eyes' location). I cropped each block in the field such that I can just check each block whether it contains a ball or not. My problem is that I'm not able to find balls correctly.
My first attempt was following. I sum RGB colors for each ball and get [R, G, B] array. Then I sum RGB colors for each block in the field. If block's array has a similar [R, G, B] as ball's array I suggest that this block has a ball.
The problem is it's hard to find good value for 'similarity'. Even different empty blocks vary in such sums significantly.
Second, I tried to use openCV module that has matchTemplate function. This function matches image with another source image and along with minMaxLoc function returns a value maxLoc. If maxLoc is close to 1 then the image is probably in source image. I made all possible variations of balls (20 overall), and passed them with the entire field. This function worked well but unfortunately it sometimes misses some balls in the field or assigns two different types of balls (say green and yellow) for one ball. I tried to improve the process by matching balls not with the entire field but with each block (this method has advantage that it checks each block and should detect correct number of balls in the field, when matching with entire field only gives one location for each color of ball. If there are two balls of the same color matchTemplate loses information about 2nd ball) . Surprisingly it still has false negatives\positives.
Probably there is much easier way to solve this problem (maybe a library that I don't know yet) but for now I can't find one. Any suggestions are welcomed.

The balls seem pretty distinct in terms of colour. The problems you initially described seem to be related to some of the finer, random detail present in the image - especially in the background and in the different shading/poses of the ball.
On this basis, I would say you could simplify the task significantly by applying a set of pre-processing steps to "collapse" the range of colours in the image.
There are any number of more principled ways to achieving accurate colour segmentation (which is what, more formally, you want to achieve) - but taking a more pragmatic view, here are a few quick'n'dirty hacks.
So, for example, we can initially smooth the image to reduce higher frequency components...
Then, convert to a normalised RGB representation...
Before, finally posterizing it with the mean shift filtering step...
Here is the code in Python, using the OpenCV bindings, that does all this in order:
import cv
# get orginal image
orig = cv.LoadImage('fling.png')
# show original
cv.ShowImage("orig", orig)
# blur a bit to remove higher frequency variation
cv.Smooth(orig,orig,cv.CV_GAUSSIAN,5,5)
# normalise RGB
norm = cv.CreateImage(cv.GetSize(orig), 8, 3)
red = cv.CreateImage(cv.GetSize(orig), 8, 1)
grn = cv.CreateImage(cv.GetSize(orig), 8, 1)
blu = cv.CreateImage(cv.GetSize(orig), 8, 1)
total = cv.CreateImage(cv.GetSize(orig), 8, 1)
cv.Split(orig,red,grn,blu,None)
cv.Add(red,grn,total)
cv.Add(blu,total,total)
cv.Div(red,total,red,255.0)
cv.Div(grn,total,grn,255.0)
cv.Div(blu,total,blu,255.0)
cv.Merge(red,grn,blu,None,norm)
cv.ShowImage("norm", norm)
# posterize simply with mean shift filtering
post = cv.CreateImage(cv.GetSize(orig), 8, 3)
cv.PyrMeanShiftFiltering(norm,post,20,30)
cv.ShowImage("post", post)

Your task is simpler in several respects than the ones the general computer vision algorithms you'll find were designed for: you know exactly what to look for and you know exactly where to look for it. As such I think involving an external library is an unnecessary complication, unless you're already familiar with it and can use it effectively as a tool to solve your own problem. In this post I will only use PIL.
First, distinguish the task into two simpler tasks:
Given a tile, determine whether there's a ball there.
Given a tile where we're pretty sure that there's a ball, identify the colour of the ball.
The second task should be simple and I won't spend time on it here. Basically, sample some pixels where the ball's main colour will be visible and compare the colours you find to the known ball colours.
So let's look at the first task.
First off, note that the balls don't extend to the edge of the tiles. Thus you can find a fairly representative sample of the background of a tile, whether or not there's a ball there, by sampling the pixels along the edge of the tile.
A simple way to proceed is to compare every pixel in a tile with this sample of the tile background, and to obtain some sort of measure of whether it's generally similar (no ball) or dissimilar (ball).
The following is one way to do this. The basic approach used here is to calculate the mean and the standard deviation of the background pixels -- separately for the red, green, and blue channels. For every pixel, we then calculate the number of standard deviations we are from the mean in every channel. We take this value for the most dissimilar channel as our measure of dissimilarity.
import Image
import math
def fetch_pixels(col, row):
img = Image.open( "image.png" )
img = img.crop( (col*32,row*32,(col+1)*32,(row+1)*32) )
return img.load()
def border_pixels( a ):
rv = [ a[x,y] for x in range(32) for y in (0,31) ]
rv.extend( [ a[x,y] for x in (0,31) for y in range(1,31) ] )
return rv
def mean_and_stddev( xs ):
mean = float(sum( xs )) / len(xs)
dev = math.sqrt( float(sum( [ (x-mean)**2 for x in xs ] )) / len(xs) )
return mean, dev
def calculate_deviations(cols = 7, rows = 8):
outimg = Image.new( "L", (cols*32,rows*32) )
pixels = outimg.load()
for col in range(cols):
for row in range(rows):
rv = calculate_deviations_for( col, row, pixels )
print rv
outimg.save( "image_output.png" )
def calculate_deviations_for( col, row, opixels ):
a = fetch_pixels( col, row )
border = border_pixels( a )
bru, brd = mean_and_stddev( map( lambda x : x[0], border ) )
bgu, bgd = mean_and_stddev( map( lambda x : x[1], border ) )
bbu, bbd = mean_and_stddev( map( lambda x : x[2], border ) )
rv = []
for y in range(32):
for x in range(32):
r, g, b = a[x,y]
dr = (bru-r) / brd
dg = (bgu-g) / bgd
db = (bbu-b) / bbd
t = max(abs(dr), abs(dg), abs(db))
opixel = 0
limit, span = 2.5, 8.0
if t > limit:
v = min(1.0, (t - limit) / span)
print t,v
opixel = 127 + int( 128 * v )
opixels[col*32+x,row*32+y] = opixel
rv.append( t )
return (sum(rv) / float(len(rv)))
A visualization of the result is here:
Note that most of the non-ball pixels are pure black. It should now be possible to determine whether a ball is present or not by simply counting the black pixels. (Or more reliably: count the size of the largest single blob of non-black pixels.)
Now, this is a very ad-hoc method and I certainly don't make any claim that it's the best method. The "limit" value was determined by experimentation -- essentially, by trial and error. It's included here to illustrate the sort of method I think you should be exploring, and to give you a starting point to tweak from. (If you want a place to start experimenting, you could try to make it give a better result for the top purple ball. Can you think of weaknesses in the approach above that might make it give a result like that? Always keep in mind, however, that you don't need a perfect-looking result, just one that's good enough. The final answer you want is "ball" or "no ball", and you just want to be able to answer that reliably.)
Note that:
You need to make sure you take the screengrab when the balls have finished rolling and are lying still in the center of their tiles. This simplifies the problem immensely.
The game's background affects the problem -- if there are ocean-themed or desert-themed levels coming up, you will need to test and possibly tweak the recognizer to make sure it still reliably works.
Special effects and/or GUI elements that cover the playing field will complicate the problem. (E.g. consider if the game has a 'cloud' or 'smoke' effect that sometimes floats over the playing field.) You may want to tweak the recognizer to be able to return "no result" if it's not sure -- then you can try another screengrab later. You may want to take several screengrabs and average the results.
I have assumed that there are only balls and non-balls. If later levels have other kinds of objects, you will have to experiment more to find out how to best recognize those.
I haven't used the 'reference picture' approach. However, if you have an image containing all the objects in the game and you can exactly align the pixels with your tiles, that's likely going to be the most reliable approach. Instead of comparing the foreground to the sampled background, compare the foreground to a set of known foreground images.

Related

Calculating the nearest neighbour in a 2d grid using multilevel solution

I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?

Algorithm to successively deliver colors of maximum contrast

I need to generate colors to use to highlight types of content. There may be 10s of color shades in one document.
My use case is that I wish to inform scientists regarding how data has been extracted from an input document. Color-coded background shading (with descriptive) tooltips will be used to highlight the blocks imported. Here is a mock input file:
And here is a mock version of how it would look once highlighted to indicate which fields were imported. A tooltip on each highlighted block would give further detail on the tool that recognised it, plus the value (with units) that was pushed to the database:
Each time I encounter a new type of content, I need to generate a new color. That color should have maximum contrast to the existing colors. Clearly, the further we go, the less contrast there will be.
In trying to imagine a solution to this, I've imagined a color wheel. We start with one color. For the next color to have maximum contrast, it will be opposite the first on the wheel.
For each successive color, an algorithm will have to look for the largest "unoccupied" arc on the color wheel and generate the color at the mid-point of it.
Does this seem like any existing color generation strategy?
If so, are there any documented algorithms to implement it?
(My target environment is Python, but that seems a mere implementation detail)
You want to have colors equidistant, and as far away as possible from each other, with white and black already inserted as used.
An easy, and surprisingly good metric to use is mean-red:
If you're doing it on the fly, you'd have to recalculate the positions of your colors with each new color you add, and if you know beforehand how many colors there are, you can calculate their positions all at once.
Here's an example implementation in python:
import numpy as np
from itertools import combinations
from scipy.optimize import minimize, Bounds
BLACK_AND_WHITE = np.array((0.0, 0.0, 0.0, 255.0, 255.0, 255.0))
we consider three consecutive numbers in the array as representing a color, given two such triplets, and using the distance as defined above, our distance function is
def distance(c1, c2):
r = (c1[0] + c2[0]) / 2.0
coeffs = np.array(((2.0 + r/256), 4, (2.0 + (255 - r)/256.0)))
diff = c1 - c2
return np.sqrt(np.sum(coeffs * diff ** 2))
We want to maximize the minimal distance between all color pairs, which is the same as minimizing the negative the minimal distance. To get the pairs, we use combinations(..., 2) which does just that, and to make it iterate over the triplets we reshape the colors array so that each row would contain a color:
def cost_function(x):
colors = np.concatenate((BLACK_AND_WHITE, x)).reshape(-1, 3)
return -min(mean_red_distance(color_pairs[0], color_pairs[1]) for color_pairs in combinations(colors, 2))
Now it's time to minimize the our cost function, the colors are allowed to range between 0 and 255:
def get_new_colors_after_adding(existing_colors):
if len(existing_colors):
guess = np.mod(existing_colors.reshape(-1, 3)[0] + np.array((100, 100, 100)), 256)
else:
guess = np.array((0, 255, 255))
guess = np.concatenate((guess, existing_colors))
# let all colors range between 0 and 255
result = minimize(cost_function, guess, bounds=Bounds(0, 255))
if not result.success:
raise ValueError('Failed adding new color')
return result.x
And finally, we add 10 colors at each step and print the resulting triplets:
if __name__ == '__main__':
# start with no colors
existing_colors = np.empty(0, dtype=np.int32)
# example of consequently adding colors.
for i in range(10):
existing_colors = get_new_colors_after_adding(existing_colors)
print(np.round(existing_colors.reshape(-1, 3)).astype(np.int))
Aah, I've thought of an alternate strategy for this problem.
Instead of generating colors on the fly, the color-coding could be deferred until the end of the process.
Once we know how many colors are required, we can generate that many permutations evenly spaced throughout the hues in the HSB/HSV color spectrum. That would provide the most contrast, I think.

Count objects in a binarized image without scipy

I'm trying to count the number of objects in an image, which I already binarized, however, I'm not allowed to use scipy or numpy packages, therefore I can’t use scipy.ndimage.label, any ideas? My attempt counts over 80 objects, but there are only 13 (counted with scipy)
def label(img):
n=1
for i in range(h):
for j in range(c):
if img[i][j]==255:
if img[i-1][j]!=0 and img[i-1][j]!=255:
img[i][j]=img[i-1][j]
elif img[i+1][j]!=0 and img[i+1][j]!=255:
img[i][j]=img[i-1][j]
elif img[i][j+1]!=0 and img[i][j+1]!=255:
img[i][j]=img[i][j+1]
elif img[i][j-1]!=0 and img[i][j-1]!=255:
img[i][j]=img[i][j-1]
else:
img[i][j]=n
if img[i-1][j]!=0:
img[i-1][j]=img[i][j]
if img[i+1][j]!=0:
img[i+1][j]=img[i][j]
if img[i][j+1]!=0:
img[i][j+1]=img[i][j]
if img[i][j-1]!=0:
img[i][j-1]=img[i][j]
n+=1
elif img[i][j]!=0:
if img[i-1][j]!=0:
img[i-1][j]=img[i][j]
if img[i+1][j]!=0:
img[i+1][j]=img[i][j]
if img[i][j+1]!=0:
img[i][j+1]=img[i][j]
if img[i][j-1]!=0:
img[i][j-1]=img[i][j]
return img,n
You will want something like https://codereview.stackexchange.com/questions/148897/floodfill-algorithm, which implements https://en.wikipedia.org/wiki/Flood_fill.
It's a good fit for numba or cython if that's feasible for you.
Perhaps you can use OpenCV, which already offers floodfill: https://docs.opencv.org/3.4/d7/d1b/group__imgproc__misc.html#gaf1f55a048f8a45bc3383586e80b1f0d0.
Suppose you have binarized so background is color one and objects are color zero. Set c = 2, scan for a zero pixel, and floodfill it with color c.
Now increment c, scan for zero, fill it, lather, rinse, repeat.
You will wind up with each object bearing a distinct color so you can use it as an isolation mask.
Distinct colors are very helpful during debugging, but of course three colors suffices (or even two) if you just want a count.
The final bitmap will be uniformly the background color in the two-color case.
Using a 4-element Von Neumann neighborhood versus an 8-element neighborhood will make a big difference in the final result.
It's easier for paint to "leak" through diagonal connectivity in the 8-element setting.
Doing edge detection and thickening can help to reduce unwanted color leakage.

Problems with using a rough greyscale algorithm?

So I'm designing a few programs for editing photos in python using PIL and one of them was converting an image to greyscale (I'm avoiding the use of any functions from PIL).
The algorithm I've employed is simple: for each pixel (colour-depth is 24), I've calculated the average of the R, G and B values and set the RGB values to this average.
My program was producing greyscale images which seemed accurate, but I was wondering if I'd employed the correct algorithm, and I came across this answer to a question, where it seems that the 'correct' algorithm is to calculate 0.299 R + 0.587 G + 0.114 B.
I decided to compare my program to this algorithm. I generated a greyscale image using my program and another one (using the same input) from a website online (the top Google result for 'image to grayscale'.
To my naked eye, it seemed that they were exactly the same, and if there was any variation, I couldn't see it. However, I decided to use this website (top Google result for 'compare two images online') to compare my greyscale images. It turned out that deep in the pixels, they had slight variations, but none which were perceivable to the human eye at a first glance (differences can be spotted, but usually only when the images are laid upon each other or switched between within milliseconds).
My Questions (the first is the main question):
Are there any disadvantages to using my 'rough' greyscale algorithm?
Does anyone have any input images where my greyscale algorithm would produce a visibly different image to the one that would be 'correct' ?
Are there any colours/RBG combinations for which my algorithm won't work as well?
My key piece of code (if needed):
def greyScale(pixelTuple):
return tuple([round(sum(pixelTuple) / 3)] * 3)
The 'correct' algorithm (which seems to heavily weight green):
def greyScale(pixelTuple):
return tuple([round(0.299 * pixelTuple[0] + 0.587 * pixelTuple[1] + 0.114 * pixelTuple[2])] * 3)
My input image:
The greyscale image my algorithm produces:
The greyscale image which is 'correct':
When the greyscale images are compared online (highlighted red are the differences, using a fuzz of 10%):
Despite the variations in pixels highlighted above, the greyscale images above appear as nearly the exact same (at least, to me).
Also, regarding my first question, if anyone's interested, this site has done some analysis on different algorithms for conversions to greyscale and also has some custom algorithms.
EDIT:
In response to #Szulat's answer, my algorithm actually produces this image instead (ignore the bad cropping, the original image had three circles but I only needed the first one):
In case people are wondering what the reason for converting to greyscale is (as it seems that the algorithm depends on the purpose), I'm just making some simple photo editing tools in python so that I can have a mini-Photoshop and don't need to rely on the Internet to apply filters and effects.
Reason for Bounty: Different answers here are covering different things, which are all relevant and helpful. This makes it quite difficult to choose which answer to accept. I've started a bounty because I like a few answers listed here, but also because it'd be nice to have a single answer which covers everything I need for this question.
The images look pretty similar, but your eye can tell the difference, specially if you put one in place of the other:
For example, you can note that the flowers in the background look brighter in the averaging conversion.
It is not that there is anything intrinsically "bad" about averaging the three channels. The reason for that formula is that we do not perceive red, green and blue equally, so their contributions to the intensities in a grayscale image shouldn't be the same; since we perceive green more intensely, green pixels should look brighter on grayscale. However, as commented by Mark there is no unique perfect conversion to grayscale, since we see in color, and in any case everyone's vision is slightly different, so any formula will just try to make an approximation so pixel intensities feel "right" for most people.
The most obvious example:
Original
Desaturated in Gimp (Lightness mode - this is what your algorithm does)
Desaturated in Gimp (Luminosity mode - this is what our eyes do)
So, don't average RGB. Averaging RGB is simply wrong!
(Okay, you're right, averaging might be valid in some obscure applications, even though it has no physical or physiological meaning when RGB values are treated as color. By the way, the "regular" way of doing weighted averaging is also incorrect in a more subtle way because of gamma. sRGB should be first linearized and then the final result converted back to sRGB (which would be equivalent of retrieving the L component in the Lab color space))
You can use any conversion equation, scale, linearity. The one you found:
I = 0.299 R + 0.587 G + 0.114 B
is based on average human eye "average" primary color (R,G,B) perception sensitivity (at least for the time period and population/HW it was created on; bear in mind those standards were created before LED,TFT, etc. screens).
There are several problems you are fighting against:
our eyes are not the same
All humans do not perceive color the same way. There are major discrepancies between genders and smaller also between regions; even generation and age play a role. So even an average should be handled as "average".
We have different sensitivity to intensity of light across the visible spectrum. The most sensitive color is green (hence the highest weight on it). But the XYZ curve peaks can be at different wavelengths for different people (like me I got them shifted a bit causing difference in recognition of certain wavelengths like some shades of Aqua - some see them as green some as blue even if none of them have any color blindness disabilities or whatever).
monitors do not use the same wavelengths nor spectral dispersion
So if you take 2 different monitors, they might use slightly different wavelengths for R, G, B or even different widths of the spectral filter (just use a spectroscope and see). Yes they should be "normalized" by the HW but that is not the same as using normalized wavelengths. It is similar to problems using RGB vs. White Noise spectrum light sources.
monitor linearity
Humans do not see on a linear scale: we are usually logarithmic/exponential (depends how you look at it) so yes we can normalize that with HW (or even SW) but the problem is if we linearize for one human then means we damage it for another.
If you take all this together you can either use averages ... or special (and expensive) equipment to measure/normalize against some standard or against a calibrated person (depends on the industry).
But that is too much to handle in home conditions so leave all that for industry and use the weights for "average" like most of the world... Luckily our brain can handle it as you cannot see the difference unless you start comparing both images side by side or in an animation :). So I (would) do:
I = 0.299 R + 0.587 G + 0.114 B
R = I
G = I
B = I
There are many different methods for converting to greyscale, and they do give different results though the differences might be easier to see with different input colour images.
As we don't really see in greyscale, the "best" method is somewhat dependent on the application and somewhat in the eye of the beholder.
The alternative formula you refer to is based on the human eye being more sensitive to variations in green tones and therefore giving them a bigger weighting - similarly to a Bayer array in a camera where there are 2 green pixels for each red and blue one. Wiki - Bayer array
There are many formulas for the Luminance, depending on the R,G,B color primaries:
Rec.601/NTSC: Y = 0.299*R + 0.587*G + 0.114*B ,
Rec.709/EBU: Y = 0.213*R + 0.715*G + 0.072*B ,
Rec.2020/UHD: Y = 0.263*R + 0.678*G + 0.059*B .
This is all because our eyes are less sensitive to blue than to red than to green.
That being said, you are probably calculating Luma, not Luminance, so the formulas are all wrong anyway. For Constant-Luminance you must convert to linear-light
R = R' ^ 2.4 , G = G' ^ 2.4 , B = B' ^ 2.4 ,
apply the Luminance formula, and convert back to the gamma domain
Y' = Y ^ (1/2.4) .
Also, consider that converting a 3D color space to a 1D quantity loses 2/3 of the information, which can bite you in the next processing steps. Depending on the problem, sometimes a different formula is better, like V = MAX(R,G,B) (from HSV color space).
How do I know? I'm a follower and friend of Dr. Poynton.
The answers provided are enough, but I want to discuss a bit more on this topic in a different manner.
Since I learnt digital painting for interest, more often I use HSV.
It is much more controllable for using HSV during painting, but keep it short, the main point is the S: Saturation separating the concept of color from the light. And turning S to 0, is already the 'computer' grey scale of image.
from PIL import Image
import colorsys
def togrey(img):
if isinstance(img,Image.Image):
r,g,b = img.split()
R = []
G = []
B = []
for rd,gn,bl in zip(r.getdata(),g.getdata(),b.getdata()) :
h,s,v = colorsys.rgb_to_hsv(rd/255.,gn/255.,bl/255.)
s = 0
_r,_g,_b = colorsys.hsv_to_rgb(h,s,v)
R.append(int(_r*255.))
G.append(int(_g*255.))
B.append(int(_b*255.))
r.putdata(R)
g.putdata(G)
b.putdata(B)
return Image.merge('RGB',(r,g,b))
else:
return None
a = Image.open('../a.jpg')
b = togrey(a)
b.save('../b.jpg')
This method truly reserved the 'bright' of original color. However, without considering how human eye process the data.
In answer to your main question, there are disadvantages in using any single measure of grey. It depends on what you want from your image. For example, if you have colored text on white background, if you want to make the text stand out you can use the minimum of the r, g, b values as your measure. But if you have black text on a colored background, you can use the maximum of the values for the same result. In my software I offer the option of max, min or median value for the user to choose. The results on continuous tone images are also illuminating.
In response to comments asking for more details, the code for a pixel is below (without any defensive measures).
int Ind0[3] = {0, 1, 2}; //all equal
int Ind1[3] = {2, 1, 0}; // top, mid ,bot from mask...
int Ind2[3] = {1, 0, 2};
int Ind3[3] = {1, 2, 0};
int Ind4[3] = {0, 2, 1};
int Ind5[3] = {2, 0, 1};
int Ind6[3] = {0, 1, 2};
int Ind7[3] = {-1, -1, -1}; // not possible
int *Inds[8] = {Ind0, Ind1, Ind2, Ind3, Ind4, Ind5, Ind6, Ind7};
void grecolor(unsigned char *rgb, int bri, unsigned char *grey)
{ //pick out bot, mid or top according to bri flag
int r = rgb[0];
int g = rgb[1];
int b = rgb[2];
int mask = 0;
mask |= (r > g);
mask <<= 1;
mask |= (g > b);
mask <<= 1;
mask |= (b > r);
grey[0] = rgb[Inds[mask][2 - bri]]; // 2, 1, 0 give bot, mid, top
}

PIL: Is it possible to fill a mask?

Suppose I have a mask (a blank triangle - 1/255/etc in the triangle, 0 elsewhere) and I have the texture I'd like to place inside that triangle, but the texture is sequential instead of being formatted into an image format. For example, if the box containing the triangle/mask is 100 x 100, but the triangle itself only has 2500 pixels, I only have the information for the 2500 pixels instead of having the actual box.
So, I could fill the triangle manually, either doing each row or column at a time, but I was wondering if there was a method to do this instead.
Here's the code I used anyways:
def fill_mask(mask, data): #mask and data are ndarrays
mask = np.copy(mask).T
k = 0
for i in xrange(len(mask)):
for j in xrange(len(mask[i])):
if mask[i][j] == 1:
mask[i][j] = data[k]
k += 1
return mask.T
That one fills horizontally (line by line). To fill vertically, take away the .T in the first and last lines. It can probably be made shorter though I'm awful with that so I'll just leave it as it is. Any improvements to it are appreciated as well.
mask[mask==1] = data
It sounds like you are asking if you can fill/modify each pixel of the triangle using an array of values without also having to visit the pixels which you don't want to fill using only the image data.
If that's so, then the short answer is no.
Certainly it would be possible to create some optimization array that only referenced those pixels which were part of the triangle, but in order to create that array, you'd have to visit each pixel, so it would only be a savings if you have to visit the same set many times.
PIL probably provides some helpers to do blending that might be optimized, which would be better than trying to roll your own.
If on the other hand, you know the dimensions and position of the triangle in your mask, you could calculate the position of the pixels inside the triangle. For that you'll need to study your trigonometry.
If you don't know how to do this already, I'd say stick with visiting each pixel, it will be a good learning experience. If you need to improve performance later, the path will be clearer once you understand the basic concepts.

Categories

Resources