Problems with using a rough greyscale algorithm? - python
So I'm designing a few programs for editing photos in python using PIL and one of them was converting an image to greyscale (I'm avoiding the use of any functions from PIL).
The algorithm I've employed is simple: for each pixel (colour-depth is 24), I've calculated the average of the R, G and B values and set the RGB values to this average.
My program was producing greyscale images which seemed accurate, but I was wondering if I'd employed the correct algorithm, and I came across this answer to a question, where it seems that the 'correct' algorithm is to calculate 0.299 R + 0.587 G + 0.114 B.
I decided to compare my program to this algorithm. I generated a greyscale image using my program and another one (using the same input) from a website online (the top Google result for 'image to grayscale'.
To my naked eye, it seemed that they were exactly the same, and if there was any variation, I couldn't see it. However, I decided to use this website (top Google result for 'compare two images online') to compare my greyscale images. It turned out that deep in the pixels, they had slight variations, but none which were perceivable to the human eye at a first glance (differences can be spotted, but usually only when the images are laid upon each other or switched between within milliseconds).
My Questions (the first is the main question):
Are there any disadvantages to using my 'rough' greyscale algorithm?
Does anyone have any input images where my greyscale algorithm would produce a visibly different image to the one that would be 'correct' ?
Are there any colours/RBG combinations for which my algorithm won't work as well?
My key piece of code (if needed):
def greyScale(pixelTuple):
return tuple([round(sum(pixelTuple) / 3)] * 3)
The 'correct' algorithm (which seems to heavily weight green):
def greyScale(pixelTuple):
return tuple([round(0.299 * pixelTuple[0] + 0.587 * pixelTuple[1] + 0.114 * pixelTuple[2])] * 3)
My input image:
The greyscale image my algorithm produces:
The greyscale image which is 'correct':
When the greyscale images are compared online (highlighted red are the differences, using a fuzz of 10%):
Despite the variations in pixels highlighted above, the greyscale images above appear as nearly the exact same (at least, to me).
Also, regarding my first question, if anyone's interested, this site has done some analysis on different algorithms for conversions to greyscale and also has some custom algorithms.
EDIT:
In response to #Szulat's answer, my algorithm actually produces this image instead (ignore the bad cropping, the original image had three circles but I only needed the first one):
In case people are wondering what the reason for converting to greyscale is (as it seems that the algorithm depends on the purpose), I'm just making some simple photo editing tools in python so that I can have a mini-Photoshop and don't need to rely on the Internet to apply filters and effects.
Reason for Bounty: Different answers here are covering different things, which are all relevant and helpful. This makes it quite difficult to choose which answer to accept. I've started a bounty because I like a few answers listed here, but also because it'd be nice to have a single answer which covers everything I need for this question.
The images look pretty similar, but your eye can tell the difference, specially if you put one in place of the other:
For example, you can note that the flowers in the background look brighter in the averaging conversion.
It is not that there is anything intrinsically "bad" about averaging the three channels. The reason for that formula is that we do not perceive red, green and blue equally, so their contributions to the intensities in a grayscale image shouldn't be the same; since we perceive green more intensely, green pixels should look brighter on grayscale. However, as commented by Mark there is no unique perfect conversion to grayscale, since we see in color, and in any case everyone's vision is slightly different, so any formula will just try to make an approximation so pixel intensities feel "right" for most people.
The most obvious example:
Original
Desaturated in Gimp (Lightness mode - this is what your algorithm does)
Desaturated in Gimp (Luminosity mode - this is what our eyes do)
So, don't average RGB. Averaging RGB is simply wrong!
(Okay, you're right, averaging might be valid in some obscure applications, even though it has no physical or physiological meaning when RGB values are treated as color. By the way, the "regular" way of doing weighted averaging is also incorrect in a more subtle way because of gamma. sRGB should be first linearized and then the final result converted back to sRGB (which would be equivalent of retrieving the L component in the Lab color space))
You can use any conversion equation, scale, linearity. The one you found:
I = 0.299 R + 0.587 G + 0.114 B
is based on average human eye "average" primary color (R,G,B) perception sensitivity (at least for the time period and population/HW it was created on; bear in mind those standards were created before LED,TFT, etc. screens).
There are several problems you are fighting against:
our eyes are not the same
All humans do not perceive color the same way. There are major discrepancies between genders and smaller also between regions; even generation and age play a role. So even an average should be handled as "average".
We have different sensitivity to intensity of light across the visible spectrum. The most sensitive color is green (hence the highest weight on it). But the XYZ curve peaks can be at different wavelengths for different people (like me I got them shifted a bit causing difference in recognition of certain wavelengths like some shades of Aqua - some see them as green some as blue even if none of them have any color blindness disabilities or whatever).
monitors do not use the same wavelengths nor spectral dispersion
So if you take 2 different monitors, they might use slightly different wavelengths for R, G, B or even different widths of the spectral filter (just use a spectroscope and see). Yes they should be "normalized" by the HW but that is not the same as using normalized wavelengths. It is similar to problems using RGB vs. White Noise spectrum light sources.
monitor linearity
Humans do not see on a linear scale: we are usually logarithmic/exponential (depends how you look at it) so yes we can normalize that with HW (or even SW) but the problem is if we linearize for one human then means we damage it for another.
If you take all this together you can either use averages ... or special (and expensive) equipment to measure/normalize against some standard or against a calibrated person (depends on the industry).
But that is too much to handle in home conditions so leave all that for industry and use the weights for "average" like most of the world... Luckily our brain can handle it as you cannot see the difference unless you start comparing both images side by side or in an animation :). So I (would) do:
I = 0.299 R + 0.587 G + 0.114 B
R = I
G = I
B = I
There are many different methods for converting to greyscale, and they do give different results though the differences might be easier to see with different input colour images.
As we don't really see in greyscale, the "best" method is somewhat dependent on the application and somewhat in the eye of the beholder.
The alternative formula you refer to is based on the human eye being more sensitive to variations in green tones and therefore giving them a bigger weighting - similarly to a Bayer array in a camera where there are 2 green pixels for each red and blue one. Wiki - Bayer array
There are many formulas for the Luminance, depending on the R,G,B color primaries:
Rec.601/NTSC: Y = 0.299*R + 0.587*G + 0.114*B ,
Rec.709/EBU: Y = 0.213*R + 0.715*G + 0.072*B ,
Rec.2020/UHD: Y = 0.263*R + 0.678*G + 0.059*B .
This is all because our eyes are less sensitive to blue than to red than to green.
That being said, you are probably calculating Luma, not Luminance, so the formulas are all wrong anyway. For Constant-Luminance you must convert to linear-light
R = R' ^ 2.4 , G = G' ^ 2.4 , B = B' ^ 2.4 ,
apply the Luminance formula, and convert back to the gamma domain
Y' = Y ^ (1/2.4) .
Also, consider that converting a 3D color space to a 1D quantity loses 2/3 of the information, which can bite you in the next processing steps. Depending on the problem, sometimes a different formula is better, like V = MAX(R,G,B) (from HSV color space).
How do I know? I'm a follower and friend of Dr. Poynton.
The answers provided are enough, but I want to discuss a bit more on this topic in a different manner.
Since I learnt digital painting for interest, more often I use HSV.
It is much more controllable for using HSV during painting, but keep it short, the main point is the S: Saturation separating the concept of color from the light. And turning S to 0, is already the 'computer' grey scale of image.
from PIL import Image
import colorsys
def togrey(img):
if isinstance(img,Image.Image):
r,g,b = img.split()
R = []
G = []
B = []
for rd,gn,bl in zip(r.getdata(),g.getdata(),b.getdata()) :
h,s,v = colorsys.rgb_to_hsv(rd/255.,gn/255.,bl/255.)
s = 0
_r,_g,_b = colorsys.hsv_to_rgb(h,s,v)
R.append(int(_r*255.))
G.append(int(_g*255.))
B.append(int(_b*255.))
r.putdata(R)
g.putdata(G)
b.putdata(B)
return Image.merge('RGB',(r,g,b))
else:
return None
a = Image.open('../a.jpg')
b = togrey(a)
b.save('../b.jpg')
This method truly reserved the 'bright' of original color. However, without considering how human eye process the data.
In answer to your main question, there are disadvantages in using any single measure of grey. It depends on what you want from your image. For example, if you have colored text on white background, if you want to make the text stand out you can use the minimum of the r, g, b values as your measure. But if you have black text on a colored background, you can use the maximum of the values for the same result. In my software I offer the option of max, min or median value for the user to choose. The results on continuous tone images are also illuminating.
In response to comments asking for more details, the code for a pixel is below (without any defensive measures).
int Ind0[3] = {0, 1, 2}; //all equal
int Ind1[3] = {2, 1, 0}; // top, mid ,bot from mask...
int Ind2[3] = {1, 0, 2};
int Ind3[3] = {1, 2, 0};
int Ind4[3] = {0, 2, 1};
int Ind5[3] = {2, 0, 1};
int Ind6[3] = {0, 1, 2};
int Ind7[3] = {-1, -1, -1}; // not possible
int *Inds[8] = {Ind0, Ind1, Ind2, Ind3, Ind4, Ind5, Ind6, Ind7};
void grecolor(unsigned char *rgb, int bri, unsigned char *grey)
{ //pick out bot, mid or top according to bri flag
int r = rgb[0];
int g = rgb[1];
int b = rgb[2];
int mask = 0;
mask |= (r > g);
mask <<= 1;
mask |= (g > b);
mask <<= 1;
mask |= (b > r);
grey[0] = rgb[Inds[mask][2 - bri]]; // 2, 1, 0 give bot, mid, top
}
Related
How to calculate SNR for dynamic range?
I am an absolute beginner when it comes to coding, so I don't really understand how to go about this. I know this question has been asked before, but I couldn't figure it out after checking those out. I hope someone can help me. Now my ultimate goal is to find out what dynamic range the arri amira actually uses. That is part of our project at uni. (specs say 14+) I have shots ranging from 0,5 to 22 f-stops. I also have a shot with the lens cap on (darkframe, which is not 100% black, because there is noise). All frames are saved as TIFF (that's a requirement in our project). Now I would first have to calculate the p(noise) of the darkframe, right? SNR = P(signal)/p(noise). And then use that value as p(noise) for the other images? I also don't know how to calculate p(noise). My idea is to compare the darkframe image to a darkframe that is actually completely black. Maybe as an array filled with zeros? I found code that included the calculation of mse values. Would that work with my idea? What I tried was this: realDarkframe= np.zeros( (1080,1920,3) ) darkframeOfAmira= cv2.imread("image.tif",1) def snr(img1, img2): mse = np.mean( (img1 - img2) ** 2) return mse d=snr(realDarkframe,darkframeOfAmira) print(d) that returns the value of 565.7092896412037. Now I don't know what that means. And I also don't know how to calculate p(signal) even if this was correct and the number 565.7092896412037 actually equals p(noise).
The dynamic range of a measurement device (i.e. camera) describes the ratio between the maximum and minimum measurable light intensities. Regarding this, two issues emerge in your implementation. The dark Frame image that determines the lower bound of the dynamic range should not be plain zeros. This would imply an infinite dynamic range (dividing by zero in the SNR formula). Instead you need to capture a dark shot as you suggest "I also have a shot with the lens cap on (darkframe, which is not 100% black, because there is noise)." and use this image as a reference. You also need an image captured at maximum intensity (total white - "burnt") The signal-to-noise computation formula in the code segment is the mean squared error. You should use the formula as described in wikipedia snr = P(signal)/p(noise) = mean(signal^2) / mean(noise^2). This code should work: darkframeOfAmira = cv2.imread("dark_image.tif",1) brightframeOfAmira= cv2.imread("image.tif",1) def snr(img1, img2): return np.mean(img1 ** 2) / (np.mean(img2 ** 2) + 1e-12) d=snr(brightframeOfAmira,darkframeOfAmira) print(d)
How to speed up dilating a 3D region in a boolean numpy array?
I have a 3D numpy array boolean mask which has been segmented from a MRI brain volume. Brain voxels = True. Everything else = False. What I would like to do is to enlarge this mask such that it would encompass the surrounding tissues in the MRI volume, not just the segmented organ, perhaps a 10mm rind of non-brain all around the brain. I tried using a 2D dilation using the skimage.morphology.dilation with a diamond filter. While this is nice and fast for a single image, I need to repeat this in multiple slices through the volume and in at least 2 planes to come even close to uniformly dilating the 3D mask. I largely took my code from here: https://scipy-lectures.org/packages/scikit-image/index.html typical volume shape = 512, 512, 270 # 1st pass in axial plane (x, y, z) = np.shape(3dMask) for slice_number in range(z): image_slice = 3dMask[:, :, slice_number] 3dMask[:, :, slice_number] = morphology.binary_dilation(image_slice, morphology.diamond(30)) # repeat in coronal plane... This works very nicely with the desired effect in each slice, but is very slow for 3D. I can speed things up by only dilating those slices containing at least one 'True', but that inevitably leaves 100+ slices in each plane. Still slow. In the hope that the python side looping is slowing everything down, I have looked for a 3D equivalent single function in numpy and skimage but have found nothing that I can recognise as useful. I toyed with the idea of finding the geometric centre and simply zooming the volume by 5%, but there will necessarily be holes in the mask (the space in-between the 2 halves of the brain) which will no longer match up with the MRI volume and so is of no use... I assume this means that I am doing it wrong as I am new to both numpy and skimage. Is there a fast way to do this? Perhaps a 3D alternative to the 2D skimage dilation?
This question actually has a bit of subtlety, which I'll try to unpack. The first thing to note is that most scikit-image functions actually work totally fine in 3D, including binary_dilation! So you should in an ideal world be able to do: dilated = morphology.binary_dilation( mask3d, morphology.ball(radius=30) ) I say in an ideal world because that crashes on my machine, probably because this longstanding SciPy bug prevents SciPy filters (which scikit-image uses under the hood) from working with large neighbourhood sizes. For square- and diamond-shaped neighbourhoods, though, you do have a workaround: dilating once with a diamond of radius 30 is actually the same as dilating 30 times with a diamond of radius 1! You can do this manually in a for-loop, or you can use scipy.ndimage.binary_dilation using the iterations keyword argument. (See this issue for some discussion around this.) from scipy import ndimage as ndi # make a little 3D diamond: diamond = ndi.generate_binary_structure(rank=3, connectivity=1) # dilate 30x with it dilated = ndi.binary_dilation(mask3d, diamond, iterations=30) You can actually get pretty far with this strategy. For example, if your dataset doesn't have the same resolution in x, y, and z, maybe you want to dilate more, say twice as much, along x and y. You can do this in two steps: dilated1 = ndi.binary_dilation(mask3d, diamond, iterations=15) flat = np.copy(diamond) flat[:, :, 0] = 0 flat[:, :, -1] = 0 dilated2 = ndi.binary_dilation(mask3d, flat, iterations=15) Finally, note that binary dilation is equivalent to a (nonbinary) convolution followed by thresholding above 0. So I found that this also works: from scipy import signal b = morphology.ball(radius=30) dilated = signal.fftconvolve(mask3d, b, mode='same') > 0 However, for this image size and on my machine, this was slower than the iterated dilation. But, it's worth keeping in mind because the performance will be different for different datasets. As a side note, I recommend posting complete, working code in your StackOverflow questions, as explained here. In your case, np.shape(3dMask) is a syntax error since 3dMask is not a valid Python identifier! =) I hope this helps!
Count objects in a binarized image without scipy
I'm trying to count the number of objects in an image, which I already binarized, however, I'm not allowed to use scipy or numpy packages, therefore I can’t use scipy.ndimage.label, any ideas? My attempt counts over 80 objects, but there are only 13 (counted with scipy) def label(img): n=1 for i in range(h): for j in range(c): if img[i][j]==255: if img[i-1][j]!=0 and img[i-1][j]!=255: img[i][j]=img[i-1][j] elif img[i+1][j]!=0 and img[i+1][j]!=255: img[i][j]=img[i-1][j] elif img[i][j+1]!=0 and img[i][j+1]!=255: img[i][j]=img[i][j+1] elif img[i][j-1]!=0 and img[i][j-1]!=255: img[i][j]=img[i][j-1] else: img[i][j]=n if img[i-1][j]!=0: img[i-1][j]=img[i][j] if img[i+1][j]!=0: img[i+1][j]=img[i][j] if img[i][j+1]!=0: img[i][j+1]=img[i][j] if img[i][j-1]!=0: img[i][j-1]=img[i][j] n+=1 elif img[i][j]!=0: if img[i-1][j]!=0: img[i-1][j]=img[i][j] if img[i+1][j]!=0: img[i+1][j]=img[i][j] if img[i][j+1]!=0: img[i][j+1]=img[i][j] if img[i][j-1]!=0: img[i][j-1]=img[i][j] return img,n
You will want something like https://codereview.stackexchange.com/questions/148897/floodfill-algorithm, which implements https://en.wikipedia.org/wiki/Flood_fill. It's a good fit for numba or cython if that's feasible for you. Perhaps you can use OpenCV, which already offers floodfill: https://docs.opencv.org/3.4/d7/d1b/group__imgproc__misc.html#gaf1f55a048f8a45bc3383586e80b1f0d0. Suppose you have binarized so background is color one and objects are color zero. Set c = 2, scan for a zero pixel, and floodfill it with color c. Now increment c, scan for zero, fill it, lather, rinse, repeat. You will wind up with each object bearing a distinct color so you can use it as an isolation mask. Distinct colors are very helpful during debugging, but of course three colors suffices (or even two) if you just want a count. The final bitmap will be uniformly the background color in the two-color case. Using a 4-element Von Neumann neighborhood versus an 8-element neighborhood will make a big difference in the final result. It's easier for paint to "leak" through diagonal connectivity in the 8-element setting. Doing edge detection and thickening can help to reduce unwanted color leakage.
Rudimentary Computer Vision Techniques for Python Bot.
After completion several chapters in computer vision books I decided to apply those methods to create some primitive bot for a game. I chose Fling that has almost no dynamics and all I needed to do was to find balls. Balls may have 5 different colors and also they can be directed to any of 4 directions (depending on eyes' location). I cropped each block in the field such that I can just check each block whether it contains a ball or not. My problem is that I'm not able to find balls correctly. My first attempt was following. I sum RGB colors for each ball and get [R, G, B] array. Then I sum RGB colors for each block in the field. If block's array has a similar [R, G, B] as ball's array I suggest that this block has a ball. The problem is it's hard to find good value for 'similarity'. Even different empty blocks vary in such sums significantly. Second, I tried to use openCV module that has matchTemplate function. This function matches image with another source image and along with minMaxLoc function returns a value maxLoc. If maxLoc is close to 1 then the image is probably in source image. I made all possible variations of balls (20 overall), and passed them with the entire field. This function worked well but unfortunately it sometimes misses some balls in the field or assigns two different types of balls (say green and yellow) for one ball. I tried to improve the process by matching balls not with the entire field but with each block (this method has advantage that it checks each block and should detect correct number of balls in the field, when matching with entire field only gives one location for each color of ball. If there are two balls of the same color matchTemplate loses information about 2nd ball) . Surprisingly it still has false negatives\positives. Probably there is much easier way to solve this problem (maybe a library that I don't know yet) but for now I can't find one. Any suggestions are welcomed.
The balls seem pretty distinct in terms of colour. The problems you initially described seem to be related to some of the finer, random detail present in the image - especially in the background and in the different shading/poses of the ball. On this basis, I would say you could simplify the task significantly by applying a set of pre-processing steps to "collapse" the range of colours in the image. There are any number of more principled ways to achieving accurate colour segmentation (which is what, more formally, you want to achieve) - but taking a more pragmatic view, here are a few quick'n'dirty hacks. So, for example, we can initially smooth the image to reduce higher frequency components... Then, convert to a normalised RGB representation... Before, finally posterizing it with the mean shift filtering step... Here is the code in Python, using the OpenCV bindings, that does all this in order: import cv # get orginal image orig = cv.LoadImage('fling.png') # show original cv.ShowImage("orig", orig) # blur a bit to remove higher frequency variation cv.Smooth(orig,orig,cv.CV_GAUSSIAN,5,5) # normalise RGB norm = cv.CreateImage(cv.GetSize(orig), 8, 3) red = cv.CreateImage(cv.GetSize(orig), 8, 1) grn = cv.CreateImage(cv.GetSize(orig), 8, 1) blu = cv.CreateImage(cv.GetSize(orig), 8, 1) total = cv.CreateImage(cv.GetSize(orig), 8, 1) cv.Split(orig,red,grn,blu,None) cv.Add(red,grn,total) cv.Add(blu,total,total) cv.Div(red,total,red,255.0) cv.Div(grn,total,grn,255.0) cv.Div(blu,total,blu,255.0) cv.Merge(red,grn,blu,None,norm) cv.ShowImage("norm", norm) # posterize simply with mean shift filtering post = cv.CreateImage(cv.GetSize(orig), 8, 3) cv.PyrMeanShiftFiltering(norm,post,20,30) cv.ShowImage("post", post)
Your task is simpler in several respects than the ones the general computer vision algorithms you'll find were designed for: you know exactly what to look for and you know exactly where to look for it. As such I think involving an external library is an unnecessary complication, unless you're already familiar with it and can use it effectively as a tool to solve your own problem. In this post I will only use PIL. First, distinguish the task into two simpler tasks: Given a tile, determine whether there's a ball there. Given a tile where we're pretty sure that there's a ball, identify the colour of the ball. The second task should be simple and I won't spend time on it here. Basically, sample some pixels where the ball's main colour will be visible and compare the colours you find to the known ball colours. So let's look at the first task. First off, note that the balls don't extend to the edge of the tiles. Thus you can find a fairly representative sample of the background of a tile, whether or not there's a ball there, by sampling the pixels along the edge of the tile. A simple way to proceed is to compare every pixel in a tile with this sample of the tile background, and to obtain some sort of measure of whether it's generally similar (no ball) or dissimilar (ball). The following is one way to do this. The basic approach used here is to calculate the mean and the standard deviation of the background pixels -- separately for the red, green, and blue channels. For every pixel, we then calculate the number of standard deviations we are from the mean in every channel. We take this value for the most dissimilar channel as our measure of dissimilarity. import Image import math def fetch_pixels(col, row): img = Image.open( "image.png" ) img = img.crop( (col*32,row*32,(col+1)*32,(row+1)*32) ) return img.load() def border_pixels( a ): rv = [ a[x,y] for x in range(32) for y in (0,31) ] rv.extend( [ a[x,y] for x in (0,31) for y in range(1,31) ] ) return rv def mean_and_stddev( xs ): mean = float(sum( xs )) / len(xs) dev = math.sqrt( float(sum( [ (x-mean)**2 for x in xs ] )) / len(xs) ) return mean, dev def calculate_deviations(cols = 7, rows = 8): outimg = Image.new( "L", (cols*32,rows*32) ) pixels = outimg.load() for col in range(cols): for row in range(rows): rv = calculate_deviations_for( col, row, pixels ) print rv outimg.save( "image_output.png" ) def calculate_deviations_for( col, row, opixels ): a = fetch_pixels( col, row ) border = border_pixels( a ) bru, brd = mean_and_stddev( map( lambda x : x[0], border ) ) bgu, bgd = mean_and_stddev( map( lambda x : x[1], border ) ) bbu, bbd = mean_and_stddev( map( lambda x : x[2], border ) ) rv = [] for y in range(32): for x in range(32): r, g, b = a[x,y] dr = (bru-r) / brd dg = (bgu-g) / bgd db = (bbu-b) / bbd t = max(abs(dr), abs(dg), abs(db)) opixel = 0 limit, span = 2.5, 8.0 if t > limit: v = min(1.0, (t - limit) / span) print t,v opixel = 127 + int( 128 * v ) opixels[col*32+x,row*32+y] = opixel rv.append( t ) return (sum(rv) / float(len(rv))) A visualization of the result is here: Note that most of the non-ball pixels are pure black. It should now be possible to determine whether a ball is present or not by simply counting the black pixels. (Or more reliably: count the size of the largest single blob of non-black pixels.) Now, this is a very ad-hoc method and I certainly don't make any claim that it's the best method. The "limit" value was determined by experimentation -- essentially, by trial and error. It's included here to illustrate the sort of method I think you should be exploring, and to give you a starting point to tweak from. (If you want a place to start experimenting, you could try to make it give a better result for the top purple ball. Can you think of weaknesses in the approach above that might make it give a result like that? Always keep in mind, however, that you don't need a perfect-looking result, just one that's good enough. The final answer you want is "ball" or "no ball", and you just want to be able to answer that reliably.) Note that: You need to make sure you take the screengrab when the balls have finished rolling and are lying still in the center of their tiles. This simplifies the problem immensely. The game's background affects the problem -- if there are ocean-themed or desert-themed levels coming up, you will need to test and possibly tweak the recognizer to make sure it still reliably works. Special effects and/or GUI elements that cover the playing field will complicate the problem. (E.g. consider if the game has a 'cloud' or 'smoke' effect that sometimes floats over the playing field.) You may want to tweak the recognizer to be able to return "no result" if it's not sure -- then you can try another screengrab later. You may want to take several screengrabs and average the results. I have assumed that there are only balls and non-balls. If later levels have other kinds of objects, you will have to experiment more to find out how to best recognize those. I haven't used the 'reference picture' approach. However, if you have an image containing all the objects in the game and you can exactly align the pixels with your tiles, that's likely going to be the most reliable approach. Instead of comparing the foreground to the sampled background, compare the foreground to a set of known foreground images.
recognize very low level picture and color it in plain black in Python
I'm working on an automated "bug count" algorithm and I'm wondering how I can recognize a very low level of contrast (no additional bugs on pad - assume camera is at the same position and ligh conditions very similar) if I subtract two pics from each other, i won't get a clean black image due to the fact the light conditions will be minimally different. Now I apply a gaussian filter and mahotas.thresholding.otsu() to draw out where there is a bug (put a white blob over top) and then I use ndimage.label() to count them. However, if my input image "cropbugs.jpg" is very dark grey, I get some random output after applying otsu() and my label() function returns a random bug count number. How do I recognize if my image is very dark grey or low in contrast and just set bug count to 0? Thanks My code so far looks like: bug_img = mahotas.imread('cropbugs.jpg') pylab.gray() bug_img = ndimage.gaussian_filter(bug_img, 6)#8 T = mahotas.thresholding.otsu(bug_img) pylab.imshow(bug_img > T) labeled,nr_objects = ndimage.label(bug_img > T) print "Bug Count: "+str(nr_objects) pylab.imshow(labeled) pylab.jet() pylab.show()
I can see multiple ways of approaching this problem (This was the suggestion in the comments). Define some fixed rule of the type based on the mean value, the standard deviation, the maximum value or some combination. You will end up with a test like one of the following: bug_img.mean() + 2*bug_img.std() < THRESHOLD bug_img.std() < THRESHOLD bug_img.max() < THRESHOLD sorted(bug_img.ravel())[-10] < THRESHOLD Use a classification system based on texture features (see my answer to a related question earlier). Go ahead and use label as if everything was good and then post-filter the results. For example: labeled,nr_objects = mahotas.label(bug_img > T) sizes = mahotas.labeled.labeled_size(labeled) good_objects = (MIN_BUG_SIZE <= sizes) & (sizes <= MAX_BUG_SIZE) print np.sum(good_objects)