Just for educational purposes, I'm working on making a letter-and-symbol recognition program in Python, and I've run into some trouble with region separation. I made a working connected-component labeling function using the information here:
CCL - Wikipedia
But I need one with the accuracy of an 8-connectivity, which it mentions but doesn't provide info for. It has a diagram on the right side that shows that to check for it, the Northwest and Northeast pixels need to be included, but I have no idea how and I can't find any information on it. I'm not asking for code, but can anybody familiar with this method describe how to incorporate those?
8-connectivity isn't more accurate, and in practice it's suitable only for certain applications. It's more common to use 4-connectivity, especially for "natural" images rather than images created in the lab for testing. An 8-connected region will include checkerboard patterns and zigzag noise. A 4-connected foreground yields an 8-connected background.
You can dig into the source for the OpenCV function cvFindContours(). There are OpenCV bindings to Python.
http://opencv.willowgarage.com/documentation/python/structural_analysis_and_shape_descriptors.html
http://opencv.willowgarage.com/wiki/PythonInterface
I would recommend first implementing a 4-connected algorithm. You can find pseudocode in books like the following:
Machine Vision: Theory, Algorithms, Practicalities by E. R. Davies
In the 3rd edition, see section 6.3, "Object Labeling and Counting"
Digital Image Processing by Gonzalez and Woods
See section 9.5.3 "Extraction of Connected Components"
The presentation is less clear, but this is a standard all-in-one textbook for image processing. The section on thresholding for binarization is good. An international edition costs about $35.
Older textbooks may have simple, straightforward descriptions. Used copies of
Computer Vision by Ballard and Brown are quite cheap. In that book, Algorithm 5.1 is called Blob Coloring.
My favorite quick description can be found in the section "Region Labeling Algorithm" of Handbook of Image and Video Processing edited by Al Bovik. Conveniently, pages 44 - 45 are available online in Google Books:
http://books.google.com/books?id=UM_GCfJe88sC&q=region+labeling+algorithm#v=snippet&q=region%20labeling%20algorithm&f=false
For OCR it's common to look for dark connected regions (blobs) on a light background. Our binarized image will be a black foreground (0) on a white background (1) in a 1-bit image.
For a 4-connected algorithm you'll use structure elements like the ones shown below (which you'll also see in the Bovik book). Once you've tinkered with 4-connectivity, the extension to 8-connectivity should be obvious.
We scan each row of pixels in the image from left to right, and all rows from top to bottom. For any pixel (x,y), its left neighbor (x - 1, y) and top neighbor (x, y - 1) have already been scanned, so we can check whether a region number has already been assigned to one or both of those neighbors. For example, if pixel (x, y-1) is labeled region 8, and if (x,y) is also a foreground pixel, then we assign region 8 to (x,y). If pixel (x,y) is a foreground pixel but the left and top neighbors are background pixels, we assign a new region number to (x,y).
I recommend the Bovik reference, but here's a quick overview of the algorithm.
Initialize a region number contour (e.g. "region = 0")
Initialize a "region equivalency" data structure for later processing.
Create a black and white image using a binarization threshold.
Scan each pixel in the image from top to bottom, left to right.
Assign region 0 to any white background (1) pixel.
For any black foreground pixel (x,y) test the following conditions:
If top and left pixels are foreground, use the region number for (x-1, y) as the region number for (x,y), and track the equivalency of the left and top region numbers.
If only left neighbor (x - 1,y) is a foreground pixel, use its region number for (x,y)
If only top neighbor (x, y - 1) is a foreground pixel, use its region number for (x,y)
If left and top neighbors are background pixels, increment the region number and assign this new region number to (x,y).
After completing this processing for the entire image, analyze the equivalency matrix and reduce each collection of equivalent regions to a single region.
The reduction of equivalencies is the tricky part. In the image below, regions have been correctly labeled according to the algorithm. The image shows a different color for each region number. The three touching regions must be reduced to one connected region.
Your code should scan the equivalency data structure to reassign 2 (red) and 3 (dark blue) to the lowest-numbered region, which is 1 (yellow). Once the region number reassignment is complete, region labeling is complete.
There are one-pass algorithms that avoid the need for an equivalency check altogether, though such algorithms are a bit harder to implement. I would recommend first implementing the traditional 4-connected algorithm, solving its problems, and then introducing an option to use 8-connectivity instead. (This option is common in image processing libraries.) Once you have 4-connected and 8-connected region labeling working you'll have a good algorithm that will find many uses. In searching for academic papers on the subject, check for "region labeling," "blobs," "contours," and "connectivity."
For grayscale algorithms that need to be binarized, your threshold algorithm will likely become a weak point in your chain of algorithms. For help with thresholding, get a copy of the Gonzalez and Woods book. For OCR, check out the book Character Recognition Systems by Cheriet, Karma, Liu, and Suen.
I propose this implementation of 8-cclabeling, posted on Github.
Related
My goal is to draw a rectangle border around the face by removing the neck area connected to the whole face area. All positive values here represent skin color pixels. Here I have so far filtered out the binary image using OpenCV and python. Code so far skinid.py
Below is the test image.
Noise removals have also been applied to this binary image
Up to this point, I followed this paper Face segmentation using skin-color map in videophone applications. And for the most of it, I used custom functions rather than using built-in OpenCV functions because I kind of wanted to do it from scratch. (although some erosion, opening, closing were used to tune it up)
I want to know a way to split the neck from the whole face area and remove it like this,
as I am quite new to the whole image processing area.
Perform a distance transform (built into opencv or you could write by hand its a pretty fun and easy one to write using the erode function iteratively, and adding the result into another matrix each round, lol slow but conceptually easy). On the binary image you presented above, the highest value in a distance transform (and tbh I think pretty generalized across any mug shots) will be the center of the face. So that pixel is the center of your box, but also that value (value of that pixel after the distance transform) will give you a pretty solid approx face size (since it is going to be the pixel distance from the center of the face to the horizontal edges of the face). Depending on what you are after, you may just be able to multiply that distance by say 1.5 or so (figure out standard face width to height ratio and such to choose your best multiplier), set that as your circle radius (or half side width for a box) and call it a day. Comment if you need anything clarified as I am pretty confident in this answer and would be happy to write up some quick code (in c++ opencv) if you need/ it would help.
(alt idea). You could tweak your color filter a bit to reject darker areas (this will at least in the image presented) create a nice separation between your face and neck due to the shadowing of the chin. (you may have to dial back your dilate/ closing op tho)
I'm hoping someone here can help me go about trying to solve this problem. I'm trying to figure out a good way to use a real time video feed to run a python 0script when a pressure gauge that looks like this reaches a certain variable pressure.
I'm pretty confident that this shouldn't be too difficult as the gauge will be mounted in a fixed position in the image frame, but am new to this sort of stuff and really don't know where to start.
Any suggestions? Thanks in advance!
This problem could be much tougher than you think. Typically, people use OpenCV for image processing/pattern recognition problems. There are many excellent tutorials for Python available in the documentation
At it's core, your problem is the figure out the angle of rotation of the needle. Then from there, you can calculate what the pressure is. If you're lucky, you can find the needle using Feature Detection + Homography and then calculate the angle. But I have my doubts that this kind of image leads itself to feature detection.
Template Matching will be able to find the needle, but it's very sensitive to rotation, so you'd have to generate a large number of templates with different needle angles and then go through all of your templates every time you want a pressure reading.
It may well be faster, cheaper, and easier to use an electronic pressure gauge for reading pressure. There are any number of ways you can get the output of your electronic gauge into Python. Some gauges can be connected to the network. Some can communicate as a serial port with RS232/422 or USB.
If you can find a gauge with a digital display, it should be more straighforward to do some kind of OCR/Template Matching to read the display from a camera.
Edit:
#jeremy arsenault's comment prompted me to think of another solution. If you are focused mostly on detecting when the needle crosses a threshold, you could look for the leading edge of the dial. And you could look more towards the center of the dial, where there are no indicators.
This kind of image really lends itself to easy conversion to Thresholding to black and white. The dial is black and the background (at least towards the center) is white. After you do some Morphology and Thresholding of your image, you could look for black pixels of the dial. No template matching required. Pictures below illustrate some of the ideas. The first picture is your image thresholded to black and white. The starting image was very clean, but if you're grabbing from a camera, you'll probably have to do some Blurring, Morphology before thresholding to get no stray black pixels where you don't want them. The second image has a superimposed circle where you can look for black pixels. You can find the leading edge and trailing edge and average to get a fairly precise value of the center.
Some more implementation details/pseudocode. For computer images, origin is usually top left corner and +y goes down. Also, I've defined increasing angle theta with increasing pressure. Parametric equation for circle is
x = x0 - r*cos(theta + theta0)
y = y0 - r*sin(theta + theta0)
theta0 is the offset angle that gives zero pressure (around -45 degrees for gauge pictured. The brute force way to look for the needle would be to start sweeping the angle from zero and increasing until you find a black pixel at x,y. Then keep increasing until you find a white pixel at x,y. The average of the two angles gives you the center of the needle and you should be able to calculate pressure.
To detect a car trailer its very hard. There're a lot of car trailers that has the same license plate as the car itself. To detect if its a trailer I need to search in a area around the license plate. I already making a detector for the license plate with Viola and Jones. Only to detect where the triangle is you don't see it very clearly on the image. The images coming from section control so a lot of different thresholds in the day time.
For the privacy I have to delete the license plate details
So my question are there special ways of image processing that would be help full. I thought about:
- canny
- adapting threshold
- image Gradients
But big difficulty is that the application needs to be real-time.
The way I see it, you will need edge-detection (canny) and image correlation (finding similar shapes; recognition of object in the image).
Your two basic shapes (patterns to seek) would consist of the line-image of license plate, the base line of the bumper, edges of the lights, and respectively the triangles for one and no triangles in the other. The presence and absence of the triangles should be the only difference between the two images to be sought.
First, you process the image through Canny or Sobel or some other edge-detect to get the edges. Then you correlate it with the two patterns. The correlation function should produce "quality of match" value - how well the shape found in the image matches the pattern. If the one with triangles matches better, it's the trailer.
Don't try to detect just the triangles. They are too dim to produce a decent match, and too common a shape, possibly producing countless false positives. Seek bumper with lights and license plate, and then once that is found, compare it to example bumper with triangles, and example bumper without triangles. That way no matter how poor the triangle detection, the match against the image with triangles will always be better if they are there.
how can I identify the presence or absence of regular stripes of different colors, but ranging from very very very very light pink to black inside of a scanned image (bitmap 200x200dpi 24-bit).
Carry a few examples.
Example 1
Example 2 (the lines are in all the columns except 7 in the second row of the last column)
For now try to identify (using python language) whether or not there is at least 5-10 pixels for the presence of different color from white to each strip, however, does not always work because the scanned image is not of high quality and the strip changes color very similar to color that surrounds it.
Thanks.
This looks to me a connected component labeling in an image to identify discrete regions of certain color range. You can have a look to cvBlobLib. Some pre-processing would be required to merge the pixels if there are holes or small variations between neighbors.
Not going to happen. The human visual system is far better than any image processing system, and I don't see anything in the 2nd row of #3. #1 and #5 are also debatable.
You need to find some way to increase the optical quality of your input.
Search for segmentation algorithm ,with a low threshold.
It should give you good results as the edges are sharp.
Sobel would be a good start ;)
Using Python, OpenCV, and live webcam input, I can't figure out how to set a point based on an x y coordinate and track where it moves.
Below is a simple example to track a yellow object.
https://github.com/abidrahmank/OpenCV-Python/blob/master/Other_Examples/track_yellow_draw_line.py
Here is the method to track yellow color:
1) Extract the first frame of video
2) Convert frame into HSV color space. Take H plane and threshold it for yellow color so that you get binary image with yellow object as white (also called blob) and remaining as black.
3) Now you find centre point of blob. You can use moments or contours(especially if you have more than one blob. In the example above, very simple logic is used. Just find leftmost,rightmost,topmost and bottommost points on blob and draw a rectangle around it). And store this values.
4) Extract next frame and follow all above steps to get new position. Join these two position and draw a line.
Over.
There are a few blogs that explain the basics. Check out this one: Object tracking in OpenCV and Python 2.6.
Edit: I don't think you can track arbitrary points. To be able to make a correspondence between one point in two images, you need to know something unique about the point to track. This is often done with interest points, which are "unique enough" to be compared across images. Other methods are based making the point easy to detect using a projection scheme.