I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.
I have 100 samples (i.e. images) of each digit. I would like to train with them.
There is a sample letter_recog.py that comes with OpenCV sample. But I still couldn't figure out on how to use it. I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn't understand first.
Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):
import numpy as np
import cv2
fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]
model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()
It gave me an array of size 20000, I don't understand what it is.
1) What is letter_recognition.data file? How to build that file from my own data set?
2) What does results.reval() denote?
3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?
Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).
1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.
It contains a letter, along with 16 features of that letter.
And this SOF helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers.
(Although I didn't understand some of the features at the end)
2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.
So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)
I took the below image for my training data:
(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).
To prepare the data for training, I made a small code in OpenCV. It does the following things:
It loads the image.
Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in the box.
Once the corresponding digit key is pressed, it resizes this box to 10x10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
Then save both the arrays in separate .txt files.
At the end of the manual classification of digits, all the digits in the training data (train.png) are labeled manually by ourselves, image will look like below:
Below is the code I used for the above purpose (of course, not so clean):
import sys
import numpy as np
import cv2
im = cv2.imread('pitrain.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
################# Now finding Contours ###################
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
elif key in keys:
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"
Now we enter in to training and testing part.
For the testing part, I used the below image, which has the same type of letters I used for the training phase.
For training we do as follows:
Load the .txt files we already saved earlier
create an instance of the classifier we are using (it is KNearest in this case)
Then we use KNearest.train function to train the data
For testing purposes, we do as follows:
We load the image used for testing
process the image as earlier and extract each digit using contour methods
Draw a bounding box for it, then resize it to 10x10, and store its pixel values in an array as done earlier.
Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)
I included last two steps (training and testing) in single code below:
import cv2
import numpy as np
####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.KNearest()
############################# testing part #########################
im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
And it worked, below is the result I got:
Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.
But anyway, this is a good start to go for beginners (I hope so).
For those who interested in C++ code can refer below code.
Thanks Abid Rahman for the nice explanation.
The procedure is same as above but, the contour finding uses only first hierarchy level contour, so that the algorithm uses only outer contour for each digit.
Code for creating sample and Label data
//Process image to extract contour
Mat thr,gray,con;
Mat src=imread("digit.png",1);
threshold(gray,thr,200,255,THRESH_BINARY_INV); //Threshold to find contour
// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
Mat ROI = thr(r); //Crop the image
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
tmp1.convertTo(tmp2,CV_32FC1); //convert to float
sample.push_back(tmp2.reshape(1,1)); // Store sample data
int c=waitKey(0); // Read corresponding label for contour from keyoard
c-=0x30; // Convert ascii to intiger value
response_array.push_back(c); // Store label to a mat
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);
// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert to float
FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
cout<<"Training and Label data created successfully....!! "<<endl;
Code for training and testing
Mat thr,gray,con;
Mat src=imread("dig.png",1);
threshold(gray,thr,200,255,THRESH_BINARY_INV); // Threshold to create input
// Read stored sample and label for training
Mat sample;
Mat response,tmp;
FileStorage Data("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data["data"] >> sample;
FileStorage Label("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label["label"] >> response;
KNearest knn;
knn.train(sample,response); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
//Create input sample by contour finding and cropping
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst(src.rows,src.cols,CV_8UC3,Scalar::all(0));
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through each contour for first hierarchy level .
Rect r= boundingRect(contours[i]);
Mat ROI = thr(r);
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
float p=knn.find_nearest(tmp2.reshape(1,1), 1);
char name[4];
putText( dst,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
In the result the dot in the first line is detected as 8 and we haven’t trained for dot. Also I am considering every contour in first hierarchy level as the sample input, user can avoid it by computing the area.
I had some problems to generate the training data, because it was hard sometimes to identify the last selected letter, so I rotated the image 1.5 degrees. Now each character is selected in order and the test still shows a 100% accuracy rate after training. Here is the code:
import numpy as np
import cv2
def rotate_image(image, angle):
image_center = tuple(np.array(image.shape[1::-1]) / 2)
rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
result = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR)
return result
img = cv2.imread('training_image.png')
cv2.imshow('orig image', img)
whiteBorder = [255,255,255]
# extend the image border
image1 = cv2.copyMakeBorder(img, 80, 80, 80, 80, cv2.BORDER_CONSTANT, None, whiteBorder)
# rotate the image 1.5 degrees clockwise for ease of data entry
image_rot = rotate_image(image1, -1.5)
#crop_img = image_rot[y:y+h, x:x+w]
cropped = image_rot[70:350, 70:710]
cv2.imwrite('rotated.png', cropped)
cv2.imshow('rotated image', cropped)
For sample data, I made some changes to the script, like this:
import sys
import numpy as np
import cv2
def sort_contours(contours, x_axis_sort='LEFT_TO_RIGHT', y_axis_sort='TOP_TO_BOTTOM'):
# initialize the reverse flag
x_reverse = False
y_reverse = False
if x_axis_sort == 'RIGHT_TO_LEFT':
x_reverse = True
if y_axis_sort == 'BOTTOM_TO_TOP':
y_reverse = True
boundingBoxes = [cv2.boundingRect(c) for c in contours]
# sorting on x-axis
sortedByX = zip(*sorted(zip(contours, boundingBoxes),
key=lambda b:b[1][0], reverse=x_reverse))
# sorting on y-axis
(contours, boundingBoxes) = zip(*sorted(zip(*sortedByX),
key=lambda b:b[1][1], reverse=y_reverse))
# return the list of sorted contours and bounding boxes
return (contours, boundingBoxes)
im = cv2.imread('rotated.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
contours, boundingBoxes = sort_contours(contours, x_axis_sort='LEFT_TO_RIGHT', y_axis_sort='TOP_TO_BOTTOM')
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28 and h < 40:
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
elif key in keys:
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.ubyte)
responses = responses.reshape((responses.size,1))
print("training complete")
I am trying to sort contours based on their arrivals, left-to-right and top-to-bottom just like how you write anything. From, top and left and then whichever comes accordingly.
This is what and how I have achieved up to now:
def get_contour_precedence(contour, cols):
tolerance_factor = 61
origin = cv2.boundingRect(contour)
return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]
image = cv2.imread("C:/Users/XXXX/PycharmProjects/OCR/raw_dataset/23.png", 0)
ret, thresh1 = cv2.threshold(image, 130, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
contours, h = cv2.findContours(thresh1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# perform edge detection, find contours in the edge map, and sort the
# resulting contours from left-to-right
contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))
# initialize the list of contour bounding boxes and associated
# characters that we'll be OCR'ing
chars = []
inc = 0
# loop over the contours
for c in contours:
inc += 1
# compute the bounding box of the contour
(x, y, w, h) = cv2.boundingRect(c)
label = str(inc)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(image, label, (x - 2, y - 2),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
print('x=', x)
print('y=', y)
print('x+w=', x + w)
print('y+h=', y + h)
crop_img = image[y + 2:y + h - 1, x + 2:x + w - 1]
name = os.path.join("bounding boxes", 'Image_%d.png' % (
cv2.imshow("cropped", crop_img)
crop_img = Image.fromarray(crop_img)
cv2.imshow('mat', image)
Input Image :
Output Image 1:
Input Image 2 :
Output for Image 2:
Input Image 3:
Output Image 3:
As you can see the 1,2,3,4 is not what I was expecting it to be each image, as displayed in the Image Number 3.
How do I adjust this to make it work or even write a custom function?
NOTE: I have multiple images of the same input image provided in my question. The content is the same but they have variations in the text so the tolerance factor is not working for each one of them. Manually adjusting it would not be a good idea.
This is my take on the problem. I'll give you the general gist of it, and then my implementation in C++. The main idea is that I want to process the image from left to right, top to bottom. I'll process each blob (or contour) as I find it, however, I need a couple of intermediate steps for achieving a successful (an ordered) segmentation.
Vertical sort using rows
The first step is trying to sort the blobs by rows – this means that each row has a set of (unordered) horizontal blobs. That's ok. the first step is computing some kind of vertical sorting, and if we process each row from top to bottom, we will achieve just that.
After the blobs are (vertically) sorted by rows, then I can check out their centroids (or center of mass) and horizontally sort them. The idea is that I will process row per row and, for each row, I sort blob centroids. Let’s see an example of what I'm trying to achieve here.
This is your input image:
This is what I call the Row Mask:
This last image contains white areas that represent a "row" each. Each row has a number (e.g., Row1 , Row2, etc.) and each row holds a set of blobs (or characters, in this case). By processing each row, top from bottom, you are already sorting the blobs on the vertical axis.
If I number each row from top to bottom, I get this image:
The Row Mask is a way of creating "rows of blobs", and this mask can be computed morphologically. Check out the 2 images overlaid to give you a better view of the processing order:
What we are trying to do here is, first, a vertical ordering (blue arrow) and then we will take care of the horizontal (red arrow) ordering. You can see that by processing each row we can (possibly) overcome the sorting problem!
Horizontal sort using centroids
Let's see now how we can sort the blobs horizontally. If we create a simpler image, with a width equal to the input image and a height equal to the numbers of rows in our Row Mask, we can simply overlay every horizontal coordinate (x coordinate) of each blob centroid. Check out this example:
This is a Row Table. Each row represents the number of rows found in the Row Mask, and is also read from top to bottom. The width of the table is the same as the width of your input image, and corresponds spatially to the horizontal axis. Each square is a pixel in your input image, mapped to the Row Table using only the horizontal coordinate (as our simplification of rows is pretty straightforward). The actual value of each pixel in the row table is a label, labeling each of the blobs on your input image. Note that the labels are not ordered!
So, for instance, this table shows that, in the row 1 (you already know what is row 1 – it's the first white area on the Row Mask) in the position (1,4) there’s the blob number 3. In position (1,6) there's blob number 2, and so on. What's cool (I think) about this table is that you can loop through it, and for every value different of 0, horizontal ordering becomes very trivial. This is the row table ordered, now, left to right:
Mapping blob information with centroids
We are going to use blobs centroids to map the information between our two representations (Row Mask/Row Table). Suppose you already have both "helper" images and you process each blob (or contour) on the input image at a time. For example, you have this as a start:
Alright, there's a blob here. How can we map it to the Row Mask and to the Row Table? Using its centroids. If we compute the centroid (shown in the figure as the green dot) we can construct a dictionary of centroids and labels. For example, for this blob, the centroid is located at (271,193). Ok, let’s assign the label = 1. So we now have this dictionary:
Now, we find the row in which this blob is placed using the same centroid on the Row Mask. Something like this:
rowNumber = rowMask.at( 271,193 )
This operation should return rownNumber = 3. Nice! We know in which row our blob is placed on, and so, it is now vertically ordered. Now, let's store its horizontal coordinate in the Row Table:
rowTable.at( 271, 193 ) = 1
Now, rowTable holds (in its row and column) the label of the processed blob. The Row Table should look something like this:
The table is a lot wider, because its horizontal dimension has to be the same as your input image. In this image, the label 1 is placed in Column 271, Row 3. If this was the only blob on your image, the blobs would be already sorted. But what happens if you add another blob in, say, Column 2, Row 1? That's why you need to traverse, again, this table after you have processed all the blobs – to properly correct their label.
Implementation in C++
Alright, hopefully the algorithm should be a little bit clear (if not, just ask, my man). I'll try to implement these ideas in OpenCV using C++. First, I need a binary image of your input. Computation is trivial using Otsu’s thresholding method:
//Read the input image:
std::string imageName = "C://opencvImages//yFX3M.png";
cv::Mat testImage = cv::imread( imageName );
//Compute grayscale image
cv::Mat grayImage;
cv::cvtColor( testImage, grayImage, cv::COLOR_RGB2GRAY );
//Get binary image via Otsu:
cv::Mat binImage;
cv::threshold( grayImage, binImage, 0, 255, cv::THRESH_OTSU );
//Invert image:
binImage = 255 - binImage;
This is the resulting binary image, nothing fancy, just what we need to start working:
The first step is to get the Row Mask. This can be achieved using morphology. Just apply a dilation + erosion with a VERY big horizontal structuring element. The idea is you want to turn those blobs into rectangles, "fusing" them together horizontally:
//Create a hard copy of the binary mask:
cv::Mat rowMask = binImage.clone();
//horizontal dilation + erosion:
int horizontalSize = 100; // a very big horizontal structuring element
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(horizontalSize,1) );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_DILATE, SE, cv::Point(-1,-1), 2 );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_ERODE, SE, cv::Point(-1,-1), 1 );
This results in the following Row Mask:
That's very cool, now that we have our Row Mask, we must number them rows, ok? There's a lot of ways of doing this, but right now I'm interested in the simpler one: loop through this image and get every single pixel. If a pixel is white, use a Flood Fill operation to label that portion of the image as a unique blob (or row, in this case). This can be done as follows:
//Label the row mask:
int rowCount = 0; //This will count our rows
//Loop thru the mask:
for( int y = 0; y < rowMask.rows; y++ ){
for( int x = 0; x < rowMask.cols; x++ ){
//Get the current pixel:
uchar currentPixel = rowMask.at<uchar>( y, x );
//If the pixel is white, this is an unlabeled blob:
if ( currentPixel == 255 ) {
//Create new label (different from zero):
//Flood fill on this point:
cv::floodFill( rowMask, cv::Point( x, y ), rowCount, (cv::Rect*)0, cv::Scalar(), 0 );
This process will label all the rows from 1 to r. That's what we wanted. If you check out the image you'll faintly see the rows, that's because our labels correspond to very low intensity values of grayscale pixels.
Ok, now let's prepare the Row Table. This "table" really is just another image, remember: same width as the input and height as the number of rows you counted on the Row Mask:
//create rows image:
cv::Mat rowTable = cv::Mat::zeros( cv::Size(binImage.cols, rowCount), CV_8UC1 );
//Just for convenience:
rowTable = 255 - rowTable;
Here, I just inverted the final image for convenience. Because I want to actually see how the table is populated with (very low intensity) pixels and be sure that everything is working as intended.
Now comes the fun part. We have both images (or data containers) prepared. We need to process each blob independently. The idea is that you have to extract each blob/contour/character from the binary image and compute its centroid and assign a new label. Again, there's a lot of way of doing this. Here, I'm using the following approach:
I'll loop through the binary mask. I'll get the current biggest blob from this binary input. I'll compute its centroid and store its data in every container needed, and then, I'll delete that blob from the mask. I'll repeat the process until no more blobs are left. This is my way of doing this, especially because I've functions I already wrote for that. This is the approach:
//Prepare a couple of dictionaries for data storing:
std::map< int, cv::Point > blobMap; //holds label, gives centroid
std::map< int, cv::Rect > boundingBoxMap; //holds label, gives bounding box
First, two dictionaries. One receives a blob label and returns the centroid. The other one receives the same label and returns the bounding box.
//Extract each individual blob:
cv::Mat bobFilterInput = binImage.clone();
//The new blob label:
int blobLabel = 0;
//Some control variables:
bool extractBlobs = true; //Controls loop
int currentBlob = 0; //Counter of blobs
while ( extractBlobs ){
//Get the biggest blob:
cv::Mat biggestBlob = findBiggestBlob( bobFilterInput );
//Compute the centroid/center of mass:
cv::Moments momentStructure = cv::moments( biggestBlob, true );
float cx = momentStructure.m10 / momentStructure.m00;
float cy = momentStructure.m01 / momentStructure.m00;
//Centroid point:
cv::Point blobCentroid;
blobCentroid.x = cx;
blobCentroid.y = cy;
//Compute bounding box:
boundingBox boxData;
computeBoundingBox( biggestBlob, boxData );
//Convert boundingBox data into opencv rect data:
cv::Rect cropBox = boundingBox2Rect( boxData );
//Label blob:
blobMap.emplace( blobLabel, blobCentroid );
boundingBoxMap.emplace( blobLabel, cropBox );
//Get the row for this centroid
int blobRow = rowMask.at<uchar>( cy, cx );
//Place centroid on rowed image:
rowTable.at<uchar>( blobRow, cx ) = blobLabel;
//Resume blob flow control:
cv::Mat blobDifference = bobFilterInput - biggestBlob;
//How many pixels are left on the new mask?
int pixelsLeft = cv::countNonZero( blobDifference );
bobFilterInput = blobDifference;
//Done extracting blobs?
if ( pixelsLeft <= 0 ){
extractBlobs = false;
//Increment blob counter:
Check out a nice animation of how this processing goes through each blob, processes it and deletes it until there’s nothing left:
Now, some notes with the above snippet. I've some helper functions: biggestBlob and computeBoundingBox. These functions compute the biggest blob in a binary image and convert a custom structure of a bounding box into OpenCV’s Rect structure respectively. Those are the operations those functions carry out.
The "meat" of the snippet is this: Once you have an isolated blob, compute its centroid (I actually compute the center of mass via central moments). Generate a new label. Store this label and centroid in a dictionary, in my case, the blobMap dictionary. Additionally compute the bounding box and store it in another dictionary, boundingBoxMap:
//Label blob:
blobMap.emplace( blobLabel, blobCentroid );
boundingBoxMap.emplace( blobLabel, cropBox );
Now, using the centroid data, fetch the corresponding row of that blob. Once you get the row, store this number into your row table:
//Get the row for this centroid
int blobRow = rowMask.at<uchar>( cy, cx );
//Place centroid on rowed image:
rowTable.at<uchar>( blobRow, cx ) = blobLabel;
Excellent. At this point you have the Row Table ready. Let’s loop through it and actually, and finally, order those damn blobs:
int blobCounter = 1; //The ORDERED label, starting at 1
for( int y = 0; y < rowTable.rows; y++ ){
for( int x = 0; x < rowTable.cols; x++ ){
//Get current label:
uchar currentLabel = rowTable.at<uchar>( y, x );
//Is it a valid label?
if ( currentLabel != 255 ){
//Get the bounding box for this label:
cv::Rect currentBoundingBox = boundingBoxMap[ currentLabel ];
cv::rectangle( testImage, currentBoundingBox, cv::Scalar(0,255,0), 2, 8, 0 );
//The blob counter to string:
std::string counterString = std::to_string( blobCounter );
cv::putText( testImage, counterString, cv::Point( currentBoundingBox.x, currentBoundingBox.y-1 ),
cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255,0,0), 1, cv::LINE_8, false );
blobCounter++; //Increment the blob/label
Nothing fancy, just a regular nested for loop, looping through each pixel on the row table. If the pixel is different from white, use the label to retrieve both the centroid and bounding box, and just change the label to an increasing number. For result displaying I just draw the bounding boxes and the new label on the original image.
Check out the ordered processing in this animation:
Very cool, here's a bonus animation, the Row Table getting populated with horizontal coordinates:
I would even say use hue moments which tends to be a better estimation for the center point of a polygon
than the "normal" coordinate center point of the rectangle, so the function could be:
def get_contour_precedence(contour, cols):
tolerance_factor = 61
M = cv2.moments(contour)
# calculate x,y coordinate of centroid
if M["m00"] != 0:
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
# set values as what you need in the situation
cX, cY = 0, 0
return ((cY // tolerance_factor) * tolerance_factor) * cols + cX
an super math. explanation what hue moments are, could you find here
Maybe you should think about get rid of this tolerance_factor
by using in general a clustering algorithm like
kmeans to cluster your center to rows and columns.
OpenCv has a an kmeans implementation which you could find here
I do not exactly know what your goal is, but another idea could be to split every line into an Region of Interest (ROI)
for further processing, afterwards you could easily count the letters
by the X-Values of the each contour and the line number
import cv2
import numpy as np
## (1) read
img = cv2.imread("yFX3M.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## (2) threshold
th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)
## (3) minAreaRect on the nozeros
pts = cv2.findNonZero(threshed)
ret = cv2.minAreaRect(pts)
(cx,cy), (w,h), ang = ret
if w>h:
w,h = h,w
## (4) Find rotated matrix, do rotation
M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0)
rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0]))
## (5) find and draw the upper and lower boundary of each lines
hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1)
th = 2
H,W = img.shape[:2]
# (6) using histogramm with threshold
uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th]
lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th]
rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR)
for y in uppers:
cv2.line(rotated, (0,y), (W, y), (255,0,0), 1)
for y in lowers:
cv2.line(rotated, (0,y), (W, y), (0,255,0), 1)
cv2.imshow('pic', rotated)
# (7) we iterate all rois and count
for i in range(len(uppers)) :
roi = rotated[uppers[i]:lowers[i],0:W]
cv2.imshow('line', roi)
# here again calc thres and contours
I found an old post with this code here
Instead of taking the upper left corner of the contour, I'd rather use the centroid or at least the bounding box center.
def get_contour_precedence(contour, cols):
tolerance_factor = 4
origin = cv2.boundingRect(contour)
return (((origin[1] + origin[3])/2 // tolerance_factor) * tolerance_factor) * cols + (origin[0] + origin[2]) / 2
But it might be hard to find a tolerance value that works in all cases.
Here is one way in Python/OpenCV by processing by rows first then characters.
Read the input
Convert to grayscale
Threshold and invert
Use a long horizontal kernels and apply morphology close to form rows
Get the contours of the rows and their bounding boxes
Save the row boxes and sort on Y
Loop over each sorted row box and extract the row from the thresholded image
Get the contours of each character in the row and save the the bounding boxes of the characters.
Sort the contours for a given row on X
Draw the bounding boxes on the input and the index number as text on the image
Increment the index
Save the results
import cv2
import numpy as np
# read input image
img = cv2.imread('vision78.png')
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# otsu threshold
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
thresh = 255 - thresh
# apply morphology close to form rows
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (51,1))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# find contours and bounding boxes of rows
rows_img = img.copy()
boxes_img = img.copy()
rowboxes = []
rowcontours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rowcontours = rowcontours[0] if len(rowcontours) == 2 else rowcontours[1]
index = 1
for rowcntr in rowcontours:
xr,yr,wr,hr = cv2.boundingRect(rowcntr)
cv2.rectangle(rows_img, (xr, yr), (xr+wr, yr+hr), (0, 0, 255), 1)
# sort rowboxes on y coordinate
def takeSecond(elem):
return elem[1]
# loop over each row
for rowbox in rowboxes:
# crop the image for a given row
xr = rowbox[0]
yr = rowbox[1]
wr = rowbox[2]
hr = rowbox[3]
row = thresh[yr:yr+hr, xr:xr+wr]
bboxes = []
# find contours of each character in the row
contours = cv2.findContours(row, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
for cntr in contours:
x,y,w,h = cv2.boundingRect(cntr)
# sort bboxes on x coordinate
def takeFirst(elem):
return elem[0]
# draw sorted boxes
for box in bboxes:
xb = box[0]
yb = box[1]
wb = box[2]
hb = box[3]
cv2.rectangle(boxes_img, (xb, yb), (xb+wb, yb+hb), (0, 0, 255), 1)
cv2.putText(boxes_img, str(index), (xb,yb), cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.75, (0,255,0), 1)
index = index + 1
# save result
cv2.imwrite("vision78_thresh.jpg", thresh)
cv2.imwrite("vision78_morph.jpg", morph)
cv2.imwrite("vision78_rows.jpg", rows_img)
cv2.imwrite("vision78_boxes.jpg", boxes_img)
# show images
cv2.imshow("thresh", thresh)
cv2.imshow("morph", morph)
cv2.imshow("rows_img", rows_img)
cv2.imshow("boxes_img", boxes_img)
Threshold image:
Morphology image of rows:
Row contours image:
Character contours image:
Hi I need to write a program that remove demarcation from gray scale image(image with text in it)
i read about thresholding and blurring but still i dont see how can i do it.
my image is an image of a hebrew text like that:
and i need to remove the demarcation(assuming that the demarcation is the smallest element in the image) the output need to be something like that
I want to write the code in python using opencv, what topics do i need to learn to be able to do that, and how?
thank you.
I can use only cv2 functions
The symbols you want to remove are significantly smaller than all other shapes, you can use that to determine witch ones to remove.
First use threshold to convert the image to binary. Next, you can use findContours to detect the shapes and then contourArea to determine if the shape is larger than a threshold.
Finally you can can create a mask to remove the unwanted shapes, draw the larger symbols on a new image or draw the smaller symbols in white over the original symbols in the original image - making them disappear. I used that last technique in the code below.
import cv2
# load image as grayscale
img = cv2.imread('1MioS.png',0)
# convert to binary. Inverted, so you get white symbols on black background
_ , thres = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY_INV)
# find contours in the thresholded image (this gives all symbols)
contours, hierarchy = cv2.findContours(thres, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# loop through the contours, if the size of the contour is below a threshold,
# draw a white shape over it in the input image
for cnt in contours:
if cv2.contourArea(cnt) < 250:
# display result
cv2.imshow('res', img)
To find the largest contour, you can loop through them and keep track of the largest value:
maxArea = 0
for cnt in contours:
currArea = cv2.contourArea(cnt)
if currArea > maxArea:
maxArea = currArea
I also whipped up a little more complex version, that creates a sorted list of the indexes and sizes of the contours. Then it looks for the largest relative difference in size of all contours, so you know which contours are 'small' and 'large'. I do not know if this works for all letters / fonts.
# create a list of the indexes of the contours and their sizes
contour_sizes = []
for index,cnt in enumerate(contours):
# sort the list based on the contour size.
# this changes the order of the elements in the list
contour_sizes.sort(key=lambda x:x[1])
# loop through the list and determine the largest relative distance
indexOfMaxDifference = 0
currentMaxDifference = 0
for i in range(1,len(contour_sizes)):
sizeDifference = contour_sizes[i][1] / contour_sizes[i-1][1]
if sizeDifference > currentMaxDifference:
currentMaxDifference = sizeDifference
indexOfMaxDifference = i
# loop through the list again, ending (or starting) at the indexOfMaxDifference, to draw the contour
for i in range(0, indexOfMaxDifference):
cv2.drawContours(img,contours,contour_sizes[i][0] ,(255),-1)
To get the background color you can do use minMaxLoc. This returns the lowest color value and it's position of an image (also the max value, but you don't need that). If you apply it to the thresholded image - where the background is black -, it will return the location of a background pixel (big odds it will be (0,0) ). You can then look up this pixel in the original color image.
# get the location of a pixel with background color
min_val, _, min_loc, _ = cv2.minMaxLoc(thres)
# load color image
img_color = cv2.imread('1MioS.png')
# get bgr values of background
b,g,r = img_color[min_loc]
# convert from numpy object
background_color = (int(b),int(g),int(r))
and then to draw the contours
and of course
cv2.imshow('res', img_color)
This looks like a problem for template matching since you have what looks like a known font and can easily understand what the characters and/or demarcations are. Check out https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html
Admittedly, the tutorial talks about finding the match; modification is up to you. In that case, you know the exact shape of the template itself, so using that information along with the location of the match, just overwrite the image data with the appropriate background color (based on the examples above, 255).
You can solve it by removing all the small clusters.
I found a Python solution (using OpenCV) here.
For supporting smaller fonts, I added the following heuristic:
"The largest size of the demarcation cluster is 1/500 of the largest letter cluster".
The heuristic can be refined, by statistical analysts (or improved by other heuristics, such as demarcation locations relative to the letters).
import numpy as np
import cv2
I = cv2.imread('Goodluck.png', cv2.IMREAD_GRAYSCALE)
J = 255 - I # Invert I
img = cv2.threshold(J, 127, 255, cv2.THRESH_BINARY)[1] # Convert to binary
# https://answers.opencv.org/question/194566/removing-noise-using-connected-components/
nlabel,labels,stats,centroids = cv2.connectedComponentsWithStats(img, connectivity=8)
labels_small = []
areas_small = []
# Find largest cluster:
max_size = np.max(stats[:, cv2.CC_STAT_AREA])
thresh_size = max_size / 500 # Set the threshold to maximum cluster size divided by 500.
for i in range(1, nlabel):
if stats[i, cv2.CC_STAT_AREA] < thresh_size:
areas_small.append(stats[i, cv2.CC_STAT_AREA])
mask = np.ones_like(labels, dtype=np.uint8)
for i in labels_small:
I[labels == i] = 255
cv2.imshow('I', I)
Here is a MATLAB code sample (kept threshold = 200):
I = imbinarize(rgb2gray(imread('בהצלחה.png')));
J = ~I;
CC = bwconncomp(J);
%Cover all small clusters with zewros.
for i = 1:CC.NumObjects
C = CC.PixelIdxList{i}; %Cluster coordinates.
%Fill small clusters with zeros.
if numel(C) < 200
J(C) = 0;
J = ~J;
During the process of error level analysis on an image, I want to highlight the pixel changes using OpenCV(With just a single image and not the difference). I know the pixel-level values for the output image but not sure about the methods to group them together and assign a shape to that (Example below where the pixel change is specified with a shape). I want to know if I could detect the circle with the lighter pixels and group them and add a grouped shape for the pixels
Input Image:
Result Image:
If I understand correctly, you want to highlight the differences between the input and output images in a new image. To do this, you can take a quantitative approach to determine the exact discrepancies between images using the Structural Similarity Index (SSIM) which was introduced in Image Quality Assessment: From Error Visibility to Structural Similarity. This method is already implemented in the scikit-image library for image processing. You can install scikit-image with pip install scikit-image.
The skimage.measure.compare_ssim() function returns a score and a diff image. The score represents the structural similarity index between the two input images and can fall between the range [-1,1] with values closer to one representing higher similarity. But since you're only interested in where the two images differ, the diff image is what we'll focus on. Specifically, the diff image contains the actual image differences with darker regions having more disparity. Larger areas of disparity are highlighted in black while smaller differences are in gray. Here's the diff image
If you look closely, there are gray noisy areas probably due to .jpg lossy compression. So to obtain a cleaner result, we perform morphological operations to smooth the image. We would obtain a cleaner result if the images used a lossless image compression format such as .png. After cleaning up the image, we highlight the differences in green
from skimage.measure import compare_ssim
import numpy as np
import cv2
# Load images and convert to grayscale
image1 = cv2.imread('1.jpg')
image2 = cv2.imread('2.jpg')
image1_gray = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
image2_gray = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)
# Compute SSIM between two images
(score, diff) = compare_ssim(image1_gray, image2_gray, full=True)
# The diff image contains the actual image differences between the two images
# and is represented as a floating point data type in the range [0,1]
# so we must convert the array to 8-bit unsigned integers in the range
# [0,255] before we can use it with OpenCV
diff = 255 - (diff * 255).astype("uint8")
# Perform morphological operations
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(diff, cv2.MORPH_OPEN, kernel, iterations=1)
close = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel, iterations=1)
diff = cv2.merge([close,close,close])
# Color difference pixels
diff[np.where((diff > [10,10,50]).all(axis=2))] = [36,255,12]
I thing the best way is to simply threshold you image and apply Morphological Transformations.
I have got the following results.
Threashold + Morphological:
Select the largest component:
using this code:
cv::Mat result;
cv::Mat img = cv::imread("fOTmh.jpg");
//-- gray & smooth image
cv::cvtColor(img, result, cv::COLOR_BGR2GRAY);
cv::blur(result, result, cv::Size(5,5));
//-- threashold with max value of the image and smooth again!
double min, max;
cv::minMaxLoc(result, &min, &max);
cv::threshold(result, result, 0.3*max, 255, cv::THRESH_BINARY);
cv::medianBlur(result, result, 7);
//-- apply Morphological Transformations
cv::Mat se = getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(11, 11));
cv::morphologyEx(result, result, cv::MORPH_DILATE, se);
cv::morphologyEx(result, result, cv::MORPH_CLOSE, se);
//-- find the largest component
vector<vector<cv::Point> > contours;
vector<cv::Vec4i> hierarchy;
cv::findContours(result, contours, hierarchy, cv::RETR_LIST, cv::CHAIN_APPROX_NONE);
vector<cv::Point> *l = nullptr;
for(auto &&c: contours){
if (l==nullptr || l->size()< c.size())
l = &c;
//-- expand and plot Rect around the largest component
cv::Rect r = boundingRect(*l);
r.x -=10;
r.y -=10;
r.width +=20;
r.height +=20;
cv::rectangle(img, r, cv::Scalar::all(255), 3);
//-- result
cv::resize(img, img, cv::Size(), 0.25, 0.25);
cv::imshow("result", img);
Python Code :
import cv2 as cv
img = cv.imread("ELA_Final.jpg")
result = cv.cvtColor(img, cv.COLOR_BGR2GRAY);
result = cv.blur(result, (5,5));
minVal, maxVal, minLoc, maxLoc = cv.minMaxLoc(result)
ret,result = cv.threshold(result, 0.3*maxVal, 255, cv.THRESH_BINARY)
median = cv.medianBlur(result, 7)
se = cv.getStructuringElement(cv.MORPH_ELLIPSE,(11, 11));
result = cv.morphologyEx(result, cv.MORPH_DILATE, se);
result = cv.morphologyEx(result, cv.MORPH_CLOSE, se);
_,contours, hierarchy = cv.findContours(result,cv.RETR_LIST, cv.CHAIN_APPROX_NONE)
x = []
for eachCOntor in contours:
m = max(x)
p = [i for i, j in enumerate(x) if j == m]
color = (255, 0, 0)
x, y, w, h = cv.boundingRect(contours[p[0]])
x -=10
y -=10
w +=20
h +=20
cv.rectangle(img, (x,y),(x+w,y+h),color, 3)
img = cv.resize( img,( 1500, 700), interpolation = cv.INTER_AREA)
cv.imshow("result", img)
I'm currently trying to write something that can extract data from some uncommon graphs in a book. I scanned the pages of the book, and by using opencv I would like to detect some features from the graphs in order to convert it into useable data. In the left graph I'm looking for the height of the "triangles" and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
The first thing I thought of was detecting the lines of the charts, in the hopes I could somehow measure their length or position. For this I'm using the Hough Line Transform. The following snippet of code shows how far I've gotten already.
import numpy as np
import cv2
# Reading the image
img = cv2.imread('test2.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection
edges = cv2.Canny(gray,50,150,apertureSize = 3)
# Line detection
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength=50,maxLineGap=20)
for line in lines:
x1,y1,x2,y2 = line[0]
The only problem is that this detection algorithm is not accurate at all. At least not for me. And in order to extract some data from the charts, the detection of the lines should be somewhat accurate. Is their any way I could do this? Or is my strategy to detect lines just wrong in the first place? Should I maybe start with detecting something else, like circles,object sizes, contours or colors?
Using color segmentation is an easy way to convert this graph to data. This method does require some manual annotation. After the graph is segmented, count the pixels for each color. Check out the 'watershed' demo in the demo files that are included in the OpenCV library:
import numpy as np
import cv2 as cv
from common import Sketcher
class App:
def __init__(self, fn):
self.img = cv.imread(fn)
self.img = cv.resize(self.img, (654,654))
h, w = self.img.shape[:2]
self.markers = np.zeros((h, w), np.int32)
self.markers_vis = self.img.copy()
self.cur_marker = 1
self.colors = np.int32( list(np.ndindex(2, 2, 3)) ) * 123
self.auto_update = True
self.sketch = Sketcher('img', [self.markers_vis, self.markers], self.get_colors)
def get_colors(self):
return list(map(int, self.colors[self.cur_marker])), self.cur_marker
def watershed(self):
m = self.markers.copy()
cv.watershed(self.img, m)
cv.imshow('img', self.img)
overlay = self.colors[np.maximum(m, 0)]
vis = cv.addWeighted(self.img, 0.5, overlay, 0.5, 0.0, dtype=cv.CV_8UC3)
cv.imshow('overlay', np.array(overlay, np.uint8))
cv.imwrite('/home/stephen/Desktop/overlay.png', np.array(overlay, np.uint8))
cv.imshow('watershed', vis)
def run(self):
while cv.getWindowProperty('img', 0) != -1 or cv.getWindowProperty('watershed', 0) != -1:
ch = cv.waitKey(50)
if ch >= ord('1') and ch <= ord('9'):
self.cur_marker = ch - ord('0')
print('marker: ', self.cur_marker)
if self.sketch.dirty and self.auto_update:
self.sketch.dirty = False
if ch == 27: break
fn = '/home/stephen/Desktop/test.png'
The output will be an image like this:
You can count the pixels for each color using this code:
# Extract the values from the image
vals = []
img = cv.imread('/home/stephen/Desktop/overlay.png')
# Get the colors in the image
flat = img.reshape(-1, img.shape[-1])
colors = np.unique(flat, axis=0)
# Iterate through the colors (ignore the first and last colors)
for color in colors[1:-1]:
a,b,c = color
lower = a-1, b-1, c-1
upper = a+1,b+1,c+1
lower = np.array(lower)
upper = np.array(upper)
mask = cv.inRange(img, lower, upper)
cv.imshow('mask', mask)
And print out the output data using this code:
names = ['alcohol', 'esters', 'biter', 'hoppy', 'acid', 'zoetheid', 'mout']
print(list(zip(names, vals)))
The output is:
[('alcohol', 22118), ('esters', 26000), ('biter', 16245), ('hoppy', 21170), ('acid', 19156), ('zoetheid', 11090), ('mout', 7167)]
I have a drawing (in dxf format) containing 9 different shapes arranged in a random pattern. I need to find the center point of each shape and derive it's x,y coordinate so that I can append it to a list for machining purposes.
The problem is I'm using autocad which saves each shape as a series of vertices even if I first convert them to distinct joined polylines. In other words, opening the drawing in a text editor just gives me a standard vertex list from which it's impossible to say where one shape ends and the next begins.
So far the only solutions I've had any success with seem awfully goldbergian. As an example I can export the dxf to a bmp and then use python and Opencv to identify each shape based on the number of contours it contains:
import sys
import numpy as np
import cv2
im = cv2.imread('drawing.bmp')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours0, hierarchy = cv2.findContours( thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = [cv2.approxPolyDP(cnt, 4, True) for cnt in contours0]
samples = np.empty((0,100))
responses = []
keys = [i for i in range(30,90)]
for cnt in contours:
tot = cv2.contourArea(cnt)
[x,y,w,h] = cv2.boundingRect(cnt)
if tot in range(1200,1250):
cv2.putText(im,"shape 3",(x+(w/2),y+(h/2)),0,1,(0,255,0))
key = cv2.waitKey(0)
I could then take the output, scale it as necessary, and list the x,y. This is however incredibly time consuming and may ultimately lose too much precision to be usable (pixels aren't floats).
There has to be someway of finding these shapes just by reading the dxf otherwise autocad couldn't render them and I would just have a point cloud.
So how exactly does it know so I can tell python what to look for to identify a distinct shape when reading a dxf as a text file?