I'm working on a project where i have to detect colored cars form video frames taken from Bird's eye
view.
For detection i used Histogram backprojection to obtain a binary image that suppose to contain only the target region of interest.
The process works fine until i tried to generalize the detection by testing it on video that contains object with similar color distribution ( like me crawling under the table and parts of my T-shirts are visible)
As you can see, both car and irrelevant objects are moving and the detection results are:
As you see irrelevant objects that share similar color distribution are shown in the binary images. However, thanks to Stack overflow experts i could improve the detection by telling the algorithm to choose the blob that represents the target object by adding the following constraints:
1-Rectangularity check
2-area and ratio check
with above constraints i could get rid of large irrelevant objects that are detected. However, for small object (see binary images), it doesn't work that much as the rectangularity for the target object (small red car) ranges between (0.72 and 1) and the small irrelevant objects do fall in this range. So i decided to add another constraint which calculating the distance between centroids of the car moving every 5 successive frames and threshold depending on that distance by doing the flowing:
import scipy.spatial.distance
from collections import deque
#defining the Centroid
centroids=deque(maxlen=40) #double ended queue containing the detected centroids
.
.
.
centroids.appendleft(center)
#center comes from detection process. e.g centroids=[(120,130), (125,132),...
Distance = scipy.spatial.distance.euclidean(pts1_red[0], pts1_red[5])
if D<=50:
#choose that blob
So tested that on different videos and turns out the the distance between the centroids ranges between 0 and 50 (0 when the car stops).
So my question is:
Is there a way i can invest this property so that it helps enhancing the detection in such a way so that the detection ignores the T-shirt?, Since when the car is no longer visible and the irrelevant object stays it will calculate the distance difference of the irrelevant object and this distance will get small till it is less than 50!
Thanks in Advance
Based on the information provided by the OP, here is an approach to solve this problem.
Sample Images
I have create a few sample images that roughly represent the objects moving across time. The center object represents the car that we are seeking and the other objects have been detected incorrectly by the classifier.
The first four images represent a car ( centre object )moving from left to right, along with two other objects that have been detected incorrectly. In the fifth image, the car has moved out of the frame but two incorrect detections are still present. The sixth frame consists of a new car entering into the frame with other incorrect detections.
Solution - Code
The comments contain information regarding the algorithm. We are computing the centroid of each blob and comparing it with the centroid of the previously detected/extracted blob.
import os
import cv2
import numpy as np
# Reading files and sorting them in the right order
all_files = os.listdir(".")
all_images = [file_name for file_name in all_files if file_name.endswith(".png")]
all_images.sort(key=lambda k: k.split(".")[0][-1])
print(all_images) #
# Initially, no centroid information is available.
previous_centroid_x = -1
previous_centroid_y = -1
DIST_THRESHOLD = 30
for i, image_name in enumerate(all_images):
rgb_image = cv2.imread(image_name)
height, width = rgb_image.shape[:2]
gray_image = cv2.cvtColor(rgb_image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray_image, 127, 255, 0)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
blankImage = np.zeros_like(rgb_image)
for cnt in contours:
# Refer to https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_contours/py_contour_features/py_contour_features.html#moments
M = cv2.moments(cnt)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
# Refer to https://www.pyimagesearch.com/2016/04/11/finding-extreme-points-in-contours-with-opencv/
# https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_contours/py_contour_properties/py_contour_properties.html#contour-properties
extLeft = tuple(cnt[cnt[:, :, 0].argmin()][0])
extRight = tuple(cnt[cnt[:, :, 0].argmax()][0])
extTop = tuple(cnt[cnt[:, :, 1].argmin()][0])
extBot = tuple(cnt[cnt[:, :, 1].argmax()][0])
color = (0, 0, 255)
if i == 0: # First frame - Assuming that you can find the correct blob accurately in the first frame
# Here, I am using a simple logic of checking if the blob is close to the centre of the image.
if abs(cY - (height / 2)) < DIST_THRESHOLD: # Check if the blob centre is close to the half the image's height
previous_centroid_x = cX # Update variables for finding the next blob correctly
previous_centroid_y = cY
DIST_THRESHOLD = (extBot[1] - extTop[1]) / 2 # Update centre distance error with half the height of the blob
color = (0, 255, 0)
else:
if abs(cY - previous_centroid_y) < DIST_THRESHOLD: # Compare with previous centroid y and see if it lies within Distance threshold
previous_centroid_x = cX
previous_centroid_y = cY
color = (0, 255, 0)
cv2.drawContours(blankImage, [cnt], 0, color, -1)
cv2.circle(blankImage, (cX, cY), 3, (255, 0, 0), -1)
cv2.imwrite("result_" + image_name, blankImage)
Updating the threshold enables the algorithm to track the object's centroid across the frames. Since the object can move up and down a little, we want to match the centroids of objects found in the current frame, to the centroid of the car found in the previous frame.
Solution - Results
Green - Selected blob
Red - Rejected blob
Object centres have also been marked for reference.
Note - This is not a perfect solution. It has several limitations, but it can help you to design an approximate solution for your problem.
This question already has answers here:
Python: Sorting items from top left to bottom right with OpenCV
(2 answers)
Closed 10 months ago.
I am trying to sort contours based on their arrivals, left-to-right and top-to-bottom just like how you write anything. From, top and left and then whichever comes accordingly.
This is what and how I have achieved up to now:
def get_contour_precedence(contour, cols):
tolerance_factor = 61
origin = cv2.boundingRect(contour)
return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]
image = cv2.imread("C:/Users/XXXX/PycharmProjects/OCR/raw_dataset/23.png", 0)
ret, thresh1 = cv2.threshold(image, 130, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
contours, h = cv2.findContours(thresh1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# perform edge detection, find contours in the edge map, and sort the
# resulting contours from left-to-right
contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))
# initialize the list of contour bounding boxes and associated
# characters that we'll be OCR'ing
chars = []
inc = 0
# loop over the contours
for c in contours:
inc += 1
# compute the bounding box of the contour
(x, y, w, h) = cv2.boundingRect(c)
label = str(inc)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(image, label, (x - 2, y - 2),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
print('x=', x)
print('y=', y)
print('x+w=', x + w)
print('y+h=', y + h)
crop_img = image[y + 2:y + h - 1, x + 2:x + w - 1]
name = os.path.join("bounding boxes", 'Image_%d.png' % (
inc))
cv2.imshow("cropped", crop_img)
print(name)
crop_img = Image.fromarray(crop_img)
crop_img.save(name)
cv2.waitKey(0)
cv2.imshow('mat', image)
cv2.waitKey(0)
Input Image :
Output Image 1:
Input Image 2 :
Output for Image 2:
Input Image 3:
Output Image 3:
As you can see the 1,2,3,4 is not what I was expecting it to be each image, as displayed in the Image Number 3.
How do I adjust this to make it work or even write a custom function?
NOTE: I have multiple images of the same input image provided in my question. The content is the same but they have variations in the text so the tolerance factor is not working for each one of them. Manually adjusting it would not be a good idea.
This is my take on the problem. I'll give you the general gist of it, and then my implementation in C++. The main idea is that I want to process the image from left to right, top to bottom. I'll process each blob (or contour) as I find it, however, I need a couple of intermediate steps for achieving a successful (an ordered) segmentation.
Vertical sort using rows
The first step is trying to sort the blobs by rows – this means that each row has a set of (unordered) horizontal blobs. That's ok. the first step is computing some kind of vertical sorting, and if we process each row from top to bottom, we will achieve just that.
After the blobs are (vertically) sorted by rows, then I can check out their centroids (or center of mass) and horizontally sort them. The idea is that I will process row per row and, for each row, I sort blob centroids. Let’s see an example of what I'm trying to achieve here.
This is your input image:
This is what I call the Row Mask:
This last image contains white areas that represent a "row" each. Each row has a number (e.g., Row1 , Row2, etc.) and each row holds a set of blobs (or characters, in this case). By processing each row, top from bottom, you are already sorting the blobs on the vertical axis.
If I number each row from top to bottom, I get this image:
The Row Mask is a way of creating "rows of blobs", and this mask can be computed morphologically. Check out the 2 images overlaid to give you a better view of the processing order:
What we are trying to do here is, first, a vertical ordering (blue arrow) and then we will take care of the horizontal (red arrow) ordering. You can see that by processing each row we can (possibly) overcome the sorting problem!
Horizontal sort using centroids
Let's see now how we can sort the blobs horizontally. If we create a simpler image, with a width equal to the input image and a height equal to the numbers of rows in our Row Mask, we can simply overlay every horizontal coordinate (x coordinate) of each blob centroid. Check out this example:
This is a Row Table. Each row represents the number of rows found in the Row Mask, and is also read from top to bottom. The width of the table is the same as the width of your input image, and corresponds spatially to the horizontal axis. Each square is a pixel in your input image, mapped to the Row Table using only the horizontal coordinate (as our simplification of rows is pretty straightforward). The actual value of each pixel in the row table is a label, labeling each of the blobs on your input image. Note that the labels are not ordered!
So, for instance, this table shows that, in the row 1 (you already know what is row 1 – it's the first white area on the Row Mask) in the position (1,4) there’s the blob number 3. In position (1,6) there's blob number 2, and so on. What's cool (I think) about this table is that you can loop through it, and for every value different of 0, horizontal ordering becomes very trivial. This is the row table ordered, now, left to right:
Mapping blob information with centroids
We are going to use blobs centroids to map the information between our two representations (Row Mask/Row Table). Suppose you already have both "helper" images and you process each blob (or contour) on the input image at a time. For example, you have this as a start:
Alright, there's a blob here. How can we map it to the Row Mask and to the Row Table? Using its centroids. If we compute the centroid (shown in the figure as the green dot) we can construct a dictionary of centroids and labels. For example, for this blob, the centroid is located at (271,193). Ok, let’s assign the label = 1. So we now have this dictionary:
Now, we find the row in which this blob is placed using the same centroid on the Row Mask. Something like this:
rowNumber = rowMask.at( 271,193 )
This operation should return rownNumber = 3. Nice! We know in which row our blob is placed on, and so, it is now vertically ordered. Now, let's store its horizontal coordinate in the Row Table:
rowTable.at( 271, 193 ) = 1
Now, rowTable holds (in its row and column) the label of the processed blob. The Row Table should look something like this:
The table is a lot wider, because its horizontal dimension has to be the same as your input image. In this image, the label 1 is placed in Column 271, Row 3. If this was the only blob on your image, the blobs would be already sorted. But what happens if you add another blob in, say, Column 2, Row 1? That's why you need to traverse, again, this table after you have processed all the blobs – to properly correct their label.
Implementation in C++
Alright, hopefully the algorithm should be a little bit clear (if not, just ask, my man). I'll try to implement these ideas in OpenCV using C++. First, I need a binary image of your input. Computation is trivial using Otsu’s thresholding method:
//Read the input image:
std::string imageName = "C://opencvImages//yFX3M.png";
cv::Mat testImage = cv::imread( imageName );
//Compute grayscale image
cv::Mat grayImage;
cv::cvtColor( testImage, grayImage, cv::COLOR_RGB2GRAY );
//Get binary image via Otsu:
cv::Mat binImage;
cv::threshold( grayImage, binImage, 0, 255, cv::THRESH_OTSU );
//Invert image:
binImage = 255 - binImage;
This is the resulting binary image, nothing fancy, just what we need to start working:
The first step is to get the Row Mask. This can be achieved using morphology. Just apply a dilation + erosion with a VERY big horizontal structuring element. The idea is you want to turn those blobs into rectangles, "fusing" them together horizontally:
//Create a hard copy of the binary mask:
cv::Mat rowMask = binImage.clone();
//horizontal dilation + erosion:
int horizontalSize = 100; // a very big horizontal structuring element
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(horizontalSize,1) );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_DILATE, SE, cv::Point(-1,-1), 2 );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_ERODE, SE, cv::Point(-1,-1), 1 );
This results in the following Row Mask:
That's very cool, now that we have our Row Mask, we must number them rows, ok? There's a lot of ways of doing this, but right now I'm interested in the simpler one: loop through this image and get every single pixel. If a pixel is white, use a Flood Fill operation to label that portion of the image as a unique blob (or row, in this case). This can be done as follows:
//Label the row mask:
int rowCount = 0; //This will count our rows
//Loop thru the mask:
for( int y = 0; y < rowMask.rows; y++ ){
for( int x = 0; x < rowMask.cols; x++ ){
//Get the current pixel:
uchar currentPixel = rowMask.at<uchar>( y, x );
//If the pixel is white, this is an unlabeled blob:
if ( currentPixel == 255 ) {
//Create new label (different from zero):
rowCount++;
//Flood fill on this point:
cv::floodFill( rowMask, cv::Point( x, y ), rowCount, (cv::Rect*)0, cv::Scalar(), 0 );
}
}
}
This process will label all the rows from 1 to r. That's what we wanted. If you check out the image you'll faintly see the rows, that's because our labels correspond to very low intensity values of grayscale pixels.
Ok, now let's prepare the Row Table. This "table" really is just another image, remember: same width as the input and height as the number of rows you counted on the Row Mask:
//create rows image:
cv::Mat rowTable = cv::Mat::zeros( cv::Size(binImage.cols, rowCount), CV_8UC1 );
//Just for convenience:
rowTable = 255 - rowTable;
Here, I just inverted the final image for convenience. Because I want to actually see how the table is populated with (very low intensity) pixels and be sure that everything is working as intended.
Now comes the fun part. We have both images (or data containers) prepared. We need to process each blob independently. The idea is that you have to extract each blob/contour/character from the binary image and compute its centroid and assign a new label. Again, there's a lot of way of doing this. Here, I'm using the following approach:
I'll loop through the binary mask. I'll get the current biggest blob from this binary input. I'll compute its centroid and store its data in every container needed, and then, I'll delete that blob from the mask. I'll repeat the process until no more blobs are left. This is my way of doing this, especially because I've functions I already wrote for that. This is the approach:
//Prepare a couple of dictionaries for data storing:
std::map< int, cv::Point > blobMap; //holds label, gives centroid
std::map< int, cv::Rect > boundingBoxMap; //holds label, gives bounding box
First, two dictionaries. One receives a blob label and returns the centroid. The other one receives the same label and returns the bounding box.
//Extract each individual blob:
cv::Mat bobFilterInput = binImage.clone();
//The new blob label:
int blobLabel = 0;
//Some control variables:
bool extractBlobs = true; //Controls loop
int currentBlob = 0; //Counter of blobs
while ( extractBlobs ){
//Get the biggest blob:
cv::Mat biggestBlob = findBiggestBlob( bobFilterInput );
//Compute the centroid/center of mass:
cv::Moments momentStructure = cv::moments( biggestBlob, true );
float cx = momentStructure.m10 / momentStructure.m00;
float cy = momentStructure.m01 / momentStructure.m00;
//Centroid point:
cv::Point blobCentroid;
blobCentroid.x = cx;
blobCentroid.y = cy;
//Compute bounding box:
boundingBox boxData;
computeBoundingBox( biggestBlob, boxData );
//Convert boundingBox data into opencv rect data:
cv::Rect cropBox = boundingBox2Rect( boxData );
//Label blob:
blobLabel++;
blobMap.emplace( blobLabel, blobCentroid );
boundingBoxMap.emplace( blobLabel, cropBox );
//Get the row for this centroid
int blobRow = rowMask.at<uchar>( cy, cx );
blobRow--;
//Place centroid on rowed image:
rowTable.at<uchar>( blobRow, cx ) = blobLabel;
//Resume blob flow control:
cv::Mat blobDifference = bobFilterInput - biggestBlob;
//How many pixels are left on the new mask?
int pixelsLeft = cv::countNonZero( blobDifference );
bobFilterInput = blobDifference;
//Done extracting blobs?
if ( pixelsLeft <= 0 ){
extractBlobs = false;
}
//Increment blob counter:
currentBlob++;
}
Check out a nice animation of how this processing goes through each blob, processes it and deletes it until there’s nothing left:
Now, some notes with the above snippet. I've some helper functions: biggestBlob and computeBoundingBox. These functions compute the biggest blob in a binary image and convert a custom structure of a bounding box into OpenCV’s Rect structure respectively. Those are the operations those functions carry out.
The "meat" of the snippet is this: Once you have an isolated blob, compute its centroid (I actually compute the center of mass via central moments). Generate a new label. Store this label and centroid in a dictionary, in my case, the blobMap dictionary. Additionally compute the bounding box and store it in another dictionary, boundingBoxMap:
//Label blob:
blobLabel++;
blobMap.emplace( blobLabel, blobCentroid );
boundingBoxMap.emplace( blobLabel, cropBox );
Now, using the centroid data, fetch the corresponding row of that blob. Once you get the row, store this number into your row table:
//Get the row for this centroid
int blobRow = rowMask.at<uchar>( cy, cx );
blobRow--;
//Place centroid on rowed image:
rowTable.at<uchar>( blobRow, cx ) = blobLabel;
Excellent. At this point you have the Row Table ready. Let’s loop through it and actually, and finally, order those damn blobs:
int blobCounter = 1; //The ORDERED label, starting at 1
for( int y = 0; y < rowTable.rows; y++ ){
for( int x = 0; x < rowTable.cols; x++ ){
//Get current label:
uchar currentLabel = rowTable.at<uchar>( y, x );
//Is it a valid label?
if ( currentLabel != 255 ){
//Get the bounding box for this label:
cv::Rect currentBoundingBox = boundingBoxMap[ currentLabel ];
cv::rectangle( testImage, currentBoundingBox, cv::Scalar(0,255,0), 2, 8, 0 );
//The blob counter to string:
std::string counterString = std::to_string( blobCounter );
cv::putText( testImage, counterString, cv::Point( currentBoundingBox.x, currentBoundingBox.y-1 ),
cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255,0,0), 1, cv::LINE_8, false );
blobCounter++; //Increment the blob/label
}
}
}
Nothing fancy, just a regular nested for loop, looping through each pixel on the row table. If the pixel is different from white, use the label to retrieve both the centroid and bounding box, and just change the label to an increasing number. For result displaying I just draw the bounding boxes and the new label on the original image.
Check out the ordered processing in this animation:
Very cool, here's a bonus animation, the Row Table getting populated with horizontal coordinates:
I would even say use hue moments which tends to be a better estimation for the center point of a polygon
than the "normal" coordinate center point of the rectangle, so the function could be:
def get_contour_precedence(contour, cols):
tolerance_factor = 61
M = cv2.moments(contour)
# calculate x,y coordinate of centroid
if M["m00"] != 0:
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
else:
# set values as what you need in the situation
cX, cY = 0, 0
return ((cY // tolerance_factor) * tolerance_factor) * cols + cX
an super math. explanation what hue moments are, could you find here
Maybe you should think about get rid of this tolerance_factor
by using in general a clustering algorithm like
kmeans to cluster your center to rows and columns.
OpenCv has a an kmeans implementation which you could find here
I do not exactly know what your goal is, but another idea could be to split every line into an Region of Interest (ROI)
for further processing, afterwards you could easily count the letters
by the X-Values of the each contour and the line number
import cv2
import numpy as np
## (1) read
img = cv2.imread("yFX3M.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## (2) threshold
th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)
## (3) minAreaRect on the nozeros
pts = cv2.findNonZero(threshed)
ret = cv2.minAreaRect(pts)
(cx,cy), (w,h), ang = ret
if w>h:
w,h = h,w
## (4) Find rotated matrix, do rotation
M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0)
rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0]))
## (5) find and draw the upper and lower boundary of each lines
hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1)
th = 2
H,W = img.shape[:2]
# (6) using histogramm with threshold
uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th]
lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th]
rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR)
for y in uppers:
cv2.line(rotated, (0,y), (W, y), (255,0,0), 1)
for y in lowers:
cv2.line(rotated, (0,y), (W, y), (0,255,0), 1)
cv2.imshow('pic', rotated)
# (7) we iterate all rois and count
for i in range(len(uppers)) :
print('line=',i)
roi = rotated[uppers[i]:lowers[i],0:W]
cv2.imshow('line', roi)
cv2.waitKey(0)
# here again calc thres and contours
I found an old post with this code here
Instead of taking the upper left corner of the contour, I'd rather use the centroid or at least the bounding box center.
def get_contour_precedence(contour, cols):
tolerance_factor = 4
origin = cv2.boundingRect(contour)
return (((origin[1] + origin[3])/2 // tolerance_factor) * tolerance_factor) * cols + (origin[0] + origin[2]) / 2
But it might be hard to find a tolerance value that works in all cases.
Here is one way in Python/OpenCV by processing by rows first then characters.
Read the input
Convert to grayscale
Threshold and invert
Use a long horizontal kernels and apply morphology close to form rows
Get the contours of the rows and their bounding boxes
Save the row boxes and sort on Y
Loop over each sorted row box and extract the row from the thresholded image
Get the contours of each character in the row and save the the bounding boxes of the characters.
Sort the contours for a given row on X
Draw the bounding boxes on the input and the index number as text on the image
Increment the index
Save the results
Input:
import cv2
import numpy as np
# read input image
img = cv2.imread('vision78.png')
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# otsu threshold
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
thresh = 255 - thresh
# apply morphology close to form rows
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (51,1))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# find contours and bounding boxes of rows
rows_img = img.copy()
boxes_img = img.copy()
rowboxes = []
rowcontours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rowcontours = rowcontours[0] if len(rowcontours) == 2 else rowcontours[1]
index = 1
for rowcntr in rowcontours:
xr,yr,wr,hr = cv2.boundingRect(rowcntr)
cv2.rectangle(rows_img, (xr, yr), (xr+wr, yr+hr), (0, 0, 255), 1)
rowboxes.append((xr,yr,wr,hr))
# sort rowboxes on y coordinate
def takeSecond(elem):
return elem[1]
rowboxes.sort(key=takeSecond)
# loop over each row
for rowbox in rowboxes:
# crop the image for a given row
xr = rowbox[0]
yr = rowbox[1]
wr = rowbox[2]
hr = rowbox[3]
row = thresh[yr:yr+hr, xr:xr+wr]
bboxes = []
# find contours of each character in the row
contours = cv2.findContours(row, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
for cntr in contours:
x,y,w,h = cv2.boundingRect(cntr)
bboxes.append((x+xr,y+yr,w,h))
# sort bboxes on x coordinate
def takeFirst(elem):
return elem[0]
bboxes.sort(key=takeFirst)
# draw sorted boxes
for box in bboxes:
xb = box[0]
yb = box[1]
wb = box[2]
hb = box[3]
cv2.rectangle(boxes_img, (xb, yb), (xb+wb, yb+hb), (0, 0, 255), 1)
cv2.putText(boxes_img, str(index), (xb,yb), cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.75, (0,255,0), 1)
index = index + 1
# save result
cv2.imwrite("vision78_thresh.jpg", thresh)
cv2.imwrite("vision78_morph.jpg", morph)
cv2.imwrite("vision78_rows.jpg", rows_img)
cv2.imwrite("vision78_boxes.jpg", boxes_img)
# show images
cv2.imshow("thresh", thresh)
cv2.imshow("morph", morph)
cv2.imshow("rows_img", rows_img)
cv2.imshow("boxes_img", boxes_img)
cv2.waitKey(0)
Threshold image:
Morphology image of rows:
Row contours image:
Character contours image:
I have a dataset of x-ray images that i am trying to clean by rotating the images so the arm is vertical and cropping the image of any excess space. Here are some examples from the dataset:
I am currently working out the best way to work out the angle of the x-ray and rotate the image based on that.
My curent approach is to detect the line of the side of the rectangle that the scan is in using the hough transform, and rotate the image based on that.
I tried to run the hough transform on the output of a canny edge detector but this doesnt work so well for images where the edge of the rectangle is blurred like in the first image.
I cant use cv's box detection as sometimes the rectangle around the scan has an edge off screen.
So i currently use adaptive thresholding to find the edge of the box and then median filter it and try to find the longest line in this, but sometimes the wrong line is the longest and the image gets rotated completley wrong.
Adaptive thresholding is used due to the fact that soem scans have different brightnesses.
The current implementation i have is:
def get_lines(img):
#threshold
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 15, 4.75)
median = cv2.medianBlur(thresh, 3)
# detect lines
lines = cv2.HoughLines(median, 1, np.pi/180, 175)
return sorted(lines, key=lambda x: x[0][0], reverse=True)
def rotate(image, angle):
(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
return cv2.warpAffine(image, M, (nW, nH))
def fix_rotation(input):
lines = get_lines(input)
rho, theta = lines[0][0]
return rotate_bound(input, theta*180/np.pi)
and produces the following results:
When it goes wrong:
I was wondering if there are any better techniques to usein order to improve the performance of this and what the best way to go about cropping the images after they have been rotated would be?
The idea is to use the blob of the arm itself and fit an ellipse around it. Then, extract its major axis. I quickly tested the idea in Matlab – not OpenCV. Here's what I did, you should be able to use OpenCV's equivalent functions to achieve similar outputs.
First, compute the threshold value of your input via Otsu. Then add some bias to the threshold value to find a better segmentation and use this value to threshold the image.
In pseudo-code:
//the bias value
threshBias = 0.4;
//get the binary threshold via otsu:
thresholdLevel = graythresh( grayInput, “otsu” );
//add bias to the original value
thresholdLevel = thresholdLevel - threshSensitivity * thresholdLevel;
//get the fixed binary image:
thresholdLevel = imbinarize( grayInput, thresholdLevel );
After small blob filtering, this is the output:
Now, get the contours/blobs and fit an ellipse for each contour. Check out the OpenCV example here: https://docs.opencv.org/3.4.9/de/d62/tutorial_bounding_rotated_ellipses.html
You end up with two ellipses:
We are looking for the biggest ellipse, the one with the biggest area and the biggest major and minor axis. I used the width and height of each ellipse to filter the results. The target ellipse is then colored in green. Finally, I get the major axis of the target ellipse, here colored in yellow:
Now, to implement these ideas in OpenCV you have these options:
Use fitEllipse to find the ellipses. The return value of this
function is a RotatedRect object. The data stored here are the
vertices of the ellipse.
Instead of fitting an ellipse you could try using minAreaRect, which
finds a rotated rectangle of the minimum area enclosing a blob.
You can use image moments to calculate the rotation angle.
Using opencv moments function, calculate the second order central moments to construct a covariance matrix and then obtain the orientation as shown here in the Image moment wiki page.
Obtain the normalized central moments nu20, nu11 and nu02 from opencv moments. Then the orientation is calculated as
0.5 * arctan(2 * nu11/(nu20 - nu02))
Please refer the given link for details.
You can use the raw image itself or the preprocessed one for the calculation of orientation. See which one gives you better accuracy and use it.
As for the bounding-box, once you rotate the image, assuming you used the preprocessed one, get all the non-zero pixel coordinates of the rotated image and calculate their upright bounding-box using opencv boundingRect.
I've been trying to blend two images. The current approach I'm taking is, I obtain the coordinates of the overlapping region of the two images, and only for the overlapping regions, I blend with a hardcoded alpha of 0.5, before adding it. SO basically I'm just taking half the value of each pixel from overlapping regions of both the images, and adding them. That doesn't give me a perfect blend because the alpha value is hardcoded to 0.5. Here's the result of blending of 3 images:
As you can see, the transition from one image to another is still visible. How do I obtain the perfect alpha value that would eliminate this visible transition? Or is there no such thing, and I'm taking a wrong approach?
Here's how I'm currently doing the blending:
for i in range(3):
base_img_warp[overlap_coords[0], overlap_coords[1], i] = base_img_warp[overlap_coords[0], overlap_coords[1],i]*0.5
next_img_warp[overlap_coords[0], overlap_coords[1], i] = next_img_warp[overlap_coords[0], overlap_coords[1],i]*0.5
final_img = cv2.add(base_img_warp, next_img_warp)
If anyone would like to give it a shot, here are two warped images, and the mask of their overlapping region: http://imgur.com/a/9pOsQ
Here is the way I would do it in general:
int main(int argc, char* argv[])
{
cv::Mat input1 = cv::imread("C:/StackOverflow/Input/pano1.jpg");
cv::Mat input2 = cv::imread("C:/StackOverflow/Input/pano2.jpg");
// compute the vignetting masks. This is much easier before warping, but I will try...
// it can be precomputed, if the size and position of your ROI in the image doesnt change and can be precomputed and aligned, if you can determine the ROI for every image
// the compression artifacts make it a little bit worse here, I try to extract all the non-black regions in the images.
cv::Mat mask1;
cv::inRange(input1, cv::Vec3b(10, 10, 10), cv::Vec3b(255, 255, 255), mask1);
cv::Mat mask2;
cv::inRange(input2, cv::Vec3b(10, 10, 10), cv::Vec3b(255, 255, 255), mask2);
// now compute the distance from the ROI border:
cv::Mat dt1;
cv::distanceTransform(mask1, dt1, CV_DIST_L1, 3);
cv::Mat dt2;
cv::distanceTransform(mask2, dt2, CV_DIST_L1, 3);
// now you can use the distance values for blending directly. If the distance value is smaller this means that the value is worse (your vignetting becomes worse at the image border)
cv::Mat mosaic = cv::Mat(input1.size(), input1.type(), cv::Scalar(0, 0, 0));
for (int j = 0; j < mosaic.rows; ++j)
for (int i = 0; i < mosaic.cols; ++i)
{
float a = dt1.at<float>(j, i);
float b = dt2.at<float>(j, i);
float alpha = a / (a + b); // distances are not between 0 and 1 but this value is. The "better" a is, compared to b, the higher is alpha.
// actual blending: alpha*A + beta*B
mosaic.at<cv::Vec3b>(j, i) = alpha*input1.at<cv::Vec3b>(j, i) + (1 - alpha)* input2.at<cv::Vec3b>(j, i);
}
cv::imshow("mosaic", mosaic);
cv::waitKey(0);
return 0;
}
Basically you compute the distance from your ROI border to the center of your objects and compute the alpha from both blending mask values. So if one image has a high distance from the border and other one a low distance from border, you prefer the pixel that is closer to the image center. It would be better to normalize those values for cases where the warped images aren't of similar size.
But even better and more efficient is to precompute the blending masks and warp them. Best would be to know the vignetting of your optical system and choose and identical blending mask (typically lower values of the border).
From the previous code you'll get these results:
ROI masks:
Blending masks (just as an impression, must be float matrices instead):
image mosaic:
There are 2 obvious problems with your images:
Border area has distorted lighting conditions
That is most likely caused by the optics used to acquire images. So to remedy that you should use only inside part of the images (cut off few pixels from border.
So when cut off 20 pixels from the border and blending to common illumination I got this:
As you can see the ugly border seam is away now only the illumination problems persists (see bullet #2).
Images are taken at different lighting conditions
Here the subsurface scattering effects hits in making the images "not-compatible". You should normalize them to some uniform illumination or post process the blended result line by line and when coherent bump detected multiply the rest of line so the bump will be diminished.
So the rest of the line should be multiplied by constant i0/i1. These kind if bumps can occur only on the edges between overlap values so you can either scan for them or use those positions directly ... To recognize valid bump it should have neighbors nearby in previous and next lines along the whole image height.
You can do this also in y axis direction in the same way ...