Related
Currently, I'm working on image stitching of aerial footage. I'm using the dataset, get from OrchardDataset. First of all, thanks to some great answers on stackoverflow, especially the answer from #alkasm (Here and Here). But I having an issue, as you can see below at Gap within the stitched image section.
I used the H21, H31, H41, etc to wrap the images. The stitched image using H21 is excellent, but when wrap the img3 to current stitched image using H31, result shown terrible alignment between img3 and current stitched image. As the more images I wrap, the gap gets bigger and the images totally not well aligned.
Does the brillant stackoverflow community have an ideas on how can I solve this problem?
These are the steps I use to stitch the images:
Extract the frame every second from the footage and undistort the image to get rid of fish-eye effect using the provided camera calibration matrix.
Compute the SIFT feature descriptors. Set up macther using FLANN kd-tree and find matches between the images. Find the Homography (H21, H32, H43 and etc, where H21 refer to the homography which warps imag2 into coordinates of img1)
Compose the homography with the previous homographies to get net homography using the method suggested in Here. (Compute H31, H41, H51, etc)
Wrap the images using the answer provided in Here.
Gap within the stitched image:
I'm using the first 10 images get from OrchardDataSet.
Stitched Image with Gaps
Here's portion of my script:
main.py
ref_img is the first frame (img1). AdjHomoSet contain the images to be wraped (img2, img3, img4, etc). AccHomoSet contain the net homography (H31, H41, H51, etc)
temp_mosaic = ref_img
h, w = temp_mosaic.shape[:2]
# Wrap the Images
for x in range(1, (len(AccHomoSet)+1)):
query_img = AdjHomoSet['H%d%d'%(x+1,(x))][1]
M_homo = AccHomoSet['H%d1'%(x+1)]
M_homo_inv = np.linalg.inv(M_homo)
(shifted_transf, dst_padded) = warpPerspectivePadded(query_img,
temp_mosaic,
M_homo_inv)
dst_pad_h, dst_pad_w = dst_padded.shape[:2]
next_img_warp = cv2.warpPerspective(query_img, shifted_transf,
(dst_pad_w, dst_pad_h),
flags=cv2.INTER_NEAREST)
# Put the base image on an enlarged palette
enlarged_base_img = np.zeros((dst_pad_h, dst_pad_w, 3),
np.uint8)
# Create masked composite
(ret,data_map) = cv2.threshold(cv2.cvtColor(next_img_warp,
cv2.COLOR_BGR2GRAY),
0, 255, cv2.THRESH_BINARY)
# add base image
enlarged_base_img = cv2.add(enlarged_base_img, dst_padded,
mask=np.bitwise_not(data_map),
dtype=cv2.CV_8U)
final_img = cv2.add(enlarged_base_img, next_img_warp,
dtype=cv2.CV_8U)
temp_mosaic = final_img
warpPerspectivePadded.py
def warpPerspectivePadded(image, temp_mosaic, homography):
src_h, src_w = image.shape[:2]
lin_homg_pts = np.array([[0, src_w, src_w, 0],
[0, 0, src_h, src_h],
[1, 1, 1, 1]])
trans_lin_homg_pts = homography.dot(lin_homg_pts)
trans_lin_homg_pts /= trans_lin_homg_pts[2,:]
minX = np.floor(np.min(trans_lin_homg_pts[0])).astype(int)
minY = np.floor(np.min(trans_lin_homg_pts[1])).astype(int)
maxX = np.ceil(np.max(trans_lin_homg_pts[0])).astype(int)
maxY = np.ceil(np.max(trans_lin_homg_pts[1])).astype(int)
# add translation to the transformation matrix to shift to positive values
anchorX, anchorY = 0, 0
transl_transf = np.eye(3,3)
if minX < 0:
anchorX = -minX
transl_transf[0,2] += anchorX
if minY < 0:
anchorY = -minY
transl_transf[1,2] += anchorY
shifted_transf = transl_transf.dot(homography)
shifted_transf /= shifted_transf[2,2]
# create padded destination image
temp_mosaic_h, temp_mosaic_w = temp_mosaic.shape[:2]
pad_widths = [anchorY, max(maxY, temp_mosaic_h) - temp_mosaic_h,
anchorX, max(maxX, temp_mosaic_w) - temp_mosaic_w]
dst_padded = cv2.copyMakeBorder(temp_mosaic, pad_widths[0],
pad_widths[1],pad_widths[2],
pad_widths[3],
cv2.BORDER_CONSTANT)
return (shifted_transf, dst_padded)
Updates:
Well, here's my code for image stitching. However, this solution is not perfect but hope it would be helpful to someone else. This solution is good enough for generating a panaroma view, SIFT+FLANN did the best to the dataset, Stitched image of the dataset with Straightline flight pattern. The interframes alignment is terribly shifted and visible skewness is obtained when stitching the dataset with lawnmower flight pattern, Stitched image of the dataset with lawnmower flight pattern and this solution absolutely not an ideal solution for orthomosaic.
imageStitcher.py
import cv2
import numpy as np
import glob
import os
import time
#import math
from colorama import Style, Back
import xlsxwriter as xls
"""
Important Parameter
-------------------
detector_type (string): type of determine, "sift" or "orb"
Defaults to "sift".
matcher_type (string): type of determine, "flann" or "bf"
Defaults to "flann".
resize_ratio (int) = number needed to decrease the input images size
output_height_times (int): determines the output height based on input image height.
Defaults to 2.
output_width_times (int): determines the output width based on input image width.
Defaults to 4.
"""
detector_type = "sift"
matcher_type = "flann"
resize_ratio = 3
output_height_times = 20
output_width_times = 15
gms = False
visualize = True
image_dir = "image/Input"
key_frame = "image/Input/frame1.jpg"
output_dir = "image/Input"
class ImageStitching:
def __init__(self, first_image,
output_height_times = output_height_times,
output_width_times = output_width_times,
detector_type = detector_type,
matcher_type = matcher_type):
"""This class processes every frame and generates the panorama
Args:
first_image (image for the first frame): first image to initialize the output size
output_height_times (int, optional): determines the output height based on input image height. Defaults to 2.
output_width_times (int, optional): determines the output width based on input image width. Defaults to 4.
detector_type (str, optional): the detector for feature detection. It can be "sift" or "orb". Defaults to "sift".
"""
self.detector_type = detector_type
self.matcher_type = matcher_type
if detector_type == "sift":
# SIFT feature detector
self.detector = cv2.xfeatures2d.SIFT_create(nOctaveLayers = 3,
contrastThreshold = 0.04,
edgeThreshold = 10,
sigma = 1.6)
if matcher_type == "flann":
# FLANN: the randomized kd trees algorithm
FLANN_INDEX_KDTREE = 1
flann_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict (checks=200)
self.matcher = cv2.FlannBasedMatcher(flann_params,search_params)
else:
# Brute-Force matcher
self.matcher = cv2.BFMatcher()
elif detector_type == "orb":
# ORB feature detector
self.detector = cv2.ORB_create()
self.detector.setFastThreshold(0)
if matcher_type == "flann":
FLANN_INDEX_LSH = 6
flann_params= dict(algorithm = FLANN_INDEX_LSH,
table_number = 6, # 12
key_size = 12, # 20
multi_probe_level = 1) #2
search_params = dict (checks=200)
self.matcher = cv2.FlannBasedMatcher(flann_params,search_params)
else:
# Brute-Force-Hamming matcher
self.matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
self.record = []
self.visualize = visualize
self.output_img = np.zeros(shape=(int(output_height_times * first_image.shape[0]),
int(output_width_times*first_image.shape[1]),
first_image.shape[2]))
self.process_first_frame(first_image)
# output image offset
self.w_offset = int(self.output_img.shape[0]/2 - first_image.shape[0]/2)
self.h_offset = int(self.output_img.shape[1]/2 - first_image.shape[1]/2)
self.output_img[self.w_offset:self.w_offset+first_image.shape[0],
self.h_offset:self.h_offset+first_image.shape[1], :] = first_image
a = self.output_img
heightM, widthM = a.shape[:2]
a = cv2.resize(a, (int(widthM / 4),
int(heightM / 4)),
interpolation=cv2.INTER_AREA)
# cv2.imshow('output', a)
self.H_old = np.eye(3)
self.H_old[0, 2] = self.h_offset
self.H_old[1, 2] = self.w_offset
def process_first_frame(self, first_image):
"""processes the first frame for feature detection and description
Args:
first_image (cv2 image/np array): first image for feature detection
"""
self.base_frame_rgb = first_image
base_frame_gray = cv2.cvtColor(first_image, cv2.COLOR_BGR2GRAY)
base_frame = cv2.GaussianBlur(base_frame_gray, (5,5), 0)
self.base_features, self.base_desc = self.detector.detectAndCompute(base_frame, None)
def process_adj_frame(self, next_frame_rgb):
"""gets an image and processes that image for mosaicing
Args:
next_frame_rgb (np array): input of current frame for the mosaicing
"""
self.next_frame_rgb = next_frame_rgb
next_frame_gray = cv2.cvtColor(next_frame_rgb, cv2.COLOR_BGR2GRAY)
next_frame = cv2.GaussianBlur(next_frame_gray, (5,5), 0)
self.next_features, self.next_desc = self.detector.detectAndCompute(next_frame, None)
self.matchingNhomography(self.next_desc, self.base_desc)
if len(self.matches) < 4:
return
print ("\n")
self.warp(self.next_frame_rgb, self.H)
# For record purpose: save into csv file later
self.record.append([len(self.base_features), len(self.next_features),
self.no_match_lr, self.no_GMSmatches, self.inlier, self.inlierRatio, self.reproError])
# loop preparation
self.H_old = self.H
self.base_features = self.next_features
self.base_desc = self.next_desc
self.base_frame_rgb = self.next_frame_rgb
def matchingNhomography(self, next_desc, base_desc):
"""matches the descriptors
Args:
next_desc (np array): current frame descriptor
base_desc (np array): previous frame descriptor
Returns:
array: and array of matches between descriptors
"""
# matching
if self.detector_type == "sift":
pair_matches = self.matcher.knnMatch(next_desc, trainDescriptors = base_desc,
k = 2)
"""
Store all the good matches as per Lowe's ratio test'
The Lowe's ratio is refer to the journal "Distinctive
Image Features from Scale-Invariant Keypoints" by
David G. Lowe.
"""
lowe_ratio = 0.8
matches = []
for m, n in pair_matches:
if m.distance < n.distance * lowe_ratio:
matches.append(m)
self.no_match_lr = len(matches)
# Rate of matches (Lowe's ratio test)
rate = float(len(matches) / ((len(self.base_features) + len(self.next_features))/2))
print (f"Rate of matches (Lowe's ratio test): {Back.RED}%f{Style.RESET_ALL}" % rate)
elif self.detector_type == "orb":
if self.matcher_type == "flann":
matches = self.matcher.match(next_desc, base_desc)
'''
lowe_ratio = 0.8
matches = []
for m, n in pair_matches:
if m.distance < n.distance * lowe_ratio:
matches.append(m)
'''
self.no_match_lr = len(matches)
# Rate of matches (Lowe's ratio test)
rate = float(len(matches) / (len(base_desc) + len(next_desc)))
print (f"Rate of matches (Lowe's ratio test): {Back.RED}%f{Style.RESET_ALL}" % rate)
else:
pair_matches = self.matcher.match(next_desc, base_desc)
# Rate of matches (before Lowe's ratio test)
self.no_match_lr = len(pair_matches)
rate = float(len(pair_matches) / (len(base_desc) + len(next_desc)))
print (f"Rate of matches: {Back.RED}%f{Style.RESET_ALL}" % rate)
# Sort them in the order of their distance.
matches = sorted(matches, key=lambda x: x.distance)
# OPTIONAL: used to remove the unmatch pair match
matches = cv2.xfeatures2d.matchGMS(self.next_frame_rgb.shape[:2],
self.base_frame_rgb.shape[:2],
self.next_features,
self.base_features, matches,
withScale = False, withRotation = False,
thresholdFactor = 6.0) if gms else matches
self.no_GMSmatches = len(matches) if gms else 0
# Rate of matches (GMS)
rate = float(self.no_GMSmatches / (len(base_desc) + len(next_desc)))
print (f"Rate of matches (GMS): {Back.CYAN}%f{Style.RESET_ALL}" % rate)
# OPTIONAL: Obtain the maximum of 20 best matches
# matches = matches[:min(len(matches), 20)]
# Visualize the matches.
if self.visualize:
match_img = cv2.drawMatches(self.next_frame_rgb, self.next_features, self.base_frame_rgb,
self.base_features, matches, None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
cv2.imshow('matches', match_img)
self.H, self.status, self.reproError = self.findHomography(self.next_features, self.base_features, matches)
print ('inlier/matched = %d / %d' % (np.sum(self.status), len(self.status)))
self.inlier = np.sum(self.status)
self.inlierRatio = float(np.sum(self.status)) / float(len(self.status))
print ('inlierRatio = ', self.inlierRatio)
# len(status) - np.sum(status) = number of detected outliers
'''
TODO -
To minimize or get rid of cumulative homography error is use block bundle adjustnment
Suggested from "Multi View Image Stitching of Planar Surfaces on Mobile Devices"
Using 3-dimentional multiplication to find cumulative homography is very sensitive
to homography error.
'''
# 3-dimensional multiplication to find cumulative homography to the reference keyframe
self.H = np.matmul(self.H_old, self.H)
self.H = self.H/self.H[2,2]
self.matches = matches
return matches
# staticmethod
def findHomography(base_features, next_features, matches):
"""gets two matches and calculate the homography between two images
Args:
base_features (np array): keypoints of image 1
next_features (np_array): keypoints of image 2
matches (np array): matches between keypoints in image 1 and image 2
Returns:
np arrat of shape [3,3]: Homography matrix
"""
kp1 = []
kp2 = []
for match in matches:
kp1.append(base_features[match.queryIdx])
kp2.append(next_features[match.trainIdx])
p1_array = np.array([k.pt for k in kp1])
p2_array = np.array([k.pt for k in kp2])
homography, status = cv2.findHomography(p1_array, p2_array, method = cv2.RANSAC,
ransacReprojThreshold = 5.0,
mask = None,
maxIters = 2000,
confidence = 0.995)
#### Finding the euclidean distance error ####
list1 = np.array(p2_array)
list2 = np.array(p1_array)
list2 = np.reshape(list2, (len(list2), 2))
ones = np.ones(len(list1))
TestPoints = np.transpose(np.reshape(list1, (len(list1), 2)))
print ("Length:", np.shape(TestPoints), np.shape(ones))
TestPointsHom = np.vstack((TestPoints, ones))
print ("Homogenous Points:", np.shape(TestPointsHom))
projectedPointsH = np.matmul(homography, TestPointsHom) # projecting the points in test image to collage image using homography matrix
projectedPointsNH = np.transpose(np.array([np.true_divide(projectedPointsH[0,:], projectedPointsH[2,:]), np.true_divide(projectedPointsH[1,:], projectedPointsH[2,:])]))
print ("list2 shape:", np.shape(list2))
print ("NH Points shape:", np.shape(projectedPointsNH))
print ("Raw Error Vector:", np.shape(np.linalg.norm(projectedPointsNH-list2, axis=1)))
Error = int(np.sum(np.linalg.norm(projectedPointsNH-list2, axis=1)))
print ("Total Error:", Error)
AvgError = np.divide(np.array(Error), np.array(len(list1)))
print ("Average Error:", AvgError)
##################
return homography, status, AvgError
def warp(self, next_frame_rgb, H):
""" warps the current frame based of calculated homography H
Args:
next_frame_rgb (np array): current frame
H (np array of shape [3,3]): homography matrix
Returns:
np array: image output of mosaicing
"""
warped_img = cv2.warpPerspective(
next_frame_rgb, H, (self.output_img.shape[1], self.output_img.shape[0]),
flags=cv2.INTER_LINEAR)
transformed_corners = self.get_transformed_corners(next_frame_rgb, H)
warped_img = self.draw_border(warped_img, transformed_corners)
self.output_img[warped_img > 0] = warped_img[warped_img > 0]
output_temp = np.copy(self.output_img)
output_temp = self.draw_border(output_temp, transformed_corners, color=(0, 0, 255))
# Visualize the stitched result
if self.visualize:
output_temp_copy = output_temp/255.
output_temp_copy = cv2.normalize(output_temp_copy, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U) # convert float64 to unit8
size = 720
heightM, widthM = output_temp_copy.shape[:2]
ratio = size / float(heightM)
output_temp_copy = cv2.resize(output_temp_copy, (int(ratio * widthM), size), interpolation=cv2.INTER_AREA)
cv2.imshow('output', output_temp_copy)
return self.output_img
# staticmethod
def get_transformed_corners(next_frame_rgb, H):
"""finds the corner of the current frame after warp
Args:
next_frame_rgb (np array): current frame
H (np array of shape [3,3]): Homography matrix
Returns:
[np array]: a list of 4 corner points after warping
"""
corner_0 = np.array([0, 0])
corner_1 = np.array([next_frame_rgb.shape[1], 0])
corner_2 = np.array([next_frame_rgb.shape[1], next_frame_rgb.shape[0]])
corner_3 = np.array([0, next_frame_rgb.shape[0]])
corners = np.array([[corner_0, corner_1, corner_2, corner_3]], dtype=np.float32)
transformed_corners = cv2.perspectiveTransform(corners, H)
transformed_corners = np.array(transformed_corners, dtype=np.int32)
# output_temp = np.copy(output_img)
# mask = np.zeros(shape=(output_temp.shape[0], output_temp.shape[1], 1))
# cv2.fillPoly(mask, transformed_corners, color=(1, 0, 0))
# cv2.imshow('mask', mask)
return transformed_corners
def draw_border(self, image, corners, color=(0, 0, 0)):
"""This functions draw rectancle border
Args:
image ([type]): current mosaiced output
corners (np array): list of corner points
color (tuple, optional): color of the border lines. Defaults to (0, 0, 0).
Returns:
np array: the output image with border
"""
for i in range(corners.shape[1]-1, -1, -1):
cv2.line(image, tuple(corners[0, i, :]), tuple(
corners[0, i-1, :]), thickness=5, color=color)
return image
#staticmethod
def stitchedimg_crop(stitched_img):
"""This functions crop the black edge
Args:
stitched_img (np array): stitched image with black edge
Returns:
np array: the output image with no black edge
"""
stitched_img = cv2.normalize(stitched_img, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U) # convert float64 to unit8
# Crop black edges
stitched_img_gray = cv2.cvtColor(stitched_img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(stitched_img_gray, 1, 255, cv2.THRESH_BINARY)
dino, contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
print ("Cropping black edge of stitched image ...")
print ("Found %d contours...\n" % (len(contours)))
max_area = 0
best_rect = (0,0,0,0)
for cnt in contours:
x,y,w,h = cv2.boundingRect(cnt)
deltaHeight = h-y
deltaWidth = w-x
if deltaHeight < 0 or deltaWidth < 0:
deltaHeight = h+y
deltaWidth = w+x
area = deltaHeight * deltaWidth
if ( area > max_area and deltaHeight > 0 and deltaWidth > 0):
max_area = area
best_rect = (x,y,w,h)
if ( max_area > 0 ):
final_img_crop = stitched_img[best_rect[1]:best_rect[1]+best_rect[3],
best_rect[0]:best_rect[0]+best_rect[2]]
return final_img_crop
def main():
images = sorted(glob.glob(image_dir + "/*.jpg"),
key=lambda x: int(os.path.splitext(os.path.basename(x))[0][5:]))
# read the first frame
first_frame = cv2.imread(key_frame)
heightM, widthM = first_frame.shape[:2]
first_frame = cv2.resize(first_frame, (int(widthM / resize_ratio),
int(heightM / resize_ratio)),
interpolation=cv2.INTER_AREA)
image_stitching = ImageStitching(first_frame)
round = 2
for next_img_path in images[1:]:
print (f'Reading {Back.YELLOW}%s{Style.RESET_ALL}...' % next_img_path)
next_frame_rgb = cv2.imread(next_img_path)
heightM, widthM = next_frame_rgb.shape[:2]
next_frame_rgb = cv2.resize(next_frame_rgb, (int(widthM / resize_ratio),
int(heightM / resize_ratio)),
interpolation=cv2.INTER_AREA)
print ("Stitching %d / %d of image ..." % (round,len(images)))
# process each frame
image_stitching.process_adj_frame(next_frame_rgb)
round += 1
if round > len(images):
print ("Please press 'q' to continue the process ...")
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.waitKey(0)
cv2.destroyAllWindows()
# cv2.imwrite('mosaic.jpg', image_stitching.output_img)
final_img_crop = image_stitching.stitchedimg_crop(image_stitching.output_img)
print ("Image stitching done ...")
cv2.imwrite("%s/Normal.JPG" % output_dir, final_img_crop)
# Save important results into csv file
tuplelist = tuple(image_stitching.record)
workbook = xls.Workbook('Normal.xlsx')
worksheet = workbook.add_worksheet("Normal")
row = 0
col = 0
worksheet.write(row, col, 'number_pairs')
worksheet.write(row, col + 1, 'basefeature')
worksheet.write(row, col + 2, 'nextfeature')
worksheet.write(row, col + 3, 'no_match_lr')
worksheet.write(row, col + 4, 'match_rate')
worksheet.write(row, col + 5, 'no_GMSmatches (OFF)')
worksheet.write(row, col + 6, 'gms_match_rate')
worksheet.write(row, col + 7, 'inlier')
worksheet.write(row, col + 8, 'inlierratio')
worksheet.write(row, col + 9, 'reproerror')
row += 1
number = 1
# Iterate over the data and write it out row by row.
for basefeature, nextfeature, no_match_lr, no_GMSmatches, inlier, inlierratio, reproerror in (tuplelist):
worksheet.write(row, col, number)
worksheet.write(row, col + 1, basefeature)
worksheet.write(row, col + 2, nextfeature)
worksheet.write(row, col + 3, no_match_lr)
match_rate = no_match_lr / ((basefeature+nextfeature)/2)
worksheet.write(row, col + 4, match_rate)
worksheet.write(row, col + 5, no_GMSmatches)
gms_match_rate = no_GMSmatches / ((basefeature+nextfeature)/2)
worksheet.write(row, col + 6, gms_match_rate)
worksheet.write(row, col + 7, inlier)
worksheet.write(row, col + 8, inlierratio)
worksheet.write(row, col + 9, reproerror)
number += 1
row += 1
workbook.close()
""""""""""""""""""""""""""""""""""""""""""""" Main """""""""""""""""""""""""""""""""""""""
if __name__ == "__main__":
program_start = time.process_time()
main()
program_end = time.process_time()
print (f'Program elapsed time: {Back.GREEN}%s s{Style.RESET_ALL}\n' % str(program_end-program_start))
Eventually I changed the way of warping the image using the approach provided by Jahaniam Real Time Video Mosaic. He locates the reference image at the middle of preset size of blank image and compute the subsequent homography and warp the adjacent images to the reference image.
Example of stitched image
Trying to find a circle in an image that has finite radius. Started off using 'HoughCircles' method from OpenCV as the parameters for it seemed very much related to my situation. But it is failing to find it. Looks like the image may need more pre-processing for it to find reliably. So, started off playing with different thresholds in opencv to no success. Here is an example of an image (note that the overall intensity of the image will vary, but the radius of the circle always remain the same ~45pixels)
Here is what I have tried so far
image = cv2.imread('image1.bmp', 0)
img_in = 255-image
mean_val = int(np.mean(img_in))
ret, img_thresh = cv2.threshold(img_in, thresh=mean_val-30, maxval=255, type=cv2.THRESH_TOZERO)
# detect circle
circles = cv2.HoughCircles(gray, cv2.HOUGH_GRADIENT, 1.0, 100, minRadius=40, maxRadius=50)
If you look at the image, the circle is obvious, its a thin light gray circle in the center of the blob.
Any suggestions?
Edited to show expected result
The expected result should be like this, as you can see, the circle is very obvious for naked eye on the original image and is always of the same radius but not at the same location on the image. But there will be only one circle of this kind on any given image.
As of 8/20/2020, here is the code I am using to get the center and radii
from numpy import zeros as np_zeros,\
full as np_full
from cv2 import calcHist as cv2_calcHist,\
HoughCircles as cv2_HoughCircles,\
HOUGH_GRADIENT as cv2_HOUGH_GRADIENT
def getCenter(img_in, saturated, minradius, maxradius):
img_local = img_in[100:380,100:540,0]
res = np_full(3, -1)
# do some contrast enhancement
img_local = stretchHistogram(img_local, saturated)
circles = cv2_HoughCircles(img_local, cv2_HOUGH_GRADIENT, 1, 40, param1=70, param2=20,
minRadius=minradius,
maxRadius=maxradius)
if circles is not None: # found some circles
circles = sorted(circles[0], key=lambda x: x[2])
res[0] = circles[0][0]+100
res[1] = circles[0][1]+100
res[2] = circles[0][2]
return res #x,y,radii
def stretchHistogram(img_in, saturated=0.35, histMin=0.0, binSize=1.0):
img_local = img_in.copy()
img_out = img_in.copy()
min, max = getMinAndMax(img_local, saturated)
if max > min:
min = histMin+min * binSize
max = histMin+max * binSize
w, h = img_local.shape[::-1]
#create a new lut
lut = np_zeros(256)
max2 = 255
for i in range(0, 256):
if i <= min:
lut[i] = 0
elif i >= max:
lut[i] = max2
else:
lut[i] = (round)(((float)(i - min) / (max - min)) * max2)
#update image with new lut values
for i in range(0, h):
for j in range(0, w):
img_out[i, j] = lut[img_local[i, j]]
return img_out
def getMinAndMax(img_in, saturated):
img_local = img_in.copy()
hist = cv2_calcHist([img_local], [0], None, [256], [0, 256])
w, h = img_local.shape[::-1]
pixelCount = w * h
saturated = 0.5
threshold = (int)(pixelCount * saturated / 200.0)
found = False
count = 0
i = 0
while not found and i < 255:
count += hist[i]
found = count > threshold
i = i + 1
hmin = i
i = 255
count = 0
while not found and i > 0:
count += hist[i]
found = count > threshold
i = i - 1
hmax = i
return hmin, hmax
and calling the above function as
getCenter(img, 5.0, 55, 62)
But it is still very unreliable. Not sure why it is so hard to get to an algorithm that works reliably for something that is very obvious to a naked eye. Not sure why there is so much variation in the result from frame to frame even though there is no change between them.
Any suggestions are greatly appreciated. Here are some more samples to play with
simple, draw your circles: cv2.HoughCircles returns a list of circles..
take care of maxRadius = 100
for i in circles[0,:]:
# draw the outer circle
cv2.circle(image,(i[0],i[1]),i[2],(255,255,0),2)
# draw the center of the circle
cv2.circle(image,(i[0],i[1]),2,(255,0,255),3)
a full working code (you have to change your tresholds):
import cv2
import numpy as np
image = cv2.imread('0005.bmp', 0)
height, width = image.shape
print(image.shape)
img_in = 255-image
mean_val = int(np.mean(img_in))
blur = cv2.blur(img_in , (3,3))
ret, img_thresh = cv2.threshold(blur, thresh=100, maxval=255, type=cv2.THRESH_TOZERO)
# detect circle
circles = cv2.HoughCircles(img_thresh, cv2.HOUGH_GRADIENT,1,40,param1=70,param2=20,minRadius=60,maxRadius=0)
print(circles)
for i in circles[0,:]:
# check if center is in middle of picture
if(i[0] > width/2-30 and i[0] < width/2+30 \
and i[1] > height/2-30 and i[1] < height/2+30 ):
# draw the outer circle
cv2.circle(image,(i[0],i[1]),i[2],(255,255,0),2)
# draw the center of the circle
cv2.circle(image,(i[0],i[1]),2,(255,0,255),3)
cv2.imshow("image", image )
while True:
keyboard = cv2.waitKey(2320)
if keyboard == 27:
break
cv2.destroyAllWindows()
result:
Extracting table data from digital PDFs have been simple using camelot and tabula. However, the solution doesn't work with scanned images of the document pages specifically when the table doesn't have borders and inner grids. I have been trying to generate vertical and horizontal lines using OpenCV. However, since the scanned images will have slight rotation angles, it is difficult to proceed with the approach.
How can we utilize OpenCV to generate grids (horizontal and vertical lines) and borders for the scanned document page which contains table data (along with paragraphs of text)? If this is feasible, how to nullify the rotation angle of the scanned image?
I wrote some code to estimate the horizontal lines from the printed letters in the page. The same could be done for vertical ones I guess. The code below follows some general assumptions, here
some basic steps in pseudo code style:
prepare picture for contour detection
do contour detection
we assume most contours are letters
calc mean width of all contours
calc mean area of contours
filter all contours with two conditions:
a) contour (letter) heigths < meanHigh * 2
b) contour area > 4/5 meanArea
calc center point of all remaining contours
assume we have line regions (bins)
list all center point which are inside the region
do linear regression of region points
save slope and intercept
calc mean slope and intercept
here the full code:
import cv2
import numpy as np
from scipy import stats
def resizeImageByPercentage(img,scalePercent = 60):
width = int(img.shape[1] * scalePercent / 100)
height = int(img.shape[0] * scalePercent / 100)
dim = (width, height)
# resize image
return cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
def calcAverageContourWithAndHeigh(contourList):
hs = list()
ws = list()
for cnt in contourList:
(x, y, w, h) = cv2.boundingRect(cnt)
ws.append(w)
hs.append(h)
return np.mean(ws),np.mean(hs)
def calcAverageContourArea(contourList):
areaList = list()
for cnt in contourList:
a = cv2.minAreaRect(cnt)
areaList.append(a[2])
return np.mean(areaList)
def calcCentroid(contour):
houghMoments = cv2.moments(contour)
# calculate x,y coordinate of centroid
if houghMoments["m00"] != 0: #case no contour could be calculated
cX = int(houghMoments["m10"] / houghMoments["m00"])
cY = int(houghMoments["m01"] / houghMoments["m00"])
else:
# set values as what you need in the situation
cX, cY = -1, -1
return cX,cY
def getCentroidWhenSizeInRange(contourList,letterSizeWidth,letterSizeHigh,deltaOffset,minLetterArea=10.0):
centroidList=list()
for cnt in contourList:
(x, y, w, h) = cv2.boundingRect(cnt)
area = cv2.minAreaRect(cnt)
#calc diff
diffW = abs(w-letterSizeWidth)
diffH = abs(h-letterSizeHigh)
#thresold A: almost smaller than mean letter size +- offset
#when almost letterSize
if diffW < deltaOffset and diffH < deltaOffset:
#threshold B > min area
if area[2] > minLetterArea:
cX,cY = calcCentroid(cnt)
if cX!=-1 and cY!=-1:
centroidList.append((cX,cY))
return centroidList
DEBUGMODE = True
#read image, do git clone https://github.com/WZBSocialScienceCenter/pdftabextract.git for the example
img = cv2.imread('pdftabextract/examples/catalogue_30s/data/ALA1934_RR-excerpt.pdf-2_1.png')
#get some basic infos
imgHeigh, imgWidth, imgChannelAmount = img.shape
if DEBUGMODE:
cv2.imwrite("img00original.jpg",resizeImageByPercentage(img,30))
cv2.imshow("original",img)
# prepare img
imgGrey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply Gaussian filter
imgGaussianBlur = cv2.GaussianBlur(imgGrey,(5,5),0)
#make binary img, black or white
_, imgBinThres = cv2.threshold(imgGaussianBlur, 130, 255, cv2.THRESH_BINARY)
## detect contours
contours, _ = cv2.findContours(imgBinThres, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#we get some letter parameter
averageLetterWidth, averageLetterHigh = calcAverageContourWithAndHeigh(contours)
threshold1AllowedLetterSizeOffset = averageLetterHigh * 2 # double size
averageContourAreaSizeOfMinRect = calcAverageContourArea(contours)
threshHold2MinArea = 4 * averageContourAreaSizeOfMinRect / 5 # 4/5 * mean
print("mean letter Width: ", averageLetterWidth)
print("mean letter High: ", averageLetterHigh)
print("threshold 1 tolerance: ", threshold1AllowedLetterSizeOffset)
print("mean letter area ", averageContourAreaSizeOfMinRect)
print("thresold 2 min letter area ", threshHold2MinArea)
#we get all centroid of letter sizes contours, the other we ignore
centroidList = getCentroidWhenSizeInRange(contours,averageLetterWidth,averageLetterHigh,threshold1AllowedLetterSizeOffset,threshHold2MinArea)
if DEBUGMODE:
#debug print all centers:
imgFilteredCenter = img.copy()
for cX,cY in centroidList:
#draw in red color as BGR
cv2.circle(imgFilteredCenter, (cX, cY), 5, (0, 0, 255), -1)
cv2.imwrite("img01letterCenters.jpg",resizeImageByPercentage(imgFilteredCenter,30))
cv2.imshow("letterCenters",imgFilteredCenter)
#we estimate a bin widths
amountPixelFreeSpace = averageLetterHigh #TODO get better estimate out of histogram
estimatedBinWidth = round( averageLetterHigh + amountPixelFreeSpace) #TODO round better ?
binCollection = dict() #range(0,imgHeigh,estimatedBinWidth)
#we do seperate the center points into bins by y coordinate
for i in range(0,imgHeigh,estimatedBinWidth):
listCenterPointsInBin = list()
yMin = i
yMax = i + estimatedBinWidth
for cX,cY in centroidList:
if yMin < cY < yMax:#if fits in bin
listCenterPointsInBin.append((cX,cY))
binCollection[i] = listCenterPointsInBin
#we assume all point are in one line ?
#model = slope (x) + intercept
#model = m (x) + n
mList = list() #slope abs in img
nList = list() #intercept abs in img
nListRelative = list() #intercept relative to bin start
minAmountRegressionElements = 12 #is also alias for letter amount we expect
#we do regression for every point in the bin
for startYOfBin, values in binCollection.items():
#we reform values
xValues = [] #TODO use more short transform
yValues = []
for x,y in values:
xValues.append(x)
yValues.append(y)
#we assume a min limit of point in bin
if len(xValues) >= minAmountRegressionElements :
slope, intercept, r, p, std_err = stats.linregress(xValues, yValues)
mList.append(slope)
nList.append(intercept)
#we calc the relative intercept
nRelativeToBinStart = intercept - startYOfBin
nListRelative.append(nRelativeToBinStart)
if DEBUGMODE:
#we debug print all lines in one picute
imgLines = img.copy()
colorOfLine = (0, 255, 0) #green
for i in range(0,len(mList)):
slope = mList[i]
intercept = nList[i]
startPoint = (0, int( intercept)) #better round ?
endPointY = int( (slope * imgWidth + intercept) )
if endPointY < 0:
endPointY = 0
endPoint = (imgHeigh,endPointY)
cv2.line(imgLines, startPoint, endPoint, colorOfLine, 2)
cv2.imwrite("img02lines.jpg",resizeImageByPercentage(imgLines,30))
cv2.imshow("linesOfLetters ",imgLines)
#we assume in mean we got it right
meanIntercept = np.mean(nListRelative)
meanSlope = np.mean(mList)
print("meanIntercept :", meanIntercept)
print("meanSlope ", meanSlope)
#TODO calc angle with math.atan(slope) ...
if DEBUGMODE:
cv2.waitKey(0)
original:
center point of letters:
lines:
I had the same problem some time ago and this tutorial is the solution to that. It explains using pdftabextract which is a Python library by Markus Konrad and leverages OpenCV’s Hough transform to detect the lines and works even if the scanned document is a bit tilted. The tutorial walks your through parsing a 1920s German newspaper
I am doing a college class project on image processing. This is my original image:
I want to join nearby/overlapping bounding boxes on individual text line images, but I don't know how. My code looks like this so far (thanks to #HansHirse for the help):
import os
import cv2
import numpy as np
from scipy import stats
image = cv2.imread('example.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
#dilation
kernel = np.ones((5,5), np.uint8)
img_dilation = cv2.dilate(thresh, kernel, iterations=1)
#find contours
ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# https://www.pyimagesearch.com/2015/04/20/sorting-contours-using-python-and-opencv/
def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
reverse = False
i = 0
# handle if we need to sort in reverse
if method == "right-to-left" or method == "bottom-to-top":
reverse = True
# handle if we are sorting against the y-coordinate rather than
# the x-coordinate of the bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
i = 1
# construct the list of bounding boxes and sort them from top to
# bottom
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b: b[1][i], reverse=reverse))
# return the list of sorted contours and bounding boxes
return (cnts, boundingBoxes)
sortedctrs,sortedbbs=sort_contours(ctrs)
xyminmax=[]
for cnt in sortedctrs:
x, y, w, h = cv2.boundingRect(cnt)
xyminmax.append([x,y,x+w,y+h])
distances=[]
for i in range(len(xyminmax)):
try:
first_xmax = xyminmax[i][2]
second_xmin = xyminmax[i + 1][0]
distance=abs(second_xmin-first_xmax)
distances.append(distance)
except IndexError:
pass
THRESHOLD=stats.mode(distances, axis=None)[0][0]
new_rects=[]
for i in range(len(xyminmax)):
try:
# [xmin,ymin,xmax,ymax]
first_ymin=xyminmax[i][1]
first_ymax=xyminmax[i][3]
second_ymin=xyminmax[i+1][1]
second_ymax=xyminmax[i+1][3]
first_xmax = xyminmax[i][2]
second_xmin = xyminmax[i+1][0]
firstheight=abs(first_ymax-first_ymin)
secondheight=abs(second_ymax-second_ymin)
distance=abs(second_xmin-first_xmax)
if distance<THRESHOLD:
new_xmin=xyminmax[i][0]
new_xmax=xyminmax[i+1][2]
if first_ymin>second_ymin:
new_ymin=second_ymin
else:
new_ymin = first_ymin
if firstheight>secondheight:
new_ymax = first_ymax
else:
new_ymax = second_ymax
new_rects.append([new_xmin,new_ymin,new_xmax,new_ymax])
else:
new_rects.append(xyminmax[i])
except IndexError:
pass
for rect in new_rects:
cv2.rectangle(image, (rect[0], rect[1]), (rect[2], rect[3]), (121, 11, 189), 2)
cv2.imwrite("result.png",image)
which produces this image as a result:
I want to join very close or overlapping bounding boxes such as these
into a single bounding box so the formula doesn't get separated into single characters. I have tried using cv2.groupRectangles but the print results were just NULL.
So, here comes my solution. I partially modified your (initial) code to my preferred naming, etc. Also, I commented all the stuff, I added.
import cv2
import numpy as np
image = cv2.imread('images/example.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((5, 5), np.uint8)
img_dilated = cv2.dilate(thresh, kernel, iterations = 1)
cnts, _ = cv2.findContours(img_dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Array of initial bounding rects
rects = []
# Bool array indicating which initial bounding rect has
# already been used
rectsUsed = []
# Just initialize bounding rects and set all bools to false
for cnt in cnts:
rects.append(cv2.boundingRect(cnt))
rectsUsed.append(False)
# Sort bounding rects by x coordinate
def getXFromRect(item):
return item[0]
rects.sort(key = getXFromRect)
# Array of accepted rects
acceptedRects = []
# Merge threshold for x coordinate distance
xThr = 5
# Iterate all initial bounding rects
for supIdx, supVal in enumerate(rects):
if (rectsUsed[supIdx] == False):
# Initialize current rect
currxMin = supVal[0]
currxMax = supVal[0] + supVal[2]
curryMin = supVal[1]
curryMax = supVal[1] + supVal[3]
# This bounding rect is used
rectsUsed[supIdx] = True
# Iterate all initial bounding rects
# starting from the next
for subIdx, subVal in enumerate(rects[(supIdx+1):], start = (supIdx+1)):
# Initialize merge candidate
candxMin = subVal[0]
candxMax = subVal[0] + subVal[2]
candyMin = subVal[1]
candyMax = subVal[1] + subVal[3]
# Check if x distance between current rect
# and merge candidate is small enough
if (candxMin <= currxMax + xThr):
# Reset coordinates of current rect
currxMax = candxMax
curryMin = min(curryMin, candyMin)
curryMax = max(curryMax, candyMax)
# Merge candidate (bounding rect) is used
rectsUsed[subIdx] = True
else:
break
# No more merge candidates possible, accept current rect
acceptedRects.append([currxMin, curryMin, currxMax - currxMin, curryMax - curryMin])
for rect in acceptedRects:
img = cv2.rectangle(image, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (121, 11, 189), 2)
cv2.imwrite("images/result.png", image)
For your example
I get the following output
Now, you have to find a proper threshold to meet your expectations. Maybe, there is even some more work to do, especially to get the whole formula, since the distances don't vary that much.
Disclaimer: I'm new to Python in general, and specially to the Python API of OpenCV (C++ for the win). Comments, improvements, highlighting Python no-gos are highly welcome!
Here is a slightly different approach, using the OpenCV Wrapper library.
import cv2
import opencv_wrapper as cvw
image = cv2.imread("example.png")
gray = cvw.bgr2gray(image)
thresh = cvw.threshold_otsu(gray, inverse=True)
# dilation
img_dilation = cvw.dilate(thresh, 5)
# Find contours
contours = cvw.find_external_contours(img_dilation)
# Map contours to bounding rectangles, using bounding_rect property
rects = map(lambda c: c.bounding_rect, contours)
# Sort rects by top-left x (rect.x == rect.tl.x)
sorted_rects = sorted(rects, key=lambda r: r.x)
# Distance threshold
dt = 5
# List of final, joined rectangles
final_rects = [sorted_rects[0]]
for rect in sorted_rects[1:]:
prev_rect = final_rects[-1]
# Shift rectangle `dt` back, to find out if they overlap
shifted_rect = cvw.Rect(rect.tl.x - dt, rect.tl.y, rect.width, rect.height)
intersection = cvw.rect_intersection(prev_rect, shifted_rect)
if intersection is not None:
# Join the two rectangles
min_y = min((prev_rect.tl.y, rect.tl.y))
max_y = max((prev_rect.bl.y, rect.bl.y))
max_x = max((prev_rect.br.x, rect.br.x))
width = max_x - prev_rect.tl.x
height = max_y - min_y
new_rect = cvw.Rect(prev_rect.tl.x, min_y, width, height)
# Add new rectangle to final list, making it the new prev_rect
# in the next iteration
final_rects[-1] = new_rect
else:
# If no intersection, add the box
final_rects.append(rect)
for rect in sorted_rects:
cvw.rectangle(image, rect, cvw.Color.MAGENTA, line_style=cvw.LineStyle.DASHED)
for rect in final_rects:
cvw.rectangle(image, rect, cvw.Color.GREEN, thickness=2)
cv2.imwrite("result.png", image)
And the result
The green boxes are the final result, while the magenta boxes are the original ones.
I used the same threshold as #HansHirse.
The equals sign still needs some work. Either a higher dilation kernel size or use the same technique vertically.
Disclosure: I am the author of OpenCV Wrapper.
Easy-to-read solution:
contours = get_contours(frame)
boxes = [cv2.boundingRect(c) for c in contours]
boxes = merge_boxes(boxes, x_val=40, y_val=20) # Where x_val and y_val are axis thresholds
def get_contours(frame): # Returns a list of contours
contours = cv2.findContours(frame, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = imutils.grab_contours(contours)
return contours
def merge_boxes(boxes, x_val, y_val):
size = len(boxes)
if size < 2:
return boxes
if size == 2:
if boxes_mergeable(boxes[0], boxes[1], x_val, y_val):
boxes[0] = union(boxes[0], boxes[1])
del boxes[1]
return boxes
boxes = sorted(boxes, key=lambda r: r[0])
i = size - 2
while i >= 0:
if boxes_mergeable(boxes[i], boxes[i + 1], x_val, y_val):
boxes[i] = union(boxes[i], boxes[i + 1])
del boxes[i + 1]
i -= 1
return boxes
def boxes_mergeable(box1, box2, x_val, y_val):
(x1, y1, w1, h1) = box1
(x2, y2, w2, h2) = box2
return max(x1, x2) - min(x1, x2) - minx_w(x1, w1, x2, w2) < x_val \
and max(y1, y2) - min(y1, y2) - miny_h(y1, h1, y2, h2) < y_val
def minx_w(x1, w1, x2, w2):
return w1 if x1 <= x2 else w2
def miny_h(y1, h1, y2, h2):
return h1 if y1 <= y2 else h2
def union(a, b):
x = min(a[0], b[0])
y = min(a[1], b[1])
w = max(a[0] + a[2], b[0] + b[2]) - x
h = max(a[1] + a[3], b[1] + b[3]) - y
return x, y, w, h
--> If you have bounding boxes and want to merge along both X and Y directions, use this snippet
--> Adjust x_pixel_value and y_pixel_value to your preferences
--> But for this, you need to have the bounding boxes
import cv2
img = cv2.imread(your image path)
x_pixel_value = 5
y_pixel_value = 6
bboxes_list = [] # your bounding boxes list
rects_used = []
for i in bboxes_list:
rects_used.append(False)
end_bboxes_list = []
for enum,i in enumerate(bboxes_list):
if rects_used[enum] == True:
continue
xmin = i[0]
xmax = i[2]
ymin = i[1]
ymax = i[3]
for enum1,j in enumerate(bboxes_list[(enum+1):], start = (enum+1)):
i_xmin = j[0]
i_xmax = j[2]
i_ymin = j[1]
i_ymax = j[3]
if rects_used[enum1] == False:
if abs(ymin - i_ymin) < x_pixel_value:
if abs(xmin-i_xmax) < y_pixel_value or abs(xmax-i_xmin) < y_pixel_value:
rects_used[enum1] = True
xmin = min(xmin,i_xmin)
xmax = max(xmax,i_xmax)
ymin = min(ymin,i_ymin)
ymax = max(ymax,i_ymax)
final_box = [xmin,ymin,xmax,ymax]
end_bboxes_list.append(final_box)
for i in end_bboxes_list:
cv2.rectangle(img,(i[0],i[1]),(i[2],i[3]), color = [0,255,0], thickness = 2)
cv2.imshow("Image",img)
cv2.waitKey(10000)
cv2.destroyAllWindows()
I used yolov3 to detect objects in a frame of size 416x416. I used that bounding box information to draw boxes on that 416x416 image.
But since the picture is so small, i can not see it properly, so i used the same frame which has dims 1920x1080. I want to scale the bounding box information and x,y cordinates such that it scales to the high dim picture but i am not able to correct scale it.
clearly the information is way off.
Note! Before passing the frame, i resize the frames from 1920,1080 to 416,416 using this method
def letterbox_resize(img, size=(resized_image_size,resized_image_size), padColor=0):
h, w = img.shape[:2]
sh, sw = size
# interpolation method
if h > sh or w > sw: # shrinking image
interp = cv2.INTER_AREA
else: # stretching image
interp = cv2.INTER_CUBIC
# aspect ratio of image
aspect = w/h # if on Python 2, you might need to cast as a float: float(w)/h
# compute scaling and pad sizing
if aspect > 1: # horizontal image
new_w = sw
new_h = np.round(new_w/aspect).astype(int)
pad_vert = (sh-new_h)/2
pad_top, pad_bot = np.floor(pad_vert).astype(int), np.ceil(pad_vert).astype(int)
pad_left, pad_right = 0, 0
elif aspect < 1: # vertical image
new_h = sh
new_w = np.round(new_h*aspect).astype(int)
pad_horz = (sw-new_w)/2
pad_left, pad_right = np.floor(pad_horz).astype(int), np.ceil(pad_horz).astype(int)
pad_top, pad_bot = 0, 0
else: # square image
new_h, new_w = sh, sw
pad_left, pad_right, pad_top, pad_bot = 0, 0, 0, 0
# set pad color
if len(img.shape) is 3 and not isinstance(padColor, (list, tuple, np.ndarray)): # color image but only one color provided
padColor = [padColor]*3
# scale and pad
scaled_img = cv2.resize(img, (new_w, new_h), interpolation=interp)
scaled_img = cv2.copyMakeBorder(scaled_img, pad_top, pad_bot, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=padColor)
return scaled_img
If someone help me in writing a script which will rescale the x,y,w,h info of the yolo predictions so that i can correctly draw accurate boxes on the image.
Your re-scaling process doesn't take into account the zero padded area on the top. Remove the top zero pads before multiplying with the scale ratio and you should be able to get the proper result.
Here is an example code for all 3 cases where bounding box is the points that corresponds to the result from YOLO.
def boundBox_restore(boundingbox, ori_size=(ori_image_width,ori_image_height), resized_size=(resized_image_size,resized_image_size)):
h, w = ori_size
sh, sw = resized_size
scale_ratio = w / sw
ox,oy,ow,oh = boundingbox
# aspect ratio of image
aspect = w/h # if on Python 2, you might need to cast as a float: float(w)/h
# compute scaling and pad sizing
if aspect > 1: # horizontal image
new_w = sw
new_h = np.round(new_w/aspect).astype(int)
pad_vert = (sh-new_h)/2
pad_top, pad_bot = np.floor(pad_vert).astype(int), np.ceil(pad_vert).astype(int)
pad_left, pad_right = 0, 0
elif aspect < 1: # vertical image
new_h = sh
new_w = np.round(new_h*aspect).astype(int)
pad_horz = (sw-new_w)/2
pad_left, pad_right = np.floor(pad_horz).astype(int), np.ceil(pad_horz).astype(int)
pad_top, pad_bot = 0, 0
else: # square image
new_h, new_w = sh, sw
pad_left, pad_right, pad_top, pad_bot = 0, 0, 0, 0
# remove pad
ox = ox - pad_left
oy = oy - pad_top
# rescale
ox = ox * scale_ratio
oy = oy * scale_ratio
ow = ow * scale_ratio
oh = oh * scale_ratio
return (ox,oy,oh,ow)