Multi-scale template match vs. Text Detection - python

I'm trying to automate the navigation of a website to grab data and download files using PyAutoGUI to detect images and buttons, but I'm having trouble using this on other people's computers. It seems to me that matching images of text is the biggest obstacle here.
I suspected the issue to be with scaling and resolution so I attempted using multi-scale template matching, but I found that using a template I upscaled wouldn't create a match at all. Using a template I downscaled didn't help either since it would either not find any matches, or find the wrong match even with a small range of confidences of 0.8-0.9.
Here's the original image at 74x17.
Here's the upscaled image at 348x80 (Windows Photo wouldn't let me upscale it any smaller for some reason).
Here's the downscaled image at 40x8.
Currently, with a downscaled image, PyAutoGUI is confusing the above image with this image:
Here's the code I wrote (and some I borrowed from someone.
Code for multi-scaling I borrowed:
# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8):
# Locate an image and return a pyscreeze box surrounding it.
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
templateim = pyscreeze._load_cv2(image,grayscale=gs) # loads the image
(tH, tW) = templateim.shape[:2] # changes the orientation
screenim_color = pyautogui.screenshot() # screenshot of image
screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)
# Checking if the locateOnScreen() is utilized with grayscale=True or not
if gs is True:
screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
else:
screenim = screenim_color
#try different scaling parameters and see which one matches best
found = None #bookeeping variable for the maximum correlation coefficient, position and scale
scalingrange = np.linspace(0.25,5,num=150)
for scale in scalingrange:
print("Trying another scale")
resizedtemplate = imutils.resize(templateim, width = int(templateim.shape[1]*scale) ) # resizing with imutils maintains the aspect ratio
r = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
(_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
if found is None or maxVal > found[0]:
found = (maxVal, maxLoc, r)
(maxVal, maxLoc, r) = found
if maxVal > confidence:
box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*r), int(tH*r) )
return box
else:
return None
def locate_center_with_scaling(image,gs=True):
loc = template_match_with_scaling(image,gs=gs)
if loc:
return pyautogui.center(loc)
else:
raise Exception("Image not found")
My code to match and click on a textbox next to its identifier:
while SKUnoCounter <= len(listOfSKUs):
while pyautogui.locateOnScreen('DescriptionBox-RESIZEDsmall.png', grayscale=True, confidence=0.8 ) is None:
print("Looking for Description Box.")
if locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png') is not None:
print("Found a resized version of Description Box. ")
#Calling to function
DB_x, DB_y = locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png')
#Clicking on Description text box
pyautogui.click( DB_x + 417, DB_y +12, button='left')
break
time.sleep(0.5)
Is it worthwhile to try and improve the accuracy of the multi-scale template matching if my goal is to use this across all kinds of computers? Would it be better to try using OCR to detect text instead of by image? My other idea here is to use PyTesseract to locate the text I'm searching for and then use those coordinates to click on things. Selenium does not work here as I need to work on an existing IE browser.
Any input here is greatly appreciated!

Following my comment above, this is how the modified function could look like
# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8, scalingrange=None):
# Locate an image and return a pyscreeze box surrounding it.
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
templateim = pyscreeze._load_cv2(image,grayscale=gs) # loads the image
(tH, tW) = templateim.shape[:2] # changes the orientation
screenim_color = pyautogui.screenshot() # screenshot of image
screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)
# Checking if the locateOnScreen() is utilized with grayscale=True or not
if gs is True:
screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
else:
screenim = screenim_color
#try different scaling parameters and see which one matches best
found = None #bookeeping variable for the maximum correlation coefficient, position and scale
for scalex in scalingrange:
width = int(templateim.shape[1] * scalex)
for scaley in scalingrange:
#print("Trying another scale")
#print(scalex,scaley)
height = int(templateim.shape[0] * scaley)
scaledsize = (width, height)
# resize image
resizedtemplate = cv2.resize(templateim, scaledsize)
#resizedtemplate = imutils.resize(templateim, width = int(templateim.shape[1]*scale) ) # resizing with imutils maintains the aspect ratio
ry = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
rx = float(resizedtemplate.shape[0])/templateim.shape[0] # recompute scaling factor
result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
(_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
if found is None or maxVal > found[0]:
found = (maxVal, maxLoc, rx, ry)
(maxVal, maxLoc, rx, ry) = found
print('maxVal= ', maxVal)
if maxVal > confidence:
box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*rx), int(tH*ry) )
return box
else:
return None
def locate_center_with_scaling(image,gs=True,**kwargs):
loc = template_match_with_scaling(image,gs=gs,**kwargs)
if loc:
return pyautogui.center(loc)
else:
raise Exception("Image not found")
im = 'DescriptionBox.png' # we will try to detect the small description box, whose width and height are scaled down by 0.54 and 0.47
unscaledLocation = pyautogui.locateOnScreen(im, grayscale=True, confidence=0.8 )
srange = np.linspace(0.4,0.6,num=20) #scale width and height in this range
if unscaledLocation is None:
print("Looking for Description Box.")
scaledLocation = locate_center_with_scaling(im, scalingrange= srange)
if scaledLocation is not None:
print(f'Found a resized version of Description Box at ({scaledLocation[0]},{scaledLocation[1]})')
pyautogui.moveTo(scaledLocation[0], scaledLocation[1])
We need to be mindful of two things:
template_match_with_scaling is now executing a double loop, one over each dimension so it will take some time to detect the template image. To amortize the detection time, we should save the scale parameters for width and height after the first detection, and scale all template images by these parameters for subsequent detections.
to be able to detect the template efficiently, we need to set the scalingrange input of template_match_with_scaling to an appropriate range of values. If the range is either small or doesn't have enough values, we will not be able to detect the template. If it is too large, detection time will be large.

Related

How do I add additional photos to my code in OpenCV Python for my Stitching Project?

Honestly, I am pretty lost. I was finally able to stitch these two pictures together but am unsure on how to update my code to incorporate more than two photos. How would I change my code in order to allow for multiple picture stitchings? Below is what I have so far, and I should mention that the pictures I am using are low quality, so other simpler examples I found either did not work or could not use all of the pictures I needed. If someone could just give me a general direction on how I would begin to alter this code for up to five pictures, I would appreciate it.
import cv2
import numpy as np
import matplotlib.pyplot as plt
import imageio
cv2.ocl.setUseOpenCL(False)
import warnings
warnings.filterwarnings('ignore')
#sift is a feature descriptor that helps locate pixel coordinates i.e. corner detector
feature_extraction_algo = 'sift'
feature_to_match = 'bf'
#train image needs to be the one transformed
train_photo = cv2.imread('Stitching/Images/Log771/Log2.bmp')
#converting from BGR to RGB for Matplotlib
train_photo = cv2.cvtColor(train_photo, cv2.COLOR_BGR2RGB)
train_photo_crop = train_photo[0:10000, 425:750]
#converting to gray scale
train_photo_gray = cv2.cvtColor(train_photo_crop, cv2.COLOR_RGB2GRAY)
#Do the same for the query image
query_photo = cv2.imread('Stitching/Images/Log771/Log3.bmp')
query_photo = cv2.cvtColor(query_photo, cv2.COLOR_BGR2RGB)
query_photo_crop = query_photo[0:10000, 425:750]
query_photo_gray = cv2.cvtColor(query_photo_crop, cv2.COLOR_RGB2GRAY)
#crop both images
#view/plot images
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, constrained_layout=False, figsize=(16,9))
ax1.imshow(query_photo_crop, cmap="gray")
ax1.set_xlabel("Query Image", fontsize=14)
ax2.imshow(train_photo_crop, cmap="gray")
ax2.set_xlabel("Train Image", fontsize=14)
plt.savefig("./"+'.jpg', bbox_inches='tight', dpi=300, optimize=True, format='jpeg')
plt.show()
#sift.detectAndCompute() gets keypoints and descriptors--helps to determine how similar or different keypoints are-- ie. one picture is
#huge and one is small. Keypoints match but are not similar enough, which is where descriptors come in.
#to compare the keypoints in vector format
def select_descriptor_methods(image, method=None):
assert method is not None, "Please define a feature descriptor method. accepted Values are: 'sift, 'surf'"
if method == 'sift':
descriptor = cv2.SIFT_create()
elif method == 'surf':
descriptor = cv2.SURF_create()
elif method == 'brisk':
descriptor = cv2.BRISK_create()
elif method =='orb':
descriptor = cv2.ORB_create()
(keypoints, features) = descriptor.detectAndCompute (image, None)
return (keypoints, features)
keypoints_train_img, features_train_img = select_descriptor_methods(train_photo_gray, method=feature_extraction_algo)
keypoints_query_img, features_query_img = select_descriptor_methods(query_photo_gray, method=feature_extraction_algo)
for keypoint in keypoints_query_img:
x,y = keypoint.pt
size = keypoint.size
orientation = keypoint.angle
response = keypoint.response
octave = keypoint.octave
class_id = keypoint.class_id
print (x,y)
print(size)
print(orientation)
print(response)
print(octave)
print(class_id)
print(len(keypoints_query_img))
features_query_img.shape
#Noting a basic fact that - SIFT descriptor is computed for every key-point detected in the image.
#Before computing descriptor, you probably used a detector (as Harris, Sift or Surf Detector) to detect points of interest. Detecting key-points and computing descriptors are two independent steps!
#drawing keypoints using drawKeypoints(input image,
# keypoints, output image, color, flag) -- keypoints based off input picture
#Displaying keypoints and features on both detected images
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(20,8), constrained_layout=False)
ax1.imshow(cv2.drawKeypoints(train_photo_gray, keypoints_query_img, None, color=(0,255,0)))
ax1.set_xlabel("(a)", fontsize=14)
ax2.imshow(cv2.drawKeypoints(query_photo_gray, keypoints_query_img,None,color=(0,255,0)))
ax2.set_xlabel("(b)", fontsize=14)
plt.savefig("./Stitching/" + feature_extraction_algo + "Images" + '.jpg', bbox_inches='tight', dpi=300, optimize=True, format='jpg')
plt.show()
def create_matching_object(method,crossCheck):
"Create and return a Matcher Object"
# For BF matcher, first we have to create the BFMatcher object using cv2.BFMatcher().
# It takes two optional params.
# normType - It specifies the distance measurement
# crossCheck - which is false by default. If it is true, Matcher returns only those matches
# with value (i,j) such that i-th descriptor in set A has j-th descriptor in set B as the best match
# and vice-versa.
if method == 'sift' or method == 'surf':
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=crossCheck)
elif method == 'orb' or method == 'brisk':
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=crossCheck)
return bf
def key_points_matching(features_train_img, features_query_img, method):
bf = create_matching_object(method, crossCheck=True)
# Match descriptors.
best_matches = bf.match(features_train_img,features_query_img)
# Sort the features in order of distance.
# The points with small distance (more similarity) are ordered first in the vector
rawMatches = sorted(best_matches, key = lambda x:x.distance)
print("Raw matches with Brute force):", len(rawMatches))
return rawMatches
def key_points_matching_KNN(features_train_img, features_query_img, ratio, method):
bf = create_matching_object(method, crossCheck=False)
# compute the raw matches and initialize the list of actual matches
rawMatches = bf.knnMatch(features_train_img, features_query_img, k=2)
print("Raw matches (knn):", len(rawMatches))
matches = []
#loop over raw matches
for m,n in rawMatches:
# ensure the distance is within a certain ratio of each
# other (i.e. Lowe's ratio test)
if m.distance < n.distance * ratio:
matches.append(m)
return matches
print("Drawing: {} matched features Lines".format(feature_to_match))
fig = plt.figure(figsize=(20,8))
if feature_to_match == 'bf':
matches = key_points_matching(features_train_img, features_query_img, method=feature_extraction_algo)
mapped_features_image = cv2.drawMatches(train_photo_crop,keypoints_train_img,query_photo_crop,keypoints_query_img,matches[:100],None,flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
# Now for cross checking draw the feature-mapping lines also with KNN
elif feature_to_match == 'knn':
matches = key_points_matching_KNN(features_train_img, features_query_img, ratio=0.75, method=feature_extraction_algo)
mapped_features_image_knn = cv2.drawMatches(train_photo_crop, keypoints_train_img, query_photo_crop, keypoints_query_img, np.random.choice(matches,50),None,flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(mapped_features_image)
plt.axis('off')
plt.savefig("./Stitching/" + feature_to_match + "_matching_img_log_"+'.jpeg', bbox_inches='tight', dpi=300, optimize=True, format='jpeg')
plt.show()
feature_to_match = 'knn'
print("Drawing: {} matched features Lines".format(feature_to_match))
fig = plt.figure(figsize=(20,8))
if feature_to_match == 'bf':
matches = key_points_matching(features_train_img, features_query_img, method=feature_extraction_algo)
mapped_features_image = cv2.drawMatches(train_photo_crop,keypoints_train_img,query_photo_crop,keypoints_query_img,matches[:100],
None,flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
# Now for cross checking draw the feature-mapping lines also with KNN
elif feature_to_match == 'knn':
matches = key_points_matching_KNN(features_train_img, features_query_img, ratio=0.75, method=feature_extraction_algo)
mapped_features_image_knn = cv2.drawMatches(train_photo_crop, keypoints_train_img, query_photo_crop, keypoints_query_img, np.random.choice(matches,100),None,flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(mapped_features_image_knn)
plt.axis('off')
plt.savefig("./Stitching/" + feature_to_match + "_Images"+'.jpg', bbox_inches='tight', dpi=300, optimize=True, format='jpg')
plt.show()
def homography_stitching(keypoints_train_img, keypoints_query_img, matches, reprojThresh):
keypoints_train_img = np.float32([keypoint.pt for keypoint in keypoints_train_img])
keypoints_query_img = np.float32([keypoint.pt for keypoint in keypoints_query_img])
''' For findHomography() - I need to have an assumption of a minimum of correspondence points that are present between the 2 images. Here, I am assuming that Minimum Match Count to be 4 '''
if len(matches) > 4:
# construct the two sets of points
points_train = np.float32([keypoints_train_img[m.queryIdx] for m in matches])
points_query = np.float32([keypoints_query_img[m.trainIdx] for m in matches])
# Calculate the homography between the sets of points
(H, status) = cv2.findHomography(points_train, points_query, cv2.RANSAC, reprojThresh)
return (matches, H, status)
else:
return None
M = homography_stitching(keypoints_train_img, keypoints_query_img, matches, reprojThresh=4)
if M is None:
print("Error!")
(matches, Homography_Matrix, status) = M
print(Homography_Matrix)
#Finally, we can apply our transformation by calling the cv2.warpPerspective function. The first parameter is our
# original image that we want to warp,
#the second is our transformation matrix M (which will be obtained from homography_stitching),
#and the final parameter is a tuple, used to indicate the width and height of the output image.
# For the calculation of the width and height of the final horizontal panoramic images
# I can just add the widths of the individual images and for the height
# I can take the max from the 2 individual images.
width = query_photo_crop.shape[1] + train_photo_crop.shape[1]
print("width ", width)
# 2922 - Which is exactly the sum value of the width of
# my train.jpg and query.jpg
height = max(query_photo_crop.shape[0], train_photo_crop.shape[0])
# otherwise, apply a perspective warp to stitch the images together
# Now just plug that "Homography_Matrix" into cv::warpedPerspective and I shall have a warped image1 into image2 frame
result = cv2.warpPerspective(train_photo_crop, Homography_Matrix, (width, height))
# The warpPerspective() function returns an image or video whose size is the same as the size of the original image or video. Hence set the pixels as per my query_photo
result[0:query_photo_crop.shape[0], 0:query_photo_crop.shape[1]] = query_photo_crop
plt.figure(figsize=(20,10))
plt.axis('off')
plt.imshow(result)
imageio.imwrite("./Stitching/Images/Log771/finishedLog"+'.jpg', result)
plt.show()
Simple method:
OpenCV makes it easy for you with the help of cv2.Stitcher_create module. Everything is handled internally, right from identifying key feature points to matching them appropriately and finally warping the images. You can pass in more than 2 images to be stitched. But I must warn you the greater the number of images and/or their dimensions; the greater the time needed for computation.
How to use cv2.Stitcher_create module?
First, we need to create an instance of the class.
imageStitcher = cv2.Stitcher_create()
To get a list of all the functions associated with this class just type help(imageStitcher). It will return a list of all the functions along with its required input parameters and expected output.
The created instance contains a function stitch() which is used to create the panorama. stitch can be used in either of the following two ways:
ret, panorama_image = stitch(images):
Pass the list of all the images and the module identifies key features, matches them and yields the warped image
ret, panorama_image = stitch(images, masks):
Optionally, we can also pass a list of masks, with one mask corresponding to every image. A mask is a binary image comprising of black and white. The module looks for keypoints/features only in the white regions of every mask, proceeds to match them and yields the warped image.
Both the above ways also returns a variable ret, if the value is 0 it means stitching was performed without any issues.
The following code sample (which I borrowed) shows the first method:
Sample images to be stitched:
Code:
import os
import cv2
# path containing images to be stitched
path = 'stitching_images'
# append all the images within the path to a list
images = []
for image_file in os.listdir(path):
if image_file.endswith ('.jpg'):
img = cv2.imread(os.path.join(path, image_file))
images.append(img)
# creating instance of stitcher class
imageStitcher = cv2.Stitcher_create()
# call the 'stitch' function and pass in the list of images
status, stitched_img = imageStitcher.stitch(images)
# display the panorama if stitching is successful
if status = 0:
cv2.imshow('Panorama', stitched_img)
Result:
(Code and images borrowed from https://github.com/niconielsen32/ComputerVision/tree/master/imageStitching)
Hard method:
If you want to create a panorama using your code, I would suggest performing it sequentially:
Iterate through all the images in your collection (say A, B, C, D, E)
For each pair of images, find keypoints and match them. (AB, AC, AD, AE, BC, BD, etc..)
Select the pair where keypoint matches are the highest and stitch them using homography (say images A and C are warped to P1)
Next find keypoint matches between the stitched image and every other image in your collection (P1B, P1D, P1E)
Find the pair with most number of matches and stitch them
Repeat likewise

Detect filled checkboxes

I am using boxdetect 1.0.0 to detect the coordinates of the checkboxes.
I am directly using get_checkboxes method with the minor configurations changes
It is detecting this type of checkboxes
But not able to detect in these cases
Any preprocessing suggestions to make the second type of checkbox detect? or any other type of method which can detect these kinds of checkboxes
Code snippet
#pip install boxdetect
from boxdetect.pipelines import get_checkboxes
from boxdetect import config
def default_checkbox_config():
cfg = config.PipelinesConfig()
# important to adjust these values to match the size of boxes on your image
cfg.width_range = (5,35)
cfg.height_range = (5,35)
# the more scaling factors the more accurate the results but also it takes more time to processing
# too small scaling factor may cause false positives
# too big scaling factor will take a lot of processing time
cfg.scaling_factors = [10]
# w/h ratio range for boxes/rectangles filtering
cfg.wh_ratio_range = (0.5, 1.7)
# group_size_range starting from 2 will skip all the groups
# with a single box detected inside (like checkboxes)
cfg.group_size_range = (2, 100)
# num of iterations when running dilation tranformation (to engance the image)
cfg.dilation_iterations = 0
return cfg
def divide_checkbox(checkboxes,crop_image_file,pdf_file_name):
img = cv2.imread(crop_image_file)
checkbox_counter = 0
for checkbox in checkboxes:
checkbox_counter+=1
cropped = img[checkbox[0][1]:checkbox[0][1]+checkbox[0][3],checkbox[0][0]:checkbox[0][2]+checkbox[0][0]]
# mpimg.imsave("checkbox_image/out{}{}.png".format(pdf_file_name,checkbox_counter), cropped)
plt.imshow(cropped)
plt.show()
def get_all_checkboxes(crop_image_file,pdf_file_name):
cfg = default_checkbox_config()
checkboxes = get_checkboxes(
crop_image_file, cfg=cfg, px_threshold=0.1, plot=False, verbose=True)
divide_checkbox(checkboxes,crop_image_file,pdf_file_name)
try:
pdf_file_name = "something"
crop_image_file= "shady_tick3.png" #just give your image path
get_all_checkboxes(crop_image_file,pdf_file_name)
except:
print(traceback.format_exc())

OpenCV: Find similar object with contours matching

I have a bundle of different images with birds but some of them contains feathers, slightly visible birds etc. I need to find images where the bird visible well and remove images with feathers, far distant birds, etc.
I already tried ORB, simple template matching, Canny edge detection. And I cannot use neural nets.
Now i try with such algorithm:
Binarize template image to get shapes
Slide window over another binarized image with sliding window and calculate matchShape with template in every window
Find best match
As you can see this method gives me strange result
Binary template
.
Shape on the other binary image, for example:
I calculated matchShapes in different parts of this image and the best result ~ 0.05 I got in this part:
which is obviously not similar to original shape.
Code for sliding window:
import cv2
OFFSET = 5
SCALE_RATIO = [0.5, 1]
def get_scaled_list(img_path, template):
matcher_list = []
img = cv2.imread(img_path)
#JUST BINARIZATION AND RESIZING
img = preprocess(resize_image(img))
height, width = img.shape
# building size of scale window
for scaler in SCALE_RATIO:
x_point = 0
y_point = 0
x1_point = int(width * scaler)
y1_point = x1_point
if x1_point > height:
y1_point = height
while y1_point <= height:
while x1_point <= width:
img1 = img[y_point:y1_point, x_point:x1_point]
#Comparing template and part of image
diff = cv2.matchShapes(template, img1, cv2.CONTOURS_MATCH_I1, 0)
data_tuple = (img_path, x_point, y_point, int(width * scaler), diff)
matcher_list.append(data_tuple)
x_point += OFFSET
x1_point += OFFSET
x_point = 0
x1_point = int(width * scaler)
y_point += OFFSET
y1_point += OFFSET
return matcher_list
How can I perform correct shape matching and why is the best result performs here?
The naive window sliding method with a rigid template will work very poorly. In particular, the sizes are different making correct overlap impossible.
What you are trying to achieve is difficult because you only have edge information and the edges are complex, broken in several independent arcs and with junctions.
You can find many solutions when you have a single closed curve (lookup "elastic contour matching" for instance), but not for your case. This would be a case of "approximate elastic graph matching".
Other possible approaches are by special distance functions such as the chamfer or Hausdorff distances, but you can still be stuck because of the size mismatch.

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.

Segmentation Fault using GRIP to process images in Python

I'm using Python to get images from an IP camera over an ethernet connection, and then process them looking for specific targets. I am using GRIP to generate code to look for the specific targeted areas. (For those unfamiliar with GRIP: it basically offers you a GUI desktop interface where you can see a live video feed and alter parameters until you get the desired output. Then you can auto generate a piece of code—mine is in Python—that will perform that processing 'pipeline' on any image you feed into it in your code).
After extensively debugging my connection code, I finally got a successful working connection that gets the image from the IP camera and send it into the GRIP pipeline. However, the processing of the image is failing, and it's returning a Segmentation Fault, with no indicated line numbers. Here is the pipeline code (auto generated):
import cv2
import numpy
import math
from enum import Enum
class GripPipeline:
"""
An OpenCV pipeline generated by GRIP.
"""
def __init__(self):
"""initializes all values to presets or None if need to be set
"""
self.__blur_type = BlurType.Median_Filter
self.__blur_radius = 19.81981981981982
self.blur_output = None
self.__hsv_threshold_input = self.blur_output
self.__hsv_threshold_hue = [72.84172661870504, 86.31399317406144]
self.__hsv_threshold_saturation = [199.50539568345323, 255.0]
self.__hsv_threshold_value = [89.43345323741006, 255.0]
self.hsv_threshold_output = None
self.__find_contours_input = self.hsv_threshold_output
self.__find_contours_external_only = False
self.find_contours_output = None
self.__filter_contours_contours = self.find_contours_output
self.__filter_contours_min_area = 500.0
self.__filter_contours_min_perimeter = 0.0
self.__filter_contours_min_width = 0.0
self.__filter_contours_max_width = 1000.0
self.__filter_contours_min_height = 0.0
self.__filter_contours_max_height = 1000.0
self.__filter_contours_solidity = [0, 100]
self.__filter_contours_max_vertices = 1000000.0
self.__filter_contours_min_vertices = 0.0
self.__filter_contours_min_ratio = 0.0
self.__filter_contours_max_ratio = 1000.0
self.filter_contours_output = None
def process(self, source0):
"""
Runs the pipeline and sets all outputs to new values.
"""
# Step Blur0:
self.__blur_input = source0
(self.blur_output) = self.__blur(self.__blur_input, self.__blur_type, self.__blur_radius)
# Step HSV_Threshold0:
self.__hsv_threshold_input = self.blur_output
(self.hsv_threshold_output) = self.__hsv_threshold(self.__hsv_threshold_input, self.__hsv_threshold_hue, self.__hsv_threshold_saturation, self.__hsv_threshold_value)
# Step Find_Contours0:
self.__find_contours_input = self.hsv_threshold_output
(self.find_contours_output) = self.__find_contours(self.__find_contours_input, self.__find_contours_external_only)
# Step Filter_Contours0:
self.__filter_contours_contours = self.find_contours_output
(self.filter_contours_output) = self.__filter_contours(self.__filter_contours_contours, self.__filter_contours_min_area, self.__filter_contours_min_perimeter, self.__filter_contours_min_width, self.__filter_contours_max_width, self.__filter_contours_min_height, self.__filter_contours_max_height, self.__filter_contours_solidity, self.__filter_contours_max_vertices, self.__filter_contours_min_vertices, self.__filter_contours_min_ratio, self.__filter_contours_max_ratio)
#staticmethod
def __blur(src, type, radius):
"""Softens an image using one of several filters.
Args:
src: The source mat (numpy.ndarray).
type: The blurType to perform represented as an int.
radius: The radius for the blur as a float.
Returns:
A numpy.ndarray that has been blurred.
"""
if(type is BlurType.Box_Blur):
ksize = int(2 * round(radius) + 1)
return cv2.blur(src, (ksize, ksize))
elif(type is BlurType.Gaussian_Blur):
ksize = int(6 * round(radius) + 1)
return cv2.GaussianBlur(src, (ksize, ksize), round(radius))
elif(type is BlurType.Median_Filter):
ksize = int(2 * round(radius) + 1)
return cv2.medianBlur(src, ksize)
else:
return cv2.bilateralFilter(src, -1, round(radius), round(radius))
#staticmethod
def __hsv_threshold(input, hue, sat, val):
"""Segment an image based on hue, saturation, and value ranges.
Args:
input: A BGR numpy.ndarray.
hue: A list of two numbers the are the min and max hue.
sat: A list of two numbers the are the min and max saturation.
lum: A list of two numbers the are the min and max value.
Returns:
A black and white numpy.ndarray.
"""
out = cv2.cvtColor(input, cv2.COLOR_BGR2HSV)
return cv2.inRange(out, (hue[0], sat[0], val[0]), (hue[1], sat[1], val[1]))
#staticmethod
def __find_contours(input, external_only):
"""Sets the values of pixels in a binary image to their distance to the nearest black pixel.
Args:
input: A numpy.ndarray.
external_only: A boolean. If true only external contours are found.
Return:
A list of numpy.ndarray where each one represents a contour.
"""
if(external_only):
mode = cv2.RETR_EXTERNAL
else:
mode = cv2.RETR_LIST
method = cv2.CHAIN_APPROX_SIMPLE
im2, contours, hierarchy =cv2.findContours(input, mode=mode, method=method)
return contours
#staticmethod
def __filter_contours(input_contours, min_area, min_perimeter, min_width, max_width,
min_height, max_height, solidity, max_vertex_count, min_vertex_count,
min_ratio, max_ratio):
"""Filters out contours that do not meet certain criteria.
Args:
input_contours: Contours as a list of numpy.ndarray.
min_area: The minimum area of a contour that will be kept.
min_perimeter: The minimum perimeter of a contour that will be kept.
min_width: Minimum width of a contour.
max_width: MaxWidth maximum width.
min_height: Minimum height.
max_height: Maximimum height.
solidity: The minimum and maximum solidity of a contour.
min_vertex_count: Minimum vertex Count of the contours.
max_vertex_count: Maximum vertex Count.
min_ratio: Minimum ratio of width to height.
max_ratio: Maximum ratio of width to height.
Returns:
Contours as a list of numpy.ndarray.
"""
output = []
for contour in input_contours:
x,y,w,h = cv2.boundingRect(contour)
if (w < min_width or w > max_width):
continue
if (h < min_height or h > max_height):
continue
area = cv2.contourArea(contour)
if (area < min_area):
continue
if (cv2.arcLength(contour, True) < min_perimeter):
continue
hull = cv2.convexHull(contour)
solid = 100 * area / cv2.contourArea(hull)
if (solid < solidity[0] or solid > solidity[1]):
continue
if (len(contour) < min_vertex_count or len(contour) > max_vertex_count):
continue
ratio = (float)(w) / h
if (ratio < min_ratio or ratio > max_ratio):
continue
output.append(contour)
return output
BlurType = Enum('BlurType', 'Box_Blur Gaussian_Blur Median_Filter Bilateral_Filter')
I realize that that is long, however I am less familiar with Python than other languages, so I wanted to offer all of it in the case that someone with much more Python experience might be able to spot some error in it.
Here is my code that I have written to get the image and feed it into the pipeline:
import numpy
import math
import cv2
import urllib.request
from enum import Enum
from GripPipeline import GripPipeline
from networktables import NetworkTable
frame = cv2.VideoCapture('https://10.17.11.1')
pipeline = GripPipeline()
def get_image()
img_array = numpy.asarray(bytearray(frame.grab()))
return img_array
while True:
img = get_image()
pipeline.process(img) #where the Segmentation Fault occurs
Does anyone have any idea on what could be causing this or how to fix it?
EDIT: It turns out that the error is coming from something in the second line of the process method, but I still don't know what. If anyone sees any flaws in what's being called there please let me know.
Try getting frames as tutorial suggests. Note renaming frame to cap:
cap = cv2.VideoCapture('https://10.17.11.1')
pipeline = GripPipeline()
while True:
ret, img = cap.read()
pipeline.process(img)

Categories

Resources