Detect filled checkboxes - python

I am using boxdetect 1.0.0 to detect the coordinates of the checkboxes.
I am directly using get_checkboxes method with the minor configurations changes
It is detecting this type of checkboxes
But not able to detect in these cases
Any preprocessing suggestions to make the second type of checkbox detect? or any other type of method which can detect these kinds of checkboxes
Code snippet
#pip install boxdetect
from boxdetect.pipelines import get_checkboxes
from boxdetect import config
def default_checkbox_config():
cfg = config.PipelinesConfig()
# important to adjust these values to match the size of boxes on your image
cfg.width_range = (5,35)
cfg.height_range = (5,35)
# the more scaling factors the more accurate the results but also it takes more time to processing
# too small scaling factor may cause false positives
# too big scaling factor will take a lot of processing time
cfg.scaling_factors = [10]
# w/h ratio range for boxes/rectangles filtering
cfg.wh_ratio_range = (0.5, 1.7)
# group_size_range starting from 2 will skip all the groups
# with a single box detected inside (like checkboxes)
cfg.group_size_range = (2, 100)
# num of iterations when running dilation tranformation (to engance the image)
cfg.dilation_iterations = 0
return cfg
def divide_checkbox(checkboxes,crop_image_file,pdf_file_name):
img = cv2.imread(crop_image_file)
checkbox_counter = 0
for checkbox in checkboxes:
checkbox_counter+=1
cropped = img[checkbox[0][1]:checkbox[0][1]+checkbox[0][3],checkbox[0][0]:checkbox[0][2]+checkbox[0][0]]
# mpimg.imsave("checkbox_image/out{}{}.png".format(pdf_file_name,checkbox_counter), cropped)
plt.imshow(cropped)
plt.show()
def get_all_checkboxes(crop_image_file,pdf_file_name):
cfg = default_checkbox_config()
checkboxes = get_checkboxes(
crop_image_file, cfg=cfg, px_threshold=0.1, plot=False, verbose=True)
divide_checkbox(checkboxes,crop_image_file,pdf_file_name)
try:
pdf_file_name = "something"
crop_image_file= "shady_tick3.png" #just give your image path
get_all_checkboxes(crop_image_file,pdf_file_name)
except:
print(traceback.format_exc())

Related

Multi-scale template match vs. Text Detection

I'm trying to automate the navigation of a website to grab data and download files using PyAutoGUI to detect images and buttons, but I'm having trouble using this on other people's computers. It seems to me that matching images of text is the biggest obstacle here.
I suspected the issue to be with scaling and resolution so I attempted using multi-scale template matching, but I found that using a template I upscaled wouldn't create a match at all. Using a template I downscaled didn't help either since it would either not find any matches, or find the wrong match even with a small range of confidences of 0.8-0.9.
Here's the original image at 74x17.
Here's the upscaled image at 348x80 (Windows Photo wouldn't let me upscale it any smaller for some reason).
Here's the downscaled image at 40x8.
Currently, with a downscaled image, PyAutoGUI is confusing the above image with this image:
Here's the code I wrote (and some I borrowed from someone.
Code for multi-scaling I borrowed:
# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8):
# Locate an image and return a pyscreeze box surrounding it.
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
templateim = pyscreeze._load_cv2(image,grayscale=gs) # loads the image
(tH, tW) = templateim.shape[:2] # changes the orientation
screenim_color = pyautogui.screenshot() # screenshot of image
screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)
# Checking if the locateOnScreen() is utilized with grayscale=True or not
if gs is True:
screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
else:
screenim = screenim_color
#try different scaling parameters and see which one matches best
found = None #bookeeping variable for the maximum correlation coefficient, position and scale
scalingrange = np.linspace(0.25,5,num=150)
for scale in scalingrange:
print("Trying another scale")
resizedtemplate = imutils.resize(templateim, width = int(templateim.shape[1]*scale) ) # resizing with imutils maintains the aspect ratio
r = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
(_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
if found is None or maxVal > found[0]:
found = (maxVal, maxLoc, r)
(maxVal, maxLoc, r) = found
if maxVal > confidence:
box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*r), int(tH*r) )
return box
else:
return None
def locate_center_with_scaling(image,gs=True):
loc = template_match_with_scaling(image,gs=gs)
if loc:
return pyautogui.center(loc)
else:
raise Exception("Image not found")
My code to match and click on a textbox next to its identifier:
while SKUnoCounter <= len(listOfSKUs):
while pyautogui.locateOnScreen('DescriptionBox-RESIZEDsmall.png', grayscale=True, confidence=0.8 ) is None:
print("Looking for Description Box.")
if locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png') is not None:
print("Found a resized version of Description Box. ")
#Calling to function
DB_x, DB_y = locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png')
#Clicking on Description text box
pyautogui.click( DB_x + 417, DB_y +12, button='left')
break
time.sleep(0.5)
Is it worthwhile to try and improve the accuracy of the multi-scale template matching if my goal is to use this across all kinds of computers? Would it be better to try using OCR to detect text instead of by image? My other idea here is to use PyTesseract to locate the text I'm searching for and then use those coordinates to click on things. Selenium does not work here as I need to work on an existing IE browser.
Any input here is greatly appreciated!
Following my comment above, this is how the modified function could look like
# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8, scalingrange=None):
# Locate an image and return a pyscreeze box surrounding it.
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
templateim = pyscreeze._load_cv2(image,grayscale=gs) # loads the image
(tH, tW) = templateim.shape[:2] # changes the orientation
screenim_color = pyautogui.screenshot() # screenshot of image
screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)
# Checking if the locateOnScreen() is utilized with grayscale=True or not
if gs is True:
screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
else:
screenim = screenim_color
#try different scaling parameters and see which one matches best
found = None #bookeeping variable for the maximum correlation coefficient, position and scale
for scalex in scalingrange:
width = int(templateim.shape[1] * scalex)
for scaley in scalingrange:
#print("Trying another scale")
#print(scalex,scaley)
height = int(templateim.shape[0] * scaley)
scaledsize = (width, height)
# resize image
resizedtemplate = cv2.resize(templateim, scaledsize)
#resizedtemplate = imutils.resize(templateim, width = int(templateim.shape[1]*scale) ) # resizing with imutils maintains the aspect ratio
ry = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
rx = float(resizedtemplate.shape[0])/templateim.shape[0] # recompute scaling factor
result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
(_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
if found is None or maxVal > found[0]:
found = (maxVal, maxLoc, rx, ry)
(maxVal, maxLoc, rx, ry) = found
print('maxVal= ', maxVal)
if maxVal > confidence:
box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*rx), int(tH*ry) )
return box
else:
return None
def locate_center_with_scaling(image,gs=True,**kwargs):
loc = template_match_with_scaling(image,gs=gs,**kwargs)
if loc:
return pyautogui.center(loc)
else:
raise Exception("Image not found")
im = 'DescriptionBox.png' # we will try to detect the small description box, whose width and height are scaled down by 0.54 and 0.47
unscaledLocation = pyautogui.locateOnScreen(im, grayscale=True, confidence=0.8 )
srange = np.linspace(0.4,0.6,num=20) #scale width and height in this range
if unscaledLocation is None:
print("Looking for Description Box.")
scaledLocation = locate_center_with_scaling(im, scalingrange= srange)
if scaledLocation is not None:
print(f'Found a resized version of Description Box at ({scaledLocation[0]},{scaledLocation[1]})')
pyautogui.moveTo(scaledLocation[0], scaledLocation[1])
We need to be mindful of two things:
template_match_with_scaling is now executing a double loop, one over each dimension so it will take some time to detect the template image. To amortize the detection time, we should save the scale parameters for width and height after the first detection, and scale all template images by these parameters for subsequent detections.
to be able to detect the template efficiently, we need to set the scalingrange input of template_match_with_scaling to an appropriate range of values. If the range is either small or doesn't have enough values, we will not be able to detect the template. If it is too large, detection time will be large.

How to do interactive image binarization using trackbars?

I have a code which gives me binary images using Otsu thresholding. I am making a dataset for a U-Net, and I want to try different algorithms (global as well as local) for the same, so that I can save the "best" image. Below is the code for my image binarization.
import cv2
import numpy as np
import skimage.filters as filters
img = cv2.imread('math.png') # read the image
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # convert to gray
smooth = cv2.GaussianBlur(gray, (95,95), 0) # blur
division = cv2.divide(gray, smooth, scale=255) # divide gray by morphology image
# Add Morphology here. Dilation, Erosion, Opening, Closing or so.
sharp = filters.unsharp_mask(division, radius=1.5, amount=1.5, multichannel=False, preserve_range=False) # sharpen using unsharp masking
sharp = (255*sharp).clip(0,255).astype(np.uint8)
thresh = cv2.threshold(sharp, 0, 255, cv2.THRESH_OTSU )[1] # threshold
I am getting pretty decent results with a broader area, but I want to use cv2.namedWindow, cv2.createTrackbar, cv2.getTrackbarPos, so that I can set the values of radius, amount, kernel, dilation, erosion, etc. by using below functions.
cv2.namedWindow('Tracking Window')
cv2.createTrackbar('param1','Tracking Window',0,255,dummy) # dummy is just a dummy function which returns None
param1 = cv2.getTrackbarPos('param1','Tracking Window')
How can I get all those. Also, how can I save the image when I press s, open next image by pressing n?
Original Question was posted by me 6 months ago
The code of my solution got longer than expected, but it offers some fancy manipulation possibilities. First of all, let's the see the actual window:
There are sliders for
the morphological operation (dilate, erode, close, open),
the structuring element (rectangle, ellipse, cross), and
the kernel size (here: limited to the range 1 ... 21).
The window name reflects the current settings for the first two sliders:
When pressing s, the image is saved incorporating the current settings:
Saved image as Erosion_Ellipsoid_SLEM_11.png.
When pressing n, the next image from a list is selected.
At any time, when pressing q, the application is exited. It ends automatically after processing all images from the list.
Before and after the interactive part, you can add any operation you want, cf. the code.
And, here's the full code:
import cv2
# Collect morphological operations
morphs = [cv2.MORPH_DILATE, cv2.MORPH_ERODE, cv2.MORPH_CLOSE, cv2.MORPH_OPEN]
# Collect some texts for later
morph_texts = {
cv2.MORPH_DILATE: 'Dilation',
cv2.MORPH_ERODE: 'Erosion',
cv2.MORPH_CLOSE: 'Closing',
cv2.MORPH_OPEN: 'Opening'
}
# Collect structuring elements
slems = [cv2.MORPH_RECT, cv2.MORPH_ELLIPSE, cv2.MORPH_CROSS]
# Collect some texts for later
slem_texts = {
cv2.MORPH_RECT: 'Rectangular SLEM',
cv2.MORPH_ELLIPSE: 'Ellipsoid SLEM',
cv2.MORPH_CROSS: 'Cross SLEM'
}
# Collect images
images = [...]
# Set up maximum values for each slider
max_morph = len(morphs) - 1
max_slem = len(slems) - 1
max_ks = 21
# Set up initial values for each slider
morph = 0
slem = 0
ks = 1
# Set up initial working image
temp = None
# Set up initial window name
title_window = 'Interactive {} with {}'.format(morph_texts[morphs[morph]],
slem_texts[slems[slem]])
# Triggered when any slider is manipulated
def on_trackbar(unused):
global image, ks, morph, slem, temp, title_window
# Get current slider values
morph = cv2.getTrackbarPos('Operation', title_window)
slem = cv2.getTrackbarPos('SLEM', title_window)
ks = cv2.getTrackbarPos('Kernel size', title_window)
# Reset window name
cv2.setWindowTitle(title_window, 'Interactive {} with {}'.
format(morph_texts[morphs[morph]],
slem_texts[slems[slem]]))
# Get current morphological operation and structuring element
op = morphs[morph]
sl = cv2.getStructuringElement(slems[slem], (ks, ks))
# Actual morphological operation
temp = cv2.morphologyEx(image.copy(), op, sl)
# Show manipulated image with current settings
cv2.imshow(title_window, temp)
# Iterate images
for image in images:
# Here go your steps before the interactive part
# ...
image = cv2.threshold(cv2.cvtColor(image.copy(), cv2.COLOR_BGR2GRAY),
0, 255, cv2.THRESH_OTSU)[1]
# Here starts the interactive part
cv2.namedWindow(title_window)
cv2.createTrackbar('Operation', title_window, morph, max_morph, on_trackbar)
cv2.createTrackbar('SLEM', title_window, slem, max_slem, on_trackbar)
cv2.createTrackbar('Kernel size', title_window, ks, max_ks, on_trackbar)
cv2.setTrackbarMin('Kernel size', title_window, 1)
on_trackbar(0)
k = cv2.waitKey(0)
# Exit everytime on pressing q
while k != ord('q'):
# Save image on pressing s
if k == ord('s'):
# Here go your steps after the interactive part, but before saving
# ...
filename = '{} {} {}.png'.\
format(morph_texts[morphs[morph]],
slem_texts[slems[slem]],
ks).replace(' ', '_')
cv2.imwrite(filename, temp)
print('Saved image as {}.'.format(filename))
# Go to next image on pressing n
elif k == ord('n'):
print('Next image')
break
# Continue if any other key was pressed
k = cv2.waitKey(0)
# Actual exiting
if k == ord('q'):
break
Hopefully, the code is self-explanatory. If not, don't hesitate to ask questions. You should be able to easily add every slider you additionally need by yourself, e.g. for the filters.unsharp_mask stuff.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
OpenCV: 4.5.1
----------------------------------------

How can I get the dimensions of a picture placeholder to re-size an image when creating a presentation and inserting a picture using python-pptx?

I'm trying to insert a picture that is re-sized to fit the dimensions of the picture placeholder from a template using python-pptx. I don't believe the API has direct access to this from what I can find out in the docs. Is there any suggestion of how I might be able to do this, using the library or other?
I have a running code that will insert a series of images into a set of template slides to automatically create a report using Powerpoint.
Here is the function that is doing the majority of the work relevant. Other parts of the app are creating the Presentation and inserting a slide etc.
def insert_images(slide, slide_num, images_path, image_df):
"""
Insert images into a slide.
:param slide: = slide object from Presentation class
:param slide_num: the template slide number for formatting
:param images_path: the directory to the folder with all the images
:param image_df: Pandas data frame regarding information of each image in images_path
:return: None
"""
placeholders = get_image_placeholders(slide)
#print(placeholders)
image_pool = image_df[image_df['slide_num'] == slide_num]
try:
assert len(placeholders) == len(image_pool.index)
except AssertionError:
print('Length of placeholders in slide does not match image naming.')
i = 0
for idx, image in image_pool.iterrows():
#print(image)
image_path = os.path.join(images_path, image.path)
pic = slide.placeholders[placeholders[i]].insert_picture(image_path)
#print(image.path)
# TODO: Add resize - get dimensions of pic placeholder
line = pic.line
print(image['view'])
if image['view'] == 'red':
line.color.rgb = RGBColor(255, 0, 0)
elif image['view'] == 'green':
line.color.rgb = RGBColor(0, 255, 0)
elif image['view'] == 'blue':
line.color.rgb = RGBColor(0, 0, 255)
else:
line.color.rgb = RGBColor(0, 0, 0)
line.width = Pt(2.25)
i+=1
The issue is that when I insert a picture into the picture placeholder, the image is cropped, not re-sized. I don't want the user to know the dimensions to hard code into my script. If the image used is relatively large it can crop a very large portion and just not be usable.
The picture object returned by PicturePlaceholder.insert_picture() has the same position and size as the placeholder it derives from. It is cropped to completely fill that space. Either the tops and bottoms are cropped or the left and right sides, depending on the relative aspect ratio of the placeholder and the image you insert. This is the same behavior PowerPoint exhibits when you insert a picture into a picture placeholder.
If you want to remove the cropping, simply set all cropping values to 0:
picture = placeholder.insert_picture(...)
picture.crop_top = 0
picture.crop_left = 0
picture.crop_bottom = 0
picture.crop_right = 0
This will not change the position (of the top-left corner) but will almost always change the size, making it either wider or taller (but not both).
So this solves the first problem easily, but of course presents you with a second one, which is how to position the picture where you want it and how to scale it appropriately without changing the aspect ratio (stretching or squeezing it).
This depends a great deal on what you're trying to accomplish and what outcome you find most pleasing. This is why it is not automatic; it's just not possible to predict.
You can find the "native" width and height of the image like this:
width, height = picture.image.size # ---width and height are int pixel-counts
From there you'll need to compare aspect ratios of the original placeholder and the image you inserted and either adjust the width or height of the picture shape.
So say you wanted to keep the same position but maintain the width and height of the placeholder as respective maximums such that the entire picture fits in the space but has a "margin" either on the bottom or the right:
available_width = picture.width
available_height = picture.height
image_width, image_height = picture.image.size
placeholder_aspect_ratio = float(available_width) / float(available_height)
image_aspect_ratio = float(image_width) / float(image_height)
# Get initial image placeholder left and top positions
pos_left, pos_top = picture.left, picture.top
picture.crop_top = 0
picture.crop_left = 0
picture.crop_bottom = 0
picture.crop_right = 0
# ---if the placeholder is "wider" in aspect, shrink the picture width while
# ---maintaining the image aspect ratio
if placeholder_aspect_ratio > image_aspect_ratio:
picture.width = int(image_aspect_ratio * available_height)
picture.height = available_height
# ---otherwise shrink the height
else:
picture.height = int(available_width/image_aspect_ratio)
picture.width = available_width
# Set the picture left and top position to the initial placeholder one
picture.left, picture.top = pos_left, pos_top
# Or if we want to center it vertically:
# picture.top = picture.top + int(picture.height/2)
This could be elaborated to "center" the image within the original space and perhaps to use "negative cropping" to retain the original placeholder size.
I haven't tested this and you might need to make some adjustments, but hopefully this gives you an idea how to proceed. This would be a good thing to extract to its own function, like adjust_picture_to_fit(picture).
This worked for me. My image is larger than the placeholder (slide.shapes[2]).
picture = slide.shapes[2].insert_picture(img_path)
picture.crop_top = 0
picture.crop_left = 0
picture.crop_bottom = 0
picture.crop_right = 0

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.

How can i find cycles in a skeleton image with python libraries?

I have many skeletonized images like this:
How can i detect a cycle, a loop in the skeleton?
Are there "special" functions that do this or should I implement it as a graph?
In case there is only the graph option, can the python graph library NetworkX can help me?
You can exploit the topology of the skeleton. A cycle will have no holes, so we can use scipy.ndimage to find any holes and compare. This isn't the fastest method, but it's extremely easy to code.
import scipy.misc, scipy.ndimage
# Read the image
img = scipy.misc.imread("Skel.png")
# Retain only the skeleton
img[img!=255] = 0
img = img.astype(bool)
# Fill the holes
img2 = scipy.ndimage.binary_fill_holes(img)
# Compare the two, an image without cycles will have no holes
print "Cycles in image: ", ~(img == img2).all()
# As a test break the cycles
img3 = img.copy()
img3[0:200, 0:200] = 0
img4 = scipy.ndimage.binary_fill_holes(img3)
# Compare the two, an image without cycles will have no holes
print "Cycles in image: ", ~(img3 == img4).all()
I've used your "B" picture as an example. The first two images are the original and the filled version which detects a cycle. In the second version, I've broken the cycle and nothing gets filled, thus the two images are the same.
First, let's build an image of the letter B with PIL:
import Image, ImageDraw, ImageFont
image = Image.new("RGBA", (600,150), (255,255,255))
draw = ImageDraw.Draw(image)
fontsize = 150
font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf", fontsize)
txt = 'B'
draw.text((30, 5), txt, (0,0,0), font=font)
img = image.resize((188,45), Image.ANTIALIAS)
print type(img)
plt.imshow(img)
you may find a better way to do that, particularly with path to the fonts. Ii would be better to load an image instead of generating it. Anyway, we have now something to work on:
Now, the real part:
import mahotas as mh
img = np.array(img)
im = img[:,0:50,0]
im = im < 128
skel = mh.thin(im)
noholes = mh.morph.close_holes(skel)
plt.subplot(311)
plt.imshow(im)
plt.subplot(312)
plt.imshow(skel)
plt.subplot(313)
cskel = np.logical_not(skel)
choles = np.logical_not(noholes)
holes = np.logical_and(cskel,noholes)
lab, n = mh.label(holes)
print 'B has %s holes'% str(n)
plt.imshow(lab)
And we have in the console (ipython):
B has 2 holes
Converting your skeleton image to a graph representation is not trivial, and I don't know of any tools to do that for you.
One way to do it in the bitmap would be to use a flood fill, like the paint bucket in photoshop. If you start a flood fill of the image, the entire background will get filled if there are no cycles. If the fill doesn't get the entire image then you've found a cycle. Robustly finding all the cycles could require filling multiple times.
This is likely to be very slow to execute, but probably much faster to code than a technique where you trace the skeleton into graph data structure.

Categories

Resources