the full error says:
cv2.error: OpenCV(4.1.0) C:\projects\opencv-python\opencv\modules\dnn\src\caffe\caffe_io.cpp:1132: error: (-2:Unspecified error) FAILED: fs.is_open(). Can't open "frozen_east_text_detection.pb" in function 'c
here i have the code:
import time
from imutils.object_detection import non_max_suppression
import numpy as np
import cv2
import argparse
imagePath = "C:/xampp/htdocs/Tensorflow/TextDetection-master/images/tabla1.jpg"
east = "frozen_east_text_detection.pb"
newW = 640
newH = 480
min_confidence = 0.5
image = cv2.imread(imagePath) # it is working variable
orig = image.copy()
(H,W) = image.shape[:2]
set the new width an height and then determine the ratio in change
rW = W / float(newW)
rH = H / float(newH)
resize the image and grab the new image dimensions
image = cv2.resize(image, (newW, newH))
(H,W) = image.shape[:2]
"""
In order to perform text detection using OpenCV and the EAST deep learning model,
we need to extract the output feature maps of TWO LAYERS
"""
# Define the TWO output layer names ofr the EAST detector model
# FIRST LAYER : output probabilities of a region containing text or not.
# SECOND LAYER: bounding box coordinates of text.
layerNames = [
"feature_fusion/Conv_7/Sigmoid",
"feature_fusion/concat_3"
]
print("[INFO] loading EAST detector")
net = cv2.dnn.readNet( east ) #Load the Nerual Network into memory
# construct a blob from the image and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage( #https://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/
image , 1.0 , (W,H) , (123.68, 116.78, 103.94) , swapRB=True, crop=False
)
start = time.time()
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
end = time.time()
print("[INFO] text detection took {:.6f} seconds".format(end-start))
# grab the umber of rows and columns from the scores volume, then
# initialize aour set of bounding box rectagles and corresponding
# cofidence scores
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
# loop over the number of rows
for y in range(0,numRows):
#extract the scores (probabilities), followed by the geometrical
#data used to derive potential bounding box coordinates that
#surround text
scoresData = scores[0, 0, y]
xData0 = geometry[0, 0, y]
xData1 = geometry[0, 1, y]
xData2 = geometry[0, 2, y]
xData3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
#loop over the number of columns
for x in range(0, numCols):
#i our score does not have sufficient probability, ignore it
if (scoresData[x] < min_confidence):
continue
# compute the offset factor as our resulting feature maps will
# be 4x smaller than the input image
(offsetX, offsetY) = (x * 4.0 , y * 4.0)
# extract the rotation angle for the prediction and then
# compute the sin and cosine
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
# use the geometry volume to derive the width and height of
# the bounding box
h = xData0[x] + xData2[x]
w = xData1[x] + xData3[x]
# compute both the starting and ending (x,y)-coordinates for
# the text prediction bounding box
endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]) )
endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]) )
startX = int(endX - w)
startY = int(endY - h)
#add the bounding box coordinates and probability score to
#our respective lists
rects.append( (startX, startY, endX, endY) )
confidences.append( scoresData[x] )
#apply non-maxima suppression to suppress weak, overlapping bounding boxes
boxes = non_max_suppression(np.array(rects) , probs=confidences) #imutils-> https://github.com/jrosebr1/imutils/blob/master/imutils/object_detection.py#L4
#loop over the bounding boxes
for (startX, startY, endX, endY) in boxes:
# scale the bounding box coordinates based on the respective ratios
startX = int(startX * rW)
startY = int(startY * rH)
endX = int(endX * rW)
endY = int(endY *rH)
#draw the bounding box on the image
cv2.rectangle( orig , (startX,startY) , (endX,endY) , (0,255,0) , 2 )
#show the outut image
cv2.imshow("Text detection",orig)
cv2.waitKey(0)
cv2.destroyAllWindows()
you can fix this by providing the absolute filepath of frozen_east_text_detection.pb from the main function of your code
Use this link to download model
https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
you can further refer to OpenCV official blog
https://learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/
Related
I am trying to create a GAN model which will remove watermark. After doing some homework, I got to this Google AI Blog which makes things worse. Thus I need to create a dataset from these websites Shutterstock, Adobe Stock, Fotolia and Canstock and manymore.
So, when I try to do same image using reverse image search. I founded out that the resolutions, images are changed which makes it more worse.
Thus, I'm only left to create a custom dataset doing the same watermark like from these websites and that's why I need to create same watermark like them on images from unsplash and so..
Can anyone please help me create same watermark which we can get from Shutterstock and Adobe Stock. It'd be a great help.
Note: I have gone through this link for watermark using Imagemagick but I need it in python. If someone can show me a way of doing the same in python. That'd be a great help.
EDIT1: If you look at this Example of Shutterstock. Zoom in and you will find that not only lines but text and rounded symbols are curved and also name and rounded symbol with different opacity. So, that's what I want to replicate.
Here is one way to do that in Python/OpenCV.
Read the input
Create an image of the text
Rotate the text image
Tile out the rotated text image to the size of the input
Blend the tiled, rotated text image with the input image
Save the output
Input:
import cv2
import numpy as np
import math
text = "WATERMARK"
thickness = 2
scale = 0.75
pad = 5
angle = -45
blend = 0.25
def rotate_bound(image, angle):
# function to rotate an image
# from https://github.com/PyImageSearch/imutils/blob/master/imutils/convenience.py
# grab the dimensions of the image and then determine the center
(h, w) = image.shape[:2]
(cX, cY) = (w / 2, h / 2)
# grab the rotation matrix (applying the negative of the
# angle to rotate clockwise), then grab the sine and cosine
# (i.e., the rotation components of the matrix)
M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
# compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
# adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
# perform the actual rotation and return the image
return cv2.warpAffine(image, M, (nW, nH))
# read image
photo = cv2.imread('lena.jpg')
ph, pw = photo.shape[:2]
# determine size for text image
(wd, ht), baseLine = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, scale, thickness)
print (wd, ht, baseLine)
# add text to black background image padded all around
pad2 = 2 * pad
text_img = np.zeros((ht+pad2,wd+pad2,3), dtype=np.uint8)
text_img = cv2.putText(text_img, text, (pad,ht+pad), cv2.FONT_HERSHEY_SIMPLEX, scale, (255,255,255), thickness)
# rotate text image
text_rot = rotate_bound(text_img, angle)
th, tw = text_rot.shape[:2]
# tile the rotated text image to the size of the input
xrepeats = math.ceil(pw/tw)
yrepeats = math.ceil(ph/th)
print(yrepeats,xrepeats)
tiled_text = np.tile(text_rot, (yrepeats,xrepeats,1))[0:ph, 0:pw]
# combine the text with the image
result = cv2.addWeighted(photo, 1, tiled_text, blend, 0)
# save results
cv2.imwrite("text_img.png", text_img)
cv2.imwrite("text_img_rot.png", text_rot)
cv2.imwrite("lena_tiled_rotated_text_img.jpg", result)
# show the results
cv2.imshow("text_img", text_img)
cv2.imshow("text_rot", text_rot)
cv2.imshow("tiled_text", tiled_text)
cv2.imshow("result", result)
cv2.waitKey(0)
Text Image:
Rotated Text Image:
Result:
Here is another variation in Python/OpenCV that does outline font for the watermark. I have made the font size larger so that the outline is more visible.
import cv2
import numpy as np
import math
text = "WATERMARK"
thickness = 2
scale = 1.5
pad = 5
angle = -45
blend = 0.4
# function to rotate an image
def rotate_bound(image, angle):
# from https://github.com/PyImageSearch/imutils/blob/master/imutils/convenience.py
# grab the dimensions of the image and then determine the center
(h, w) = image.shape[:2]
(cX, cY) = (w / 2, h / 2)
# grab the rotation matrix (applying the negative of the
# angle to rotate clockwise), then grab the sine and cosine
# (i.e., the rotation components of the matrix)
M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
# compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
# adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
# perform the actual rotation and return the image
return cv2.warpAffine(image, M, (nW, nH))
# read image
photo = cv2.imread('lena.jpg')
ph, pw = photo.shape[:2]
# determine size for text image
(wd, ht), baseLine = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, scale, thickness)
print (wd, ht, baseLine)
# add text to black background image padded all around
# write thicker white text and then write over that with thinner gray text to make outline text
pad2 = 2 * pad
text_img = np.zeros((ht+pad2,wd+pad2,3), dtype=np.uint8)
text_img = cv2.putText(text_img, text, (pad,ht+pad), cv2.FONT_HERSHEY_SIMPLEX, scale, (256,256,256), thickness+3)
text_img = cv2.putText(text_img, text, (pad,ht+pad), cv2.FONT_HERSHEY_SIMPLEX, scale, (128,128,128), thickness)
# rotate text image
text_rot = rotate_bound(text_img, angle)
th, tw = text_rot.shape[:2]
# tile the rotated text image to the size of the input
xrepeats = math.ceil(pw/tw)
yrepeats = math.ceil(ph/th)
print(yrepeats,xrepeats)
tiled_text = np.tile(text_rot, (yrepeats,xrepeats,1))[0:ph, 0:pw]
# combine the text with the image
#result = cv2.addWeighted(photo, 1, tiled_text, blend, 0)
mask = blend * cv2.threshold(tiled_text, 0, 255, cv2.THRESH_BINARY)[1]
result = (mask * tiled_text.astype(np.float64) + (255-mask)*photo.astype(np.float64))/255
result = result.clip(0,255).astype(np.uint8)
# save results
cv2.imwrite("text_img.png", text_img)
cv2.imwrite("text_img_rot.png", text_rot)
cv2.imwrite("lena_tiled_rotated_text_img2.jpg", result)
# show the results
cv2.imshow("text_img", text_img)
cv2.imshow("text_rot", text_rot)
cv2.imshow("tiled_text", tiled_text)
cv2.imshow("result", result)
cv2.waitKey(0)
Result:
I was wondering, given the type of interpolation that is used for image resizes using cv2.resize. How can I find out exactly where a particular pixel maps too? For example, if I'm increasing the size of an image using Linear_interpolation and I take coordinates (785, 251) for a particular pixel, regardless of whether or not the aspect ratio changes between the source image and resized image, how could I find out exactly to what coordinates the pixel in the source image with coordinates == (785, 251) maps in the resized version? I've looked over the internet for a solution but all solutions seem to be indirect methods of finding out where a pixel maps that don't actually work for different aspect ratio's:
https://answers.opencv.org/question/209827/resize-and-remap/
After resizing an image with cv2, how to get the new bounding box coordinate
Is there a way through cv2 to access the way pixels are mapped maybe and through reversing the script finding out the new coordinates?
The reason why I would like this is that I want to be able to create bounding boxes that give me back the same information regardless of the change in aspect ratio of a given image. Every method I've used so far doesn't give me back the same information. I figure that if I can figure out where the particular pixel coordinates of x,y top left and bottom right maps I can recreate an accurate bounding box regardless of aspect ratio changes.
Scaling the coordinates works when the center coordinate is (0, 0).
You may compute x_scaled and y_scaled as follows:
Subtract x_original_center and y_original_center from x_original and y_original.
After subtraction, (0, 0) is the "new center".
Scale the "zero centered" coordinates by scale_x and scale_y.
Convert the "scaled zero centered" coordinates to "top left (0, 0)" by adding x_scaled_center and y_scaled_center.
Computing the center accurately:
The Python conversion is:
(0, 0) is the top left, and (cols-1, rows-1) is the bottom right coordinate.
The accurate center coordinate is:
x_original_center = (original_rows-1)/2
y_original_center = (original_cols-1)/2
Python code (assume img is the original image):
resized_img = cv2.resize(img, [int(cols*scale_x), int(rows*scale_y)])
rows, cols = img.shape[0:2]
resized_rows, resized_cols = resized_img.shape[0:2]
x_original_center = (cols-1) / 2
y_original_center = (rows-1) / 2
x_scaled_center = (resized_cols-1) / 2
y_scaled_center = (resized_rows-1) / 2
# Subtract the center, scale, and add the "scaled center".
x_scaled = (x_original - x_original_center)*scale_x + x_scaled_center
y_scaled = (y_original - y_original_center)*scale_y + y_scaled_center
Testing
The following code sample draws crosses at few original and scaled coordinates:
import cv2
def draw_cross(im, x, y, use_color=False):
""" Draw a cross with center (x,y) - cross is two rows and two columns """
x = int(round(x - 0.5))
y = int(round(y - 0.5))
if use_color:
im[y-4:y+6, x] = [0, 0, 255]
im[y-4:y+6, x+1] = [255, 0, 0]
im[y, x-4:x+6] = [0, 0, 255]
im[y+1, x-4:x+6] = [255, 0, 0]
else:
im[y-4:y+6, x] = 0
im[y-4:y+6, x+1] = 255
im[y, x-4:x+6] = 0
im[y+1, x-4:x+6] = 255
img = cv2.imread('graf.png') # http://man.hubwiz.com/docset/OpenCV.docset/Contents/Resources/Documents/db/d70/tutorial_akaze_matching.html
rows, cols = img.shape[0:2] # cols = 320, rows = 256
# 3 points for testing:
x0_original, y0_original = cols//2-0.5, rows//2-0.5 # 159.5, 127.5
x1_original, y1_original = cols//5-0.5, rows//4-0.5 # 63.5, 63.5
x2_original, y2_original = (cols//5)*3+20-0.5, (rows//4)*3+30-0.5 # 211.5, 221.5
draw_cross(img, x0_original, y0_original) # Center of cross (159.5, 127.5)
draw_cross(img, x1_original, y1_original)
draw_cross(img, x2_original, y2_original)
scale_x = 2.5
scale_y = 2
resized_img = cv2.resize(img, [int(cols*scale_x), int(rows*scale_y)], interpolation=cv2.INTER_NEAREST)
resized_rows, resized_cols = resized_img.shape[0:2] # cols = 800, rows = 512
# Compute center column and center row
x_original_center = (cols-1) / 2 # 159.5
y_original_center = (rows-1) / 2 # 127.5
# Compute center of resized image
x_scaled_center = (resized_cols-1) / 2 # 399.5
y_scaled_center = (resized_rows-1) / 2 # 255.5
# Compute the destination coordinates after resize
x0_scaled = (x0_original - x_original_center)*scale_x + x_scaled_center # 399.5
y0_scaled = (y0_original - y_original_center)*scale_y + y_scaled_center # 255.5
x1_scaled = (x1_original - x_original_center)*scale_x + x_scaled_center # 159.5
y1_scaled = (y1_original - y_original_center)*scale_y + y_scaled_center # 127.5
x2_scaled = (x2_original - x_original_center)*scale_x + x_scaled_center # 529.5
y2_scaled = (y2_original - y_original_center)*scale_y + y_scaled_center # 443.5
# Draw crosses on resized image
draw_cross(resized_img, x0_scaled, y0_scaled, True)
draw_cross(resized_img, x1_scaled, y1_scaled, True)
draw_cross(resized_img, x2_scaled, y2_scaled, True)
cv2.imshow('img', img)
cv2.imshow('resized_img', resized_img)
cv2.waitKey()
cv2.destroyAllWindows()
Original image:
Resized image:
Making sure the crosses are aligned:
Note:
In my answer I was using the naming conventions of Miki's comment.
I'm trying to write a python script which will accept an image from the user and accept 6 points which are obtained by the user clicking on the image which gets the coordinates of the clicks. The first 4 coordinates are used to warp the image to make those points the corners. However all im getting is a white screen as an output.
transform.py
# import the necessary packages https://www.pyimagesearchcv2.waitKey(0).com/2014/08/25/4-point-opencv-getperspective-transform-example/
import numpy as np
import cv2
def order_points(pts):
# initialzie a list of coordinates that will be ordered
# such that the first entry in the list is the top-left,
# the second entry is the top-right, the third is the
# bottom-right, and the fourth is the bottom-left
rect = np.zeros((4, 2), dtype = "float32")
# the top-left point will have the smallest sum, whereas
# the bottom-right point will have the largest sum
s = pts.sum(axis = 1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# now, compute the difference between the points, the
# top-right point will have the smallest difference,r
# whereas the bottom-left will have the largest difference
diff = np.diff(pts, axis = 1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
# return the ordered coordinates
return rect
def four_point_transform(image, pts):
# obtain a consistent order of the points and unpack them
# individually
rect = order_points(pts)
(tl, tr, br, bl) = rect
for i in rect: print (i)
# compute the width of the new image, which will be the
# maximum distance between bottom-right and bottom-left
# x-coordiates or the top-right and top-left x-coordinates
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB))
# compute the height of the new image, which will be the
# maximum distance between the top-right and bottom-right
# y-coordinates or the top-left and bottom-left y-coordinates
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))
# now that we have the dimensions of the new image, construct
# the set of destination points to obtain a "birds eye view",
# (i.e. top-down view) of the image, again specifying points
# in the top-left, top-right, bottom-right, and bottom-left
# order
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")
# compute the perspective transform matrix and then apply it
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
# return the warped image
return warped
transform_example.py
# import the necessary packages
from transform import four_point_transform
import numpy as np
import argparse
import cv2
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", help = "path to the image file")
args = vars(ap.parse_args())
# load the image
image = cv2.imread(args["image"])
mouse_pts = []
def get_mouse_points(event, x, y, flags, param):
# Used to mark 4 points on the frame zero of the video that will be warped
# Used to mark 2 points on the frame zero of the video that are 6 feet away
global mouseX, mouseY, mouse_pts
if event == cv2.EVENT_LBUTTONDOWN:
mouseX, mouseY = x, y
cv2.circle(image, (x, y), 10, (0, 255, 255), 10)
if "mouse_pts" not in globals():
mouse_pts = []
mouse_pts.append((x, y))
print("Point detected")
print(mouse_pts)
cv2.namedWindow("persp")
cv2.setMouseCallback("persp", get_mouse_points)
cv2.imshow("persp", image)
while(True):
cv2.waitKey(1)
if len(mouse_pts) == 7:
cv2.destroyWindow("persp")
break
four_pts = np.array(mouse_pts[0:4], dtype="float32")
# apply the four point tranform to obtain a "birds eye view" of
# the image
warped = four_point_transform(image, four_pts)
# show warped images
cv2.imshow("persp", warped)
cv2.waitKey(0)
original image:
the points selected are on the fence on either side of the road. what i want is a kinda birds eye view of the bridge.
output image:
Thanks for the help!
Extracting table data from digital PDFs have been simple using camelot and tabula. However, the solution doesn't work with scanned images of the document pages specifically when the table doesn't have borders and inner grids. I have been trying to generate vertical and horizontal lines using OpenCV. However, since the scanned images will have slight rotation angles, it is difficult to proceed with the approach.
How can we utilize OpenCV to generate grids (horizontal and vertical lines) and borders for the scanned document page which contains table data (along with paragraphs of text)? If this is feasible, how to nullify the rotation angle of the scanned image?
I wrote some code to estimate the horizontal lines from the printed letters in the page. The same could be done for vertical ones I guess. The code below follows some general assumptions, here
some basic steps in pseudo code style:
prepare picture for contour detection
do contour detection
we assume most contours are letters
calc mean width of all contours
calc mean area of contours
filter all contours with two conditions:
a) contour (letter) heigths < meanHigh * 2
b) contour area > 4/5 meanArea
calc center point of all remaining contours
assume we have line regions (bins)
list all center point which are inside the region
do linear regression of region points
save slope and intercept
calc mean slope and intercept
here the full code:
import cv2
import numpy as np
from scipy import stats
def resizeImageByPercentage(img,scalePercent = 60):
width = int(img.shape[1] * scalePercent / 100)
height = int(img.shape[0] * scalePercent / 100)
dim = (width, height)
# resize image
return cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
def calcAverageContourWithAndHeigh(contourList):
hs = list()
ws = list()
for cnt in contourList:
(x, y, w, h) = cv2.boundingRect(cnt)
ws.append(w)
hs.append(h)
return np.mean(ws),np.mean(hs)
def calcAverageContourArea(contourList):
areaList = list()
for cnt in contourList:
a = cv2.minAreaRect(cnt)
areaList.append(a[2])
return np.mean(areaList)
def calcCentroid(contour):
houghMoments = cv2.moments(contour)
# calculate x,y coordinate of centroid
if houghMoments["m00"] != 0: #case no contour could be calculated
cX = int(houghMoments["m10"] / houghMoments["m00"])
cY = int(houghMoments["m01"] / houghMoments["m00"])
else:
# set values as what you need in the situation
cX, cY = -1, -1
return cX,cY
def getCentroidWhenSizeInRange(contourList,letterSizeWidth,letterSizeHigh,deltaOffset,minLetterArea=10.0):
centroidList=list()
for cnt in contourList:
(x, y, w, h) = cv2.boundingRect(cnt)
area = cv2.minAreaRect(cnt)
#calc diff
diffW = abs(w-letterSizeWidth)
diffH = abs(h-letterSizeHigh)
#thresold A: almost smaller than mean letter size +- offset
#when almost letterSize
if diffW < deltaOffset and diffH < deltaOffset:
#threshold B > min area
if area[2] > minLetterArea:
cX,cY = calcCentroid(cnt)
if cX!=-1 and cY!=-1:
centroidList.append((cX,cY))
return centroidList
DEBUGMODE = True
#read image, do git clone https://github.com/WZBSocialScienceCenter/pdftabextract.git for the example
img = cv2.imread('pdftabextract/examples/catalogue_30s/data/ALA1934_RR-excerpt.pdf-2_1.png')
#get some basic infos
imgHeigh, imgWidth, imgChannelAmount = img.shape
if DEBUGMODE:
cv2.imwrite("img00original.jpg",resizeImageByPercentage(img,30))
cv2.imshow("original",img)
# prepare img
imgGrey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply Gaussian filter
imgGaussianBlur = cv2.GaussianBlur(imgGrey,(5,5),0)
#make binary img, black or white
_, imgBinThres = cv2.threshold(imgGaussianBlur, 130, 255, cv2.THRESH_BINARY)
## detect contours
contours, _ = cv2.findContours(imgBinThres, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#we get some letter parameter
averageLetterWidth, averageLetterHigh = calcAverageContourWithAndHeigh(contours)
threshold1AllowedLetterSizeOffset = averageLetterHigh * 2 # double size
averageContourAreaSizeOfMinRect = calcAverageContourArea(contours)
threshHold2MinArea = 4 * averageContourAreaSizeOfMinRect / 5 # 4/5 * mean
print("mean letter Width: ", averageLetterWidth)
print("mean letter High: ", averageLetterHigh)
print("threshold 1 tolerance: ", threshold1AllowedLetterSizeOffset)
print("mean letter area ", averageContourAreaSizeOfMinRect)
print("thresold 2 min letter area ", threshHold2MinArea)
#we get all centroid of letter sizes contours, the other we ignore
centroidList = getCentroidWhenSizeInRange(contours,averageLetterWidth,averageLetterHigh,threshold1AllowedLetterSizeOffset,threshHold2MinArea)
if DEBUGMODE:
#debug print all centers:
imgFilteredCenter = img.copy()
for cX,cY in centroidList:
#draw in red color as BGR
cv2.circle(imgFilteredCenter, (cX, cY), 5, (0, 0, 255), -1)
cv2.imwrite("img01letterCenters.jpg",resizeImageByPercentage(imgFilteredCenter,30))
cv2.imshow("letterCenters",imgFilteredCenter)
#we estimate a bin widths
amountPixelFreeSpace = averageLetterHigh #TODO get better estimate out of histogram
estimatedBinWidth = round( averageLetterHigh + amountPixelFreeSpace) #TODO round better ?
binCollection = dict() #range(0,imgHeigh,estimatedBinWidth)
#we do seperate the center points into bins by y coordinate
for i in range(0,imgHeigh,estimatedBinWidth):
listCenterPointsInBin = list()
yMin = i
yMax = i + estimatedBinWidth
for cX,cY in centroidList:
if yMin < cY < yMax:#if fits in bin
listCenterPointsInBin.append((cX,cY))
binCollection[i] = listCenterPointsInBin
#we assume all point are in one line ?
#model = slope (x) + intercept
#model = m (x) + n
mList = list() #slope abs in img
nList = list() #intercept abs in img
nListRelative = list() #intercept relative to bin start
minAmountRegressionElements = 12 #is also alias for letter amount we expect
#we do regression for every point in the bin
for startYOfBin, values in binCollection.items():
#we reform values
xValues = [] #TODO use more short transform
yValues = []
for x,y in values:
xValues.append(x)
yValues.append(y)
#we assume a min limit of point in bin
if len(xValues) >= minAmountRegressionElements :
slope, intercept, r, p, std_err = stats.linregress(xValues, yValues)
mList.append(slope)
nList.append(intercept)
#we calc the relative intercept
nRelativeToBinStart = intercept - startYOfBin
nListRelative.append(nRelativeToBinStart)
if DEBUGMODE:
#we debug print all lines in one picute
imgLines = img.copy()
colorOfLine = (0, 255, 0) #green
for i in range(0,len(mList)):
slope = mList[i]
intercept = nList[i]
startPoint = (0, int( intercept)) #better round ?
endPointY = int( (slope * imgWidth + intercept) )
if endPointY < 0:
endPointY = 0
endPoint = (imgHeigh,endPointY)
cv2.line(imgLines, startPoint, endPoint, colorOfLine, 2)
cv2.imwrite("img02lines.jpg",resizeImageByPercentage(imgLines,30))
cv2.imshow("linesOfLetters ",imgLines)
#we assume in mean we got it right
meanIntercept = np.mean(nListRelative)
meanSlope = np.mean(mList)
print("meanIntercept :", meanIntercept)
print("meanSlope ", meanSlope)
#TODO calc angle with math.atan(slope) ...
if DEBUGMODE:
cv2.waitKey(0)
original:
center point of letters:
lines:
I had the same problem some time ago and this tutorial is the solution to that. It explains using pdftabextract which is a Python library by Markus Konrad and leverages OpenCV’s Hough transform to detect the lines and works even if the scanned document is a bit tilted. The tutorial walks your through parsing a 1920s German newspaper
I a working on a text recognition project.
I have built a classifier using TensorFlow to predict digits but I would like to implement a more complex algorithm of text recognition by using text localization and text segmentation (separating each character) but I didn't find an implementation for those parts of the algorithms.
So, do you know some algorithms/implementation/tips I, using TensorFlow, to localize text and do text segmentation in natural scenes pictures (actually localize and segmentation of text in the scoreboard for sports pictures)?
Thank you very much for any help.
To group elements on a page, like paragraphs of text and images, you can use some clustering algo, and/or blob detection with some tresholds.
You can use Radon transform to recognize lines and detect skew of a scanned page.
I think that for character separation you will have to mess with fonts. Some polynomial matching/fitting or something. (this is a very wild guess for now, don't take it seriously).
But similar aproach would allow you to get the character out of the line and recognize it in same step.
As for recognition, once you have a character, there is a nice trigonometric trick of comparing angles of the character to the angles stored in a database.
Works great on handwriting too.
I am not an expert on how page segmentation exactly works, but it seems that I am on my way to become one. Just working on a project including it.
So give me a month and I'll be able to tell you more. :D
Anyway, you should go and read Tesseract code to see how HP and Google did it there. It should give you pretty good ideas.
Good luck!
After you are done with Object Detection, you can perform text detection which can be passed on to tesseract. There can multiple variation to enhance image before passing it to detector function.
Reference Papers
https://arxiv.org/abs/1704.03155v2
https://arxiv.org/pdf/2002.07662.pdf
def text_detector(image):
#hasFrame, image = cap.read()
orig = image
(H, W) = image.shape[:2]
(newW, newH) = (640, 320)
rW = W / float(newW)
rH = H / float(newH)
image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]
layerNames = [
"feature_fusion/Conv_7/Sigmoid",
"feature_fusion/concat_3"]
blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
(123.68, 116.78, 103.94), swapRB=True, crop=False)
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
for y in range(0, numRows):
scoresData = scores[0, 0, y]
xData0 = geometry[0, 0, y]
xData1 = geometry[0, 1, y]
xData2 = geometry[0, 2, y]
xData3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
# loop over the number of columns
for x in range(0, numCols):
# if our score does not have sufficient probability, ignore it
if scoresData[x] < 0.5:
continue
# compute the offset factor as our resulting feature maps will
# be 4x smaller than the input image
(offsetX, offsetY) = (x * 4.0, y * 4.0)
# extract the rotation angle for the prediction and then
# compute the sin and cosine
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
# use the geometry volume to derive the width and height of
# the bounding box
h = xData0[x] + xData2[x]
w = xData1[x] + xData3[x]
# compute both the starting and ending (x, y)-coordinates for
# the text prediction bounding box
endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
startX = int(endX - w)
startY = int(endY - h)
# add the bounding box coordinates and probability score to
# our respective lists
rects.append((startX, startY, endX, endY))
confidences.append(scoresData[x])
boxes = non_max_suppression(np.array(rects), probs=confidences)
for (startX, startY, endX, endY) in boxes:
startX = int(startX * rW)
startY = int(startY * rH)
endX = int(endX * rW)
endY = int(endY * rH)
# draw the bounding box on the image
cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 3)
return orig