Set an ROI in OpenCV

Set an ROI in OpenCV - python

I am very new to OpenCV and Python. I have followed a tutorial to use YOLO using yolov3-tiny. It can detect vehicles fine. But what I need to complete my project is to count the number of vehicles that passes a particular lane. If I use the method where the vehicle is detected (the bounding box appears) to count, the count becomes very inaccurate since, the bounding box keeps blinking (meaning, it keeps on locating the same vehicle again, sometimes up to 5 times), so this is not a good way to count.
So I figured, how about if I just count a vehicle if it gets to a certain point. I have seen a lot of codes that seems to make this but, since I am a beginner, it really is hard for me to understand let alone, run it in my system. Their samples need to install so many things that I can't do because it throws errors.
See my sample code below:
cap = cv2.VideoCapture('rtsp://username:password#xxx.xxx.xxx.xxx:xxx/cam/realmonitor?channel=1')
whT = 320
confThreshold = 0.75
nmsThreshold = 0.3
list_of_vehicles = ["bicycle","car","motorbike","bus","truck"]
classesFile = 'coco.names'
classNames = []
with open(classesFile, 'r') as f:
classNames = f.read().rstrip('\n').split('\n')
modelConfiguration = 'yolov3-tiny.cfg'
modelWeights = 'yolov3-tiny.weights'
net = cv2.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
total_vehicle_count = 0
def getVehicleCount(boxes, class_name):
global total_vehicle_count
dict_vehicle_count = {}
if(class_name in list_of_vehicles):
total_vehicle_count += 1
# print(total_vehicle_count)
return total_vehicle_count, dict_vehicle_count
def findObjects(ouputs, img):
hT, wT, cT = img.shape
bbox = []
classIds = []
confs = []
for output in outputs:
for det in output:
scores = det[5:]
classId = np.argmax(scores)
confidence = scores[classId]
if confidence > confThreshold:
w, h = int(det[2] * wT), int(det[3] * hT)
x, y = int((det[0] * wT) - w/2), int((det[1] * hT) - h/2)
bbox.append([x, y, w, h])
classIds.append(classId)
confs.append(float(confidence))
indices = cv2.dnn.NMSBoxes(bbox, confs, confThreshold, nmsThreshold)
for i in indices:
i = i[0]
box = bbox[i]
getVehicleCount(bbox, classNames[classIds[i]])
x, y, w, h = box[0], box[1], box[2], box[3]
cv2.rectangle(img, (x,y), (x+w, y+h), (255,0,255), 1)
cv2.putText(img, f'{classNames[classIds[i]].upper()} {int(confs[i]*100)}%', (x,y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255,0,255), 2)
while True:
success, img = cap.read()
blob = cv2.dnn.blobFromImage(img, 1/255, (whT,whT), [0,0,0], 1, crop=False)
net.setInput(blob)
layerNames = net.getLayerNames()
outputnames = [layerNames[i[0]-1] for i in net.getUnconnectedOutLayers()]
# print(outputnames)
outputs = net.forward(outputnames)
findObjects(outputs, img)
cv2.imshow('Image', img)
if cv2.waitKey(1) & 0XFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
With this code, 1 vehicle is counted sometimes up to 50 count, depending on the size, which is highly inaccurate. How can I create an ROI so that when the detected vehicle passes that point, that will be the only time it will count.

First, I would recommend that you consider using a visual tracker to track each detected rectangle. This is important even if you have an ROI to crop the image close to your counting zone/line. That is because even if the ROI is localized, the detection might still blink a couple of times causing a miscount. That is especially valid if another vehicle can enter the ROI while the first one is still passing it.
I recommend using the easy-to-use tracker provided by the widely used dlib library. Please refer to this example on how to use it.
Instead of counting detections within an ROI, you need to define an ROI line (within your ROI). Then, track detections rectangles centers in each frame. Finally, increase your counter once a rectangle center passes the ROI line.
Regarding how to count a rectangle passing the ROI line:
Select two points to define your ROI line.
Use your points to find the parameters for the general line formula ax + by + c = 0
For each frame, plug the rectangle center coordinates in the formula and keep track of the sign of the result.
If the sign of the result changes that means that the rectangle center has passed the line.

Related

Real time OCR lag

im trying to capture position of license plate with webcam feed using YOLOv4 tiny then input the result to easyOCR to extract the characters. The detection works well in real time, however when i apply the OCR the webcam stream become really laggy. Is there anyway i can improve this code to make it make it less laggy?
my YOLOv4 detection
#detection
while 1:
#_, pre_img = cap.read()
#pre_img= cv2.resize(pre_img, (640, 480))
_, img = cap.read()
#img = cv2.flip(pre_img,1)
hight, width, _ = img.shape
blob = cv2.dnn.blobFromImage(img, 1 / 255, (416, 416), (0, 0, 0), swapRB=True, crop=False)
net.setInput(blob)
output_layers_name = net.getUnconnectedOutLayersNames()
layerOutputs = net.forward(output_layers_name)
boxes = []
confidences = []
class_ids = []
for output in layerOutputs:
for detection in output:
score = detection[5:]
class_id = np.argmax(score)
confidence = score[class_id]
if confidence > 0.7:
center_x = int(detection[0] * width)
center_y = int(detection[1] * hight)
w = int(detection[2] * width)
h = int(detection[3] * hight)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append((float(confidence)))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, .5, .4)
boxes = []
confidences = []
class_ids = []
for output in layerOutputs:
for detection in output:
score = detection[5:]
class_id = np.argmax(score)
confidence = score[class_id]
if confidence > 0.5:
center_x = int(detection[0] * width)
center_y = int(detection[1] * hight)
w = int(detection[2] * width)
h = int(detection[3] * hight)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append((float(confidence)))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, .8, .4)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(boxes), 3))
if len(indexes) > 0:
for i in indexes.flatten():
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
confidence = str(round(confidences[i], 2))
color = colors[i]
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
# detection= cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
detected_image = img[y:y+h, x:x+w]
cv2.putText(img, label + " " + confidence, (x, y + 400), font, 2, color, 2)
#print(detected_image)
cv2.imshow('detection',detected_image)
cv2.imwrite('lp5.jpg',detected_image)
cropped_image = cv2.imread('lp5.jpg')
cv2.waitKey(5000)
print("system is waiting")
result = OCR(cropped_image)
print(result)
easy OCR function
def OCR(cropped_image):
reader = easyocr.Reader(['en'], gpu=False) # what the reader expect from the image
result = reader.readtext(cropped_image)
text = ''
for result in result:
text += result[1] + ' '
spliced = (remove(text))
return spliced

There are several points.
cv2.waitKey(5000) in your loop causes some delay even though you keep pressing a key. So remove it if you are not debugging.
You are saving a detected region into a JPEG image file and loading it each time. So pass the region(Numpy array) on the OCR module directly.
EasyOCR is a DNN model based on ResNet, but you are not using a GPU(gpu=False). So use GPU.(See this benchmark by Liao.)
You are creating an easyocr.Reader instance each time in a loop. Creating it requires to load and initialize a DNN model. This is a huge workload causing the major bottleneck. So create only single instance before the loop and reuse it inside a loop.

You are essentially saying "the while loop must be fast."
And of course the OCR() call is a bit slow.
Ok, good.
Don't call OCR() from within the loop.
Rather, enqueue a request,
and let another thread / process / host
worry about the OCR computation,
while the loop quickly continues
upon its merry way.
You could use a threaded Queue,
or a subprocess,
or blast it over to RabbitMQ or Kafka.
The simplest approach would be to
simply overwrite /tmp/cropped_image.png
within the loop,
and have another process notice such
updates and (slowly) call OCR(),
appending the results to a log file.
There might be a couple of updates
to the image file while a single
OCR call is in progress, and that's fine.
The two are decoupled from one another,
each progressing at their own pace.
Downside of a queue would be OCR
sometimes falling behind -- you actually
want to shed load by skipping some
(redundant) cropped images.
The two are racing, and that's fine.
But take care to do things in atomic
fashion -- you wouldn't want to OCR
an image that starts with one frame
and ends with part of a subsequent
frame.
Write to a temp file and, after close(),
use os.rename() to atomically
make those pixels available under
the name that the OCR daemon
will read from.
Once it has a file descriptor
open for read, it will have no
problem reading to EOF without
interference, the kernel takes
care of that for us.

Object detection model for detecting rectangular shape (text cursor) in a video?

I'm currently doing some research to detect and locate a text-cursor (you know, the blinking rectangle shape that indicates the character position when you type on your computer) from a screen-record video. To do that, I've trained YOLOv4 model with custom object dataset (I took a reference from here) and planning to also implement DeepSORT to track the moving cursor.
Here's the example of training data I used to train YOLOv4:
Here's what I want to achieve:
Do you think using YOLOv4 + DeepSORT is considered overkill for this task? I'm asking because as of now, only 70%-80% of the video frame that contains the text-cursor can be successfully detected by the model. If it is overkill after all, do you know any other method that can be implemented for this task?
Anyway, I'm planning to detect the text-cursor not only from Visual Studio Code window, but also from Browser (e.g., Google Chrome) and Text Processor (e.g., Microsoft Word) as well. Something like this:
I'm considering the Sliding Window method as an alternative, but from what I've read, the method might consume much resources and perform slower. I'm also considering Template Matching from OpenCV (like this), but I don't think it will perform better and faster than the YOLOv4.
The constraint is about the performance speed (i.e, how many frames can be processed given amount of time) and the detection accuracy (i.e, I want to avoid letter 'l' or '1' detected as the text-cursor, since those characters are similar in some font). But higher accuracy with slower FPS is acceptable I think.
I'm currently using Python, Tensorflow, and OpenCV for this.
Thank you very much!

This would work if the cursor is the only moving object on the screen. Here is the before and after:
Before:
After:
The code:
import cv2
import numpy as np
BOX_WIDTH = 10
BOX_HEIGHT = 20
def process_img(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5))
img_canny = cv2.Canny(img_gray, 50, 50)
return img_canny
def get_contour(img):
contours, hierarchies = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
if contours:
return max(contours, key=cv2.contourArea)
def get_line_tip(cnt1, cnt2):
x1, y1, w1, h1 = cv2.boundingRect(cnt1)
if h1 > BOX_HEIGHT / 2:
if np.any(cnt2):
x2, y2, w2, h2 = cv2.boundingRect(cnt2)
if x1 < x2:
return x1, y1
return x1 + w1, y1
def get_rect(x, y):
half_width = BOX_WIDTH // 2
lift_height = BOX_HEIGHT // 6
return (x - half_width, y - lift_height), (x + half_width, y + BOX_HEIGHT - lift_height)
cap = cv2.VideoCapture("screen_record.mkv")
success, img_past = cap.read()
cnt_past = np.array([])
line_tip_past = 0, 0
while True:
success, img_live = cap.read()
if not success:
break
img_live_processed = process_img(img_live)
img_past_processed = process_img(img_past)
img_diff = cv2.bitwise_xor(img_live_processed, img_past_processed)
cnt = get_contour(img_diff)
line_tip = get_line_tip(cnt, cnt_past)
if line_tip:
cnt_past = cnt
line_tip_past = line_tip
else:
line_tip = line_tip_past
rect = get_rect(*line_tip)
img_past = img_live.copy()
cv2.rectangle(img_live, *rect, (0, 0, 255), 2)
cv2.imshow("Cursor", img_live)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cv2.destroyAllWindows()
Breaking it down:
Import the necessary libraries:
import cv2
import numpy as np
Define the size of the tracking box depending on the size of the cursor:
BOX_WIDTH = 10
BOX_HEIGHT = 20
Define a function to process the frames into edges:
def process_img(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5))
img_canny = cv2.Canny(img_gray, 50, 50)
return img_canny
Define a function that would retrieve the contour with the greatest area in an image (the cursor doesn't need to be large for this to work, it can be tiny if needed):
def get_contour(img):
contours, hierarchies = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
if contours:
return max(contours, key=cv2.contourArea)
Define a function that will take in 2 contours, one being the contour of the cursor + some text for the current frame, the other being the contour + some stray text for the contour of the cursor + some text from the frame before. With the two contours, we can identify if the cursor is moving left or right:
def get_line_tip(cnt1, cnt2):
x1, y1, w1, h1 = cv2.boundingRect(cnt1)
if h1 > BOX_HEIGHT / 2:
if np.any(cnt2):
x2, y2, w2, h2 = cv2.boundingRect(cnt2)
if x1 < x2:
return x1, y1
return x1 + w1, y1
Define a function that will take in the tip points of the cursor, and return a box based on the BOX_WIDTH and BOX_HEIGHT constants defined earlier:
def get_rect(x, y):
half_width = BOX_WIDTH // 2
lift_height = BOX_HEIGHT // 6
return (x - half_width, y - lift_height), (x + half_width, y + BOX_HEIGHT - lift_height)
Define a capture devices for the video, and remove one frame from the start of the video and store it in a variable that will be used as the frame before every frame. Also define temporary values for the past contour and past line tip:
cap = cv2.VideoCapture("screen_record.mkv")
success, img_past = cap.read()
cnt_past = np.array([])
line_tip_past = 0, 0
Use a while loop, and read from the video. Process the frame and the frame before that frame in the video:
while True:
success, img_live = cap.read()
if not success:
break
img_live_processed = process_img(img_live)
img_past_processed = process_img(img_past)
With the processed frames, we can find the difference between the frame using the cv2.bitwise_xor method to get where the movement is on the screen. Then, we can find the contour of the movement between the 2 frames using the get_contour function defined:
img_diff = cv2.bitwise_xor(img_live_processed, img_past_processed)
cnt = get_contour(img_diff)
With the contour, we can utilize the get_line_tip function defined to find the tip of the cursor. If a tip was found, save it into the line_tip_past variable to use for the next iteration, and if a tip was not found, we can us the past tip we saved as the current tip:
line_tip = get_line_tip(cnt, cnt_past)
if line_tip:
cnt_past = cnt
line_tip_past = line_tip
else:
line_tip = line_tip_past
Now we define a rectangle using the cursor tip and the get_rect function we defined earlier, and draw it onto the current frame. But before drawing it on, we save the frame to be the frame before the current frame of the next iteration:
rect = get_rect(*line_tip)
img_past = img_live.copy()
cv2.rectangle(img_live, *rect, (0, 0, 255), 2)
Finally, we display the frame:
cv2.imshow("Cursor", img_live)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cv2.destroyAllWindows()

How do I improve the number detection for blueprints (OCR)

I have a number of blueprints where I would like to detect the numbers on the blueprint such that I can turn them into proper models.
for example I have the following image and would like all the numbers on this image so I ran the following code:
import pytesseract
from pytesseract import Output
import cv2
import numpy as np
img = cv2.imread('vdb7C.jpg')
custom_config = r' (--oem 2 --psm 10'
d = pytesseract.image_to_data(img,config=custom_config,lang='eng', output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
text=d["text"][i]
print(text+ str(str.isdigit(text)))
if str.isdigit(text):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imwrite("output.jpg" , img)
This gave me the following result: . As you can see it does correctly identify a number of numbers on the blueprint, however it misses quite a few others and falsely detect a few that aren't really there. I care more about getting all the numbers than a few false positives but would still like to keep those to a minimum so any suggestions there?
I have already tried thinning operations, re-scaling the images, rotating the images and smoothing the images but all of those don't appear to make much difference, extreme rescaling (*0.1 or *10) does change a few things but any gains made in one part of the image are undone by faults appearing in other parts.
Especially difficult are situations such as on the left building where we have lines numbers close to or even overlapping part of the design.
Here we see 2 examples of such situations
also note that font usage is not consistent between images.
It's worth noting that the lines are almost always obviously thinner then the fond used for the numbers so perhaps something could be done with that?
I have also tried using the EAST OCR system with the following code:
img = cv2.imread('vdb7C.jpg')
W=5664
H=4000
dim = (W, H)
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
net = cv2.dnn.readNet("frozen_east_text_detection.pb")
blob = cv2.dnn.blobFromImage(img, 1.0, (W, H),
(123.68, 116.78, 103.94), swapRB=True, crop=False)
net.setInput(blob)
(scores, geometry) = net.forward(["feature_fusion/Conv_7/Sigmoid",
"feature_fusion/concat_3"])
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
# loop over the number of rows
for y in range(0, numRows):
# extract the scores (probabilities), followed by the geometrical
# data used to derive potential bounding box coordinates that
# surround text
scoresData = scores[0, 0, y]
xData0 = geometry[0, 0, y]
xData1 = geometry[0, 1, y]
xData2 = geometry[0, 2, y]
xData3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
for x in range(0, numCols):
if scoresData[x] < confidence:
continue
(offsetX, offsetY) = (x * 4.0, y * 4.0)
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
h = xData0[x] + xData2[x]
w = xData1[x] + xData3[x]
endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
startX = int(endX - w)
startY = int(endY - h)
rects.append((startX, startY, endX, endY))
confidences.append(scoresData[x])
boxes = non_max_suppression(np.array(rects), probs=confidences)
for box in boxes:
(y,h,x,w) = box
print(box)
print(np.shape(img))
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imwrite("output.jpg" , img)
however this causes quite a number of bounding boxes to be outside of the image and in general the bounding boxes seem unrelated to the content, so anyone know what's up there?
Any suggestions? I have 8000 images right now and need to eventually process a total of about 400k images.

I suggest using a solution that applies neural networks like keras-ocr which applies CRAFT and CRNN. It does a better job in detecting text that overlaps with the design. This is what I got using it out of the box:
import matplotlib.pyplot as plt
import keras_ocr
detector = keras_ocr.detection.Detector()
image = keras_ocr.tools.read('vdb7C.jpg')
boxes = detector.detect(images=[image])[0]
canvas = keras_ocr.tools.drawBoxes(image, boxes)
plt.imshow(canvas)
Result:

Run your tesseract piece of code, but only use results with 3 or more digits. This should provide you with enough good examples of digits. Extract each digit to separate file and save their positions. Now you can go two ways.
You can go the simple way if you will see that the fonts of the digits are quite similar. Then you can create a set of templates for the digits (say 15-30). Remember that you can get the size of the digits for specific image? Resize your digits template to the right size and run the most trivial template matching. This will definitely create some false detections (especially for "1"s), and you will have to find a way to reduce their amount to acceptable level.
More complex way is to build a custom CNN detector and train it on your data. So from the first stage you will get several hundred examples of digits (and their positions) that you want to detect. You can look at this project or this one as references. Also this article can provide you some guidance.
One more thing that can be useful. Your images have lots of long perpendicular lines. If you align them to the axis, you can remove the lines very easily by binarizing the original, shifting the result (right or down) by several pixels and ANDing them. This will left only the long lines. Find their length, and you will be able to remove lines above certain length in the original image.

How to find table like structure in image

I have different type of invoice files, I want to find table in each invoice file. In this table position is not constant. So I go for image processing. First I tried to convert my invoice into image, then I found contour based on table borders, Finally I can catch table position.
For the task I used below code.
with Image(page) as page_image:
page_image.alpha_channel = False #eliminates transperancy
img_buffer=np.asarray(bytearray(page_image.make_blob()), dtype=np.uint8)
img = cv2.imdecode(img_buffer, cv2.IMREAD_UNCHANGED)
ret, thresh = cv2.threshold(img, 127, 255, 0)
im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
margin=[]
for contour in contours:
# get rectangle bounding contour
[x, y, w, h] = cv2.boundingRect(contour)
# Don't plot small false positives that aren't text
if (w >thresh1 and h> thresh2):
margin.append([x, y, x + w, y + h])
#data cleanup on margin to extract required position values.
In this code thresh1, thresh2 i'll update based on the file.
So using this code I can successfully read positions of tables in images, using this position i'll work on my invoice pdf file. For example
Sample 1:
Sample 2:
Sample 3:
Output:
Sample 1:
Sample 2:
Sample 3:
But, now I have a new format which doesn't have any borders but it's a table. How to solve this? Because my entire operation depends only on borders of the tables. But now I don't have a table borders. How can I achieve this? I don't have any idea to move out from this problem. My question is, Is there any way to find position based on table structure?.
For example My problem input looks like below:
I would like to find its position like below:
How can I solve this?
It is really appreciable to give me an idea to solve the problem.
Thanks in advance.

Vaibhav is right. You can experiment with the different morphological transforms to extract or group pixels into different shapes, lines, etc. For example, the approach can be the following:
Start from the Dilation to convert the text into the solid spots.
Then apply the findContours function as a next step to find text
bounding boxes.
After having the text bounding boxes it is possible to apply some
heuristics algorithm to cluster the text boxes into groups by their
coordinates. This way you can find a groups of text areas aligned
into rows and columns.
Then you can apply sorting by x and y coordinates and/or some
analysis to the groups to try to find if the grouped text boxes can
form a table.
I wrote a small sample illustrating the idea. I hope the code is self explanatory. I've put some comments there too.
import os
import cv2
import imutils
# This only works if there's only one table on a page
# Important parameters:
# - morph_size
# - min_text_height_limit
# - max_text_height_limit
# - cell_threshold
# - min_columns
def pre_process_image(img, save_in_file, morph_size=(8, 8)):
# get rid of the color
pre = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Otsu threshold
pre = cv2.threshold(pre, 250, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# dilate the text to make it solid spot
cpy = pre.copy()
struct = cv2.getStructuringElement(cv2.MORPH_RECT, morph_size)
cpy = cv2.dilate(~cpy, struct, anchor=(-1, -1), iterations=1)
pre = ~cpy
if save_in_file is not None:
cv2.imwrite(save_in_file, pre)
return pre
def find_text_boxes(pre, min_text_height_limit=6, max_text_height_limit=40):
# Looking for the text spots contours
# OpenCV 3
# img, contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# OpenCV 4
contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Getting the texts bounding boxes based on the text size assumptions
boxes = []
for contour in contours:
box = cv2.boundingRect(contour)
h = box[3]
if min_text_height_limit < h < max_text_height_limit:
boxes.append(box)
return boxes
def find_table_in_boxes(boxes, cell_threshold=10, min_columns=2):
rows = {}
cols = {}
# Clustering the bounding boxes by their positions
for box in boxes:
(x, y, w, h) = box
col_key = x // cell_threshold
row_key = y // cell_threshold
cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]
# Filtering out the clusters having less than 2 cols
table_cells = list(filter(lambda r: len(r) >= min_columns, rows.values()))
# Sorting the row cells by x coord
table_cells = [list(sorted(tb)) for tb in table_cells]
# Sorting rows by the y coord
table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))
return table_cells
def build_lines(table_cells):
if table_cells is None or len(table_cells) <= 0:
return [], []
max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]
max_last_row_height_box = max(table_cells[-1], key=lambda b: b[3])
max_y = max_last_row_height_box[1] + max_last_row_height_box[3]
hor_lines = []
ver_lines = []
for box in table_cells:
x = box[0][0]
y = box[0][1]
hor_lines.append((x, y, max_x, y))
for box in table_cells[0]:
x = box[0]
y = box[1]
ver_lines.append((x, y, x, max_y))
(x, y, w, h) = table_cells[0][-1]
ver_lines.append((max_x, y, max_x, max_y))
(x, y, w, h) = table_cells[0][0]
hor_lines.append((x, max_y, max_x, max_y))
return hor_lines, ver_lines
if __name__ == "__main__":
in_file = os.path.join("data", "page.jpg")
pre_file = os.path.join("data", "pre.png")
out_file = os.path.join("data", "out.png")
img = cv2.imread(os.path.join(in_file))
pre_processed = pre_process_image(img, pre_file)
text_boxes = find_text_boxes(pre_processed)
cells = find_table_in_boxes(text_boxes)
hor_lines, ver_lines = build_lines(cells)
# Visualize the result
vis = img.copy()
# for box in text_boxes:
# (x, y, w, h) = box
# cv2.rectangle(vis, (x, y), (x + w - 2, y + h - 2), (0, 255, 0), 1)
for line in hor_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
for line in ver_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
cv2.imwrite(out_file, vis)
I've got the following output:
Of course to make the algorithm more robust and applicable to a variety of different input images it has to be adjusted correspondingly.
Update: Updated the code with respect to the OpenCV API changes for findContours. If you have older version of OpenCV installed - use the corresponding call. Related post.

You can try applying some morphological transforms (such as Dilation, Erosion or Gaussian Blur) as a pre-processing step before your findContours function
For example
blur = cv2.GaussianBlur(g, (3, 3), 0)
ret, thresh1 = cv2.threshold(blur, 150, 255, cv2.THRESH_BINARY)
bitwise = cv2.bitwise_not(thresh1)
erosion = cv2.erode(bitwise, np.ones((1, 1) ,np.uint8), iterations=5)
dilation = cv2.dilate(erosion, np.ones((3, 3) ,np.uint8), iterations=5)
The last argument, iterations shows the degree of dilation/erosion that will take place (in your case, on the text). Having a small value will results in small independent contours even within an alphabet and large values will club many nearby elements. You need to find the ideal value so that only that block of your image gets.
Please note that I've taken 150 as the threshold parameter because I've been working on extracting text from images with varying backgrounds and this worked out better. You can choose to continue with the value you've taken since it's a black & white image.

There are many types of tables in the document images with too much variations and layouts. No matter how many rules you write, there will always appear a table for which your rules will fail. These types of problems are genrally solved using ML(Machine Learning) based solutions. You can find many pre-implemented codes on github for solving the problem of detecting tables in the images using ML or DL (Deep Learning).
Here is my code along with the deep learning models, the model can detect various types of tables as well as the structure cells from the tables: https://github.com/DevashishPrasad/CascadeTabNet
The approach achieves state of the art on various public datasets right now (10th May 2020) as far as the accuracy is concerned
More details : https://arxiv.org/abs/2004.12629

this would be helpful for you.
I've drawn a bounding box for each word in my invoice, then I will chose only fields that I want. You can use for that ROI (Region Of Interest)
import pytesseract
import cv2
img = cv2.imread(r'path\Invoice2.png')
d = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 1)
cv2.imshow('img', img)
cv2.waitKey(0)
You will get this output:

Copy a part of an image in opencv and python

I'm trying to split an image into several sub-images with opencv by identifying templates of the original image and then copy the regions where I matched those templates. I'm a TOTAL newbie to opencv! I've identified the sub-images using:
result = cv2.matchTemplate(img, template, cv2.TM_CCORR_NORMED)
After some cleanup I get a list of tuples called points in which I iterate to show the rectangles. tw and th is the template width and height respectively.
for pt in points:
re = cv2.rectangle(img, pt, (pt[0] + tw, pt[1] + th), 0, 2)
print('%s, %s' % (str(pt[0]), str(pt[1])))
count+=1
What I would like to accomplish is to save the octagons (https://dl.dropbox.com/u/239592/region01.png) into separated files.
How can I do this? I've read something about contours but I'm not sure how to use it. Ideally I would like to contour the octagon.
Thanks a lot for your help!

If template matching is working for you, stick to it. For instance, I considered the following template:
Then, we can pre-process the input in order to make it a binary one and discard small components. After this step, the template matching is performed. Then it is a matter of filtering the matches by means of discarding close ones (I've used a dummy method for that, so if there are too many matches you could see it taking some time). After we decide which points are far apart (and thus identify different hexagons), we can do minor adjusts to them in the following manner:
Sort by y-coordinate;
If two adjacent items start at a y-coordinate that is too close, then set them both to the same y-coord.
Now you can sort this point list in an appropriate order such that the crops are done in raster order. The cropping part is easily achieved using slicing provided by numpy.
import sys
import cv2
import numpy
outbasename = 'hexagon_%02d.png'
img = cv2.imread(sys.argv[1])
template = cv2.cvtColor(cv2.imread(sys.argv[2]), cv2.COLOR_BGR2GRAY)
theight, twidth = template.shape[:2]
# Binarize the input based on the saturation and value.
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
saturation = hsv[:,:,1]
value = hsv[:,:,2]
value[saturation > 35] = 255
value = cv2.threshold(value, 0, 255, cv2.THRESH_OTSU)[1]
# Pad the image.
value = cv2.copyMakeBorder(255 - value, 3, 3, 3, 3, cv2.BORDER_CONSTANT, value=0)
# Discard small components.
img_clean = numpy.zeros(value.shape, dtype=numpy.uint8)
contours, _ = cv2.findContours(value, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
for i, c in enumerate(contours):
area = cv2.contourArea(c)
if area > 500:
cv2.drawContours(img_clean, contours, i, 255, 2)
def closest_pt(a, pt):
if not len(a):
return (float('inf'), float('inf'))
d = a - pt
return a[numpy.argmin((d * d).sum(1))]
match = cv2.matchTemplate(img_clean, template, cv2.TM_CCORR_NORMED)
# Filter matches.
threshold = 0.8
dist_threshold = twidth / 1.5
loc = numpy.where(match > threshold)
ptlist = numpy.zeros((len(loc[0]), 2), dtype=int)
count = 0
print "%d matches" % len(loc[0])
for pt in zip(*loc[::-1]):
cpt = closest_pt(ptlist[:count], pt)
dist = ((cpt[0] - pt[0]) ** 2 + (cpt[1] - pt[1]) ** 2) ** 0.5
if dist > dist_threshold:
ptlist[count] = pt
count += 1
# Adjust points (could do for the x coords too).
ptlist = ptlist[:count]
view = ptlist.ravel().view([('x', int), ('y', int)])
view.sort(order=['y', 'x'])
for i in xrange(1, ptlist.shape[0]):
prev, curr = ptlist[i - 1], ptlist[i]
if abs(curr[1] - prev[1]) < 5:
y = min(curr[1], prev[1])
curr[1], prev[1] = y, y
# Crop in raster order.
view.sort(order=['y', 'x'])
for i, pt in enumerate(ptlist, start=1):
cv2.imwrite(outbasename % i,
img[pt[1]-2:pt[1]+theight-2, pt[0]-2:pt[0]+twidth-2])
print 'Wrote %s' % (outbasename % i)
If you want only the contours of the hexagons, then crop on img_clean instead of img (but then it is pointless to sort the hexagons in raster order).
Here is a representation of the different regions that would be cut for your two examples without modifying the code above:

I am sorry, I didn't understand from your question on how do you relate matchTemplate and Contours.
Anyway, below is a small technique using contours. It is on the assumption that your other images are also like the one you provided. I am not sure if it works with your other images. But I think it would help to get a startup. Try this yourself and make necessary adjustments and modifications.
What I did :
1 - I needed the edge of octagons . So Thresholded Image using Otsu and apply dilation and erosion (or use any method you like that works well for all your images, beware of the edges in left edge of image).
2 - Then found contours (More about contours : http://goo.gl/r0ID0
3 - For each contours, find its convex hull, find its area(A) & perimeter(P)
4 - For a perfect octagon, P*P/A = 13.25 approximately. I used it here and cut it and saved it.
5 - You can see cropping it also removes some edges of octagon. If you want it, adjust the cropping dimension.
Code :
import cv2
import numpy as np
img = cv2.imread('region01.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
thresh = cv2.dilate(thresh,None,iterations = 2)
thresh = cv2.erode(thresh,None)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
number = 0
for cnt in contours:
hull = cv2.convexHull(cnt)
area = cv2.contourArea(hull)
P = cv2.arcLength(hull,True)
if ((area != 0) and (13<= P**2/area <= 14)):
#cv2.drawContours(img,[hull],0,255,3)
x,y,w,h = cv2.boundingRect(hull)
number = number + 1
roi = img[y:y+h,x:x+w]
cv2.imshow(str(number),roi)
cv2.imwrite("1"+str(number)+".jpg",roi)
cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Those 6 octagons will be stored as separate files.
Hope it helps !!!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.