OpenCV perspective transform - python

I am currently working on creating a python software that tracks players on a soccer field. I got the player detection working with YoloV3 and was able to output quite a nice result with players centroids and boxes drawn. What i want to do now is translate the players position and project their centroids onto a png/jpg of a soccerfield. For this I inteded to use two arrays with refrence points one for the soccerfield-image and one for the source video. But my question now is how do I translate the coordinates of the centroids to the soccerfield image.
Similiar example:
How the boxes and Markers are drawn:
def draw_labels_and_boxes(img, boxes, confidences, classids, idxs, colors, labels):
# If there are any detections
if len(idxs) > 0:
for i in idxs.flatten():
# Get the bounding box coordinates
x, y = boxes[i][0], boxes[i][1]
w, h = boxes[i][2], boxes[i][3]
# Draw the bounding box rectangle and label on the image
cv.rectangle(img, (x, y), (x + w, y + h), (255, 255, 255), 2)
cv.drawMarker (img, (int(x + w / 2), int(y + h / 2)), (x, y), 0, 20, 3)
return img
Boxes are generated like this:
def generate_boxes_confidences_classids(outs, height, width, tconf):
boxes = []
confidences = []
classids = []
for out in outs:
for detection in out:
# print (detection)
# a = input('GO!')
# Get the scores, classid, and the confidence of the prediction
scores = detection[5:]
classid = np.argmax(scores)
confidence = scores[classid]
# Consider only the predictions that are above a certain confidence level
if confidence > tconf:
# TODO Check detection
box = detection[0:4] * np.array([width, height, width, height])
centerX, centerY, bwidth, bheight = box.astype('int')
# Using the center x, y coordinates to derive the top
# and the left corner of the bounding box
x = int(centerX - (bwidth / 2))
y = int(centerY - (bheight / 2))
# Append to list
boxes.append([x, y, int(bwidth), int(bheight)])
return boxes, confidences, classids

Assuming a stationary camera,
Find the coordinates of the four corners of the field.
Find the corresponding four corners in the top-view image that you want to create.
Find a homography matrix using these two sets of points. You can use OpenCV's findHomography for this.
Transform all the centroids using this homography matrix and that should give you your coordinates in the new image space. You can use warpPerspective for doing this.

Recently during COVID19 pandemic many developers have developed "social-distancing-monitoring-system". There a few of them also developed "Bird's Eye View" of the system. Your problem is just similar. As external links are not accepted here, so I am not able to post the exact link(s). Please check their codes in GitHub.


Bounding Box Regression

I generated a data-set of (200 x 200x 3) images in which each image contains a 40 X 40 box of different color.
Create a model using tensorflow which can predict coords of this 40 x 40 box.
The code i used for generating these images:
from PIL import Image, ImageDraw
from random import randrange
colors = ["#ffd615", "#f9ff21", "#00d1ff",
"#0e153a", "#fc5c9c", "#ac3f21",
"#40514e", "#492540", "#ff8a5c",
"#000000", "#a6fff2", "#f0f696",
"#d72323", "#dee1ec", "#fcb1b1"]
def genrate_image(color):
img ="RGB", size=(200, 200), color=color)
return img
def save_image(img, imgname):
def draw_rect(image, color, x, y):
draw = ImageDraw.Draw(image)
coords = ((x, y), (x+40, y), (x+40, y+40), (x, y+40))
draw.polygon(coords, fill=color)
#return image, str(coords)
return image, coords[0][0], coords[2][0], coords[0][1], coords[2][1]
FILE_NAME = "train_annotations.txt"
for i in range(0, 100):
img = genrate_image(colors[randrange(0, len(colors))])
img, x0, x1, y0, y1 = draw_rect(img, colors[randrange(0, len(colors))], randrange(200 - 50), randrange(200 - 50))
save_image(img, "dataset/train_images/img"+str(i)+".png")
with open(FILE_NAME, "a+") as f:
f.write(f"{x0} {x1} {y0} {y1}\n")
can anyone help me by suggesting how can i build a model which can predict coords of a new image.
Well the easiest way you can split these boxes is by doing a K-means clustering where K is 2. So you basically record all the rgb pixel values of the pixels. Then using K-means group up the pixels into 2 groups, one would be the background group, the other being the box color group. Then with the box color group, map those colors back to their original coordinates. Then get the mean of those coordinates to get the location of the 40x40 box.
Above is a source documentation on how to do K-means
It is enough to perform a bounding box regression, for this you just need to add a fully connected layer after СNN with 4 output values:x1,y1,x2,y2. where they are top left and bottom right.
Something similar can be found here

Remove and measure a line openCV

Links to all images at the bottom
I have drawn a line over an arrow which captures the angle of that arrow. I would like to then remove the arrow, keep only the line, and use cv2.minAreaRect to determine the angle. So far I've got everything to work except removing the original arrow, which results in an incorrect angle generated by the cv2.minAreaRect bounding box.
Really, I just want the bold black line running through the arrow to use to measure the angle, not the arrow itself. if anyone has an idea to make this work, or a simpler way, please let me know. Thanks
import numpy as np
import cv2
image = cv2.imread("templates/a_15.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(image, 127, 255, 0)
contours, hierarchy = cv2.findContours(thresh, 1, 2)
cont = contours[0]
rows,cols = image.shape[:2]
[vx,vy,x,y] = cv2.fitLine(cont, cv2.DIST_L2,0,0.01,0.01)
leftish = int((-x*vy/vx) + y)
rightish = int(((cols-x)*vy/vx)+y)
line = cv2.line(image,(cols-1,rightish),(0,leftish),(0,255,0),10)
# thresholding
thresh = cv2.threshold(line, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# compute rotated bounding box based on all pixel values > 0 and
# use coordinates to compute a rotated bounding box of those coordinates
coordinates = np.column_stack(np.where(thresh > 0))
w = coordinates[0]
h = coordinates[1]
# Compute minimum rotated angle that contains entire image.
# Return angle values in the range [-90, 0).
# As the rectangle rotates clockwise, angle values increase towards 0.
# Once 0 is reached, angle is set back to -90 degrees.
angle = cv2.minAreaRect(coordinates)[-1]
# for angles less than -45 degrees, add 90 degrees to angle to take the inverse.
if angle < - 45:
angle = -(90 + angle)
angle = -angle
# rotate image
(h, w) = image.shape[:2]
center = (w // 2, h // 2) # image center
RM = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(image, RM, (w, h),
flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
# correction angle for validation
cv2.putText(rotated, "Angle {:.2f} degrees".format(angle),
(10, 30), cv2.FONT_HERSHEY_DUPLEX, 0.9, (0, 255, 0), 2)
# output
print("[INFO] angle: {:.3f}".format(angle))
cv2.imshow("Line", line)
cv2.imshow("Input", image)
cv2.imshow("Rotated", rotated)
current results
Here's a possible solution. The main idea is to identify de "tip" and the "tail" of the arrow approximating some key points. After you have identified both ends, you can draw a line joining both points. It is also an advantage to know which of the endpoints is the tip, because that way you can measure the angle from a constant point.
There's more than one way to achieve this. I choose something that I have applied in the past: I will use this approach to identify the endpoints of the overall shape. My assumption is that the tip will yield more points than the tail. After that, I'll cluster all the endpoints in two groups: tip and tail. I can use K-Means for that, as it will return the mean centers for both clusters. After that, we have our tip and tail points that can be joined easily with a line. These are the steps:
Convert the image to grayscale
Get the skeleton of the image, to normalize the shape to a width of 1 pixel
Apply the method described in the link to get the arrow's endpoints
Divide the endpoints in two clusters and use K-Means to get their centers
Join both endpoints with a line
Let's see the code:
# imports:
import cv2
import numpy as np
# image path
path = "D://opencvImages//"
fileName = "CoXeb.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Grayscale conversion:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
grayscaleImage = 255 - grayscaleImage
# Extend the borders for the skeleton:
extendedImg = cv2.copyMakeBorder(grayscaleImage, 5, 5, 5, 5, cv2.BORDER_CONSTANT)
# Store a deep copy of the crop for results:
grayscaleImageCopy = cv2.cvtColor(extendedImg, cv2.COLOR_GRAY2BGR)
# Compute the skeleton:
skeleton = cv2.ximgproc.thinning(extendedImg, None, 1)
The first step is to get the skeleton of the arrow. As I said, this step is needed prior to the convolution-based method that identifies the endpoints of a shape. Computing the skeleton normalizes the shape to a one pixel width. However, sometimes, if the shape is too close to the "canvas" borders, the skeleton could show some artifacts. This is avoided with a border extension. The skeleton of the arrow is this:
Check that image out. If we identify the endpoints, the tip will exhibit at least 3 points, while the tail at least 1. That's handy - the tip will always have more points than the tail. If only we could detect those points... Luckily, we can:
# Threshold the image so that white pixels get a value of 0 and
# black pixels a value of 10:
_, binaryImage = cv2.threshold(skeleton, 128, 10, cv2.THRESH_BINARY)
# Set the end-points kernel:
h = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
# Convolve the image with the kernel:
imgFiltered = cv2.filter2D(binaryImage, -1, h)
# Extract only the end-points pixels, those with
# an intensity value of 110:
binaryImage = np.where(imgFiltered == 110, 255, 0)
# The above operation converted the image to 32-bit float,
# convert back to 8-bit uint
binaryImage = binaryImage.astype(np.uint8)
This endpoint detecting method convolves the skeleton with a special kernel that identifies endpoints. It returns a binary image where all the endpoints have the value 110. After thresholding this mid-result, we get this image, which represents the arrow endpoints:
Nice, as you see, we can group the points in two clusters and get their cluster centers. Sounds like a job for K-Means, because that's exactly what it does. We first need to treat our data, though, because K-Means operates on defined-shaped arrays of float data:
# Find the X, Y location of all the end-points
# pixels:
Y, X = binaryImage.nonzero()
# Reshape the arrays for K-means
Y = Y.reshape(-1,1)
X = X.reshape(-1,1)
Z = np.hstack((X, Y))
# K-means operates on 32-bit float data:
floatPoints = np.float32(Z)
# Set the convergence criteria and call K-means:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(floatPoints, 2, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Set the cluster count, find the points belonging
# to cluster 0 and cluster 1:
cluster1Count = np.count_nonzero(label)
cluster0Count = np.shape(label)[0] - cluster1Count
print("Elements of Cluster 0: "+str(cluster0Count))
print("Elements of Cluster 1: " + str(cluster1Count))
The last two lines prints the endpoints that are assigned to Cluster 0 Cluster 1, respectively. That outputs this:
Elements of Cluster 0: 3
Elements of Cluster 1: 2
Just as expected - well, kinda. Seems that Cluster 0 is the tip and cluster 2 the tail! But the tail actually got 2 points. If you look the image of the skeleton closely, you'll see there's a small bifurcation at the tail. That's why we, in reality, got two points instead of just one. Alright, let's get the center points and draw them on the original input:
# Look for the cluster of max number of points
# That cluster will be the tip of the arrow:
maxCluster = 0
if cluster1Count > cluster0Count:
maxCluster = 1
# Check out the centers of each cluster:
matRows, matCols = center.shape
# Store the ordered end-points here:
orderedPoints = [None] * 2
# Let's identify and draw the two end-points
# of the arrow:
for b in range(matRows):
# Get cluster center:
pointX = int(center[b][0])
pointY = int(center[b][1])
# Get the "tip"
if b == maxCluster:
color = (0, 0, 255)
orderedPoints[0] = (pointX, pointY)
# Get the "tail"
color = (255, 0, 0)
orderedPoints[1] = (pointX, pointY)
# Draw it:, (pointX, pointY), 3, color, -1)
cv2.imshow("End-Points", grayscaleImageCopy)
This is the resulting image:
The tip always gets drawn in red while the tail is drawn in blue. Very cool, let's store these points in the orderedPoints list and draw the final line in a new "canvas", with dimension same as the original image:
# Store the tip and tail points:
p0x = orderedPoints[1][0]
p0y = orderedPoints[1][1]
p1x = orderedPoints[0][0]
p1y = orderedPoints[0][1]
# Create a new "canvas" (image) using the input dimensions:
imageHeight, imageWidth = binaryImage.shape[:2]
newImage = np.zeros((imageHeight, imageWidth), np.uint8)
newImage = 255 - newImage
# Draw a line using the detected points:
(x1, y1) = orderedPoints[0]
(x2, y2) = orderedPoints[1]
lineColor = (0, 0, 0)
cv2.line(newImage , (x1, y1), (x2, y2), lineColor, thickness=2)
cv2.imshow("Detected Line", newImage)
The line overlaid on the original image and the new image containing only the line:
It sounds like you want to measure the angle of the line but because you are measuring a line you drew in the original image, you must now filter out the original image to get an accurate measure of the line...which you drew with coordinates you know the endpoints of?
I guess:
make a better filter?
draw the line in a blank image and detect angle there?
determine the angle from the known coordinates?
Since you were asking for just a line, I tried that...just made a blank image, drew your detected line on it and then used that downstream...
blankIm = np.ones((height, width, channels), dtype=np.uint8)
line = cv2.line(blankIm,(cols-1,rightish),(0,leftish),(0,255,0),10)

how do crop specific part of image based on the coordinate boxes?

I want to crop my image based on the coordinate boxes of detected objects, the one with classID=1.
There might be multiple objects with the same id or other classes as well.
My problem is that my code only returns one cropped image, How could I return all cropped images with ClassID=1?
I have totall of 6 classes in which I am interested in ClassID=1.
# initializing bounding boxes, confidences, and classIDs.
boxes = []
confidences = []
classIDs = []
for output in layersOutputs:
# loop over each of the detections
for detection in output:
# extract the class ID and confidence
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
# filter out weak predictions
if confidence > c_threshold:
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
# update bounding box coordinates, confidences, classIDs
boxes.append([x, y, int(width), int(height)])
# applying non maximum suppression
ind = cv.dnn.NMSBoxes(boxes, confidences, c_threshold, nms)
if len(ind) > 0:
# loop over the indexes that we want to keep
for i in ind.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[1][0], boxes[1][1])
(w, h) = (boxes[1][2], boxes[1][3])
for i in classIDs:
if i != 1:
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
# crop that part of image which contains desired object
image = image[y:y + h, x:x + w]
cv.imshow("Image", image)
path = '/path to folder'
cv.imwrite(os.path.join(path, 'PImage.jpg'), image)
Edited: As you can see there are many types of animals in this picture, I am trying to crop part of image that has dogs in it. I already got the coordinate bounding boxes related to dog parts(which means that I know where is the location of the rectangle that has dog in it as indicated in the photo)
I want to crop those rectangles that I indicated in the image. Dog has class id=1. I have class cat and other animals with different indexes.
Your loop is incorrect.
classID= [1]
for i in classID:
Here you basically say for i in [1]:, which crops only for your first detection. Instead, you should loop over all detections. Assuming the rest of your code is correct, the following loops over all detections logged in classID, and only crops if it belongs to class 1.
for i in classID:
if i!=1:
I did type print(type(classID)) it tells that
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
As far as I know I can iterate on iterable objects, such as lists, single values such as int64 are not iterable.
How could I solve this problem?
The problem is you are losing the reference to the original image in the following line.
image = image[y:y + h, x:x + w]
instead you can create new variable for each dog image
dog_img = image[y:y + h, x:x + w]
Also while writing, you are writing by the same name, so it will overwrite the previous instance of the image, so try to make the name of the image to be dynamic like dog1.jpg, dog2.jpg ...
c = 0
for i in classIDs:
if i != 1:
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
# crop that part of image which contains desired object
dog_img = image[y:y + h, x:x + w]
cv.imshow("Image", dog_img )
path = '/path to folder'
c +=1
cv.imwrite(os.path.join(path, 'PImage'+str(c)+'.jpg'), dog_img )

How to find table like structure in image

I have different type of invoice files, I want to find table in each invoice file. In this table position is not constant. So I go for image processing. First I tried to convert my invoice into image, then I found contour based on table borders, Finally I can catch table position.
For the task I used below code.
with Image(page) as page_image:
page_image.alpha_channel = False #eliminates transperancy
img_buffer=np.asarray(bytearray(page_image.make_blob()), dtype=np.uint8)
img = cv2.imdecode(img_buffer, cv2.IMREAD_UNCHANGED)
ret, thresh = cv2.threshold(img, 127, 255, 0)
im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
# get rectangle bounding contour
[x, y, w, h] = cv2.boundingRect(contour)
# Don't plot small false positives that aren't text
if (w >thresh1 and h> thresh2):
margin.append([x, y, x + w, y + h])
#data cleanup on margin to extract required position values.
In this code thresh1, thresh2 i'll update based on the file.
So using this code I can successfully read positions of tables in images, using this position i'll work on my invoice pdf file. For example
Sample 1:
Sample 2:
Sample 3:
Sample 1:
Sample 2:
Sample 3:
But, now I have a new format which doesn't have any borders but it's a table. How to solve this? Because my entire operation depends only on borders of the tables. But now I don't have a table borders. How can I achieve this? I don't have any idea to move out from this problem. My question is, Is there any way to find position based on table structure?.
For example My problem input looks like below:
I would like to find its position like below:
How can I solve this?
It is really appreciable to give me an idea to solve the problem.
Thanks in advance.
Vaibhav is right. You can experiment with the different morphological transforms to extract or group pixels into different shapes, lines, etc. For example, the approach can be the following:
Start from the Dilation to convert the text into the solid spots.
Then apply the findContours function as a next step to find text
bounding boxes.
After having the text bounding boxes it is possible to apply some
heuristics algorithm to cluster the text boxes into groups by their
coordinates. This way you can find a groups of text areas aligned
into rows and columns.
Then you can apply sorting by x and y coordinates and/or some
analysis to the groups to try to find if the grouped text boxes can
form a table.
I wrote a small sample illustrating the idea. I hope the code is self explanatory. I've put some comments there too.
import os
import cv2
import imutils
# This only works if there's only one table on a page
# Important parameters:
# - morph_size
# - min_text_height_limit
# - max_text_height_limit
# - cell_threshold
# - min_columns
def pre_process_image(img, save_in_file, morph_size=(8, 8)):
# get rid of the color
pre = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Otsu threshold
pre = cv2.threshold(pre, 250, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# dilate the text to make it solid spot
cpy = pre.copy()
struct = cv2.getStructuringElement(cv2.MORPH_RECT, morph_size)
cpy = cv2.dilate(~cpy, struct, anchor=(-1, -1), iterations=1)
pre = ~cpy
if save_in_file is not None:
cv2.imwrite(save_in_file, pre)
return pre
def find_text_boxes(pre, min_text_height_limit=6, max_text_height_limit=40):
# Looking for the text spots contours
# OpenCV 3
# img, contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# OpenCV 4
contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Getting the texts bounding boxes based on the text size assumptions
boxes = []
for contour in contours:
box = cv2.boundingRect(contour)
h = box[3]
if min_text_height_limit < h < max_text_height_limit:
return boxes
def find_table_in_boxes(boxes, cell_threshold=10, min_columns=2):
rows = {}
cols = {}
# Clustering the bounding boxes by their positions
for box in boxes:
(x, y, w, h) = box
col_key = x // cell_threshold
row_key = y // cell_threshold
cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]
# Filtering out the clusters having less than 2 cols
table_cells = list(filter(lambda r: len(r) >= min_columns, rows.values()))
# Sorting the row cells by x coord
table_cells = [list(sorted(tb)) for tb in table_cells]
# Sorting rows by the y coord
table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))
return table_cells
def build_lines(table_cells):
if table_cells is None or len(table_cells) <= 0:
return [], []
max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]
max_last_row_height_box = max(table_cells[-1], key=lambda b: b[3])
max_y = max_last_row_height_box[1] + max_last_row_height_box[3]
hor_lines = []
ver_lines = []
for box in table_cells:
x = box[0][0]
y = box[0][1]
hor_lines.append((x, y, max_x, y))
for box in table_cells[0]:
x = box[0]
y = box[1]
ver_lines.append((x, y, x, max_y))
(x, y, w, h) = table_cells[0][-1]
ver_lines.append((max_x, y, max_x, max_y))
(x, y, w, h) = table_cells[0][0]
hor_lines.append((x, max_y, max_x, max_y))
return hor_lines, ver_lines
if __name__ == "__main__":
in_file = os.path.join("data", "page.jpg")
pre_file = os.path.join("data", "pre.png")
out_file = os.path.join("data", "out.png")
img = cv2.imread(os.path.join(in_file))
pre_processed = pre_process_image(img, pre_file)
text_boxes = find_text_boxes(pre_processed)
cells = find_table_in_boxes(text_boxes)
hor_lines, ver_lines = build_lines(cells)
# Visualize the result
vis = img.copy()
# for box in text_boxes:
# (x, y, w, h) = box
# cv2.rectangle(vis, (x, y), (x + w - 2, y + h - 2), (0, 255, 0), 1)
for line in hor_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
for line in ver_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
cv2.imwrite(out_file, vis)
I've got the following output:
Of course to make the algorithm more robust and applicable to a variety of different input images it has to be adjusted correspondingly.
Update: Updated the code with respect to the OpenCV API changes for findContours. If you have older version of OpenCV installed - use the corresponding call. Related post.
You can try applying some morphological transforms (such as Dilation, Erosion or Gaussian Blur) as a pre-processing step before your findContours function
For example
blur = cv2.GaussianBlur(g, (3, 3), 0)
ret, thresh1 = cv2.threshold(blur, 150, 255, cv2.THRESH_BINARY)
bitwise = cv2.bitwise_not(thresh1)
erosion = cv2.erode(bitwise, np.ones((1, 1) ,np.uint8), iterations=5)
dilation = cv2.dilate(erosion, np.ones((3, 3) ,np.uint8), iterations=5)
The last argument, iterations shows the degree of dilation/erosion that will take place (in your case, on the text). Having a small value will results in small independent contours even within an alphabet and large values will club many nearby elements. You need to find the ideal value so that only that block of your image gets.
Please note that I've taken 150 as the threshold parameter because I've been working on extracting text from images with varying backgrounds and this worked out better. You can choose to continue with the value you've taken since it's a black & white image.
There are many types of tables in the document images with too much variations and layouts. No matter how many rules you write, there will always appear a table for which your rules will fail. These types of problems are genrally solved using ML(Machine Learning) based solutions. You can find many pre-implemented codes on github for solving the problem of detecting tables in the images using ML or DL (Deep Learning).
Here is my code along with the deep learning models, the model can detect various types of tables as well as the structure cells from the tables:
The approach achieves state of the art on various public datasets right now (10th May 2020) as far as the accuracy is concerned
More details :
this would be helpful for you.
I've drawn a bounding box for each word in my invoice, then I will chose only fields that I want. You can use for that ROI (Region Of Interest)
import pytesseract
import cv2
img = cv2.imread(r'path\Invoice2.png')
d = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 1)
cv2.imshow('img', img)
You will get this output:

putting the detected face into an image and display in another window in opencv using python

i'm new to image processing and opencv, but so far the easy to understand functions and good documentation have enabled me to try out and understand upto some level code like facedetection etc.
Now when i detect the faces in the webcam video stream, the program draws a square around the face. Now i want that much area of the image, in the square around the face, to be created as an another image. From what i've been doing, i'm getting a rectangular area of the image in which the face is not even present.
i've used the cv.GetSubRect() and understood its use. Like for eg:
sub=cv.GetSubRect(img, (700,525,200,119))
gives me the cropped picture of my eye.
But i can't get the face in my face and eye detection program.
Here's what i've done:
min_size = (17,17)
#max_size = (30,30)
image_scale = 2
haar_scale = 2
min_neighbors = 2
haar_flags = 0
# Allocate the temporary images
gray = cv.CreateImage((image.width, image.height), 8, 1)
smallImage = cv.CreateImage((cv.Round(image.width / image_scale),cv.Round (image.height / image_scale)), 8 ,1)
#eyeregion = cv.CreateImage((cv.Round(image.width / image_scale),cv.Round (image.height / image_scale)), 8 ,1)
# Convert color input image to grayscale
cv.CvtColor(image, gray, cv.CV_BGR2GRAY)
# Scale input image for faster processing
cv.Resize(gray, smallImage, cv.CV_INTER_LINEAR)
# Equalize the histogram
cv.EqualizeHist(smallImage, smallImage)
# Detect the faces
faces = cv.HaarDetectObjects(smallImage, faceCascade, cv.CreateMemStorage(0),
haar_scale, min_neighbors, haar_flags, min_size)
#, max_size)
# If faces are found
if faces:
for ((x, y, w, h), n) in faces:
# the input to cv.HaarDetectObjects was resized, so scale the
# bounding box of each face and convert it to two CvPoints
pt1 = (int(x * image_scale), int(y * image_scale))
pt2 = (int((x + w) * image_scale), int((y + h) * image_scale))
cv.Rectangle(image, pt1, pt2, cv.RGB(255, 0, 0), 3, 4, 0)
face_region = cv.GetSubRect(image,(x,int(y + (h/4)),w,int(h/2)))
cv.SetImageROI(image, (pt1[0],
pt2[0] - pt1[0],
int((pt2[1] - pt1[1]) * 0.7)))
eyes = cv.HaarDetectObjects(image, eyeCascade,
eyes_haar_scale, eyes_min_neighbors,
eyes_haar_flags, eyes_min_size)
if eyes:
# For each eye found
for eye in eyes:
eye[0][0],eye[0][1] are x,y co-ordinates of the top-left corner of detected eye
eye[0][2],eye[0][3] are the width and height of the cvRect of the detected eye region (i mean c'mon, that can be made out from the for loop of the face detection)
# Draw a rectangle around the eye
ept1 = (eye[0][0],eye[0][1])
ept2 = ((eye[0][0]+eye[0][2]),(eye[0][1]+eye[0][3]))
cv.Rectangle(image,ept1,ept2,cv.RGB(0,0,255),1,8,0) # This is working..
ea = ept1[0]
eb = ept1[1]
ec = (ept2[0]-ept1[0])
ed = (ept2[1]-ept1[1])
# i've tried multiplying with image_scale to get the eye region within
# the window of eye but still i'm getting just a top-left area of the
# image, top-left to my head. It does make sense to multiply with image_scale right?
eyeregion=cv.GetSubRect(image, (ea,eb,ec,ed))
I hope this code is from OpenCV/samples/Python. There is a small mistake in the arguments you have given for the co-ordinates inside cv.GetSubRect. Please replace last two lines of above program with following:
face_region = cv.GetSubRect(image,(a,b,c,d))
Make sure, you have no false detection or multiple detection.

