Python: Accelerate numpy brute force 2d image searching

Python: Accelerate numpy brute force 2d image searching - python

I have two images, where part of one image is present in another, however the position is not necessarily equal.
These two example images are significantly less complex than the typical drawing, so no keypoint feature matching methods will work. The objective is to find the coordinates of the top left corner of the second image that optimally joins to the two images together.
My approach is simply brute force, with a count of the black pixels to determine when the most amount of overlap has been achieved. This method works most the time, however for large images is obviously extremely slow.
Im hoping to accelerate this either with GPU acceleration or just optimisation of the method. Numba #jit decorators arent compatible with mask based numpy indexing so that wont work - the only obvious thing I can think of is cython. Cupy has also not produced good results.
Separate to reducing the number of loops (reducing search space or increasing step) are there any obvious accelerations that can be made?
def get_overlap_box(box1, box2):
# Unpack the coordinates of the two boxes
x1, y1, w1, h1 = box1
x2, y2, w2, h2 = box2
# Calculate the horizontal overlap
x_overlap = max(0, min(x1 + w1, x2 + w2) - max(x1, x2))
# Calculate the vertical overlap
y_overlap = max(0, min(y1 + h1, y2 + h2) - max(y1, y2))
# Calculate the top-left corner of the overlap box
x_overlap_start = max(x1, x2)
y_overlap_start = max(y1, y2)
return [x_overlap_start, y_overlap_start, x_overlap, y_overlap]
def find_join_point(img1, img2, step=2):
im1h, im1w = img1.shape
im2h, im2w = img2.shape
canvas1 = img1.copy()
canvas2 = img2.copy()
count = []
black_pixel_count_dict_reverse = {}
for x in tqdm.tqdm(range(- im2w, im1w, step)):
for y in range(- im2h, im1h, step):
# get the coordinates of the two images relative to the top left corner of the first image
box2 = [x, y, im2w, im2h]
box1 = [0, 0, im1w , im1h]
# Get coordinates of the overlapping box relative to the first image
box3 = get_overlap_box(box1, box2)
# extract the overlapping region from both canvases for comaparison
sampleregion1 = canvas1[box3[1]:box3[3] + box3[1], box3[0]:box3[2] + box3[0]]
replacement = sampleregion1.copy()
sampleregion2 = canvas2[box3[1] - y:box3[3] + box3[1] - y, box3[0] - x:box3[2] + box3[0] - x]
#Dont continue for completely white sample regions
if np.all(sampleregion1 == 255) or np.all(sampleregion2 == 255):
continue
# merge the two overlapping regions
sampleregion1[sampleregion2 == 0] = 0
# replace the orignal with the new sample region
canvas1[box3[1]:box3[3] + box3[1], box3[0]:box3[2] + box3[0]] = sampleregion1
# count the black pixels across both images
black_pixel_count = np.count_nonzero(canvas1 == 0) - np.count_nonzero(sampleregion2 == 0)
black_pixel_count_dict_reverse[black_pixel_count] = [x, y]
# reset the first canvas
canvas1[box3[1]:box3[3] + box3[1], box3[0]:box3[2] + box3[0]] = replacement
# keep the count for analysis later
count.extend([black_pixel_count])
# find the x,y position with the minimum black pixel count
optimum = black_pixel_count_dict_reverse[min(count)]
print(optimum)
return optimum
Thanks in advance.

If there is only translation between the two images and no rotation, what you need is to find the maximum of the cross correlation between the two images.
This is handled nicely by skimage.registration.phase_cross_correlation, and my solution below is taken from scikit-image's image registration example.
import skimage.registration
shift, _, _ = skimage.registration.phase_cross_correlation(img1, img2)
You can also use the upsample_factor keyword argument of phase_cross_correlation if you want subpixel precision of the shift between your two images.

Related

translate an image after rotation without using library

I try to rotate an image clockwise 45 degree and translate the image -50,-50.
Rotation process works fine:(I refer to this page:How do I rotate an image manually without using cv2.getRotationMatrix2D)
import numpy as np
import math
from scipy import ndimage
from PIL import Image
# inputs
img = ndimage.imread("A.png")
rotation_amount_degree = 45
# convert rotation amount to radian
rotation_amount_rad = rotation_amount_degree * np.pi / 180.0
# get dimension info
height, width, num_channels = img.shape
# create output image, for worst case size (45 degree)
max_len = int(math.sqrt(height*height + width*width))
rotated_image = np.zeros((max_len, max_len, num_channels))
#rotated_image = np.zeros((img.shape))
rotated_height, rotated_width, _ = rotated_image.shape
mid_row = int( (rotated_height+1)/2 )
mid_col = int( (rotated_width+1)/2 )
# for each pixel in output image, find which pixel
#it corresponds to in the input image
for r in range(rotated_height):
for c in range(rotated_width):
# apply rotation matrix, the other way
y = (r-mid_col)*math.cos(rotation_amount_rad) + (c-mid_row)*math.sin(rotation_amount_rad)
x = -(r-mid_col)*math.sin(rotation_amount_rad) + (c-mid_row)*math.cos(rotation_amount_rad)
# add offset
y += mid_col
x += mid_row
# get nearest index
#a better way is linear interpolation
x = round(x)
y = round(y)
#print(r, " ", c, " corresponds to-> " , y, " ", x)
# check if x/y corresponds to a valid pixel in input image
if (x >= 0 and y >= 0 and x < width and y < height):
rotated_image[r][c][:] = img[y][x][:]
# save output image
output_image = Image.fromarray(rotated_image.astype("uint8"))
output_image.save("rotated_image.png")
However, when I try to translate the image. I edited the above code to this:
if (x >= 0 and y >= 0 and x < width and y < height):
rotated_image[r-50][c-50][:] = img[y][x][:]
But I got something like this:
It seems the right and the bottom did not show the right pixel. How could I solve it?
Any suggestions would be highly appreciated.

The translation needs to be handled as a wholly separate step. Trying to translate the value from the source image doesn't account for newly created 0,0,0 (if RGB) valued pixels by the rotation.
Further, simply subtracting 50 from the rotated array index values, without validating them at that stage for positivity, is allowing for a negative valued index, which is fully supported by Python. That is why you are getting a "wrap" effect instead of a translation
You said your script rotated the image as intended, so while perhaps not the most efficient, the most intuitive is to simply shift the values of the image assembled after you rotate. You could test that the values for the new image remain positive after subtracting 50 and only saving the ones >= 0 or being cognizant of the fact that you are shifting the values downward by 50, any number less than 50 will be discarded and you get:
<what you in the block you said was functional then:>
translated_image = np.zeros((max_len, max_len, num_channels))
for i in range(0, rotated_height-50): # range(start, stop[, step])
for j in range(0, rotated_width-50):
translated_image[i+50][j+50][:] = rotated[i][j][:]
# save output image
output_image = Image.fromarray(translated_image.astype("uint8"))
output_image.save("rotated_translated_image.png")

How to draw repeated slanted lines

I need to draw slanted lines like this programmatically using opencv-python, and it has to be similar in terms of the slant angle and the distance between the lines:
If using OpenCV cv.line() i need to supply the function with the line's start and endpoint.
Following this StackOverflow accepted answer, I think I will be able to know those two points, but first I need to calculate the line equation itself.
So what I have done is first I calculate the slant angle of the line using the measure tool in ai (The actual image was given by the graphic designer as ai (adobe illustrator) file), and I got 67deg and I solve the gradient of the line. But the problem is I don't know how to get the horizontal spacing/distance between the lines. I needed that so i can supply the start.X. I used the illustrator, and try to measure the distance between the lines but how to map it to opencv coordinate?
Overall is my idea feasible? Or is there a better way to achieve this?
Update 1:
I managed to draw this experimental image:
And this is code:
def show_image_scaled(window_name,image,height,width):
cv2.namedWindow(window_name,cv2.WINDOW_NORMAL)
cv2.resizeWindow(window_name,width,height)
cv2.imshow(window_name,image)
cv2.waitKey(0)
cv2.destroyAllWindows()
def slanted_lines_background():
canvas = np.ones((200,300)) * 255
end_x = 0
start_y = 0
m = 2.35
end_x = 0
for x in range(0,canvas.shape[1],10):
start_x = x
end_y = start_y + compute_length(m,start_x,start_y,end_x)
cv2.line(canvas,(start_x,start_y),(end_x,end_y),(0,0,0),2)
show_image_scaled("Slant",canvas,200,300)
def compute_length(m,start_x,start_y,end_x=0):
c = start_y - (m * start_x)
length_square = (end_x - start_x)**2 + ((m *end_x) + c - start_y) ** 2
length = math.sqrt(length_square)
return int(length)
Still working on to fill the left part of the rectangle

This code "shades" every pixel in a given image to produce your hatched pattern. Don't worry about the math. It's mostly correct. I've checked the edge cases for small and wide lines. The sampling isn't exactly correct but nobody's gonna notice anyway because the imperfection amounts to small fractions of a pixel. And I've used numba to make it fast.
import numpy as np
from numba import njit, prange
#njit(parallel=True)
def hatch(im, angle=45, stride=10, dc=None):
stride = float(stride)
if dc is None:
dc = stride * 0.5
assert 0 <= dc <= stride
stride2 = stride / 2
dc2 = dc / 2
angle = angle / 180 * np.pi
c = np.cos(angle)
s = np.sin(angle)
(height, width) = im.shape[:2]
for y in prange(height):
for x in range(width):
# distance to origin along normal
dist_origin = c*x - s*y
# distance to center of nearest line
dist_center = stride2 - abs((dist_origin % stride) - stride2)
# distance to edge of nearest line
dist_edge = dist_center - dc2
# shade pixel, with antialiasing
# use edge-0.5 to edge+0.5 as "gradient" <=> 1-sized pixel straddles edge
# for thick/thin lines, needs hairline handling
# thin line -> gradient hits far edge of line / pixel may span both edges of line
# thick line -> gradient hits edge of adjacent line / pixel may span adjacent line
if dist_edge > 0.5: # background
val = 0
else: # pixel starts covering line
val = 0.5 - dist_edge
if dc < 1: # thin line, clipped to line width
val = min(val, dc)
elif stride - dc < 1: # thick line, little background
val = max(val, 1 - (stride - dc))
im[y,x] = val
canvas = np.zeros((128, 512), 'f4')
hatch(canvas, angle=-23, stride=5, dc=2.5)
# mind the gamma mapping before imshow

Tiling fixed-size rectangles to cover a given set of points

The problem - given a list of planar points [p_1, ..., p_n] and the dimensions of some rectangle w, h, find the minimal set of rectangles w, h that cover all points (edit - the rectangles are not rotated).
My inital solution was:
find the bounding-box of all points
divide the width and height of the bounding-box by the w, h of the given rectangle and round the number up to get the number of instances of the rectangle in x and y
to further optimize, go through all rectangles and delete the ones that have zero points inside them.
An example in Python:
def tile_rect(points, rect):
w, h = rect
xs = [p.x for p in points]
ys = [p.y for p in points]
bbox_w = abs(max(xs) - min(xs))
bbox_h = abs(max(ys) - min(ys))
n_x, n_y = ceil(bbox_w / w), ceil(bbox_h / h)
rect_xs = [(min(xs) + n * w for n in range(n_x)]
rect_ys = [(min(ys) + n * h for n in range(n_y)]
rects = remove_empty(rect_xs, rect_ys)
return rects
How can I do better? What algorithm can I use to decrease the number of rectangles?

To discretize the problem for integer programming, observe that given a rectangle we can slide it in the +x and +y directions without decreasing the coverage until the min x and the min y lines both have a point on them. Thus the integer program is just the standard min cover:
minimize sum_R x_R
subject to
for every point p, sum_{R contains p} x_R >= 1
x_R in {0, 1}
where R ranges over all rectangles whose min x is the x of some point and whose min y is the y of some point (not necessarily the same point).
Demo Python:
import random
from ortools.linear_solver import pywraplp
w = 0.1
h = 0.1
points = [(random.random(), random.random()) for _ in range(100)]
rectangles = [(x, y) for (x, _) in points for (_, y) in points]
solver = pywraplp.Solver.CreateSolver("min cover", "SCIP")
objective = solver.Objective()
constraints = [solver.RowConstraint(1, pywraplp.inf, str(p)) for p in points]
variables = [solver.BoolVar(str(r)) for r in rectangles]
for (x, y), var in zip(rectangles, variables):
objective.SetCoefficient(var, 1)
for (px, py), con in zip(points, constraints):
if x <= px <= x + w and y <= py <= y + h:
con.SetCoefficient(var, 1)
solver.Objective().SetMinimization()
solver.Solve()
scale = 6 * 72
margin = 72
print(
'<svg width="{}" height="{}">'.format(
margin + scale + margin, margin + scale + margin
)
)
print(
'<text x="{}" y="{}">{} rectangles</text>'.format(
margin // 2, margin // 2, round(objective.Value())
)
)
for x, y in points:
print(
'<circle cx="{}" cy="{}" r="3" fill="none" stroke="black"/>'.format(
margin + x * scale, margin + y * scale
)
)
for (x, y), var in zip(rectangles, variables):
if var.solution_value():
print(
'<rect x="{}" y="{}" width="{}" height="{}" fill="none" stroke="rgb({},{},{})"/>'.format(
margin + x * scale,
margin + y * scale,
w * scale,
h * scale,
random.randrange(192),
random.randrange(192),
random.randrange(192),
)
)
print("</svg>")
Example output:

Assuming an approximate, rather than optimal solution is acceptable, how about a routine generally like:
Until no points are left:
(1) Find the convex hull of the remaining points.
(2) Cover each point/s on the hull so the
covering rectangles extend "inward."
(Perhaps examine neighbouring hull points
to see if a rectangle can cover more than one.)
(3) Remove the newly covered points.
Clearly, the orientation of the covering rectangles has an effect on the procedure and result. I think there is a way to combine (1) and (3), or possibly rely on a nested convex hull, but I don't have too much experience with those.

This is can be transformed into a mostly standard set cover problem. The general steps are as follows, given n points in the plane.
First, generate all possible maximally inclusive rectangles, of which there are at most n^2, named R. The key insight is that given a point p1 with coordinates (x1, y1), use x1 as the leftmost bound for a set of rectangles. For all other points p2 with (x2,y2) where x1 <= x2 <= x1+w and where y1-h <= y2 <= y1+h, generate a rectangle ((x1, y2), (x1+w, y2+h)).
For each rectangle r generated, count the points included in that rectangle cover(r).
Choose a subset of the rectangles R, s, such that all points are in Union(r in s) cover(r)
Now, the tricky part is that last step. Fortunately, it is a standard problem and there are many algorithms suggested in the literature. For example, combinatorial optimization solvers (such as SAT solvers, MIP solvers, and Constraint programming solvers) can be used.
Note that the above re-formulation only works if it is ok for rectangles to cover each other. It might be the case that the generated set of rectangles is not enough to find the least set of rectangles that do not overlap.

Returning undistorted pixel coordinates rather than the image

I need to undistort the pixel coordinates of an image -- and need the corrected coordinates returned. I do not want an undistorted image returned-- just the corrected coordinates of the pixels. The camera is calibrated, and I have the camera intrinsic parameters, and the distortion matrix. I am using OpenCV in python 3
I have read up as much of the theory as I can find and questions here. Key info is:
https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
This pretty clearly describes the radial distortion and tangential distortion that needs to be considered.
radial:
x_{corrected} = x( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)
y_{corrected} = y( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)
Tangential:
x_{corrected} = x + [ 2p_1xy + p_2(r^2+2x^2)]
y_{corrected} = y + [ p_1(r^2+ 2y^2)+ 2p_2xy]
I suspect that I can't simply apply these corrections sequentially. Perhaps there is a function to do what I want to do directly, anyway -- and I'd love to hear about that.
I can't simply use the normal undistort procedure on the image, as I am attempting to apply an IR camera's distortion correction to the depth data from the same camera. If you undistort a depth image like this -- you split pixels across coordinates and the answer makes no sense. Hopefully I am on the right track with this.
The code so far:
import numpy as np
import cv2
imgIR = cv2.imread("/20190529-150017-305-1235-depth.png",0)
#you could try this on any image...
h, w = imgIR.shape[:2]
X = np.array([i for i in range(0,w)]*(h))
X = X.reshape(h, w)
Y = np.array([[i]*(w) for i in range(0,h)])
fx = 483.0 #x focal length
fy = 490.2
CentreX = 361.4 #optical centre of the image - x axis
CentreY = 275.6
#Relative to the optical centre, it is possible to determine the `#coordinates of each pixel in the image`
#then do the above operation without loops using a scalar subtraction
Xref = X - CentreX
Yref = Y - CentreY
#"scaling factor" refers to the relation between depth units and meters;
scalingFactor = 18.0/36.0 # 18pixels / 36 mm;
# I'm not sure what should be yet -- whether [pixels at the shelf]/mm
#or mm/[pixels at the shelf]
Z = imgIR / scalingFactor
#using numpy
Xcoord = np.multiply(Xref,Z/fx)
Ycoord = np.multiply(Yref,Z/fy)
#how to correct these coords for the radial and tangential distortion?
#parameters as returned for the distortion matrix using
cv2.calibrateCamera
dstvec = array([[-0.1225, -0.0159, 0.001616, -0.0018924,-0.00120696]])
What I am looking for is a new matrix of undistorted (radial and tangential distortion removed) X coordinates and a matrix of undistored Y coordinates -- with each matrix element representing one of the original pixels.
Thanks for your help!

I think you are looking for OpenCV's undistortPoints (https://amroamroamro.github.io/mexopencv/matlab/cv.undistortPoints.html).
px_distorted = np.zeros((1, 1, 2), dtype=np.float32)
px_distorted[0][0][0] = x_coordinate
px_distorted[0][0][1] = y_coordinate
px_undistorted = cv2.undistortPoints(px_distorted, intrinsics_mat, dist_coefficients)

Calculating percentage of Bounding box overlap, for image detector evaluation

In testing an object detection algorithm in large images, we check our detected bounding boxes against the coordinates given for the ground truth rectangles.
According to the Pascal VOC challenges, there's this:
A predicted bounding box is considered correct if it overlaps more
than 50% with a ground-truth bounding box, otherwise the bounding box
is considered a false positive detection. Multiple detections are
penalized. If a system predicts several bounding boxes that overlap
with a single ground-truth bounding box, only one prediction is
considered correct, the others are considered false positives.
This means that we need to calculate the percentage of overlap. Does this mean that the ground truth box is 50% covered by the detected boundary box? Or that 50% of the bounding box is absorbed by the ground truth box?
I've searched but I haven't found a standard algorithm for this - which is surprising because I would have thought that this is something pretty common in computer vision. (I'm new to it). Have I missed it? Does anyone know what the standard algorithm is for this type of problem?

For axis-aligned bounding boxes it is relatively simple. "Axis-aligned" means that the bounding box isn't rotated; or in other words that the boxes lines are parallel to the axes. Here's how to calculate the IoU of two axis-aligned bounding boxes.
def get_iou(bb1, bb2):
"""
Calculate the Intersection over Union (IoU) of two bounding boxes.
Parameters
----------
bb1 : dict
Keys: {'x1', 'x2', 'y1', 'y2'}
The (x1, y1) position is at the top left corner,
the (x2, y2) position is at the bottom right corner
bb2 : dict
Keys: {'x1', 'x2', 'y1', 'y2'}
The (x, y) position is at the top left corner,
the (x2, y2) position is at the bottom right corner
Returns
-------
float
in [0, 1]
"""
assert bb1['x1'] < bb1['x2']
assert bb1['y1'] < bb1['y2']
assert bb2['x1'] < bb2['x2']
assert bb2['y1'] < bb2['y2']
# determine the coordinates of the intersection rectangle
x_left = max(bb1['x1'], bb2['x1'])
y_top = max(bb1['y1'], bb2['y1'])
x_right = min(bb1['x2'], bb2['x2'])
y_bottom = min(bb1['y2'], bb2['y2'])
if x_right < x_left or y_bottom < y_top:
return 0.0
# The intersection of two axis-aligned bounding boxes is always an
# axis-aligned bounding box
intersection_area = (x_right - x_left) * (y_bottom - y_top)
# compute the area of both AABBs
bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
assert iou >= 0.0
assert iou <= 1.0
return iou
Explanation
Images are from this answer

The top-voted answer has a mathematical error if you are working with screen (pixel) coordinates! I submitted an edit a few weeks ago with a long explanation for all readers so that they would understand the math. But that edit wasn't understood by the reviewers and was removed, so I've submitted the same edit again, but more briefly summarized this time. (Update: Rejected 2vs1 because it was deemed a "substantial change", heh).
So I will completely explain the BIG problem with its math here in this separate answer.
So, yes, in general, the top-voted answer is correct and is a good way to calculate the IoU. But (as other people have pointed out too) its math is completely incorrect for computer screens. You cannot just do (x2 - x1) * (y2 - y1), since that will not produce the correct area calculations whatsoever. Screen indexing starts at pixel 0,0 and ends at width-1,height-1. The range of screen coordinates is inclusive:inclusive (inclusive on both ends), so a range from 0 to 10 in pixel coordinates is actually 11 pixels wide, because it includes 0 1 2 3 4 5 6 7 8 9 10 (11 items). So, to calculate the area of screen coordinates, you MUST therefore add +1 to each dimension, as follows: (x2 - x1 + 1) * (y2 - y1 + 1).
If you're working in some other coordinate system where the range is not inclusive (such as an inclusive:exclusive system where 0 to 10 means "elements 0-9 but not 10"), then this extra math would NOT be necessary. But most likely, you are processing pixel-based bounding boxes. Well, screen coordinates start at 0,0 and go up from there.
A 1920x1080 screen is indexed from 0 (first pixel) to 1919 (last pixel horizontally) and from 0 (first pixel) to 1079 (last pixel vertically).
So if we have a rectangle in "pixel coordinate space", to calculate its area we must add 1 in each direction. Otherwise, we get the wrong answer for the area calculation.
Imagine that our 1920x1080 screen has a pixel-coordinate based rectangle with left=0,top=0,right=1919,bottom=1079 (covering all pixels on the whole screen).
Well, we know that 1920x1080 pixels is 2073600 pixels, which is the correct area of a 1080p screen.
But with the wrong math area = (x_right - x_left) * (y_bottom - y_top), we would get: (1919 - 0) * (1079 - 0) = 1919 * 1079 = 2070601 pixels! That's wrong!
That is why we must add +1 to each calculation, which gives us the following corrected math: area = (x_right - x_left + 1) * (y_bottom - y_top + 1), giving us: (1919 - 0 + 1) * (1079 - 0 + 1) = 1920 * 1080 = 2073600 pixels! And that's indeed the correct answer!
The shortest possible summary is: Pixel coordinate ranges are inclusive:inclusive, so we must add + 1 to each axis if we want the true area of a pixel coordinate range.
For a few more details about why +1 is needed, see Jindil's answer: https://stackoverflow.com/a/51730512/8874388
As well as this pyimagesearch article:
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
And this GitHub comment:
https://github.com/AlexeyAB/darknet/issues/3995#issuecomment-535697357
Since the fixed math wasn't approved, anyone who copies the code from the top-voted answer hopefully sees this answer, and will be able to bugfix it themselves, by simply copying the bugfixed assertions and area-calculation lines below, which have been fixed for inclusive:inclusive (pixel) coordinate ranges:
assert bb1['x1'] <= bb1['x2']
assert bb1['y1'] <= bb1['y2']
assert bb2['x1'] <= bb2['x2']
assert bb2['y1'] <= bb2['y2']
................................................
# The intersection of two axis-aligned bounding boxes is always an
# axis-aligned bounding box.
# NOTE: We MUST ALWAYS add +1 to calculate area when working in
# screen coordinates, since 0,0 is the top left pixel, and w-1,h-1
# is the bottom right pixel. If we DON'T add +1, the result is wrong.
intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)
# compute the area of both AABBs
bb1_area = (bb1['x2'] - bb1['x1'] + 1) * (bb1['y2'] - bb1['y1'] + 1)
bb2_area = (bb2['x2'] - bb2['x1'] + 1) * (bb2['y2'] - bb2['y1'] + 1)

A Simple way for any kind of polygon.
(Image is not drawn to scale)
from shapely.geometry import Polygon
def calculate_iou(box_1, box_2):
poly_1 = Polygon(box_1)
poly_2 = Polygon(box_2)
iou = poly_1.intersection(poly_2).area / poly_1.union(poly_2).area
return iou
box_1 = [[511, 41], [577, 41], [577, 76], [511, 76]]
box_2 = [[544, 59], [610, 59], [610, 94], [544, 94]]
print(calculate_iou(box_1, box_2))
The result will be 0.138211... which means 13.82%.
You can also use shapely.geometry.box if your box is rectangular [minx, miny, maxx, maxy] shape.
Note: The origin of Coordinate Systems in shapely library is left-bottom where origin in computer graphics is left-top. This difference does not affect the IoU calculation, but if you do other types of calculation, this information might be helpful.

You can calculate with torchvision as follows. The bbox is prepared in the format of [x1, y1, x2, y2].
import torch
import torchvision.ops.boxes as bops
box1 = torch.tensor([[511, 41, 577, 76]], dtype=torch.float)
box2 = torch.tensor([[544, 59, 610, 94]], dtype=torch.float)
iou = bops.box_iou(box1, box2)
# tensor([[0.1382]])

For the intersection distance, shouldn't we add a +1 so as to have
intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)
(same for the AABB)
Like on this pyimage search post
I agree (x_right - x_left) x (y_bottom - y_top) works in mathematics with point coordinates but since we deal with pixels it is I think different.
Consider a 1D example :
2 points : x1 = 1 and x2 = 3, the distance is indeed x2-x1 = 2
2 pixels of index : i1 = 1 and i2 = 3, the segment from pixel i1 to i2 contains 3 pixels ie l = i2 - i1 + 1
EDIT: I recently got to know that this is a "little-square" approach.
If however you consider pixels as point-samples (ie the bounding box corner would be at the centre of the pixel as apparently in matplotlib) then you don't need the +1.
See this comment and this illustration

import numpy as np
def box_area(arr):
# arr: np.array([[x1, y1, x2, y2]])
width = arr[:, 2] - arr[:, 0]
height = arr[:, 3] - arr[:, 1]
return width * height
def _box_inter_union(arr1, arr2):
# arr1 of [N, 4]
# arr2 of [N, 4]
area1 = box_area(arr1)
area2 = box_area(arr2)
# Intersection
top_left = np.maximum(arr1[:, :2], arr2[:, :2]) # [[x, y]]
bottom_right = np.minimum(arr1[:, 2:], arr2[:, 2:]) # [[x, y]]
wh = bottom_right - top_left
# clip: if boxes not overlap then make it zero
intersection = wh[:, 0].clip(0) * wh[:, 1].clip(0)
#union
union = area1 + area2 - intersection
return intersection, union
def box_iou(arr1, arr2):
# arr1[N, 4]
# arr2[N, 4]
# N = number of bounding boxes
assert(arr1[:, 2:] > arr[:, :2]).all()
assert(arr2[:, 2:] > arr[:, :2]).all()
inter, union = _box_inter_union(arr1, arr2)
iou = inter / union
print(iou)
box1 = np.array([[10, 10, 80, 80]])
box2 = np.array([[20, 20, 100, 100]])
box_iou(box1, box2)
reference: https://pytorch.org/vision/stable/_modules/torchvision/ops/boxes.html#nms

In the snippet below, I construct a polygon along the edges of the first box. I then use Matplotlib to clip the polygon to the second box. The resulting polygon contains four vertices, but we are only interested in the top left and bottom right corners, so I take the max and the min of the coordinates to get a bounding box, which is returned to the user.
import numpy as np
from matplotlib import path, transforms
def clip_boxes(box0, box1):
path_coords = np.array([[box0[0, 0], box0[0, 1]],
[box0[1, 0], box0[0, 1]],
[box0[1, 0], box0[1, 1]],
[box0[0, 0], box0[1, 1]]])
poly = path.Path(np.vstack((path_coords[:, 0],
path_coords[:, 1])).T, closed=True)
clip_rect = transforms.Bbox(box1)
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
return np.array([np.min(poly_clipped, axis=0),
np.max(poly_clipped, axis=0)])
box0 = np.array([[0, 0], [1, 1]])
box1 = np.array([[0, 0], [0.5, 0.5]])
print clip_boxes(box0, box1)

Maybe one for the more visually inclined, like me . . .
Say your ROIs are atop an HD Rez surface. You can make a matrix for each in numpy like . .
roi1 = np.zeros((1080, 1920))
Then "fill in" ROI area like . . .
roi1[y1:y2, x1:x2] = 1 # y1,x1 & y2,x2 are the ROI corners
Repeat for roi2. Then calculate IoU with a this function . . .
def calc_iou(roi1, roi2):
# Sum all "white" pixels clipped to 1
U = np.sum(np.clip(roi1 + roi2, 0 , 1))
# +1 for each overlapping white pixel (these will = 2)
I = len(np.where(roi1 + roi2 == 2)[0])
return(I/U)

how about this approach? Could be extended to any number of unioned shapes
surface = np.zeros([1024,1024])
surface[1:1+10, 1:1+10] += 1
surface[100:100+500, 100:100+100] += 1
unionArea = (surface==2).sum()
print(unionArea)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Accelerate numpy brute force 2d image searching - python

Related

translate an image after rotation without using library

How to draw repeated slanted lines

Tiling fixed-size rectangles to cover a given set of points

Returning undistorted pixel coordinates rather than the image

Calculating percentage of Bounding box overlap, for image detector evaluation

Categories

Resources