How to draw repeated slanted lines - python

I need to draw slanted lines like this programmatically using opencv-python, and it has to be similar in terms of the slant angle and the distance between the lines:
If using OpenCV cv.line() i need to supply the function with the line's start and endpoint.
Following this StackOverflow accepted answer, I think I will be able to know those two points, but first I need to calculate the line equation itself.
So what I have done is first I calculate the slant angle of the line using the measure tool in ai (The actual image was given by the graphic designer as ai (adobe illustrator) file), and I got 67deg and I solve the gradient of the line. But the problem is I don't know how to get the horizontal spacing/distance between the lines. I needed that so i can supply the start.X. I used the illustrator, and try to measure the distance between the lines but how to map it to opencv coordinate?
Overall is my idea feasible? Or is there a better way to achieve this?
Update 1:
I managed to draw this experimental image:
And this is code:
def show_image_scaled(window_name,image,height,width):
cv2.namedWindow(window_name,cv2.WINDOW_NORMAL)
cv2.resizeWindow(window_name,width,height)
cv2.imshow(window_name,image)
cv2.waitKey(0)
cv2.destroyAllWindows()
def slanted_lines_background():
canvas = np.ones((200,300)) * 255
end_x = 0
start_y = 0
m = 2.35
end_x = 0
for x in range(0,canvas.shape[1],10):
start_x = x
end_y = start_y + compute_length(m,start_x,start_y,end_x)
cv2.line(canvas,(start_x,start_y),(end_x,end_y),(0,0,0),2)
show_image_scaled("Slant",canvas,200,300)
def compute_length(m,start_x,start_y,end_x=0):
c = start_y - (m * start_x)
length_square = (end_x - start_x)**2 + ((m *end_x) + c - start_y) ** 2
length = math.sqrt(length_square)
return int(length)
Still working on to fill the left part of the rectangle

This code "shades" every pixel in a given image to produce your hatched pattern. Don't worry about the math. It's mostly correct. I've checked the edge cases for small and wide lines. The sampling isn't exactly correct but nobody's gonna notice anyway because the imperfection amounts to small fractions of a pixel. And I've used numba to make it fast.
import numpy as np
from numba import njit, prange
#njit(parallel=True)
def hatch(im, angle=45, stride=10, dc=None):
stride = float(stride)
if dc is None:
dc = stride * 0.5
assert 0 <= dc <= stride
stride2 = stride / 2
dc2 = dc / 2
angle = angle / 180 * np.pi
c = np.cos(angle)
s = np.sin(angle)
(height, width) = im.shape[:2]
for y in prange(height):
for x in range(width):
# distance to origin along normal
dist_origin = c*x - s*y
# distance to center of nearest line
dist_center = stride2 - abs((dist_origin % stride) - stride2)
# distance to edge of nearest line
dist_edge = dist_center - dc2
# shade pixel, with antialiasing
# use edge-0.5 to edge+0.5 as "gradient" <=> 1-sized pixel straddles edge
# for thick/thin lines, needs hairline handling
# thin line -> gradient hits far edge of line / pixel may span both edges of line
# thick line -> gradient hits edge of adjacent line / pixel may span adjacent line
if dist_edge > 0.5: # background
val = 0
else: # pixel starts covering line
val = 0.5 - dist_edge
if dc < 1: # thin line, clipped to line width
val = min(val, dc)
elif stride - dc < 1: # thick line, little background
val = max(val, 1 - (stride - dc))
im[y,x] = val
canvas = np.zeros((128, 512), 'f4')
hatch(canvas, angle=-23, stride=5, dc=2.5)
# mind the gamma mapping before imshow

Related

translate an image after rotation without using library

I try to rotate an image clockwise 45 degree and translate the image -50,-50.
Rotation process works fine:(I refer to this page:How do I rotate an image manually without using cv2.getRotationMatrix2D)
import numpy as np
import math
from scipy import ndimage
from PIL import Image
# inputs
img = ndimage.imread("A.png")
rotation_amount_degree = 45
# convert rotation amount to radian
rotation_amount_rad = rotation_amount_degree * np.pi / 180.0
# get dimension info
height, width, num_channels = img.shape
# create output image, for worst case size (45 degree)
max_len = int(math.sqrt(height*height + width*width))
rotated_image = np.zeros((max_len, max_len, num_channels))
#rotated_image = np.zeros((img.shape))
rotated_height, rotated_width, _ = rotated_image.shape
mid_row = int( (rotated_height+1)/2 )
mid_col = int( (rotated_width+1)/2 )
# for each pixel in output image, find which pixel
#it corresponds to in the input image
for r in range(rotated_height):
for c in range(rotated_width):
# apply rotation matrix, the other way
y = (r-mid_col)*math.cos(rotation_amount_rad) + (c-mid_row)*math.sin(rotation_amount_rad)
x = -(r-mid_col)*math.sin(rotation_amount_rad) + (c-mid_row)*math.cos(rotation_amount_rad)
# add offset
y += mid_col
x += mid_row
# get nearest index
#a better way is linear interpolation
x = round(x)
y = round(y)
#print(r, " ", c, " corresponds to-> " , y, " ", x)
# check if x/y corresponds to a valid pixel in input image
if (x >= 0 and y >= 0 and x < width and y < height):
rotated_image[r][c][:] = img[y][x][:]
# save output image
output_image = Image.fromarray(rotated_image.astype("uint8"))
output_image.save("rotated_image.png")
However, when I try to translate the image. I edited the above code to this:
if (x >= 0 and y >= 0 and x < width and y < height):
rotated_image[r-50][c-50][:] = img[y][x][:]
But I got something like this:
It seems the right and the bottom did not show the right pixel. How could I solve it?
Any suggestions would be highly appreciated.
The translation needs to be handled as a wholly separate step. Trying to translate the value from the source image doesn't account for newly created 0,0,0 (if RGB) valued pixels by the rotation.
Further, simply subtracting 50 from the rotated array index values, without validating them at that stage for positivity, is allowing for a negative valued index, which is fully supported by Python. That is why you are getting a "wrap" effect instead of a translation
You said your script rotated the image as intended, so while perhaps not the most efficient, the most intuitive is to simply shift the values of the image assembled after you rotate. You could test that the values for the new image remain positive after subtracting 50 and only saving the ones >= 0 or being cognizant of the fact that you are shifting the values downward by 50, any number less than 50 will be discarded and you get:
<what you in the block you said was functional then:>
translated_image = np.zeros((max_len, max_len, num_channels))
for i in range(0, rotated_height-50): # range(start, stop[, step])
for j in range(0, rotated_width-50):
translated_image[i+50][j+50][:] = rotated[i][j][:]
# save output image
output_image = Image.fromarray(translated_image.astype("uint8"))
output_image.save("rotated_translated_image.png")

Create random pattern gradient maps?

So I have been working on some procedural generating and i managed to create a circular monochrome gradient map which i use to generate other maps.
def create_circular_gradient(self, world):
center_x, center_y = self.mapSize[1] // 2, self.mapSize[0] // 2
circle_grad = np.zeros_like(world)
for y in range(world.shape[0]):
for x in range(world.shape[1]):
distx = abs(x - center_x)
disty = abs(y - center_y)
dist = math.sqrt(distx * distx + disty * disty)
circle_grad[y][x] = dist
# get it between -1 and 1
max_grad = np.max(circle_grad)
circle_grad = circle_grad / max_grad
circle_grad -= 0.5
circle_grad *= 2.0
circle_grad = -circle_grad
# shrink gradient
for y in range(world.shape[0]):
for x in range(world.shape[1]):
if circle_grad[y][x] > 0:
circle_grad[y][x] *= 20
# get it between 0 and 1
max_grad = np.max(circle_grad)
circle_grad = circle_grad / max_grad
grad_world = self.apply_gradient_noise(world, circle_grad)
return grad_world
Now my question is, how exactly would i need to modify this to create a swirly gradient in a random way? and even maybe a box or a diverse linear gradients?
Note that world is a randomly generated 2d array (and can be any shape like 100x100 or even 100x50 but of course i use it a whole lot bigger) and to the circle gradient, I add a perlin noise to randomize the generation. The code that randomizes it will work for any map but i'm not sure where to start creating different shapes of gradient.
Edit
Ignore the magic numbers, that was the only way i found how to create a circular monochrome gradient

Implementing log Gabor filter bank

I was reading this paper "Self-Invertible 2D Log-Gabor Wavelets" it defines 2D log gabor filter as such:
The paper also states that the filter only covers one side of the frequency space and shows that in this image
On my attempt to implement the filter I get results that do not match with what is said in the paper. Let me start with my implementation then I will state the problems.
Implementation:
I created a 2d array that contains the filter and transformed each index so that the origin of the frequency domain is at the center of the array with positive x-axis going right and positive y-axis going up.
number_scales = 5 # scale resolution
number_orientations = 9 # orientation resolution
N = constantDim # image dimensions
def getLogGaborKernal(scale, angle, logfun=math.log2, norm = True):
# setup up filter configuration
center_scale = logfun(N) - scale
center_angle = ((np.pi/number_orientations) * angle) if (scale % 2) \
else ((np.pi/number_orientations) * (angle+0.5))
scale_bandwidth = 0.996 * math.sqrt(2/3)
angle_bandwidth = 0.996 * (1/math.sqrt(2)) * (np.pi/number_orientations)
# 2d array that will hold the filter
kernel = np.zeros((N, N))
# get the center of the 2d array so we can shift origin
middle = math.ceil((N/2)+0.1)-1
# calculate the filter
for x in range(0,constantDim):
for y in range(0,constantDim):
# get the transformed x and y where origin is at center
# and positive x-axis goes right while positive y-axis goes up
x_t, y_t = (x-middle),-(y-middle)
# calculate the filter value at given index
kernel[y,x] = logGaborValue(x_t,y_t,center_scale,center_angle,
scale_bandwidth, angle_bandwidth,logfun)
# normalize the filter energy
if norm:
Kernel = kernel / np.sum(kernel**2)
return kernel
To calculate the filter value at each index another transform is made where we go to the log-polar space
def logGaborValue(x,y,center_scale,center_angle,scale_bandwidth,
angle_bandwidth, logfun):
# transform to polar coordinates
raw, theta = getPolar(x,y)
# if we are at the center, return 0 as in the log space
# zero is not defined
if raw == 0:
return 0
# go to log polar coordinates
raw = logfun(raw)
# calculate (theta-center_theta), we calculate cos(theta-center_theta)
# and sin(theta-center_theta) then use atan to get the required value,
# this way we can eliminate the angular distance wrap around problem
costheta, sintheta = math.cos(theta), math.sin(theta)
ds = sintheta * math.cos(center_angle) - costheta * math.sin(center_angle)
dc = costheta * math.cos(center_angle) + sintheta * math.sin(center_angle)
dtheta = math.atan2(ds,dc)
# final value, multiply the radial component by the angular one
return math.exp(-0.5 * ((raw-center_scale) / scale_bandwidth)**2) * \
math.exp(-0.5 * (dtheta/angle_bandwidth)**2)
Problems:
The angle: the paper stated that indexing the angles from 1->8 would produce good coverage of the orientation, but in my implementation angles from 1->n don't cover except for half orientations. Even the vertical orientation is not correctly covered. This can be shown in this figure which contains sets of filters of scale 3 and orientations ranging from 1->8:
The coverage: from filters above it is clear the filter covers both sides of the space which is not what the paper says. This can be made more explicit by using 9 orientations ranging from -4 -> 4. The following image contains all the filters in one image to show how it covers both sides of the spectrum (this image is created by taking the maximum at each location from all filters):
Middle Column (orientation $\pi / 2$): in the first figure in orientation from 3 -> 8 it can be seen that the filter vanishes at orientation $ \pi / 2$. Is this normal? This can be seen too when I combine all the filters(of all 5 scales and 9 orientations) in one image:
Update:
Adding the impulse response of the filter in spatial domain, as you can see there is an obvious distortion in -4 & 4 orientations:
After a lot of code analysis, I found that my implementation was correct but the getPolar function was messed up, so the code above should work just fine. This is the a new code without the getPolar function if any one was looking for it:
number_scales = 5 # scale resolution
number_orientations = 8 # orientation resolution
N = 128 # image dimensions
def getFilter(f_0, theta_0):
# filter configuration
scale_bandwidth = 0.996 * math.sqrt(2/3)
angle_bandwidth = 0.996 * (1/math.sqrt(2)) * (np.pi/number_orientations)
# x,y grid
extent = np.arange(-N/2, N/2 + N%2)
x, y = np.meshgrid(extent,extent)
mid = int(N/2)
## orientation component ##
theta = np.arctan2(y,x)
center_angle = ((np.pi/number_orientations) * theta_0) if (f_0 % 2) \
else ((np.pi/number_orientations) * (theta_0+0.5))
# calculate (theta-center_theta), we calculate cos(theta-center_theta)
# and sin(theta-center_theta) then use atan to get the required value,
# this way we can eliminate the angular distance wrap around problem
costheta = np.cos(theta)
sintheta = np.sin(theta)
ds = sintheta * math.cos(center_angle) - costheta * math.sin(center_angle)
dc = costheta * math.cos(center_angle) + sintheta * math.sin(center_angle)
dtheta = np.arctan2(ds,dc)
orientation_component = np.exp(-0.5 * (dtheta/angle_bandwidth)**2)
## frequency componenet ##
# go to polar space
raw = np.sqrt(x**2+y**2)
# set origin to 1 as in the log space zero is not defined
raw[mid,mid] = 1
# go to log space
raw = np.log2(raw)
center_scale = math.log2(N) - f_0
draw = raw-center_scale
frequency_component = np.exp(-0.5 * (draw/ scale_bandwidth)**2)
# reset origin to zero (not needed as it is already 0?)
frequency_component[mid,mid] = 0
return frequency_component * orientation_component

Many particles in box - physics simulation

I'm currently trying to simulate many particles in a box bouncing around.
I've taken into account #kalhartt's suggestions and this is the improved code to initialize the particles inside the box:
import numpy as np
import scipy.spatial.distance as d
import matplotlib.pyplot as plt
# 2D container parameters
# Actual container is 50x50 but chose 49x49 to account for particle radius.
limit_x = 20
limit_y = 20
#Number and radius of particles
number_of_particles = 350
radius = 1
def force_init(n):
# equivalent to np.array(list(range(number_of_particles)))
count = np.linspace(0, number_of_particles-1, number_of_particles)
x = (count + 2) % (limit_x-1) + radius
y = (count + 2) / (limit_x-1) + radius
return np.column_stack((x, y))
position = force_init(number_of_particles)
velocity = np.random.randn(number_of_particles, 2)
The initialized positions look like this:
Once I have the particles initialized I'd like to update them at each time-step. The code for updating follows the previous code immediately and is as follows:
# Updating
while np.amax(abs(velocity)) > 0.01:
# Assume that velocity slowly dying out
position += velocity
velocity *= 0.995
#Get pair-wise distance matrix
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
#If pdist [i,j] is <=4 then the particles are too close and so treat as collision
for i in range(len(pair_d)):
for j in range(i):
# Only looking at upper triangular matrix (not inc. diagonal)
if pair_d[i,j] ==True:
# If two particles are too close then swap velocities
# It's a bad hack but it'll work for now.
vel_1 = velocity[j][:]
velocity[j] = velocity[i][:]*0.9
velocity[i] = vel_1*0.9
# Masks for particles beyond the boundary
xmax = position[:, 0] > limit_x
xmin = position[:, 0] < 0
ymax = position[:, 1] > limit_y
ymin = position[:, 1] < 0
# flip velocity and assume that it looses 10% of energy
velocity[xmax | xmin, 0] *= -0.9
velocity[ymax | ymin, 1] *= -0.9
# Force maximum positions of being +/- 2*radius from edge
position[xmax, 0] = limit_x-2*radius
position[xmin, 0] = 2*radius
position[ymax, 0] = limit_y-2*radius
position[ymin, 0] = 2*radius
After updating it and letting it run to completion I get this result:
This is infinitely better than before but there are still patches that are too close together - such as:
Too close together. I think the updating works... and thanks to #kalhartt my code is wayyyy better and faster (and I learnt some things about numpy... props #kalhartt) but I still don't know where it's screwing up. I've tried changing the order of the actual updates with the pair-wise distance going last or the position +=velocity going last but to no avail. I added the *0.9 to make the entire thing die down faster and I tried it with 4 to make sure that 2*radius (=2) wasn't too tight a criteria... but nothing seems to work.
Any and all help would be appreciated.
There are just two typos standing in your way. First for i in range(len(positions)/2): only iterates over half of your particles. This is why half the particles stay in the x bounds (if you watch for large iterations its more clear). Second, the second y condition should be a minimum (I assume) position[i][1] < 0. The following block works to bound the particles for me (I didn't test with the collision code so there could be problems there).
for i in range(len(position)):
if position[i][0] > limit_x or position[i][0] < 0:
velocity[i][0] = -velocity[i][0]
if position[i][1] > limit_y or position[i][1] < 0:
velocity[i][1] = -velocity[i][1]
As an aside, try to leverage numpy to eliminate loops when possible. It is faster, more efficient, and in my opinion more readable. For example force_init would look like this:
def force_init(n):
# equivalent to np.array(list(range(number_of_particles)))
count = np.linspace(0, number_of_particles-1, number_of_particles)
x = (count * 2) % limit_x + radius
y = (count * 2) / limit_x + radius
return np.column_stack((x, y))
And your boundary conditions would look like this:
while np.amax(abs(velocity)) > 0.01:
position += velocity
velocity *= 0.995
# Masks for particles beyond the boundary
xmax = position[:, 0] > limit_x
xmin = position[:, 0] < 0
ymax = position[:, 1] > limit_y
ymin = position[:, 1] < 0
# flip velocity
velocity[xmax | xmin, 0] *= -1
velocity[ymax | ymin, 1] *= -1
Final note, it is probably a good idea to hard clip position to the bounding box with something like position[xmax, 0] = limit_x; position[xmin, 0] = 0. There may be cases where velocity is small and a particle outside the box will be reflected but not make it inside in the next iteration. So it will just sit outside the box being reflected forever.
EDIT: Collision
The collision detection is a much harder problem, but lets see what we can do. Lets take a look at your current implementation.
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
for i in range(len(pair_d)):
for j in range(i):
# Only looking at upper triangular matrix (not inc. diagonal)
if pair_d[i,j] ==True:
# If two particles are too close then swap velocities
# It's a bad hack but it'll work for now.
vel_1 = velocity[j][:]
velocity[j] = velocity[i][:]*0.9
velocity[i] = vel_1*0.9
Overall a very good approach, cdist will efficiently calculate the distance
between sets of points and you find which points collide with pair_d = pair_dist<=4.
The nested for loops are the first problem. We need to iterate over True values of pair_d where j > i. First your code actually iterate over the lower triangular region by using for j in range(i) so that j < i, not particularly important in this instance as long since i,j pairs are not repeated. However Numpy has two builtins we can use instead, np.triu lets us set all values below a diagonal to 0 and np.nonzero will give us the indices of non-zero elements in a matrix. So this:
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
for i in range(len(pair_d)):
for j in range(i+1, len(pair_d)):
if pair_d[i, j]:
...
is equivalent to
pair_dist = d.cdist(position, position)
pair_d = np.triu(pair_dist<=4, k=1) # k=1 to exclude the diagonal
for i, j in zip(*np.nonzero(pair_d)):
...
The second problem (as you noted) is that the velocities are just switched and scaled instead of reflected. What we really want to do is negate and scale the component of each particles velocity along the axis that connects them. Note that to do this we will need the vector connecting them position[j] - position[i] and the length of the vector connecting them (which we already calculated). So unfortunately part of the cdist calculation gets repeated. Lets quit using cdist and do it ourselves instead. The goal here is to make two arrays diff and norm where diff[i][j] is a vector pointing from particle i to j (so diff is a 3D array) and norm[i][j] is the distance between particles i and j. We can do this with numpy like so:
nop = number_of_particles
# Give pos a 3rd index so we can use np.repeat below
# equivalent to `pos3d = np.array([ position ])
pos3d = position.reshape(1, nop, 2)
# 3D arras with a repeated index so we can form combinations
# diff_i[i][j] = position[i] (for all j)
# diff_j[i][j] = position[j] (for all i)
diff_i = np.repeat(pos3d, nop, axis=1).reshape(nop, nop, 2)
diff_j = np.repeat(pos3d, nop, axis=0)
# diff[i][j] = vector pointing from position[i] to position[j]
diff = diff_j - diff_i
# norm[i][j] = sqrt( diff[i][j]**2 )
norm = np.linalg.norm(diff, axis=2)
# check for collisions and take the region above the diagonal
collided = np.triu(norm < radius, k=1)
for i, j in zip(*np.nonzero(collided)):
# unit vector from i to j
unit = diff[i][j] / norm[i][j]
# flip velocity
velocity[i] -= 1.9 * np.dot(unit, velocity[i]) * unit
velocity[j] -= 1.9 * np.dot(unit, velocity[j]) * unit
# push particle j to be radius units from i
# This isn't particularly effective when 3+ points are close together
position[j] += (radius - norm[i][j]) * unit
...
Since this post is long enough already, here is a gist of the code with my modifications.

Calculating percentage of Bounding box overlap, for image detector evaluation

In testing an object detection algorithm in large images, we check our detected bounding boxes against the coordinates given for the ground truth rectangles.
According to the Pascal VOC challenges, there's this:
A predicted bounding box is considered correct if it overlaps more
than 50% with a ground-truth bounding box, otherwise the bounding box
is considered a false positive detection. Multiple detections are
penalized. If a system predicts several bounding boxes that overlap
with a single ground-truth bounding box, only one prediction is
considered correct, the others are considered false positives.
This means that we need to calculate the percentage of overlap. Does this mean that the ground truth box is 50% covered by the detected boundary box? Or that 50% of the bounding box is absorbed by the ground truth box?
I've searched but I haven't found a standard algorithm for this - which is surprising because I would have thought that this is something pretty common in computer vision. (I'm new to it). Have I missed it? Does anyone know what the standard algorithm is for this type of problem?
For axis-aligned bounding boxes it is relatively simple. "Axis-aligned" means that the bounding box isn't rotated; or in other words that the boxes lines are parallel to the axes. Here's how to calculate the IoU of two axis-aligned bounding boxes.
def get_iou(bb1, bb2):
"""
Calculate the Intersection over Union (IoU) of two bounding boxes.
Parameters
----------
bb1 : dict
Keys: {'x1', 'x2', 'y1', 'y2'}
The (x1, y1) position is at the top left corner,
the (x2, y2) position is at the bottom right corner
bb2 : dict
Keys: {'x1', 'x2', 'y1', 'y2'}
The (x, y) position is at the top left corner,
the (x2, y2) position is at the bottom right corner
Returns
-------
float
in [0, 1]
"""
assert bb1['x1'] < bb1['x2']
assert bb1['y1'] < bb1['y2']
assert bb2['x1'] < bb2['x2']
assert bb2['y1'] < bb2['y2']
# determine the coordinates of the intersection rectangle
x_left = max(bb1['x1'], bb2['x1'])
y_top = max(bb1['y1'], bb2['y1'])
x_right = min(bb1['x2'], bb2['x2'])
y_bottom = min(bb1['y2'], bb2['y2'])
if x_right < x_left or y_bottom < y_top:
return 0.0
# The intersection of two axis-aligned bounding boxes is always an
# axis-aligned bounding box
intersection_area = (x_right - x_left) * (y_bottom - y_top)
# compute the area of both AABBs
bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
assert iou >= 0.0
assert iou <= 1.0
return iou
Explanation
Images are from this answer
The top-voted answer has a mathematical error if you are working with screen (pixel) coordinates! I submitted an edit a few weeks ago with a long explanation for all readers so that they would understand the math. But that edit wasn't understood by the reviewers and was removed, so I've submitted the same edit again, but more briefly summarized this time. (Update: Rejected 2vs1 because it was deemed a "substantial change", heh).
So I will completely explain the BIG problem with its math here in this separate answer.
So, yes, in general, the top-voted answer is correct and is a good way to calculate the IoU. But (as other people have pointed out too) its math is completely incorrect for computer screens. You cannot just do (x2 - x1) * (y2 - y1), since that will not produce the correct area calculations whatsoever. Screen indexing starts at pixel 0,0 and ends at width-1,height-1. The range of screen coordinates is inclusive:inclusive (inclusive on both ends), so a range from 0 to 10 in pixel coordinates is actually 11 pixels wide, because it includes 0 1 2 3 4 5 6 7 8 9 10 (11 items). So, to calculate the area of screen coordinates, you MUST therefore add +1 to each dimension, as follows: (x2 - x1 + 1) * (y2 - y1 + 1).
If you're working in some other coordinate system where the range is not inclusive (such as an inclusive:exclusive system where 0 to 10 means "elements 0-9 but not 10"), then this extra math would NOT be necessary. But most likely, you are processing pixel-based bounding boxes. Well, screen coordinates start at 0,0 and go up from there.
A 1920x1080 screen is indexed from 0 (first pixel) to 1919 (last pixel horizontally) and from 0 (first pixel) to 1079 (last pixel vertically).
So if we have a rectangle in "pixel coordinate space", to calculate its area we must add 1 in each direction. Otherwise, we get the wrong answer for the area calculation.
Imagine that our 1920x1080 screen has a pixel-coordinate based rectangle with left=0,top=0,right=1919,bottom=1079 (covering all pixels on the whole screen).
Well, we know that 1920x1080 pixels is 2073600 pixels, which is the correct area of a 1080p screen.
But with the wrong math area = (x_right - x_left) * (y_bottom - y_top), we would get: (1919 - 0) * (1079 - 0) = 1919 * 1079 = 2070601 pixels! That's wrong!
That is why we must add +1 to each calculation, which gives us the following corrected math: area = (x_right - x_left + 1) * (y_bottom - y_top + 1), giving us: (1919 - 0 + 1) * (1079 - 0 + 1) = 1920 * 1080 = 2073600 pixels! And that's indeed the correct answer!
The shortest possible summary is: Pixel coordinate ranges are inclusive:inclusive, so we must add + 1 to each axis if we want the true area of a pixel coordinate range.
For a few more details about why +1 is needed, see Jindil's answer: https://stackoverflow.com/a/51730512/8874388
As well as this pyimagesearch article:
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
And this GitHub comment:
https://github.com/AlexeyAB/darknet/issues/3995#issuecomment-535697357
Since the fixed math wasn't approved, anyone who copies the code from the top-voted answer hopefully sees this answer, and will be able to bugfix it themselves, by simply copying the bugfixed assertions and area-calculation lines below, which have been fixed for inclusive:inclusive (pixel) coordinate ranges:
assert bb1['x1'] <= bb1['x2']
assert bb1['y1'] <= bb1['y2']
assert bb2['x1'] <= bb2['x2']
assert bb2['y1'] <= bb2['y2']
................................................
# The intersection of two axis-aligned bounding boxes is always an
# axis-aligned bounding box.
# NOTE: We MUST ALWAYS add +1 to calculate area when working in
# screen coordinates, since 0,0 is the top left pixel, and w-1,h-1
# is the bottom right pixel. If we DON'T add +1, the result is wrong.
intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)
# compute the area of both AABBs
bb1_area = (bb1['x2'] - bb1['x1'] + 1) * (bb1['y2'] - bb1['y1'] + 1)
bb2_area = (bb2['x2'] - bb2['x1'] + 1) * (bb2['y2'] - bb2['y1'] + 1)
A Simple way for any kind of polygon.
(Image is not drawn to scale)
from shapely.geometry import Polygon
def calculate_iou(box_1, box_2):
poly_1 = Polygon(box_1)
poly_2 = Polygon(box_2)
iou = poly_1.intersection(poly_2).area / poly_1.union(poly_2).area
return iou
box_1 = [[511, 41], [577, 41], [577, 76], [511, 76]]
box_2 = [[544, 59], [610, 59], [610, 94], [544, 94]]
print(calculate_iou(box_1, box_2))
The result will be 0.138211... which means 13.82%.
You can also use shapely.geometry.box if your box is rectangular [minx, miny, maxx, maxy] shape.
Note: The origin of Coordinate Systems in shapely library is left-bottom where origin in computer graphics is left-top. This difference does not affect the IoU calculation, but if you do other types of calculation, this information might be helpful.
You can calculate with torchvision as follows. The bbox is prepared in the format of [x1, y1, x2, y2].
import torch
import torchvision.ops.boxes as bops
box1 = torch.tensor([[511, 41, 577, 76]], dtype=torch.float)
box2 = torch.tensor([[544, 59, 610, 94]], dtype=torch.float)
iou = bops.box_iou(box1, box2)
# tensor([[0.1382]])
For the intersection distance, shouldn't we add a +1 so as to have
intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)
(same for the AABB)
Like on this pyimage search post
I agree (x_right - x_left) x (y_bottom - y_top) works in mathematics with point coordinates but since we deal with pixels it is I think different.
Consider a 1D example :
2 points : x1 = 1 and x2 = 3, the distance is indeed x2-x1 = 2
2 pixels of index : i1 = 1 and i2 = 3, the segment from pixel i1 to i2 contains 3 pixels ie l = i2 - i1 + 1
EDIT: I recently got to know that this is a "little-square" approach.
If however you consider pixels as point-samples (ie the bounding box corner would be at the centre of the pixel as apparently in matplotlib) then you don't need the +1.
See this comment and this illustration
import numpy as np
def box_area(arr):
# arr: np.array([[x1, y1, x2, y2]])
width = arr[:, 2] - arr[:, 0]
height = arr[:, 3] - arr[:, 1]
return width * height
def _box_inter_union(arr1, arr2):
# arr1 of [N, 4]
# arr2 of [N, 4]
area1 = box_area(arr1)
area2 = box_area(arr2)
# Intersection
top_left = np.maximum(arr1[:, :2], arr2[:, :2]) # [[x, y]]
bottom_right = np.minimum(arr1[:, 2:], arr2[:, 2:]) # [[x, y]]
wh = bottom_right - top_left
# clip: if boxes not overlap then make it zero
intersection = wh[:, 0].clip(0) * wh[:, 1].clip(0)
#union
union = area1 + area2 - intersection
return intersection, union
def box_iou(arr1, arr2):
# arr1[N, 4]
# arr2[N, 4]
# N = number of bounding boxes
assert(arr1[:, 2:] > arr[:, :2]).all()
assert(arr2[:, 2:] > arr[:, :2]).all()
inter, union = _box_inter_union(arr1, arr2)
iou = inter / union
print(iou)
box1 = np.array([[10, 10, 80, 80]])
box2 = np.array([[20, 20, 100, 100]])
box_iou(box1, box2)
reference: https://pytorch.org/vision/stable/_modules/torchvision/ops/boxes.html#nms
In the snippet below, I construct a polygon along the edges of the first box. I then use Matplotlib to clip the polygon to the second box. The resulting polygon contains four vertices, but we are only interested in the top left and bottom right corners, so I take the max and the min of the coordinates to get a bounding box, which is returned to the user.
import numpy as np
from matplotlib import path, transforms
def clip_boxes(box0, box1):
path_coords = np.array([[box0[0, 0], box0[0, 1]],
[box0[1, 0], box0[0, 1]],
[box0[1, 0], box0[1, 1]],
[box0[0, 0], box0[1, 1]]])
poly = path.Path(np.vstack((path_coords[:, 0],
path_coords[:, 1])).T, closed=True)
clip_rect = transforms.Bbox(box1)
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
return np.array([np.min(poly_clipped, axis=0),
np.max(poly_clipped, axis=0)])
box0 = np.array([[0, 0], [1, 1]])
box1 = np.array([[0, 0], [0.5, 0.5]])
print clip_boxes(box0, box1)
Maybe one for the more visually inclined, like me . . .
Say your ROIs are atop an HD Rez surface. You can make a matrix for each in numpy like . .
roi1 = np.zeros((1080, 1920))
Then "fill in" ROI area like . . .
roi1[y1:y2, x1:x2] = 1 # y1,x1 & y2,x2 are the ROI corners
Repeat for roi2. Then calculate IoU with a this function . . .
def calc_iou(roi1, roi2):
# Sum all "white" pixels clipped to 1
U = np.sum(np.clip(roi1 + roi2, 0 , 1))
# +1 for each overlapping white pixel (these will = 2)
I = len(np.where(roi1 + roi2 == 2)[0])
return(I/U)
how about this approach? Could be extended to any number of unioned shapes
surface = np.zeros([1024,1024])
surface[1:1+10, 1:1+10] += 1
surface[100:100+500, 100:100+100] += 1
unionArea = (surface==2).sum()
print(unionArea)

Categories

Resources