Implementing log Gabor filter bank - python

I was reading this paper "Self-Invertible 2D Log-Gabor Wavelets" it defines 2D log gabor filter as such:
The paper also states that the filter only covers one side of the frequency space and shows that in this image
On my attempt to implement the filter I get results that do not match with what is said in the paper. Let me start with my implementation then I will state the problems.
Implementation:
I created a 2d array that contains the filter and transformed each index so that the origin of the frequency domain is at the center of the array with positive x-axis going right and positive y-axis going up.
number_scales = 5 # scale resolution
number_orientations = 9 # orientation resolution
N = constantDim # image dimensions
def getLogGaborKernal(scale, angle, logfun=math.log2, norm = True):
# setup up filter configuration
center_scale = logfun(N) - scale
center_angle = ((np.pi/number_orientations) * angle) if (scale % 2) \
else ((np.pi/number_orientations) * (angle+0.5))
scale_bandwidth = 0.996 * math.sqrt(2/3)
angle_bandwidth = 0.996 * (1/math.sqrt(2)) * (np.pi/number_orientations)
# 2d array that will hold the filter
kernel = np.zeros((N, N))
# get the center of the 2d array so we can shift origin
middle = math.ceil((N/2)+0.1)-1
# calculate the filter
for x in range(0,constantDim):
for y in range(0,constantDim):
# get the transformed x and y where origin is at center
# and positive x-axis goes right while positive y-axis goes up
x_t, y_t = (x-middle),-(y-middle)
# calculate the filter value at given index
kernel[y,x] = logGaborValue(x_t,y_t,center_scale,center_angle,
scale_bandwidth, angle_bandwidth,logfun)
# normalize the filter energy
if norm:
Kernel = kernel / np.sum(kernel**2)
return kernel
To calculate the filter value at each index another transform is made where we go to the log-polar space
def logGaborValue(x,y,center_scale,center_angle,scale_bandwidth,
angle_bandwidth, logfun):
# transform to polar coordinates
raw, theta = getPolar(x,y)
# if we are at the center, return 0 as in the log space
# zero is not defined
if raw == 0:
return 0
# go to log polar coordinates
raw = logfun(raw)
# calculate (theta-center_theta), we calculate cos(theta-center_theta)
# and sin(theta-center_theta) then use atan to get the required value,
# this way we can eliminate the angular distance wrap around problem
costheta, sintheta = math.cos(theta), math.sin(theta)
ds = sintheta * math.cos(center_angle) - costheta * math.sin(center_angle)
dc = costheta * math.cos(center_angle) + sintheta * math.sin(center_angle)
dtheta = math.atan2(ds,dc)
# final value, multiply the radial component by the angular one
return math.exp(-0.5 * ((raw-center_scale) / scale_bandwidth)**2) * \
math.exp(-0.5 * (dtheta/angle_bandwidth)**2)
Problems:
The angle: the paper stated that indexing the angles from 1->8 would produce good coverage of the orientation, but in my implementation angles from 1->n don't cover except for half orientations. Even the vertical orientation is not correctly covered. This can be shown in this figure which contains sets of filters of scale 3 and orientations ranging from 1->8:
The coverage: from filters above it is clear the filter covers both sides of the space which is not what the paper says. This can be made more explicit by using 9 orientations ranging from -4 -> 4. The following image contains all the filters in one image to show how it covers both sides of the spectrum (this image is created by taking the maximum at each location from all filters):
Middle Column (orientation $\pi / 2$): in the first figure in orientation from 3 -> 8 it can be seen that the filter vanishes at orientation $ \pi / 2$. Is this normal? This can be seen too when I combine all the filters(of all 5 scales and 9 orientations) in one image:
Update:
Adding the impulse response of the filter in spatial domain, as you can see there is an obvious distortion in -4 & 4 orientations:

After a lot of code analysis, I found that my implementation was correct but the getPolar function was messed up, so the code above should work just fine. This is the a new code without the getPolar function if any one was looking for it:
number_scales = 5 # scale resolution
number_orientations = 8 # orientation resolution
N = 128 # image dimensions
def getFilter(f_0, theta_0):
# filter configuration
scale_bandwidth = 0.996 * math.sqrt(2/3)
angle_bandwidth = 0.996 * (1/math.sqrt(2)) * (np.pi/number_orientations)
# x,y grid
extent = np.arange(-N/2, N/2 + N%2)
x, y = np.meshgrid(extent,extent)
mid = int(N/2)
## orientation component ##
theta = np.arctan2(y,x)
center_angle = ((np.pi/number_orientations) * theta_0) if (f_0 % 2) \
else ((np.pi/number_orientations) * (theta_0+0.5))
# calculate (theta-center_theta), we calculate cos(theta-center_theta)
# and sin(theta-center_theta) then use atan to get the required value,
# this way we can eliminate the angular distance wrap around problem
costheta = np.cos(theta)
sintheta = np.sin(theta)
ds = sintheta * math.cos(center_angle) - costheta * math.sin(center_angle)
dc = costheta * math.cos(center_angle) + sintheta * math.sin(center_angle)
dtheta = np.arctan2(ds,dc)
orientation_component = np.exp(-0.5 * (dtheta/angle_bandwidth)**2)
## frequency componenet ##
# go to polar space
raw = np.sqrt(x**2+y**2)
# set origin to 1 as in the log space zero is not defined
raw[mid,mid] = 1
# go to log space
raw = np.log2(raw)
center_scale = math.log2(N) - f_0
draw = raw-center_scale
frequency_component = np.exp(-0.5 * (draw/ scale_bandwidth)**2)
# reset origin to zero (not needed as it is already 0?)
frequency_component[mid,mid] = 0
return frequency_component * orientation_component

Related

How to draw repeated slanted lines

I need to draw slanted lines like this programmatically using opencv-python, and it has to be similar in terms of the slant angle and the distance between the lines:
If using OpenCV cv.line() i need to supply the function with the line's start and endpoint.
Following this StackOverflow accepted answer, I think I will be able to know those two points, but first I need to calculate the line equation itself.
So what I have done is first I calculate the slant angle of the line using the measure tool in ai (The actual image was given by the graphic designer as ai (adobe illustrator) file), and I got 67deg and I solve the gradient of the line. But the problem is I don't know how to get the horizontal spacing/distance between the lines. I needed that so i can supply the start.X. I used the illustrator, and try to measure the distance between the lines but how to map it to opencv coordinate?
Overall is my idea feasible? Or is there a better way to achieve this?
Update 1:
I managed to draw this experimental image:
And this is code:
def show_image_scaled(window_name,image,height,width):
cv2.namedWindow(window_name,cv2.WINDOW_NORMAL)
cv2.resizeWindow(window_name,width,height)
cv2.imshow(window_name,image)
cv2.waitKey(0)
cv2.destroyAllWindows()
def slanted_lines_background():
canvas = np.ones((200,300)) * 255
end_x = 0
start_y = 0
m = 2.35
end_x = 0
for x in range(0,canvas.shape[1],10):
start_x = x
end_y = start_y + compute_length(m,start_x,start_y,end_x)
cv2.line(canvas,(start_x,start_y),(end_x,end_y),(0,0,0),2)
show_image_scaled("Slant",canvas,200,300)
def compute_length(m,start_x,start_y,end_x=0):
c = start_y - (m * start_x)
length_square = (end_x - start_x)**2 + ((m *end_x) + c - start_y) ** 2
length = math.sqrt(length_square)
return int(length)
Still working on to fill the left part of the rectangle
This code "shades" every pixel in a given image to produce your hatched pattern. Don't worry about the math. It's mostly correct. I've checked the edge cases for small and wide lines. The sampling isn't exactly correct but nobody's gonna notice anyway because the imperfection amounts to small fractions of a pixel. And I've used numba to make it fast.
import numpy as np
from numba import njit, prange
#njit(parallel=True)
def hatch(im, angle=45, stride=10, dc=None):
stride = float(stride)
if dc is None:
dc = stride * 0.5
assert 0 <= dc <= stride
stride2 = stride / 2
dc2 = dc / 2
angle = angle / 180 * np.pi
c = np.cos(angle)
s = np.sin(angle)
(height, width) = im.shape[:2]
for y in prange(height):
for x in range(width):
# distance to origin along normal
dist_origin = c*x - s*y
# distance to center of nearest line
dist_center = stride2 - abs((dist_origin % stride) - stride2)
# distance to edge of nearest line
dist_edge = dist_center - dc2
# shade pixel, with antialiasing
# use edge-0.5 to edge+0.5 as "gradient" <=> 1-sized pixel straddles edge
# for thick/thin lines, needs hairline handling
# thin line -> gradient hits far edge of line / pixel may span both edges of line
# thick line -> gradient hits edge of adjacent line / pixel may span adjacent line
if dist_edge > 0.5: # background
val = 0
else: # pixel starts covering line
val = 0.5 - dist_edge
if dc < 1: # thin line, clipped to line width
val = min(val, dc)
elif stride - dc < 1: # thick line, little background
val = max(val, 1 - (stride - dc))
im[y,x] = val
canvas = np.zeros((128, 512), 'f4')
hatch(canvas, angle=-23, stride=5, dc=2.5)
# mind the gamma mapping before imshow

Does numpy.fft2 results produce results that follow standard ordering as described in numpy.fft documentation?

I've read the documentation and searched Stack Overflow for the answer to this question, but can't find it. Sorry if it has already been answered.
I'm working with the results of an np.fft.fft2(Z) where Z is some 2d NumPy array. I would expect positive frequencies to be stored in values less than the Nyquist wavenumber in both x and y directions. From my tests, it seems this is the approach Matlab takes. In NumPy documentation they write positive frequencies are stored below the Nyquist number and negative frequencies above; this does not seem to be the case for fft2.
Some positive frequencies terms are stored at locations greater than the Nyquist wavenumber. For example, a mode at location (127,1) with associated amplitude stored at (1,127), will produce a 2D sinusoid with 4 peaks indicating that the wavenumber should be around 4, not 127.
I can't tell which is the positive and negative frequency in my example above because they are not following standard ordering.
So the main question I have is what kind of order does the fft2 follow for storing positive and negative frequencies?
I didn't post any examples because my question is a universal one and shouldn't be problem specific.
import numpy as np
from heapq import nlargest
## Setting up a simple example
lx = 4.0
ly = 4.0;
lz = 1.5;
nx = 128;
ny = 128;
L = 1.0
H = .4
x = np.linspace(0, lx, nx);
y = np.linspace(0, ly, ny);
x0 = 2.0;
y0 = 2.0;
z1 = np.zeros([ny,nx])
zm= np.zeros([ny,nx])
for j in range(1,ny):
for i in range(1,nx):
if np.sqrt(abs(x[i] - x0)** 2 + abs(y[j] - y0) ** 2) < L:
if np.abs(x[i] - x0) < L:
z1[j, i] = H * np.cos(np.pi * abs(x[i] - x0) / (2 *L))**2;
z1 = z1+np.transpose(z1)/2.0
## Here I take the fft
nf = np.shape(z1)[0]/2
fz1 = np.fft.fft2(z1)
spec_fz1 = np.abs(fz1)**2
valmax = nlargest(1000, spec_fz1.flatten())
## Here I search for amplitude pairs above nyquist number
for i in range(1,len(valmax),2):
xy = return_xy(valmax[i], spec_fz1)
if len(xy) >2:
if ((xy[0] > nf or xy[1]> nf) and (xy[2] > nf or xy[3]> nf) ):
print('both index locations above nyquist frequency')
else:
xy2 = return_xy(valmax[i+1], spec_fz1)
if ((xy[0] > nf or xy[1]> nf) and (xy2[0] > nf or xy2[1]> nf) ):
print('both index locations above nyquist frequency')
def return_xy(mode,spec_topo):
kxky = np.array([])
for i in range(np.shape(spec_topo)[0]):
for j in range(np.shape(spec_topo)[1]):
if spec_topo[i,j] == mode:
kxky= np.append(kxky,[i,j])
if len(kxky)> 1:
return kxky
else:
return kxky[0]
After sorting by the largest amplitude at the 21st index two amplitude pairs are stored at (127,1) and (1,127) which is above the Nyquist number. How should I interpret this wavenumber? note return_xy does same thing as np.where
I think this bit of code demonstrates how the 2D DFT output of np.fft.fft2 is organized:
import numpy as np
import matplotlib.pyplot as plt
n = 16
x = np.arange(n) / n * 2 * np.pi
y = np.arange(n) / n * 2 * np.pi
for kx in range(4):
for ky in range(4):
f = np.cos(kx * x[None,:] + ky * y[:,None])
F = np.fft.fft2(f)
plt.subplot(4, 4, 1 + ky * 4 + kx)
plt.imshow(np.abs(F))
plt.axis('off')
plt.title(f'kx = {kx}, ky = {ky}', fontsize=10)
plt.tight_layout()
plt.show()
We can see that the origin, kx=0 and ky=0 is at the top-left of the array. For a horizontal wave with exactly one period in the input, we see we have a pair of peaks at kx=1 and kx=N-1 (which is equivalent to kx=-1). With two periods in the input, kx=2 and kx=-2, etc. Vertical waves produce the same result but along the vertical axis, and diagonal waves at 45 degrees have the peaks at 45 degrees.
This is the exact same ordering as the 1D DFT (np.fft.fft) produces. The 2D DFT is simply the 1D DFT applied along the columns, and then along the rows of the result (or the other way around, it doesn't matter).
As for the test shown in the question, it is the superposition of two sine waves (one horizontal and one vertical) multiplied by a round window (a "pillbox" function). In the Fourier domain (continuous world), this corresponds to four impulse functions (two along the horizontal axis for the one sine wave, two along the vertical axis for the other sine wave), convolved with the Bessel function of the first kind of order 1 (J1). Because the sine waves have a low frequency, the four impulse functions are close together, and after the convolution appear as a somewhat wider Bessel function, centered around the origin:
plt.imshow(np.log(np.abs(fz1) + 1e-6))
plt.show()
What we see is the peak centered on the origin (at the top-left corner), with things to the left of the origin wrapped around to the right edge, and things to the top of the origin wrapped around to the bottom edge. Applying np.fft.fftshift moves the origin to the middle of the array, yielding a more recognizable shape.

How can you find the relative spherical coordinates of an object after changing the origin?

I have an algorithm question. I am currently working on a script that generates images of an object from various angles inside of Unreal engine and pairs these images with the coordinates of the object. The way it works is that I have the object at the origin, and I generate random spherical coordinates to place my camera at. I then rotate my camera to face the object and do an extra rotation so that the object can lie anywhere in my camera's FOV. I now want to consider my camera as the origin and find the spherical coordinates of the object relative to the graph.
Currently, I am trying to derive the coordinates as in the code below. I start by noting that the radial distance between the object and the camera is the same regardless of which one is the origin. Then, I use the fact that the angles between my camera and my object are determined entirely by the extra rotation at the end of my camera placement. Finally, I try to find a rotation that will orient the object the same way as in the image based on the angular coordinates of the camera (This is done because I want to encode information about points on the object besides the center. For example, I am currently using a 1 meter cube as a placeholder object, and I want to keep track of the coordinates of the corners. I chose to use rotations because I can use them to make a rotation matrix and use it to convert my coordinates). Below is the code I use to do this (the AirSim library is used here, but all you need to know is airsim.Pose() takes in a Euclidean position coordinate and a Quaternion rotation as arguments to position my camera).
PRECISION_ANGLE = 4 # Fractions of a degree used in generating random pitch, roll, and yaw values
PRECISION_METER = 100 # Fractions of a meter used in generating random distance values
RADIUS_MAX = 20 # Maximum distance from the obstacle to be expected
#TODO: Replace minimum distace with a test for detecting if the camera is inside the obstacle
RADIUS_MIN = 3 # Minimum distance from the obstacle to be expected. Set this value large enough so that the camera will not spawn inside the object
# Camera details should match settings.json
IMAGE_HEIGHT = 144
IMAGE_WIDTH = 256
FOV = 90
# TODO: Vertical FOV rounds down for generating random integers. Some pictures will not be created
VERT_FOV = FOV * IMAGE_HEIGHT // IMAGE_WIDTH
def polarToCartesian(r, theta, phi):
return [
r * math.sin(theta) * math.cos(phi),
r * math.sin(theta) * math.sin(phi),
r * math.cos(theta)]
while 1:
# generate a random position for our camera
r = random.randint(RADIUS_MIN * PRECISION_METER, RADIUS_MAX * PRECISION_METER) / PRECISION_METER
phi = random.randint(0, 360 * PRECISION_ANGLE) / PRECISION_ANGLE
theta = random.randint(0, 180 * PRECISION_ANGLE) / PRECISION_ANGLE
# Convert polar coordinates to cartesian for AirSim
pos = polarToCartesian(r, math.radians(theta), math.radians(phi))
# Generate a random offset for the camera angle
pitch = random.randint(0, VERT_FOV * PRECISION_ANGLE) / PRECISION_ANGLE - VERT_FOV / 2
# TODO: Rotating the drone causes the obstacle to be removed from the image because the camera is not square
#roll = random.randint(0, 360 * PRECISION_ANGLE) / PRECISION_ANGLE
roll = 0
yaw = random.randint(0, FOV * PRECISION_ANGLE) / PRECISION_ANGLE - FOV/2
# Calculate coordinates of the center of the obstacle relative to the drone's new position and orientation
obs_r = r
obs_phi = yaw
obs_theta = 90 - pitch
# Convert polar coordinates to cartesian for AirSim
obs_pos = polarToCartesian(obs_r, math.radians(obs_theta), math.radians(obs_phi))
# Record rotational transformation on obstacle for calculating coordinates of key locations relative to the center
obs_phi_offset = -phi
obs_theta_offset = 270 - theta
# Move the camera to our calculated position
camera_pose = airsim.Pose(airsim.Vector3r(pos[0], pos[1], pos[2]), airsim.to_quaternion(math.radians(90 - theta + pitch), math.radians(roll), math.radians(phi + 180 + yaw))) #radians
Is this algorithm implemented correctly? What other ways could I find the coordinates of my object? Should I be doing something in Unreal Engine to get my coordinates instead of doing this algorithmically (though it needs to be fast)?
A translation of the origin by Vector3(i,j,k) is simply the translation of the original output.
camera_pose = airsim.Pose(airsim.Vector3r(pos[0] + i, pos[1] + j, pos[2] + k), airsim.to_quaternion(math.radians(90 - theta + pitch), math.radians(roll), math.radians(phi + 180 + yaw))) #radians

Integrating 2D data over an irregular grid in python

So I have 2D function which is sampled irregularly over a domain, and I want to calculate the volume underneath the surface. The data is organised in terms of [x,y,z], taking a simple example:
def f(x,y):
return np.cos(10*x*y) * np.exp(-x**2 - y**2)
datrange1 = np.linspace(-5,5,1000)
datrange2 = np.linspace(-0.5,0.5,1000)
ar = []
for x in datrange1:
for y in datrange2:
ar += [[x,y, f(x,y)]]
for x in xrange2:
for y in yrange2:
ar += [[x,y, f(x,y)]]
val_arr1 = np.array(ar)
data = np.unique(val_arr1)
xlist, ylist, zlist = data.T
where np.unique sorts the data in the first column then the second. The data is arranged in this way as I need to sample more heavily around the origin as there is a sharp feature that must be resolved.
Now I wondered about constructing a 2D interpolating function using scipy.interpolate.interp2d, then integrating over this using dblquad. As it turns out, this is not only inelegant and slow, but also kicks out the error:
RuntimeWarning: No more knots can be added because the number of B-spline
coefficients already exceeds the number of data points m.
Is there a better way to integrate data arranged in this fashion or overcoming this error?
If you can sample the data with high enough resolution around the feature of interest, then more sparsely everywhere else, the problem definition then becomes how to define the area under each sample. This is easy with regular rectangular samples, and could likely be done stepwise in increments of resolution around the origin. The approach I went after is to generate the 2D Voronoi cells for each sample in order to determine their area. I pulled most of the code from this answer, as it had almost all the components needed already.
import numpy as np
from scipy.spatial import Voronoi
#taken from: # https://stackoverflow.com/questions/28665491/getting-a-bounded-polygon-coordinates-from-voronoi-cells
#computes voronoi regions bounded by a bounding box
def square_voronoi(xy, bbox): #bbox: (min_x, max_x, min_y, max_y)
# Select points inside the bounding box
points_center = xy[np.where((bbox[0] <= xy[:,0]) * (xy[:,0] <= bbox[1]) * (bbox[2] <= xy[:,1]) * (bbox[2] <= bbox[3]))]
# Mirror points
points_left = np.copy(points_center)
points_left[:, 0] = bbox[0] - (points_left[:, 0] - bbox[0])
points_right = np.copy(points_center)
points_right[:, 0] = bbox[1] + (bbox[1] - points_right[:, 0])
points_down = np.copy(points_center)
points_down[:, 1] = bbox[2] - (points_down[:, 1] - bbox[2])
points_up = np.copy(points_center)
points_up[:, 1] = bbox[3] + (bbox[3] - points_up[:, 1])
points = np.concatenate((points_center, points_left, points_right, points_down, points_up,), axis=0)
# Compute Voronoi
vor = Voronoi(points)
# Filter regions (center points should* be guaranteed to have a valid region)
# center points should come first and not change in size
regions = [vor.regions[vor.point_region[i]] for i in range(len(points_center))]
vor.filtered_points = points_center
vor.filtered_regions = regions
return vor
#also stolen from: https://stackoverflow.com/questions/28665491/getting-a-bounded-polygon-coordinates-from-voronoi-cells
def area_region(vertices):
# Polygon's signed area
A = 0
for i in range(0, len(vertices) - 1):
s = (vertices[i, 0] * vertices[i + 1, 1] - vertices[i + 1, 0] * vertices[i, 1])
A = A + s
return np.abs(0.5 * A)
def f(x,y):
return np.cos(10*x*y) * np.exp(-x**2 - y**2)
#sampling could easily be shaped to sample origin more heavily
sample_x = np.random.rand(1000) * 10 - 5 #same range as example linspace
sample_y = np.random.rand(1000) - .5
sample_xy = np.array([sample_x, sample_y]).T
vor = square_voronoi(sample_xy, (-5,5,-.5,.5)) #using bbox from samples
points = vor.filtered_points
sample_areas = np.array([area_region(vor.vertices[verts+[verts[0]],:]) for verts in vor.filtered_regions])
sample_z = np.array([f(p[0], p[1]) for p in points])
volume = np.sum(sample_z * sample_areas)
I haven't exactly tested this, but the principle should work, and the math checks out.

Decoding geometry output of EAST text detection

I'm trying to use the EAST model in OpenCV to detect text in images. I'm successfuly getting the output after I run an image through a network but I'm having a hard time understanding how the decode function I use works. I know that I get 5 numbers as output from the model and I think it's the distances from a point to the top, bottom, left and right sides of the rectangle, respectively, and the angle of rotation at the end. I'm not sure what the decode function does to get the bounding box for the text region.
I know why the offset is multiplied by 4 (it's shrunk by 4 when run through the model). I know why h and w are what they are. I'm not sure about anything after that.
scores are the confidence scores for each region;
geometry are the geometry values for each region (the 5 numbers I mentioned)
scoreThresh is just a threshold for the non-maximum suppresion
def decode(scores, geometry, scoreThresh):
detections = []
confidences = []
############ CHECK DIMENSIONS AND SHAPES OF geometry AND scores ############
assert len(scores.shape) == 4, "Incorrect dimensions of scores"
assert len(geometry.shape) == 4, "Incorrect dimensions of geometry"
assert scores.shape[0] == 1, "Invalid dimensions of scores"
assert geometry.shape[0] == 1, "Invalid dimensions of geometry"
assert scores.shape[1] == 1, "Invalid dimensions of scores"
assert geometry.shape[1] == 5, "Invalid dimensions of geometry"
assert scores.shape[2] == geometry.shape[2], "Invalid dimensions of scores and geometry"
assert scores.shape[3] == geometry.shape[3], "Invalid dimensions of scores and geometry"
height = scores.shape[2]
width = scores.shape[3]
for y in range(0, height):
# Extract data from scores
scoresData = scores[0][0][y]
x0_data = geometry[0][0][y]
x1_data = geometry[0][1][y]
x2_data = geometry[0][2][y]
x3_data = geometry[0][3][y]
anglesData = geometry[0][4][y]
for x in range(0, width):
score = scoresData[x]
# If score is lower than threshold score, move to next x
if(score < scoreThresh):
continue
# Calculate offset
offsetX = x * 4.0
offsetY = y * 4.0
angle = anglesData[x]
# Calculate cos and sin of angle
cosA = math.cos(angle)
sinA = math.sin(angle)
h = x0_data[x] + x2_data[x]
w = x1_data[x] + x3_data[x]
# Calculate offset
offset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])
# Find points for rectangle
p1 = (-sinA * h + offset[0], -cosA * h + offset[1])
p3 = (-cosA * w + offset[0], sinA * w + offset[1])
center = (0.5*(p1[0]+p3[0]), 0.5*(p1[1]+p3[1]))
detections.append((center, (w,h), -1*angle * 180.0 / math.pi))
confidences.append(float(score))
# Return detections and confidences
return [detections, confidences]
The paper contains a diagram of the output format. Instead of specifying the box in a usual way, it is specified as a set of distances (up, right, down, and left) from an offset (x, y), in addition to an angle A, the amount box has rotated counterclockwise.
Note that the scores and geometry are indexed by y, x, opposite of any logic below offset calculation. Therefore, to get the geometry components of a highest scoring y, x:
high_scores_yx = np.where(scores[0][0] >= np.max(scores[0][0]))
y, x = high_scores_yx[0][0], high_scores_yx[1][0]
h_upper, w_right, h_lower, w_left, A = geometry[0,:,y,x]
The code uses offset to store the offset of the lower-right corner of the rectangle. Since it's the lower-right, it only needs w_right and h_lower, which in the code, are x1_data and x2_data, respectively.
The location of the lower-right corner, with respect to the original offset offsetX, offsetY, depends on the angle of rotation. Below, the dotted lines show the axes orientation. The components to get from the original to the lower-bottom offset are labelled in violet (horizontal) and purple (vertical). Note that the sin(A) * w_right component is subtracted because y gets bigger as you go lower, in this coordinate system.
So that explains
offset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])
Next: p1 and p3 are the lower-left and upper-right corners of the rectangle, respectively, with rotation taken into account. center is just the average of these two points.
Finally, -1*angle * 180.0 / math.pi converts the original counterclockwise, radians angle into a clockwise-based, degrees angle (so final output angle should be negative for objects rotated counterclockwise). This is for compatibility with the CV2 boxPoints method, used in:
https://github.com/opencv/opencv/blob/7fb70e170154d064ef12d8fec61c0ae70812ce3d/samples/dnn/text_detection.py

Categories

Resources