Context: I am performing Object Localisation and wanting to implement an Inhibition of Return mechanism (i.e. drawing a black cross on the image where the red bounding box is after a trigger action.)
Problem: I do not know how to accurately scale the bounding box (red) in relation to the original input (init_input). If this scaling is understood, then the black cross should be accurately placed in the middle of the red bounding box.
My current code for this function is as follows:
def IoR(b, init_input, prev_coord):
"""
Inhibition-of-Return mechanism.
Marks the region of the image covered by
the bounding box with a black cross.
:param b:
The current bounding box represented as [x1, y1, x2, y2].
:param init_input:
The initial input volume of the current episode.
:param prev_coord:
The previous state's bounding box coordinates (x1, y1, x2, y2)
"""
x1, y1, x2, y2 = prev_coord
width = 12
x_mid = (b[2] + b[0]) // 2
y_mid = (b[3] + b[1]) // 2
# Define vertical rectangle coordinates
ver_x1 = int(((x_mid) * IMG_SIZE / (x2 - x1)) - width)
ver_x2 = int(((x_mid) * IMG_SIZE / (x2 - x1)) + width)
ver_y1 = int((b[1]) * IMG_SIZE / (y2 - y1))
ver_y2 = int((b[3]) * IMG_SIZE / (y2 - y1))
# Define horizontal rectangle coordinates
hor_x1 = int((b[0]) * IMG_SIZE / (x2 - x1))
hor_x2 = int((b[2]) * IMG_SIZE / (x2 - x1))
hor_y1 = int(((y_mid) * IMG_SIZE / (y2 - y1)) - width)
hor_y2 = int(((y_mid) * IMG_SIZE / (y2 - y1)) + width)
# Draw vertical rectangle
cv2.rectangle(init_input, (ver_x1, ver_y1), (ver_x2, ver_y2), (0, 0, 0), -1)
# Draw horizontal rectangle
cv2.rectangle(init_input, (hor_x1, hor_y1), (hor_x2, hor_y2), (0, 0, 0), -1)
The desired effect can be seen below:
Note: I believe the complexity in this problem arises due to the image being resized (to 224, 224, 3) each time I take an action (and consequently move onto the next state). Therefore, the "anchor" to determine the scaling must be extracted from the previous states scaling, which is shown in the following code:
def next_state(init_input, b_prime, g):
"""
Returns the observable region of the next state.
Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.
IMG_SIZE = 224
:param init_input:
The initial input volume of the current episode.
:param b_prime:
The subsequent state's bounding box.
:param g: (init_g)
The initial ground truth box of the target object.
"""
# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)
# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224), interpolation=cv2.INTER_AREA)
# Resize ground truth box
g[0] = int((g[0] - x1) * IMG_SIZE / (x2 - x1)) # x1
g[1] = int((g[1] - y1) * IMG_SIZE / (y2 - y1)) # y1
g[2] = int((g[2] - x1) * IMG_SIZE / (x2 - x1)) # x2
g[3] = int((g[3] - y1) * IMG_SIZE / (y2 - y1)) # y2
return observable_region, g, (b_prime[0], b_prime[1], b_prime[2], b_prime[3])
Explanation:
There is a state t in which the agent is predicting the location of the target object. The target object has a ground truth box (yellow in image, dotted in sketch), and the agent's current "localising box" is the red bounding box. Say, at state t the agent decides it is best to move right. Consequently, the bounding box is moved to the right, and then the next state, t' is determined by adding an additional 16 pixels of context around the red bounding box, cropping the original image with respect to this boundary, and then upscaling the cropped image back to 224, 224 in dimensions.
Say the agent is now confident that its prediction is accurate, so it chooses the trigger action. This basically means, end the current target object's localisation episode and place a black cross on where the agent predicted the object was (i.e. in the middle of the red bounding box). Now, since the current state is zoomed in after being cropped following the previous action, the bounding box must be re-scaled with respect to the normal/original/initial image and then the black cross can be drawn accurately onto the image.
In the context of my problem, the first rescaling between states is working perfectly well (the second code in this post). However, scaling back to normal and drawing the black cross is what I cannot seem to get my head around.
Here is an image which hopefully helps the explanation:
Here is the output of my current solution (please click the image to zoom in):
I think it's better to save the coordinate globally instead of using a bunch of upscale/downscale. They give me headache and there might be loss of precision due to rounding.
That is, every time you detect something, you convert it to global (original image) coordinate first. I have written a small demo here, imitating your detection and trigger behavior.
Initial detection:
Zoomed in, another detection:
Zoomed in, another detection:
Zoomed in, another detection:
Zoomed back to original scale, with the detection box in the correct location
Code:
import cv2
import matplotlib.pyplot as plt
IMG_SIZE = 224
im = cv2.cvtColor(cv2.imread('lena.jpg'), cv2.COLOR_BGR2GRAY)
im = cv2.resize(im, (IMG_SIZE, IMG_SIZE))
# Your detector results
detected_region = [
[(10, 20) , (80, 100)],
[(50, 0) , (220, 190)],
[(100, 143) , (180, 200)],
[(110, 45) , (180, 150)]
]
# Global states
x_scale = 1.0
y_scale = 1.0
x_shift = 0
y_shift = 0
x1, y1 = 0, 0
x2, y2 = IMG_SIZE-1, IMG_SIZE-1
for region in detected_region:
# Detection
x_scale = IMG_SIZE / (x2-x1)
y_scale = IMG_SIZE / (y2-y1)
x_shift = x1
y_shift = y1
cur_im = cv2.resize(im[y1:y2, x1:x2], (IMG_SIZE, IMG_SIZE))
# Assuming the detector return these results
cv2.rectangle(cur_im, region[0], region[1], (255))
plt.imshow(cur_im)
plt.show()
# Zooming in, using part of your code
context_pixels = 16
x1 = max(region[0][0] - context_pixels, 0) / x_scale + x_shift
y1 = max(region[0][1] - context_pixels, 0) / y_scale + y_shift
x2 = min(region[1][0] + context_pixels, IMG_SIZE) / x_scale + x_shift
y2 = min(region[1][1] + context_pixels, IMG_SIZE) / y_scale + y_shift
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
# Assuming the detector confirm its choice here
print('Confirmed detection: ', x1, y1, x2, y2)
# This time no padding
x1 = detected_region[-1][0][0] / x_scale + x_shift
y1 = detected_region[-1][0][1] / y_scale + y_shift
x2 = detected_region[-1][1][0] / x_scale + x_shift
y2 = detected_region[-1][1][1] / y_scale + y_shift
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
cv2.rectangle(im, (x1, y1), (x2, y2), (255, 0, 0))
plt.imshow(im)
plt.show()
This also prevents resizing on a resized image which might create more artifacts and worsen the detector's performance.
Imagine a point (x, y) in a 500x500 image. Let it be (100, 200).
After scaling it to a different size, say 250x250 - the correct way to scale it would be to just look at the current co-ordinate and do new_coord = old_coord * NEW_SIZE/OLD_SIZE.
Thus, (100,200) will be transformed to (50,100)
If you replace your scaling using x2-x1 and use a simpler rescaling formula, it should fix your problem.
Update: NEW_SIZE and OLD_SIZE may be different for the two co-ordinates based on the shape of the original image and final image, if they are rectangular and not square.
Related
I have an image where I am creating rectangle over a specified area. The image is :
I am reading this image passing it through yolo algorithm gives me co-ordinates for rectangle around this gesture
the x1 , y1 , x2 , y2 values are
print(x1 , y1 , x2 , y2)
tensor(52.6865) tensor(38.8428) tensor(143.1934) tensor(162.9857)
Using these to add a rectangle over the image
box_w = x2 - x1
box_h = y2 - y1
color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
# Create a Rectangle patch
bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
# Add the bbox to the plot
ax.add_patch(bbox)
It results in the follwing image :
Now, I want to blacken everything around this square. For this purpose i am saving the above image and reading it back then using opencv to blacken the rest using following code.
x1 = int(x1)
y1 = int(y1)
x2 = int(x2)
y2 = int(y2)
# read image
img = cv2.imread(output_path)
#creating black mask
mask = np.zeros_like(img)
mask = cv2.rectangle(mask, (x1, y1), (x2,y2), (255,255,255), -1)
# apply mask to image
result = cv2.bitwise_and(img, mask)
# save results
cv2.imwrite(output_path, result)
I am getting the following image as result :
There are 2 issues :
cv2.rectangle only takes integer values as co-ordinates
May be x, y axis has different direction in yolo and open cv. Just guessing cause integers co-ordinate values should not be giving such vast difference from the rectangle.
This is being done in Jupyter notebook on Win 10.
I am trying to turn an object detector for images into object detector for videos.
But, I am getting multiple bounding boxes and I don't know why.
It seems like the first frame of the video has the correct number of bounding boxes, namely 1. But as it loops the function draw_boxes outputs images that have multiple or overlapping bounding boxes.
If you can help I will appreciate it. Thanks.
Here is an example of some frame:
And here is the code:
for i in tqdm(range(nb_frames)):
_, frame = video_reader.read()
cv2.imwrite("framey.jpg", frame)
filename = "framey.jpg"
image, image_w, image_h = load_image_pixels(filename, (input_w, input_h))
yhat = model.predict(image)
for i in range(len(yhat)):
# decode the output of the network
boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxes
do_nms(boxes, 0.5)
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# draw what we found
imagex = draw_boxes(filename, v_boxes, v_labels, v_scores)
video_writer.write(imagex)
video_reader.release()
video_writer.release()
And here is the function that is spitting out the above image:
def draw_boxes(filename, v_boxes, v_labels, v_scores):
# load the image
data = pyplot.imread(filename)
# plot the image
pyplot.imshow(data)
# get the context for drawing boxes
ax = pyplot.gca()
# plot each box
for i in range(len(v_boxes)):
box = v_boxes[i]
# get coordinates
y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
# calculate width and height of the box
width, height = x2 - x1, y2 - y1
# create the shape
rect = Rectangle((x1, y1), width, height, fill=False, color='white')
# draw the box
ax.add_patch(rect)
# draw text and score in top left corner
label = "%s (%.3f)" % (v_labels[i], v_scores[i])
pyplot.text(x1, y1, label, color='white')
# show the plot
pyplot.savefig('detected.jpg')
filename = "detected.jpg"
image = load_img(filename)
image_array = img_to_array(image)
image_array = (image_array*255).astype(np.uint8)
return image_array
So, the error was in the 'draw_boxes' function.
I changed 'draw_boxes' and it worked.
def draw_bounding_boxes(image, v_boxes, v_labels, v_scores):
for i in range(len(v_boxes)):
box = v_boxes[i]
y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
width, height = x2 - x1, y2 - y1
label = "%s (%.3f)" % (v_labels[i], v_scores[i])
region = np.array([[x1 - 3, y1],
[x1-3, y1 - height-26],
[x1+width+13, y1-height-26],
[x1+width+13, y1]], dtype='int32')
cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 5)
cv2.fillPoly(image,[region], (255, 0, 0))
cv2.putText(image,
label,
(x1+13, y1-13),
cv2.FONT_HERSHEY_SIMPLEX,
1e-3 * image.shape[0],
(0,0,0),
2)
return image
I have many images which contains two points, one at the top and another at the bottom. As well as I have the coordinates stored in the excel file too. I want to roatate the image so that it is 90 degrees.Below is the image which contains two coordinates.
The red color signifies the actual image using the coordinates and the angle is 85 degrees (approx), so iwant to rotate the image and make it 90 degrees as shown with yellow in the figure.
Can someone help me with this which api or functions to use. (I am using Python for coding)
It is basic math with angles in triangle.
if you have two points (x1,y1), (x2, y2) then you can calculate dx = x2-x1, dy = y2-y1 and then you can calculate tangens_alpha = dy/dx and alpha = arcus_tangens(tangens_alpha) and you have angle which you hava to use to calculate rotation - 90-alpha
In Python it will be as below. I took points from your image.
Because image have (0,0) in top left corner, not in bottom left corner like in math so I use dy = -(y2 - y1) to flip it
import math
x1 = 295
y1 = 605
x2 = 330
y2 = 100
dx = x2 - x1
dy = -(y2 - y1)
alpha = math.degrees(math.atan2(dy, dx))
rotation = 90-alpha
print(alpha, rotation)
And now you can use PIL/pillow or cv2+imutils to rotate it
import math
import cv2
import imutils
x1 = 295
y1 = 605
x2 = 330
y2 = 100
dx = x2 - x1
dy = -(y2 - y1)
alpha = math.degrees(math.atan2(dy, dx))
rotation = 90-alpha
print(alpha, rotation)
img = cv2.imread('image.jpg')
img_2 = imutils.rotate(img, rotation)
cv2.imwrite('rotate.jpg', img_2)
img_3 = imutils.rotate_bound(img, -rotation)
cv2.imwrite('rotate_bound.jpg', img_3)
cv2.imshow('rotate', img_2)
cv2.imshow('rotate_bound', img_3)
cv2.waitKey(0)
rotate.jpg
rotate_bound.jpg
I'm working on a image processing project and I'm trying to detect the shape of a square in a distorted image with opencv 3.4.0 and python 3.
The image is this one:
As you can see the black square is pretty distorted, but I'm interested in detecting only the right side that will always be on focus and not distorted.
I've already tried to use the Hough Line Transformation method with the code below, but it detects the line right to the square:
temp = cv2.imread("images/macro/noF3.jpg")
grTemp = cv2.cvtColor(temp, cv2.COLOR_BGR2GRAY)
asd = cv2.Canny(grTemp, 50, 150, apertureSize = 3)
lines = cv2.HoughLines(asd.copy(), 1, np.pi/180, 200)
for rho, theta in lines[0]:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.line(temp, (x1,y1), (x2,y2), (255,0,0), 3)
I've also tried to use Canny and the findContours method, but it didn't work well, moreover I can't blur the image.
I would like to obtain the result seen in the image below:
Thanks
I am using pyCairo for drawing moving elements on a surface.
In order to get better perfomance i tried to use "clip" function to redraw only the changed parts of a bigger image . Unfortunately it creates unwanted edges on the image. The edges of the cliping can be seen. Is it possible to avoid this kind of behaviour?
import math
import cairo
def draw_stuff(ctx):
""" clears background with solid black and then draws a circle"""
ctx.set_source_rgb (0, 0, 0) # Solid color
ctx.paint()
ctx.arc (0.5, 0.5, 0.5, 0, 2*math.pi)
ctx.set_source_rgb (0, 123, 0)
ctx.fill()
WIDTH, HEIGHT = 256, 256
surface = cairo.ImageSurface (cairo.FORMAT_ARGB32, WIDTH, HEIGHT)
ctx = cairo.Context (surface)
ctx.scale (WIDTH, HEIGHT) # Normalizing the canvas
draw_stuff(ctx)
#Let's draw stuff again, this time only redrawing a small part of the image
ctx.save()
ctx.rectangle(0.2,0.2,0.2,0.2)
ctx.clip()
draw_stuff(ctx)
ctx.restore()
surface.write_to_png ("example.png") # Output to PNG
You should round your cliping coordinates to integers (in device space). See http://cairographics.org/FAQ/#clipping_performance
I don't know the Python API and I am just guessing how it might work like from the C API, but it is something like this:
def snap_to_pixels(ctx, x, y):
x, y = ctx.user_to_device(x, y)
# No idea how to round an integer in python,
# this would be round() in C
# (Oh and perhaps you don't want this rounding, but
# instead want to round the top-left corner of your
# rectangle towards negative infinity and the bottom-right
# corner towards positive infinity. That way the rectangle
# would never become smaller to the rounding. But hopefully
# this example is enough to get the idea.
x = int(x + 0.5)
y = int(x + 0.5)
return ctx.device_to_user(x, y)
# Calculate the top-left and bottom-right corners of our rectangle
x1, y1 = 0.2, 0.2
x2, y2 = x1 + 0.2, y1 + 0.2
x1, y1 = snap_to_pixels(ctx, x1, y1)
x2, y2 = snap_to_pixels(ctx, x2, y2)
# Clip for this rectangle
ctx.rectangle(x1, y1, x2 - x1, y2 - y1)
ctx.clip()