Bounding Box Regression - python

I generated a data-set of (200 x 200x 3) images in which each image contains a 40 X 40 box of different color.
Create a model using tensorflow which can predict coords of this 40 x 40 box.
enter image description here
The code i used for generating these images:
from PIL import Image, ImageDraw
from random import randrange
colors = ["#ffd615", "#f9ff21", "#00d1ff",
"#0e153a", "#fc5c9c", "#ac3f21",
"#40514e", "#492540", "#ff8a5c",
"#000000", "#a6fff2", "#f0f696",
"#d72323", "#dee1ec", "#fcb1b1"]
def genrate_image(color):
img = Image.new(mode="RGB", size=(200, 200), color=color)
return img
def save_image(img, imgname):
img.save(imgname)
def draw_rect(image, color, x, y):
draw = ImageDraw.Draw(image)
coords = ((x, y), (x+40, y), (x+40, y+40), (x, y+40))
draw.polygon(coords, fill=color)
#return image, str(coords)
return image, coords[0][0], coords[2][0], coords[0][1], coords[2][1]
FILE_NAME = "train_annotations.txt"
for i in range(0, 100):
img = genrate_image(colors[randrange(0, len(colors))])
img, x0, x1, y0, y1 = draw_rect(img, colors[randrange(0, len(colors))], randrange(200 - 50), randrange(200 - 50))
save_image(img, "dataset/train_images/img"+str(i)+".png")
with open(FILE_NAME, "a+") as f:
f.write(f"{x0} {x1} {y0} {y1}\n")
f.close()
can anyone help me by suggesting how can i build a model which can predict coords of a new image.

Well the easiest way you can split these boxes is by doing a K-means clustering where K is 2. So you basically record all the rgb pixel values of the pixels. Then using K-means group up the pixels into 2 groups, one would be the background group, the other being the box color group. Then with the box color group, map those colors back to their original coordinates. Then get the mean of those coordinates to get the location of the 40x40 box.
https://www.tensorflow.org/api_docs/python/tf/compat/v1/estimator/experimental/KMeans
Above is a source documentation on how to do K-means

It is enough to perform a bounding box regression, for this you just need to add a fully connected layer after СNN with 4 output values:x1,y1,x2,y2. where they are top left and bottom right.
Something similar can be found here https://github.com/sabhatina/bounding-box-regression.

Related

OpenCV perspective transform

I am currently working on creating a python software that tracks players on a soccer field. I got the player detection working with YoloV3 and was able to output quite a nice result with players centroids and boxes drawn. What i want to do now is translate the players position and project their centroids onto a png/jpg of a soccerfield. For this I inteded to use two arrays with refrence points one for the soccerfield-image and one for the source video. But my question now is how do I translate the coordinates of the centroids to the soccerfield image.
Similiar example:
Example
How the boxes and Markers are drawn:
def draw_labels_and_boxes(img, boxes, confidences, classids, idxs, colors, labels):
# If there are any detections
if len(idxs) > 0:
for i in idxs.flatten():
# Get the bounding box coordinates
x, y = boxes[i][0], boxes[i][1]
w, h = boxes[i][2], boxes[i][3]
# Draw the bounding box rectangle and label on the image
cv.rectangle(img, (x, y), (x + w, y + h), (255, 255, 255), 2)
cv.drawMarker (img, (int(x + w / 2), int(y + h / 2)), (x, y), 0, 20, 3)
return img
Boxes are generated like this:
def generate_boxes_confidences_classids(outs, height, width, tconf):
boxes = []
confidences = []
classids = []
for out in outs:
for detection in out:
# print (detection)
# a = input('GO!')
# Get the scores, classid, and the confidence of the prediction
scores = detection[5:]
classid = np.argmax(scores)
confidence = scores[classid]
# Consider only the predictions that are above a certain confidence level
if confidence > tconf:
# TODO Check detection
box = detection[0:4] * np.array([width, height, width, height])
centerX, centerY, bwidth, bheight = box.astype('int')
# Using the center x, y coordinates to derive the top
# and the left corner of the bounding box
x = int(centerX - (bwidth / 2))
y = int(centerY - (bheight / 2))
# Append to list
boxes.append([x, y, int(bwidth), int(bheight)])
confidences.append(float(confidence))
classids.append(classid)
return boxes, confidences, classids
Assuming a stationary camera,
Find the coordinates of the four corners of the field.
Find the corresponding four corners in the top-view image that you want to create.
Find a homography matrix using these two sets of points. You can use OpenCV's findHomography for this.
Transform all the centroids using this homography matrix and that should give you your coordinates in the new image space. You can use warpPerspective for doing this.
Recently during COVID19 pandemic many developers have developed "social-distancing-monitoring-system". There a few of them also developed "Bird's Eye View" of the system. Your problem is just similar. As external links are not accepted here, so I am not able to post the exact link(s). Please check their codes in GitHub.

Storing given percentage of shape contour coordinates

I am building a project on Detectron2 with some mushrooms as a topic. The prediction works OK-well and I'm now trying to generate COCO-like annotations of the images with the predicted region (all XY coordinates of the region). For this, I need to do two things:
Retrieve the XY coordinates of the predicted shape/region and "downscale it" to only save the main edges (in order to avoid saving too many data points)
Plot the "saved" points back onto the main image for the user to judge if enough points are being saved
Unfortunately i'm failing at both. On the first point, I (think) that I have the same as a binary numpy object, but i'm surprised by its size AND i don't manage to transform it into sets of XY coordinates
On the second point, I am getting an error that I can't figure out how to debug:
/usr/local/lib/python3.6/dist-packages/google/colab/patches/__init__.py in cv2_imshow(a)
20 image.
21 """
---> 22 a = a.clip(0, 255).astype('uint8')
23 # cv2 stores colors as BGR; convert to RGB
24 if a.ndim == 3:
AttributeError: 'cv2.UMat' object has no attribute 'clip'
The portion of the code i'm using is here:
from detectron2.utils.visualizer import ColorMode
## Predicts some random image
dataset_dicts = get_all_mushroom_dicts(mushroom_categories, "mushroom_dataset_small/val")
d = random.sample(dataset_dicts, 1)
im = cv2.imread(d["file_name"])
mushroom_outputs = mushroom_predictor(im)
v = Visualizer(im[:, :, ::-1],
metadata=mushroom_metadata,
scale=0.8,
instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels
)
instances = mushroom_outputs["instances"].to("cpu")
mush_out = v.draw_instance_predictions(instances)
image = mush_out.get_image()[:, :, ::-1]
masks = np.asarray(instances.pred_masks)
print ("NP array shape", masks.shape)
print("Image array shape", image.shape)
print("Type before", type(image))
cv2_imshow(image) ## works fine
## ?? How to get the coordinates of the boundaries? And then take only some of them
## ??
## Assuming that (128, 128) is one of these coordinates for now
image2 = cv2.circle(image, (128, 128), 10, (255, 0, 0), 20)
print("Type after", type(image2))
cv2_imshow(image2) ## crashes
FYI I have also tried to find and draw the contour but this doesn't seem to work (see my post here https://github.com/facebookresearch/detectron2/issues/1702#event-3501434732 ).
Do you have any clue?
Thanks!
I managed to find a solution that works but part of it is not elegant:
Retrieve the XY coordinates:
masks = np.asarray(instances.pred_masks)
cnt, heirarchy = cv2.findContours(masks[0].astype("uint8"), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
border = cnt[0]
gap = 20 # take only every 20 point
for i in list(range(border.shape[0]))[0::gap]:
y = int(border[i][0][1]*0.8)
x = int(border[i][0][0]*0.8)
print("New XY coordinates", (x,y))
Draw on the picture:
image_old = mush_out.get_image()[:, :, ::-1]
cv2.imwrite("img.png", image_old)
image = cv2.imread("img.png",cv2.IMREAD_COLOR) # Still no clue why this is needed but doesnt work if you dont save and re-read...
radius = 3
color = (255, 0, 0)
cv2.circle(image, (x, y), radius, color, -1) # my previous code was wrong; cv2.circle returns void and edits directly the image
cv2_imshow(image)

How to plot centroids on image after kmeans clustering?

I have a color image and wanted to do k-means clustering on it using OpenCV.
This is the image on which I wanted to do k-means clustering.
This is my code:
import numpy as np
import cv2
import matplotlib.pyplot as plt
image1 = cv2.imread("./triangle.jpg", 0)
Z1 = image1.reshape((-1))
Z1 = np.float32(Z1)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K1 = 2
ret, mask, center =cv2.kmeans(Z1,K1,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
print(center)
res_image1 = center[mask.flatten()]
clustered_image1 = res_image1.reshape((image1.shape))
for c in center:
plt.hlines(c, xmin=0, xmax=max(clustered_image1.shape[0], clustered_image1.shape[1]), lw=1.)
plt.imshow(clustered_image1)
plt.show()
This is what I get from the center variable.
[[112]
[255]]
This is the output image
My problem is that I'm unable to understand the output. I have two lists in the center variable because I wanted two classes. But why do they have only one value?
Shouldn't it be something like this (which makes sense because centroids should be points):
[[x1, y1]
[x2, y2]]
instead of this:
[[x]
[y]]
and if I read the image as a color image like this:
image1 = cv2.imread("./triangle.jpg")
Z1 = image1.reshape((-1, 3))
I get this output:
[[255 255 255]
[ 89 173 1]]
Color image output
Can someone explain to me how I can get 2d points instead of lines? Also, how do I interpret the output I got from the center variable when using the color image?
Please let me know if I'm unclear anywhere. Thanks!!
K-Means-clustering finds clusters of similar values. Your input is an array of color values, hence you find the colors that describe the 2 clusters. [255 255 255] is the white color, [ 89 173 1] is the green color. Similar for [112] and [255] in the grayscale version. What you're doing is color quantization
They are correctly the centroids, but their dimension is color, not location. Therefor you cannot plot it anywhere. Well you can, but I looks like this:
See how the 'color location' determines to which class each pixel belongs?
This is not something you can locate in your image. What you can do is find the pixels that belong to the different clusters, and use the locations of the found pixels to determine their centroid or 'average' position.
To get the 'average' position of each color, you have to separate out the pixel coordinates according to the class/color to which they belong. In the code below I used np.where( img <= 240) where 240 is the threshold. I used 240 out of ease, but you could use K-Means to determine where the threshold should be. (inRange() might be useful at some point)) If you sum the coordinates and divide that by the number of pixels found, you'll have what I think you are looking for:
Result:
Code:
import cv2
# load image as grayscale
img = cv2.imread('D21VU.jpg',0)
# get the positions of all pixels that are not full white (= triangle)
triangle_px = np.where( img <= 240)
# dividing the sum of the values by the number of pixels
# to get the average location
ty = int(sum(triangle_px[0])/len(triangle_px[0]))
tx = int(sum(triangle_px[1])/len(triangle_px[1]))
# print location and draw filled black circle
print("Triangle ({},{})".format(tx,ty))
cv2.circle(img, (tx,ty), 10,(0), -1)
# the same process, but now with only white pixels
white_px = np.where( img > 240)
wy = int(sum(white_px[0])/len(white_px[0]))
wx = int(sum(white_px[1])/len(white_px[1]))
# print location and draw white filled circle
print("White: ({},{})".format(wx,wy))
cv2.circle(img, (wx,wy), 10,(255), -1)
# display result
cv2.imshow('Result',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here is an Imagemagick solution, since I am not proficient with OpenCV.
Basically, I convert your actual image (from your link in the comments) to binary, then use image moments to extract the centroid and other statistics.
I suspect you can do something similar in OpenCV, Skimage, or Python Wand, which is based upon Imagemagick. (See for example:
https://docs.opencv.org/3.4/d3/dc0/group__imgproc__shape.html#ga556a180f43cab22649c23ada36a8a139
https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.moments_coords_central
https://en.wikipedia.org/wiki/Image_moment)
Input:
Your image does not have just two colors. Perhaps this image did not have kmeans clustering applied with 2 colors only. So I will do that with an Imagemagick script that I have built.
kmeans -n 2 -m 5 img.png img2.png
final colors:
count,hexcolor
99234,#65345DFF
36926,#27AD0EFF
Then I convert the two colors to black and white by simply thresholding and stretching the dynamic range to full black and white.
convert img2.png -threshold 50% -auto-level img3.png
Then I get all the image moment statistics for the white pixels, which includes the x,y centroid in pixels relative to the top left corner of the image. It also includes the equivalent ellipse major and minor axes, angle of major axis, eccentricity of the ellipse, and equivalent brightness of the ellipse, plus the 8 Hu image moments.
identify -verbose -moments img3.png
Channel moments:
Gray:
--> Centroid: 208.523,196.302 <--
Ellipse Semi-Major/Minor axis: 170.99,164.34
Ellipse angle: 140.853
Ellipse eccentricity: 0.197209
Ellipse intensity: 106.661 (0.41828)
I1: 0.00149333 (0.380798)
I2: 3.50537e-09 (0.000227937)
I3: 2.10942e-10 (0.00349771)
I4: 7.75424e-13 (1.28576e-05)
I5: 9.78445e-24 (2.69016e-09)
I6: -4.20164e-17 (-1.77656e-07)
I7: 1.61745e-24 (4.44704e-10)
I8: 9.25127e-18 (3.91167e-08)

Finding the "level of black" in a pixel in a grayscale image

I am trying to calculate the percentage of black a pixel is. For example, let's say I have a pixel that is 75% black, so a gray. I have the RGBA values, so how do I get the level of black?
I have already completed getting each pixel and replacing it with a new RGBA value, and tried to use some RGBA logic to no avail.
#Gradient Testing here
from PIL import Image
picture = Image.open("img1.png")
img = Image.open('img1.png').convert('LA')
img.save('greyscale.png')
# Get the size of the image
width, height = picture.size
# Process every pixel
for x in range(width):
for y in range(height):
#Code I need here
r1, g1, b1, alpha = picture.getpixel( (x,y) )
r,g,b = 120, 140, 99
greylvl = 1 - (alpha(r1 + g1 + b1) / 765) #Code I tried
I would like to get new variable called that gives me a value, such as 0.75, which would represent a 0.75 percent black pixel.
I'm not quite sure what the "LA" format you're trying to convert to is for; I would try "L" instead.
Try this code: (Make sure you're using Python 3.)
from PIL import Image
picture = Image.open('img1.png').convert('L')
width, height = picture.size
for x in range(width):
for y in range(height):
value = picture.getpixel( (x, y) )
black_level = 1 - value / 255
print('Level of black at ({}, {}): {} %'.format(x, y, black_level * 100))
Is this what you're looking for?

How to merge a transparent png image with another image using PIL

I have a transparent png image foo.png and I've opened another image with:
im = Image.open("foo2.png")
Now what I need is to merge foo.png with foo2.png.
(foo.png contains some text and I want to print that text on foo2.png)
from PIL import Image
background = Image.open("test1.png")
foreground = Image.open("test2.png")
background.paste(foreground, (0, 0), foreground)
background.show()
First parameter to .paste() is the image to paste. Second are coordinates, and the secret sauce is the third parameter. It indicates a mask that will be used to paste the image. If you pass a image with transparency, then the alpha channel is used as mask.
Check the docs.
Image.paste does not work as expected when the background image also contains transparency. You need to use real Alpha Compositing.
Pillow 2.0 contains an alpha_composite function that does this.
background = Image.open("test1.png")
foreground = Image.open("test2.png")
Image.alpha_composite(background, foreground).save("test3.png")
EDIT: Both images need to be of the type RGBA. So you need to call convert('RGBA') if they are paletted, etc.. If the background does not have an alpha channel, then you can use the regular paste method (which should be faster).
As olt already pointed out, Image.paste doesn't work properly, when source and destination both contain alpha.
Consider the following scenario:
Two test images, both contain alpha:
layer1 = Image.open("layer1.png")
layer2 = Image.open("layer2.png")
Compositing image using Image.paste like so:
final1 = Image.new("RGBA", layer1.size)
final1.paste(layer1, (0,0), layer1)
final1.paste(layer2, (0,0), layer2)
produces the following image (the alpha part of the overlayed red pixels is completely taken from the 2nd layer. The pixels are not blended correctly):
Compositing image using Image.alpha_composite like so:
final2 = Image.new("RGBA", layer1.size)
final2 = Image.alpha_composite(final2, layer1)
final2 = Image.alpha_composite(final2, layer2)
produces the following (correct) image:
One can also use blending:
im1 = Image.open("im1.png")
im2 = Image.open("im2.png")
blended = Image.blend(im1, im2, alpha=0.5)
blended.save("blended.png")
Had a similar question and had difficulty finding an answer. The following function allows you to paste an image with a transparency parameter over another image at a specific offset.
import Image
def trans_paste(fg_img,bg_img,alpha=1.0,box=(0,0)):
fg_img_trans = Image.new("RGBA",fg_img.size)
fg_img_trans = Image.blend(fg_img_trans,fg_img,alpha)
bg_img.paste(fg_img_trans,box,fg_img_trans)
return bg_img
bg_img = Image.open("bg.png")
fg_img = Image.open("fg.png")
p = trans_paste(fg_img,bg_img,.7,(250,100))
p.show()
def trans_paste(bg_img,fg_img,box=(0,0)):
fg_img_trans = Image.new("RGBA",bg_img.size)
fg_img_trans.paste(fg_img,box,mask=fg_img)
new_img = Image.alpha_composite(bg_img,fg_img_trans)
return new_img
Here is my code to merge 2 images of different sizes, each with transparency and with offset:
from PIL import Image
background = Image.open('image1.png')
foreground = Image.open("image2.png")
x = background.size[0]//2
y = background.size[1]//2
background = Image.alpha_composite(
Image.new("RGBA", background.size),
background.convert('RGBA')
)
background.paste(
foreground,
(x, y),
foreground
)
background.show()
This snippet is a mix of the previous answers, blending elements with offset while handling images with different sizes, each with transparency.
the key code is:
_, _, _, alpha = image_element_copy.split()
image_bg_copy.paste(image_element_copy, box=(x0, y0, x1, y1), mask=alpha)
the full function is:
def paste_image(image_bg, image_element, cx, cy, w, h, rotate=0, h_flip=False):
image_bg_copy = image_bg.copy()
image_element_copy = image_element.copy()
image_element_copy = image_element_copy.resize(size=(w, h))
if h_flip:
image_element_copy = image_element_copy.transpose(Image.FLIP_LEFT_RIGHT)
image_element_copy = image_element_copy.rotate(rotate, expand=True)
_, _, _, alpha = image_element_copy.split()
# image_element_copy's width and height will change after rotation
w = image_element_copy.width
h = image_element_copy.height
x0 = cx - w // 2
y0 = cy - h // 2
x1 = x0 + w
y1 = y0 + h
image_bg_copy.paste(image_element_copy, box=(x0, y0, x1, y1), mask=alpha)
return image_bg_copy
the above function supports:
position(cx, cy)
auto resize image_element to (w, h)
rotate image_element without cropping it
horizontal flip

Categories

Resources