Given a binary image, how do I box around the majority of the white pixels? For example, consider the following image:
As canny segmentation results in a binary image, I thought I could use np.nonzero to identify the location of the points, and then draw a box around it. I have the following function to identify the location of the bounding box but its not working as intended (as you can see by the box in the image above):
def get_bounding_box(image,thresh=0.95):
nonzero_indices = np.nonzero(image)
min_row, max_row = np.min(nonzero_indices[0]), np.max(nonzero_indices[0])
min_col, max_col = np.min(nonzero_indices[1]), np.max(nonzero_indices[1])
box_size = max_row - min_row + 1, max_col - min_col + 1
#box_size_thresh = (int(box_size[0] * thresh), int(box_size[1] * thresh))
box_size_thresh = (int(box_size[0]), int(box_size[1]))
#coordinates of the box that contains 95% of the highest pixel values
top_left = (min_row + int((box_size[0] - box_size_thresh[0]) / 2), min_col + int((box_size[1] - box_size_thresh[1]) / 2))
bottom_right = (top_left[0] + box_size_thresh[0], top_left[1] + box_size_thresh[1])
print((top_left[0], top_left[1]), (bottom_right[0], bottom_right[1]))
return (top_left[0], top_left[1]), (bottom_right[0], bottom_right[1])
and using the following code to get the coords and draw the box as follows:
seg= canny_segmentation(gray)
bb_thresh = get_bounding_box(seg,0.95)
im_crop = gray[bb_thresh[0][1]:bb_thresh[1][1],bb_thresh[0][0]:bb_thresh[1][0]]
why is this code not giving me the right top left / bottom right coordinates?
I have a example colab workbook here
The issue is related to the order of the coordinates returned from get_bounding_box:
return (top_left[0], top_left[1]), (bottom_right[0], bottom_right[1])
Applies the following ordering:
(y0, x0), (y1, x1)
When y is the row and x is column.
When returned coordinates are used by im_crop = gray[bb_thresh[0][1]:bb_thresh[1][1], bb_thresh[0][0]:bb_thresh[1][0]] the rows and columns are switched up.
For avoiding confusion I recommend storing the coordinates in x0, y0, x1, y1 first:
(y0, x0), (y1, x1) = bb_thresh
Then use the coordinates in the correct order:
im_crop = gray[y0:y1, x0:x1]
For testing, we may also draw a rectangle using cv2.rectangle:
cv2.rectangle(bgr_image, (x0, y0), (x1, y1), (0, 255, 0), 2)
Part of the confusion is related to the fact the NumPy array indexing is (y, x) and OpenCV "point" coordinate convention is (x, y)
Code sample (not using Google Colab):
import cv2
import numpy as np
def get_bounding_box(image, thresh=0.95):
nonzero_indices = np.nonzero(image)
min_row, max_row = np.min(nonzero_indices[0]), np.max(nonzero_indices[0])
min_col, max_col = np.min(nonzero_indices[1]), np.max(nonzero_indices[1])
box_size = max_row - min_row + 1, max_col - min_col + 1
#box_size_thresh = (int(box_size[0] * thresh), int(box_size[1] * thresh))
box_size_thresh = (int(box_size[0]), int(box_size[1]))
#coordinates of the box that contains 95% of the highest pixel values
top_left = (min_row + int((box_size[0] - box_size_thresh[0]) / 2), min_col + int((box_size[1] - box_size_thresh[1]) / 2))
bottom_right = (top_left[0] + box_size_thresh[0], top_left[1] + box_size_thresh[1])
print((top_left[0], top_left[1]), (bottom_right[0], bottom_right[1]))
return (top_left[0], top_left[1]), (bottom_right[0], bottom_right[1]) # Return format is (y0, x0), (y1, x1), when y is the row and x is the column
def canny_segmentation(img, low_threshold=100, high_threshold=200):
edges = cv2.Canny(img, low_threshold, high_threshold)
return edges
gray = cv2.imread('small_grayscale_image.png', cv2.IMREAD_GRAYSCALE) # Read input image as Grayscale
seg = canny_segmentation(gray, 300, 320) # Use high thresholds - for testing
bb_thresh = get_bounding_box(seg, 0.95)
#im_crop = gray[bb_thresh[0][1]:bb_thresh[1][1], bb_thresh[0][0]:bb_thresh[1][0]]
(y0, x0), (y1, x1) = bb_thresh # Store coordinates in intermediate variables in order to avoid confusion.
im_crop = gray[y0:y1, x0:x1]
# Draw green rectangle for testing
bgr_image = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
cv2.rectangle(bgr_image, (x0, y0), (x1, y1), (0, 255, 0), 2)
cv2.imshow('bgr_image', bgr_image)
cv2.imshow('seg', seg)
small_grayscale_image (input image):
I think that the top left and bottom right coordinates of the bounding box are not correctly calculated in the get_bounding_box function. The problem might lie in the calculation of top_left and bottom_right. The indices for the top left and bottom right coordinates of the bounding box should be calculated based on the min_row, max_row, min_col, max_col values, and not box_size_thresh.
Here's a corrected version of the code:
def get_bounding_box(image,thresh=0.95):
nonzero_indices = np.nonzero(image)
min_row, max_row = np.min(nonzero_indices[0]), np.max(nonzero_indices[0])
min_col, max_col = np.min(nonzero_indices[1]), np.max(nonzero_indices[1])
top_left = (min_row, min_col)
bottom_right = (max_row, max_col)
return top_left, bottom_right
Hope this helped!
It turns out I needed to transpose the image before getting the coordinates, a simple .T did the trick
nonzero_indices = np.nonzero(image.T)
I'm using a RealSense D455 camera and trying to detect objects and calculate the width of them. I found some code that does it for the height but when I try to change this the calculations are wrong. For height it's usually pretty accurate only showing small increases in height when wrong. But with the changed code it says for example an object that's ~40cm as 1-1,5 meters.
if score > 0.8 and class_ == 1: # 1 for human
left = box[1] * W
top = box[0] * H
right = box[3] * W
bottom = box[2] * H
width = right - left
height = bottom - top
bbox = (int(left), int(top), int(width), int(height))
heightB = bbox[1] + bbox[3]
p1 = (int(bbox[0]), int(bbox[1]))
p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
# draw box
cv2.rectangle(color_image, p1, p2, (255,0,0), 2, 1)
# x,y,z of bounding box
obj_points = verts[int(bbox[1]):int(bbox[1] + bbox[3]), int(bbox[0]):int(bbox[0] + bbox[2])].reshape(-1, 3)
zs = obj_points[:, 2]
z = np.median(zs)
ys = obj_points[:, 0]
ys = np.delete(ys, np.where(
(zs < z - 1) | (zs > z + 1))) # take only y for close z to prevent including background
my = np.amin(ys, initial=1)
My = np.amax(ys, initial=-1)
height = (My - my) # add next to rectangle print of height using cv library
height = float("{:.2f}".format(height))
print("[INFO] object height is: ", height, "[m]")
height_txt = str(height) + "[m]"
# Write some Text
bottomLeftCornerOfText = (p1[0], p1[1] + 20)
fontScale = 1
fontColor = (255, 255, 255)
lineType = 2
cv2.putText(color_image, height_txt,
# Show images
cv2.namedWindow('RealSense', cv2.WINDOW_AUTOSIZE)
cv2.imshow('RealSense', color_image)
Object pointers are used, they split up the dimensions into their own array, so zs = obj_points[:, 2] will be for z ys = obj_points[:, 1] is for y. I thought just changing ys = obj_points[:, 1] to ys = obj_points[:, 0] would calculate width but aforementioned it does not work.
ys = np.delete(ys, np.where((zs < z - 1) | (zs > z + 1)))
This is is just to take out the outliers so as to not take into account background values.
This is the part that calculates the height, since the camera will be horizontal the height difference will be the width.
my = np.amin(ys, initial=1)
My = np.amax(ys, initial=-1)
height = (My - my) # add next to rectangle print of height using cv library
Since the camera is horizontal I can just the the length of Y. But this does not seem to work when I try the same for X.
If it's necessary this is the link to the original GitHub repo: I'm using Example2.
I was wondering, given the type of interpolation that is used for image resizes using cv2.resize. How can I find out exactly where a particular pixel maps too? For example, if I'm increasing the size of an image using Linear_interpolation and I take coordinates (785, 251) for a particular pixel, regardless of whether or not the aspect ratio changes between the source image and resized image, how could I find out exactly to what coordinates the pixel in the source image with coordinates == (785, 251) maps in the resized version? I've looked over the internet for a solution but all solutions seem to be indirect methods of finding out where a pixel maps that don't actually work for different aspect ratio's:
After resizing an image with cv2, how to get the new bounding box coordinate
Is there a way through cv2 to access the way pixels are mapped maybe and through reversing the script finding out the new coordinates?
The reason why I would like this is that I want to be able to create bounding boxes that give me back the same information regardless of the change in aspect ratio of a given image. Every method I've used so far doesn't give me back the same information. I figure that if I can figure out where the particular pixel coordinates of x,y top left and bottom right maps I can recreate an accurate bounding box regardless of aspect ratio changes.
Scaling the coordinates works when the center coordinate is (0, 0).
You may compute x_scaled and y_scaled as follows:
Subtract x_original_center and y_original_center from x_original and y_original.
After subtraction, (0, 0) is the "new center".
Scale the "zero centered" coordinates by scale_x and scale_y.
Convert the "scaled zero centered" coordinates to "top left (0, 0)" by adding x_scaled_center and y_scaled_center.
Computing the center accurately:
The Python conversion is:
(0, 0) is the top left, and (cols-1, rows-1) is the bottom right coordinate.
The accurate center coordinate is:
x_original_center = (original_rows-1)/2
y_original_center = (original_cols-1)/2
Python code (assume img is the original image):
resized_img = cv2.resize(img, [int(cols*scale_x), int(rows*scale_y)])
rows, cols = img.shape[0:2]
resized_rows, resized_cols = resized_img.shape[0:2]
x_original_center = (cols-1) / 2
y_original_center = (rows-1) / 2
x_scaled_center = (resized_cols-1) / 2
y_scaled_center = (resized_rows-1) / 2
# Subtract the center, scale, and add the "scaled center".
x_scaled = (x_original - x_original_center)*scale_x + x_scaled_center
y_scaled = (y_original - y_original_center)*scale_y + y_scaled_center
The following code sample draws crosses at few original and scaled coordinates:
import cv2
def draw_cross(im, x, y, use_color=False):
""" Draw a cross with center (x,y) - cross is two rows and two columns """
x = int(round(x - 0.5))
y = int(round(y - 0.5))
if use_color:
im[y-4:y+6, x] = [0, 0, 255]
im[y-4:y+6, x+1] = [255, 0, 0]
im[y, x-4:x+6] = [0, 0, 255]
im[y+1, x-4:x+6] = [255, 0, 0]
im[y-4:y+6, x] = 0
im[y-4:y+6, x+1] = 255
im[y, x-4:x+6] = 0
im[y+1, x-4:x+6] = 255
img = cv2.imread('graf.png') #
rows, cols = img.shape[0:2] # cols = 320, rows = 256
# 3 points for testing:
x0_original, y0_original = cols//2-0.5, rows//2-0.5 # 159.5, 127.5
x1_original, y1_original = cols//5-0.5, rows//4-0.5 # 63.5, 63.5
x2_original, y2_original = (cols//5)*3+20-0.5, (rows//4)*3+30-0.5 # 211.5, 221.5
draw_cross(img, x0_original, y0_original) # Center of cross (159.5, 127.5)
draw_cross(img, x1_original, y1_original)
draw_cross(img, x2_original, y2_original)
scale_x = 2.5
scale_y = 2
resized_img = cv2.resize(img, [int(cols*scale_x), int(rows*scale_y)], interpolation=cv2.INTER_NEAREST)
resized_rows, resized_cols = resized_img.shape[0:2] # cols = 800, rows = 512
# Compute center column and center row
x_original_center = (cols-1) / 2 # 159.5
y_original_center = (rows-1) / 2 # 127.5
# Compute center of resized image
x_scaled_center = (resized_cols-1) / 2 # 399.5
y_scaled_center = (resized_rows-1) / 2 # 255.5
# Compute the destination coordinates after resize
x0_scaled = (x0_original - x_original_center)*scale_x + x_scaled_center # 399.5
y0_scaled = (y0_original - y_original_center)*scale_y + y_scaled_center # 255.5
x1_scaled = (x1_original - x_original_center)*scale_x + x_scaled_center # 159.5
y1_scaled = (y1_original - y_original_center)*scale_y + y_scaled_center # 127.5
x2_scaled = (x2_original - x_original_center)*scale_x + x_scaled_center # 529.5
y2_scaled = (y2_original - y_original_center)*scale_y + y_scaled_center # 443.5
# Draw crosses on resized image
draw_cross(resized_img, x0_scaled, y0_scaled, True)
draw_cross(resized_img, x1_scaled, y1_scaled, True)
draw_cross(resized_img, x2_scaled, y2_scaled, True)
cv2.imshow('img', img)
cv2.imshow('resized_img', resized_img)
Original image:
Resized image:
Making sure the crosses are aligned:
In my answer I was using the naming conventions of Miki's comment.
I am pretty new to Python and want to do the following: I want to divide the following image into 8 pie segments:
I want it to look something like this (I made this in PowerPoint):
The background should be black and the edge of the figure should have an unique color as well as each pie segment.
EDIT: I have written a code that divides the whole image in 8 segments:
from PIL import Image, ImageDraw'C:/Users/20191881/Documents/OGO Beeldanalyse/Python/asymmetrie/rotation.png')
fill = 255
draw = ImageDraw.Draw(im)
draw.line((0,0) + im.size, fill)
draw.line((0, im.size[1], im.size[0], 0), fill)
draw.line((0.5*im.size[0],0, 0.5*im.size[0], im.size[1]), fill)
draw.line((0, 0.5*im.size[1], im.size[0], 0.5*im.size[1]), fill)
del draw
The output gives:
The only thing that is left to do is to find a way to make each black segment inside the border an unique color and also give all the white edge segments an unique color.
Your code divides the image in eight parts, that's correct, but with respect to the image center, you don't get eight "angular equally" pie segments like you show in your sketch.
Here would be my solution, only using Pillow and the math module:
import math
from PIL import Image, ImageDraw
def segment_color(i_color, n_colors):
r = int((192 - 64) / (n_colors - 1) * i_color + 64)
g = int((224 - 128) / (n_colors - 1) * i_color + 128)
b = 255
return (r, g, b)
# Load image; generate ImageDraw
im ='path_to/vgdrD.png').convert('RGB')
draw = ImageDraw.Draw(im)
# Number of pie segments (must be an even number)
n = 8
# Replace (all-white) edge with defined edge color
edge_color = (255, 128, 0)
pixels = im.load()
for y in range(im.height):
for x in range(im.width):
if pixels[x, y] == (255, 255, 255):
pixels[x, y] = edge_color
# Draw lines with defined line color
line_color = (0, 255, 0)
d = min(im.width, im.height) - 10
center = (int(im.width/2), int(im.height)/2)
for i in range(int(n/2)):
angle = 360 / n * i
x1 = math.cos(angle/180*math.pi) * d/2 + center[0]
y1 = math.sin(angle/180*math.pi) * d/2 + center[1]
x2 = math.cos((180+angle)/180*math.pi) * d/2 + center[0]
y2 = math.sin((180+angle)/180*math.pi) * d/2 + center[1]
draw.line([(x1, y1), (x2, y2)], line_color)
# Fill pie segments with defined segment colors
for i in range(n):
angle = 360 / n * i + 360 / n / 2
x = math.cos(angle/180*math.pi) * 20 + center[0]
y = math.sin(angle/180*math.pi) * 20 + center[1]
ImageDraw.floodfill(im, (x, y), segment_color(i, n)) + '_pie.png')
For n = 8 pie segments, the following result is produced:
The first step is to replace all white pixels in the original image with the desired edge color. Of course, the assumption here is, that there are no other (white) pixels in the image. Also, this might be better done using NumPy and vectorized code, but I wanted to keep the solution Pillow-only.
Next step is to draw the (green) lines. Here, I calculate the proper coordinates of the lines' start and end using sin and cos.
The last step is to flood fill the pie segments' area, cf. ImageDraw.floodfill. Therefore, I calculate the seed points the same way as before, but add an angular shift to hit a point exactly within the pie segment.
As you can see, n is variable in my solution (n must be even):
Of course, there are limitations regarding the angular resolution, most due to the small image.
Hope that helps!
EDIT: Here's a modified version to also allow for individually colored edges.
import math
from PIL import Image, ImageDraw
def segment_color(i_color, n_colors):
r = int((192 - 64) / (n_colors - 1) * i_color + 64)
g = int((224 - 128) / (n_colors - 1) * i_color + 128)
b = 255
return (r, g, b)
def edge_color(i_color, n_colors):
r = 255
g = 255 - int((224 - 32) / (n_colors - 1) * i_color + 32)
b = 255 - int((192 - 16) / (n_colors - 1) * i_color + 16)
return (r, g, b)
# Load image; generate ImageDraw
im ='images/vgdrD.png').convert('RGB')
draw = ImageDraw.Draw(im)
center = (int(im.width/2), int(im.height)/2)
# Number of pie segments (must be an even number)
n = 8
# Replace (all-white) edge with defined edge color
max_len = im.width + im.height
im_pix = im.load()
for i in range(n):
mask ='L', im.size, 0)
mask_draw = ImageDraw.Draw(mask)
angle = 360 / n * i
x1 = math.cos(angle/180*math.pi) * max_len + center[0]
y1 = math.sin(angle/180*math.pi) * max_len + center[1]
angle = 360 / n * (i+1)
x2 = math.cos(angle/180*math.pi) * max_len + center[0]
y2 = math.sin(angle/180*math.pi) * max_len + center[1]
mask_draw.polygon([center, (x1, y1), (x2, y2)], 255)
mask_pix = mask.load()
for y in range(im.height):
for x in range(im.width):
if (im_pix[x, y] == (255, 255, 255)) & (mask_pix[x, y] == 255):
im_pix[x, y] = edge_color(i, n)
# Draw lines with defined line color
line_color = (0, 255, 0)
d = min(im.width, im.height) - 10
for i in range(int(n/2)):
angle = 360 / n * i
x1 = math.cos(angle/180*math.pi) * d/2 + center[0]
y1 = math.sin(angle/180*math.pi) * d/2 + center[1]
x2 = math.cos((180+angle)/180*math.pi) * d/2 + center[0]
y2 = math.sin((180+angle)/180*math.pi) * d/2 + center[1]
draw.line([(x1, y1), (x2, y2)], line_color)
# Fill pie segments with defined segment colors
for i in range(n):
angle = 360 / n * i + 360 / n / 2
x = math.cos(angle/180*math.pi) * 20 + center[0]
y = math.sin(angle/180*math.pi) * 20 + center[1]
ImageDraw.floodfill(im, (x, y), segment_color(i, n)) + '_pie.png')
Binary masks for each pie segment are created, and all white pixels only within that binary mask are replaced with a defined edge color.
Using NumPy still seems favorable, but I was curious to do that in Pillow only.
How can I crop an image in the center? Because I know that the box is a 4-tuple defining the left, upper, right, and lower pixel coordinate but I don't know how to get these coordinates so it crops in the center.
Assuming you know the size you would like to crop to (new_width X new_height):
import Image
im =<your image>)
width, height = im.size # Get dimensions
left = (width - new_width)/2
top = (height - new_height)/2
right = (width + new_width)/2
bottom = (height + new_height)/2
# Crop the center of the image
im = im.crop((left, top, right, bottom))
This will break if you attempt to crop a small image larger, but I'm going to assume you won't be trying that (Or that you can catch that case and not crop the image).
One potential problem with the proposed solution is in the case there is an odd difference between the desired size, and old size. You can't have a half pixel on each side. One has to choose a side to put an extra pixel on.
If there is an odd difference for the horizontal the code below will put the extra pixel to the right, and if there is and odd difference on the vertical the extra pixel goes to the bottom.
import numpy as np
def center_crop(img, new_width=None, new_height=None):
width = img.shape[1]
height = img.shape[0]
if new_width is None:
new_width = min(width, height)
if new_height is None:
new_height = min(width, height)
left = int(np.ceil((width - new_width) / 2))
right = width - int(np.floor((width - new_width) / 2))
top = int(np.ceil((height - new_height) / 2))
bottom = height - int(np.floor((height - new_height) / 2))
if len(img.shape) == 2:
center_cropped_img = img[top:bottom, left:right]
center_cropped_img = img[top:bottom, left:right, ...]
return center_cropped_img
I feel like the simplest solution that is most suitable for most applications is still missing. The accepted answer has an issue with uneven pixels and especially for ML algorithms, the pixel count of the cropped image is paramount.
In the following example, I would like to crop an image to 224/100, from the center. I do not care if the pixels are shifted to the left or right by 0.5, as long as the output picture will always be of the defined dimensions. It avoids the reliance on math.*.
from PIL import Image
import matplotlib.pyplot as plt
im ="test.jpg")
left = int(im.size[0]/2-224/2)
upper = int(im.size[1]/2-100/2)
right = left +224
lower = upper + 100
im_cropped = im.crop((left, upper,right,lower))
The output is before cropping (not shown in code):
The touples show the dimensions.
This is the function I was looking for:
from PIL import Image
im ="test.jpg")
crop_rectangle = (50, 50, 200, 200)
cropped_im = im.crop(crop_rectangle)
Taken from another answer
I originally used the accepted answer:
import Image
im =<your image>)
width, height = im.size # Get dimensions
left = (width - new_width)/2
top = (height - new_height)/2
right = (width + new_width)/2
bottom = (height + new_height)/2
# Crop the center of the image
im = im.crop((left, top, right, bottom))
But I came into the problem mentioned by Dean Pospisil
One potential problem with the proposed solution is in the case there
is an odd difference between the desired size, and old size. You can't
have a half pixel on each side. One has to choose a side to put an
extra pixel on.
Dean Pospisil's solution works, I also came up with my own calculation to fix this:
import Image
im =<your image>)
width, height = im.size # Get dimensions
left = round((width - new_width)/2)
top = round((height - new_height)/2)
x_right = round(width - new_width) - left
x_bottom = round(height - new_height) - top
right = width - x_right
bottom = height - x_bottom
# Crop the center of the image
im = im.crop((left, top, right, bottom))
With the accepted answer, an image of 180px x 180px to be cropped to 180px x 101px will result in a cropped image to 180px x 102px.
With my calculation, it will be correctly cropped to 180px x 101px
Crop center and around:
def im_crop_around(img, xc, yc, w, h):
img_width, img_height = img.size # Get dimensions
left, right = xc - w / 2, xc + w / 2
top, bottom = yc - h / 2, yc + h / 2
left, top = round(max(0, left)), round(max(0, top))
right, bottom = round(min(img_width - 0, right)), round(min(img_height - 0, bottom))
return img.crop((left, top, right, bottom))
def im_crop_center(img, w, h):
img_width, img_height = img.size
left, right = (img_width - w) / 2, (img_width + w) / 2
top, bottom = (img_height - h) / 2, (img_height + h) / 2
left, top = round(max(0, left)), round(max(0, top))
right, bottom = round(min(img_width - 0, right)), round(min(img_height - 0, bottom))
return img.crop((left, top, right, bottom))
May be i am late to this party but at least i am here
I want to center crop the image convert 9:16 image to 16:9 portrait to landscape
This is the algo i used :
divide image in 4 equal parts
discard part 1 and part four
Set left to 0, right to width of image
code :
from PIL import Image
im ='main.jpg')
width, height = im.size
if height > width:
h2 = height/2
h4 = h2/2
border = (0, h4, width, h4*3)
cropped_img = im.crop(border)"test.jpg")
before :
I hope this helps
You could use Torchvision's CenterCrop transformation for this. Here's an example
from PIL import Image
from torchvision.transforms import functional as F
crop_size = 256 # can be either an integer or a tuple of ints for (height, width) separately
img =<path_to_your_image>)
cropped_img = F.center_crop(img, crop_size)
F.center_crop works with torch.Tensors or PIL.Images and retains the data type i.e. when input is a PIL.Image then output is also a (cropped) PIL.Image. An added bonus is that the above transformation would automatically apply padding in case the input image size is smaller than the requested crop size.