Segmenting Image In Fixed Blocks - python

Its not about cropping an image in OpenCV. I know how to do it, for example: Image[200:400, 100:300] # Crop from x, y, w, h -> 100, 200, 300, 400. What I am trying to do is create multiple segments of the Image, which do not exceed Image's Width/Height obviously.
So precisely, if an Image is 720x640 resolution,and I need to split this image in multiple blocks, say it 100x100 Fixed blocks, so how to achieve this exactly in OpenCV using Python?

import cv2
def segmentize (image_path, segment_width=200, segment_height=50):
# Croping Formula ==> y:h, x:w
idx, x_axis, x_width, = 1, 0, segment_width
y_axis, y_height = 0, segment_height
img = cv2.imread(image_path)
height, width, dept = img.shape
while y_axis <= height:
while x_axis <= width:
crop = img[y_axis:y_height, x_axis:x_width]
x_axis=x_width
x_width+=segment_width
cropped_image_path = "crop/crop%d.png" % idx
cv2.imwrite(cropped_image_path, crop)
idx+=1
y_axis += segment_height
y_height += segment_height
x_axis, x_width = 0, segment_width

Related

Resizing images without changing annotations

I have a few thousand annotated images that are of different sizes. I would like to resize them to make them the same size. However, this would make the bounding box coordinates of the annotated objects in the images to be invalid. Is there a way, I can resize the image but ensure that the coordinates would still be valid?
Please follow this Resizing image and its bounding box
The snippet from the above link:
import cv2
import numpy as np
def drawBox(boxes, image):
for i in range(0, len(boxes)):
# changed color and width to make it visible
cv2.rectangle(image, (boxes[i][2], boxes[i][3]), (boxes[i][4], boxes[i][5]), (255, 0, 0), 1)
cv2.imshow("img", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
def cvTest():
# imageToPredict = cv2.imread("img.jpg", 3)
imageToPredict = cv2.imread("49466033\\img.png ", 3)
print(imageToPredict.shape)
# Note: flipped comparing to your original code!
# x_ = imageToPredict.shape[0]
# y_ = imageToPredict.shape[1]
y_ = imageToPredict.shape[0]
x_ = imageToPredict.shape[1]
targetSize = 416
x_scale = targetSize / x_
y_scale = targetSize / y_
print(x_scale, y_scale)
img = cv2.resize(imageToPredict, (targetSize, targetSize));
print(img.shape)
img = np.array(img);
# original frame as named values
(origLeft, origTop, origRight, origBottom) = (160, 35, 555, 470)
x = int(np.round(origLeft * x_scale))
y = int(np.round(origTop * y_scale))
xmax = int(np.round(origRight * x_scale))
ymax = int(np.round(origBottom * y_scale))
# Box.drawBox([[1, 0, x, y, xmax, ymax]], img)
drawBox([[1, 0, x, y, xmax, ymax]], img)
cvTest()
Original image :
Resized
Hope this solves your question.

Crop entire image with the same cropping size with PIL in python

I have some problem with my logic on PIL python. My goal is to crop one image entirely in 64x64 size from the left-top corner to botom-right corner position. I can do one time cropping operation, but when I tried to crop an image entirely with looping, I am stuck with the looping case in the middle.
In the first looping, I can crop ((0, 0, 64, 64)). But then I cannot figure the looping part to get the next 64x64s to the left and to the bottom with PIL. As the first 2-tuple is the origin position point, the next tuple is for the cropping size.
any help will be really appreciated as I am starting to learn python.
import os
from PIL import Image
savedir = "E:/Cropped/OK"
filename = "E:/Cropped/dog.jpg"
img = Image.open(filename)
width, height = img.size
start_pos = start_x, start_y = (0,0)
cropped_image_size = w, h = (64, 64)
frame_num = 1
for col_i in range (width):
for row_i in range (height):
x = start_x + col_i*w
y = start_y + row_i*h
crop = img.crop((x, y, x+w*row_i, y+h*col_i))
save_to= os.path.join(savedir, "counter_{:03}.jpg")
crop.save(save_to.format(frame_num))
frame_num += 1
You can use the range() function to do the stepping for you (in blocks of 64 in this case), so that your cropping only involves simple expressions:
import os
from PIL import Image
savedir = "E:/Cropped/OK"
filename = "E:/Cropped/dog.jpg"
img = Image.open(filename)
width, height = img.size
start_pos = start_x, start_y = (0, 0)
cropped_image_size = w, h = (64, 64)
frame_num = 1
for col_i in range(0, width, w):
for row_i in range(0, height, h):
crop = img.crop((col_i, row_i, col_i + w, row_i + h))
save_to= os.path.join(savedir, "counter_{:03}.jpg")
crop.save(save_to.format(frame_num))
frame_num += 1
Other than that, your code works as expected.

How to resize text for cv2.putText according to the image size in OpenCV, Python?

fontScale = 1
fontThickness = 1
# make sure font thickness is an integer, if not, the OpenCV functions that use this may crash
fontThickness = int(fontThickness)
upperLeftTextOriginX = int(imageWidth * 0.05)
upperLeftTextOriginY = int(imageHeight * 0.05)
textSize, baseline = cv2.getTextSize(resultText, fontFace, fontScale, fontThickness)
textSizeWidth, textSizeHeight = textSize
# calculate the lower left origin of the text area based on the text area center, width, and height
lowerLeftTextOriginX = upperLeftTextOriginX
lowerLeftTextOriginY = upperLeftTextOriginY + textSizeHeight
# write the text on the image
cv2.putText(openCVImage, resultText, (lowerLeftTextOriginX, lowerLeftTextOriginY), fontFace, fontScale, Color,
fontThickness)
It seems fontScale does not scale text according to the image width and height because the text is almost in the same size for different sized images. So how can I resize the text according to the image size so that all the text could fit in the image?
Here is the solution that will fit the text inside your rectangle. If your rectangles are of variable width, then you can get the font scale by looping through the potential scales and measuring how much width (in pixels) would your text take. Once you drop below your rectangle width you can retrieve the scale and use it to actually putText:
def get_optimal_font_scale(text, width):
for scale in reversed(range(0, 60, 1)):
textSize = cv.getTextSize(text, fontFace=cv.FONT_HERSHEY_DUPLEX, fontScale=scale/10, thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
print(new_width)
return scale/10
return 1
for this worked!
scale = 1 # this value can be from 0 to 1 (0,1] to change the size of the text relative to the image
fontScale = min(imageWidth,imageHeight)/(25/scale)
just keep in mind that the font type can affect the 25 constant
Approach
One way to approach this is to scale the font size proportionally to the size of the image. In my experience, more natural results are obtained when applying this not only to fontScale, but also to thickness. For example:
import math
import cv2
FONT_SCALE = 2e-3 # Adjust for larger font size in all images
THICKNESS_SCALE = 1e-3 # Adjust for larger thickness in all images
img = cv2.imread("...")
height, width, _ = img.shape
font_scale = min(width, height) * FONT_SCALE
thickness = math.ceil(min(width, height) * THICKNESS_SCALE)
Example
Let's take this free-to-use stock photo as an example. We create two versions of the base image by rescaling to a width of 2000px and 600px (keeping the aspect ratio constant). With the approach above, text looks appropriately sized to the image size in both cases (here shown in an illustrative use case where we label bounding boxes):
2000px
600px
Full code to reproduce (but note: input images have to be preprocessed):
import math
import cv2
FONT_SCALE = 2e-3 # Adjust for larger font size in all images
THICKNESS_SCALE = 1e-3 # Adjust for larger thickness in all images
TEXT_Y_OFFSET_SCALE = 1e-2 # Adjust for larger Y-offset of text and bounding box
img_width_to_bboxes = {
2000: [
{"xywh": [120, 400, 1200, 510], "label": "car"},
{"xywh": [1080, 420, 790, 340], "label": "car"},
],
600: [
{"xywh": [35, 120, 360, 155], "label": "car"},
{"xywh": [325, 130, 235, 95], "label": "car"},
],
}
def add_bbox_and_text() -> None:
for img_width, bboxes in img_width_to_bboxes.items():
# Base image from https://www.pexels.com/photo/black-suv-beside-grey-auv-crossing-the-pedestrian-line-during-daytime-125514/
# Two rescaled versions of the base image created with width of 600px and 2000px
img = cv2.imread(f"pexels-kaique-rocha-125514_{img_width}.jpg")
height, width, _ = img.shape
for bbox in bboxes:
x, y, w, h = bbox["xywh"]
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(
img,
bbox["label"],
(x, y - int(height * TEXT_Y_OFFSET_SCALE)),
fontFace=cv2.FONT_HERSHEY_TRIPLEX,
fontScale=min(width, height) * FONT_SCALE,
thickness=math.ceil(min(width, height) * THICKNESS_SCALE),
color=(0, 255, 0),
)
cv2.imwrite(f"pexels-kaique-rocha-125514_{img_width}_with_text.jpg", img)
if __name__ == "__main__":
add_bbox_and_text()
If you take fontScale = 1 for images with size approximately 1000 x 1000, then this code should scale your font correctly.
fontScale = (imageWidth * imageHeight) / (1000 * 1000) # Would work best for almost square images
If you are still having any problem, do comment.
I implemented a function to find best fitted centered location for text.
Take a look if these codes help you.
def findFontLocate(s_txt, font_face, font_thick, cv_bgd):
best_scale = 1.0
bgd_w = cv_bgd.shape[1]
bgd_h = cv_bgd.shape[0]
txt_rect_w = 0
txt_rect_h = 0
baseline = 0
for scale in np.arange(1.0, 6.0, 0.2):
(ret_w, ret_h), tmp_bsl = cv2.getTextSize(
s_txt, font_face, scale, font_thick)
tmp_w = ret_w + 2 * font_thick
tmp_h = ret_h + 2 * font_thick + tmp_bsl
if tmp_w >= bgd_w or tmp_h >= bgd_h:
break
else:
baseline = tmp_bsl
txt_rect_w = tmp_w
txt_rect_h = tmp_h
best_scale = scale
lt_x, lt_y = round(bgd_w/2-txt_rect_w/2), round(bgd_h/2-txt_rect_h/2)
rb_x, rb_y = round(bgd_w/2+txt_rect_w/2), round(bgd_h/2+txt_rect_h/2)-baseline
return (lt_x, lt_y, rb_x, rb_y), best_scale, baseline
Note that, the function accept four arguments: s_txt(string to render), font_face, font_thick and cv_bgd(background image in ndarray format)
When you putText(), write codes as following:
cv2.putText(
cv_bgd, s_txt, (lt_x, rb_y), font_face,
best_scale, (0,0,0), font_thick, cv2.LINE_AA)
You can use get_optimal_font_scale function as bellow, to adjust font size according to the image size:
def get_optimal_font_scale(text, width):
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10, thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1
fontScale = 3*(img.shape[1]//6)
font_size = get_optimal_font_scale(text, fontScale)
cv2.putText(img, text, org, font, font_size, color, thickness, cv2.LINE_AA)
You can change fontScale for your image.
It`s work for me.
double calc_scale_rectbox(const char *txt, int box_width, int box_height,
cv::Size &textSize, int &baseline)
{
if (!txt) return 1.0;
double scale = 2.0;
double w_aprx = 0;
double h_aprx = 0;
do
{
textSize = cv::getTextSize(txt, FONT_HERSHEY_DUPLEX, scale, 2,
&baseline);
w_aprx = textSize.width * 100 / box_width;
h_aprx = textSize.height * 100 / box_height;
scale -= 0.1;
} while (w_aprx > 50 || h_aprx > 50);
return scale;
}
......
cv::Size textSize;
int baseline = 0;
double scale = calc_scale_rectbox(win_caption.c_str(), width,
height, textSize, baseline);
cv::putText(img, win_caption, Point(width / 2 - textSize.width / 2,
(height + textSize.height - baseline + 2) / 2),
FONT_HERSHEY_DUPLEX, scale, CV_RGB(255, 255, 255), 2);
A simple utility function:
def optimal_font_dims(img, font_scale = 2e-3, thickness_scale = 5e-3):
h, w, _ = img.shape
font_scale = min(w, h) * font_scale
thickness = math.ceil(min(w, h) * thickness_scale)
return font_scale, thickness
Usage:
font_scale, thickness = optimal_font_dims(image)
cv2.putText(image, "LABEL", (x, y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (255,0,0), thickness)

Cropping logos from white paper

I have a huge dataset of images having some logos at arbitrary places on white paper. How to retrieve coordinates (top left and bottom right) of object from the image using python?
For ex, consider this image
http://ak9.picdn.net/shutterstock/videos/5360279/thumb/3.jpg (ignore shadow)
I want to highlight egg in the image.
EDIT:
Images are hi-res & very huge in count so iterative solution takes a good amount of time. One thing i missed is that images are stored in 1-bit mode. So i think we can get better solution using numpy.
If the rest of the picture is one colour you can compare each pixel and find a different colour indicating the start of the picture like this please pay attention that I assume the top right hand corner to be the background colour, if this is not always the case, use a different approach (counting mode pixel colour for instance)!:
import numpy as np
from PIL import Image
import pprint
def get_y_top(pix, width, height, background, difference):
back_np = np.array(background)
for y in range(0, height):
for x in range(0, width):
if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
return y
def get_y_bot(pix, width, height, background, difference):
back_np = np.array(background)
for y in range(height-1, -1, -1):
for x in range(0, width):
if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
return y
def get_x_left(pix, width, height, background, difference):
back_np = np.array(background)
for x in range(0, width):
for y in range(0, height):
if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
return x
def get_x_right(pix, width, height, background, difference):
back_np = np.array(background)
for x in range(width-1, -1, -1):
for y in range(0, height):
if max(np.abs(np.array(pix[x, y]) - back_np)) > difference:
return x
img = Image.open('test.jpg')
width, height = img.size
pix = img.load()
background = pix[0,0]
difference = 20 #or whatever works for you here, use trial and error to establish this number
y_top = get_y_top(pix, width, height, background, difference)
y_bot = get_y_bot(pix, width, height, background, difference)
x_left = get_x_left(pix, width, height, background, difference)
x_right = get_x_right(pix, width, height, background, difference)
Using this information you can crop your image and save:
img = img.crop((x_left,y_top,x_right,y_bot))
img.save('test3.jpg')
Resulting in this:
For this image(the egg on the white bg):
Your can crop in the following steps:
Read and convert to gray
Threshold and Invert
Find the extreme coordinates and crop
The egg image, size of (480, 852, 3), costs 0.016s.
The code:
## Time passed: 0.016 s
#!/usr/bin/python3
# 2018/04/10 19:39:14
# 2018/04/10 20:25:36
import cv2
import numpy as np
import matplotlib.pyplot as plt
import time
ts = time.time()
## 1. Read and convert to gray
fname = "egg.jpg"
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## 2. Threshold and Invert
th, dst = cv2.threshold(gray, 240, 255, cv2.THRESH_BINARY_INV)
## 3. Find the extreme coordinates and crop
ys, xs = np.where(dst>0)
target = img[ys.min():ys.max(), xs.min():xs.max()]
te = time.time()
print("Time passed: {:.3f} s".format(te-ts))
plt.imshow(target)
plt.show()
## Time passed: 0.016 s

Finding all the bounding rectangles of all non-transparent regions in PIL

I have a transparent-background image with some non-transparent text.
And I want to find all the bounding boxes of each individual word in the text.
Here is the code about creating a transparent image and draw some text ("Hello World", for example) , after that, do affine transform and thumbnail it.
from PIL import Image, ImageFont, ImageDraw, ImageOps
import numpy as np
fontcolor = (255,255,255)
fontsize = 180
# padding rate for setting the image size of font
fimg_padding = 1.1
# check code bbox padding rate
bbox_gap = fontsize * 0.05
# Rrotation +- N degree
# Choice a font type for output---
font = ImageFont.truetype('Fonts/Bebas.TTF', fontsize)
# the text is "Hello World"
code = "Hello world"
# Get the related info of font---
code_w, code_h = font.getsize(code)
# Setting the image size of font---
img_size = int((code_w) * fimg_padding)
# Create a RGBA image with transparent background
img = Image.new("RGBA", (img_size,img_size),(255,255,255,0))
d = ImageDraw.Draw(img)
# draw white text
code_x = (img_size-code_w)/2
code_y = (img_size-code_h)/2
d.text( ( code_x, code_y ), code, fontcolor, font=font)
# img.save('initial.png')
# Transform the image---
img = img_transform(img)
# crop image to the size equal to the bounding box of whole text
alpha = img.split()[-1]
img = img.crop(alpha.getbbox())
# resize the image
img.thumbnail((512,512), Image.ANTIALIAS)
# img.save('myimage.png')
# what I want is to find all the bounding box of each individual word
boxes=find_all_bbx(img)
Here is the code about affine transform (provided here for those who want to do some experiment)
def find_coeffs(pa, pb):
matrix = []
for p1, p2 in zip(pa, pb):
matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])
matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])
A = np.matrix(matrix, dtype=np.float)
B = np.array(pb).reshape(8)
res = np.dot(np.linalg.inv(A.T * A) * A.T, B)
return np.array(res).reshape(8)
def rand_degree(st,en,gap):
return (np.fix(np.random.random()* (en-st) * gap )+st)
def img_transform(img):
width, height = img.size
print img.size
m = -0.5
xshift = abs(m) * width
new_width = width + int(round(xshift))
img = img.transform((new_width, height), Image.AFFINE,
(1, m, -xshift if m > 0 else 0, 0, 1, 0), Image.BICUBIC)
range_n = width*0.2
gap_n = 1
x1 = rand_degree(0,range_n,gap_n)
y1 = rand_degree(0,range_n,gap_n)
x2 = rand_degree(width-range_n,width,gap_n)
y2 = rand_degree(0,range_n,gap_n)
x3 = rand_degree(width-range_n,width,gap_n)
y3 = rand_degree(height-range_n,height,gap_n)
x4 = rand_degree(0,range_n,gap_n)
y4 = rand_degree(height-range_n,height,gap_n)
coeffs = find_coeffs(
[(x1, y1), (x2, y2), (x3, y3), (x4, y4)],
[(0, 0), (width, 0), (new_width, height), (xshift, height)])
img = img.transform((width, height), Image.PERSPECTIVE, coeffs, Image.BICUBIC)
return img
How to implement find_all_bbx to find the bounding box of each individual word?
For example, one of the box can be found in 'H' ( you can download the image to see the partial result).
For what you want to do you need to label the individual words and then compute the bounding box of each object with the same label.
The most straigh forward approach here is just taking the min and max positions of the pixels that make up that word.
The labeling is a little bit more difficult. For example you could use a morphological operation to combine the letters of the words (morphological opening, see PIL documentation) and then use ImageDraw.floodfill. Or you could try to anticipate the positions of the words from the position where you first draw the text
code_x and code_y
and the chosen font and size of the letters and the spacing (this will trickier I think).

Categories

Resources