How do I measure the bounds of a string in wand? - python

I'm using Wand to generate a JPG with custom variable text inside it.
I have an array of strings all with the same width but different heights.
Is there a method to word wrap a long text inside a boundary or calculate the height needed for the text so when drawing the texts from the array they don't overlap.
with Drawing() as ctx:
with Image(width=1080, height=1080, background=Color("WHITE")) as img:
with Drawing() as draw:
for i,line in enumerate(lines):
metrics = draw.get_font_metrics(img, line, multiline=True)
draw.text(x=150, y=120+(i*35)+int(metrics.text_height), body=line)
draw(img)
img.sample(1080, 1080)
img.save(filename="output.png")

This may not be the answer(s) your looking for, but will hopefully get you on the right path.
How do I measure the bounds of a string in wand?
Your already doing it. Rather than a "smart-n-quick" one-liner approach, I would suggest a more classic offset & accumulator approach to map positions that update with each iteration.
top_margin = 120
line_offset = 0
line_padding = 35
with Drawing() as ctx:
with Image(width=1080, height=1080, background=Color("WHITE")) as img:
with Drawing() as draw:
for i,line in enumerate(lines):
metrics = draw.get_font_metrics(img, line, multiline=True)
draw.text(x=150, y=y=top_margin + line_offset, body=line)
line_offset += int(metrics.text_height) + line_padding
Is there a method to word wrap a long text inside a boundary or calculate the height needed for the text so when drawing the texts from the array they don't overlap.
The short answer is no. You would be responsible for implementing the algorithm. Luckily the internet is full of examples & research articles that can be referenced. It can be as basic as find-the-last-space-before-overflow...
lines = [
'I\'m using Wand to generate a JPG with custom variable text inside it.',
'I have an array of strings all with the same width but different heights',
'Is there a method to word wrap a long text inside a boundary or calculate the height needed for the text so when drawing the texts from the array they don\'t overlap',
]
image_width = 540
image_height = 540
left_margin = 150
right_margin = image_width - left_margin * 2
top_margin = 120
line_padding = 35
line_offset = 0
with Drawing() as ctx:
with Image(width=image_width, height=image_height, background=Color("LIGHTCYAN")) as img:
with Drawing() as draw:
for i,line in enumerate(lines):
metrics = draw.get_font_metrics(img, line, multiline=True)
last_idx = 1
# Do we need to do work?
while metrics.text_width > right_margin:
last_breakpoint=0
# Scan text for possible breakpoints.
for idx in range(last_idx, len(line)):
if line[idx] == ' ':
last_breakpoint = idx
else:
# Determine if we need to insert a breakpoint.
metrics = draw.get_font_metrics(img, line[:idx], multiline=True)
if metrics.text_width >= right_margin:
line = line[:last_breakpoint].strip() + '\n' + line[last_breakpoint:].strip()
last_idx = last_breakpoint
break
# Double check any modifications to text was successful enough.
metrics = draw.get_font_metrics(img, line, multiline=True)
draw.text(x=left_margin, y=top_margin + line_offset, body=line)
line_offset += int(metrics.text_height) + line_padding
draw(img)
img.save(filename="output.png")
The above code could be optimized, and Python might already include some better methods.
Further reading...
The source code ImageMagick's CAPTION: protocol is a good example. The algorithm repeatedly calls GetMultilineTypeMetrics as well as FormatMagickCaption to adjust pointsize & insert line-breaks.
The wand library doesn't really support the caption protocol, but you can play-around with it by using the following workaround.
from wand.api import library
# ...
with Image(width=image_width, height=image_height, background=Color("LIGHTCYAN")) as img:
for i,line in enumerate(lines):
# Create a tempory image for each bounding box
with Image() as throwaway:
library.MagickSetSize(throwaway.wand, right_margin, line_padding)
throwaway.read(filename='CAPTION:'+line)
img.composite(throwaway, left_margin, top_margin + line_offset)
line_offset += line_padding + throwaway.height
img.save(filename="output.png")

Related

Scanning lists more efficiently in python

I have some code, which works as intended, however takes about 4 and a half hours to run, I understand that there are about 50 billion calculations my poor pc needs to do but I thought it would be worth asking!
This code gets an image, and wants to find every possible region of 331*331 pixels in the given image, and find how many black pixels there are in each, I will use this data to create a heatmap of black pixel density, and also a list of all of the values found:
image = Image.open(self.selectedFile)
pixels = list(image.getdata())
width, height = image.size
pixels = [pixels[i * width:(i+1) * width] for i in range(height)]
#print(pixels)
rightShifts = width - 331
downShifts = height - 331
self.totalRegionsLabel['text'] = f'Total Regions: {rightShifts * downShifts}'
self.blackList = [0 for i in range(0, rightShifts*downShifts)]
self.heatMap = [[] for i in range(0, downShifts)]
for x in range(len(self.heatMap)):
self.heatMap[x] = [0 for i in range(0, rightShifts)]
for x in range(rightShifts):
for y in range(downShifts):
blackCount = 0
for z in range(x + 331):
for w in range(y + 331):
if pixels[z][w] == 0:
blackCount += 1
self.blackList[x+1*y] = blackCount
self.heatMap[x][y] = blackCount
print(self.blackList)
You have several problems here, as I pointed out. Your z/w loops are always starting at the upper left, so by the time you get towards the end, you're summing the entire image, not just a 331x331 subset. You also have much confusion in your axes. In an image, [y] is first, [x] is second. An image is rows of columns. You need to remember that.
Here's an implementation as I suggested above. For each column, I do a full sum on the top 331x331 block. Then, for every row below, I just subtract the top row and add the next row below.
self.heatMap = [[0]*rightShifts for i in range(downShifts)]
for x in range(rightShifts):
# Sum up the block at the top.
blackCount = 0
for row in range(331):
for col in range(331):
if pixels[row][x+col] == 0:
blackCount += 1
self.heatMap[0][x] = blackCount
for y in range(1,downShifts):
# To do the next block down, we subtract the top row and
# add the bottom.
for col in range(331):
blackCount += pixels[y+330][x+col] - pixels[y-1][x+col]
self.heatMap[y][x] = blackCount
You could tweak this even more by alternating the columns. So, at the bottom of the first column, scoot to the right by subtracting the first column and adding the next new column. then scoot back up to the top. That's a lot more trouble.
The two innermost for-loops seem to be transformable to some numpy code if using this package is not an issue. It would give something like:
pixels = image.get_data() # it is probably already a numpy array
# Get an array filled with either True or False, with True whenever pixel is black:
pixel_is_black = (pixels[x:(x+331), y:(y+331)] == 0)
pixel_is_black *= 1 # Transform True and False to respectively 1 and 0. Maybe not needed
self.blackList[x+y] = pixel_is_black.sum() # self explanatory
This is the simplest optimization I can think of, you probably can do much better with clever numpy tricks.
I would recommend using some efficient vector computations through the numpy and opencv libraries.
First, binarize your image so that black pixels are set to zero, and any other color pixels (gray to white) are set to 1. Then, apply a 2D filter to the image of shape 331 x 331 where each value in the filter kernel is (1 / (331 x 331) - this will take the average of all the values in each 331x331 area and assign it to the center pixel.
This gives you a heatmap, where each pixel value is the proportion of non-black pixels in the surrounding 331 x 331 region. A darker pixel (value closer to zero) means more pixels in that region are black.
For some background, this approach uses image processing techniques called image binarization and box blur
Example code:
import cv2
import numpy as np
# setting up a fake image, with some white spaces, gray spaces, and black spaces
img_dim = 10000
fake_img = np.full(shape=(img_dim, img_dim), fill_value=255, dtype=np.uint8) # white
fake_img[: img_dim // 3, : img_dim // 3] = 0 # top left black
fake_img[2 * img_dim // 3 :, 2 * img_dim // 3 :] = 0 # bottom right black
fake_img[img_dim // 3 : 2 * img_dim // 3, img_dim // 3 : 2 * img_dim // 3] = 127 # center gray
# show the fake image
cv2.imshow("", fake_img)
cv2.waitKey()
cv2.destroyAllWindows()
# solution to your problem
binarized = np.where(fake_img == 0, 0, 1) # have 0 values where black, 1 values else
my_filter = np.full(shape=(331, 331), fill_value=(1 / (331 * 331))) # set up filter
heatmap = cv2.filter2D(fake_img, 1, my_filter) # apply filter, which takes average of values in 331x331 block
# show the heatmap
cv2.imshow("", heatmap)
cv2.waitKey()
cv2.destroyAllWindows()
I ran this on my laptop, with a huge (fake) image of 10000 x 10000 pixels, almost instantly.
Sorry I should have deleted this post before you all put the effort in, however, some of these workarounds are really smart and interesting, I ended up coming up with a solution independently that is the same as what Tim Robbers first suggested, I used the array I had and built a second one on which every item in a row is the number of black cells preceding it, and then for each row in a region instead of scanning every item, just scan the preceding value and the final value and you are good:
image = Image.open(self.selectedFile).convert('L') #convert to luminance mode as RGB information is irrelevant
pixels = list(image.getdata()) #get the value of every pixel in the image
width, height = image.size
pixels = [pixels[i * width:(i+1) * width] for i in range(height)] #split the pixels array into a two dimensional array with the dimensions to match the image
#This program scans every possible 331*331 square starting from the top left, so it will move right width - 331 pixels and down height - 331 pixels
rightShifts = width - 331
downShifts = height - 331
self.totalRegionsLabel['text'] = f'Total Regions: {rightShifts * downShifts}' #This wont update till the function has completed running
#The process of asigning new values to values in an array is faster than appending them so this is why I prefilled the arrays:
self.heatMap = [[] for i in range(0, downShifts)]
for x in range(len(self.heatMap)):
self.heatMap[x] = [0 for i in range(0, rightShifts)]
cumulativeMatrix = [] #The cumulative matrix replaces each value in each row with how many zeros precede it
for y in range(len(pixels)):
cumulativeMatrix.append([])
cumulativeMatrix[y].append(0)
count = 0
for x in range(len(pixels[y])):
if pixels[y][x] == 0:
count += 1
cumulativeMatrix[y].append(count)
regionCount = 0
maxValue = 0 #this is the lowest possible maximum value
minValue = 109561 #this is the largest possible minimum value
self.blackList = []
#loop through all possible regions
for y in range(downShifts):
for x in range(rightShifts):
blackPixels = 0
for regionY in range(y, y + 331):
lowerLimit = cumulativeMatrix[regionY][x]
upperLimit = cumulativeMatrix[regionY][x+332]
blackPixels += (upperLimit - lowerLimit)
if blackPixels > maxValue:
maxValue = blackPixels
if blackPixels < minValue:
minValue = blackPixels
self.blackList.append(blackPixels)
self.heatMap[y][x] = blackPixels
regionCount += 1
This brought run time to under a minute and thus solved my problem, however, thank you for your contributions I have learned a lot from reading them!
Try to look into the map() function. It uses C to streamline iterations.
You can speed up your for loops like this:
pixels = list(map(lambda i: x[i*width:(i+1)*width], range(height)))

How to code up an image stitching software for these 'simple' images?

TLDR:
Need help trying to calculate overlap region between 2 graphs.
So I'm trying to stitch these 2 images:
Since I know that the images I will be stitching definitely come from the same image, I feel that I should be able to code this up myself. Using libraries like OpenCV feels a little like overkill for me for this task.
My current idea is that I can simplify this task by doing the following steps for each image:
Load image using PIL
Convert image to black and white (PIL image mode ā€œLā€)
[Optional: crop images to overlapping region by inspection by eye]
Create vector row_sum, which is a sum of each row
[Optional: log row_sum, to reduce the size of values we're working with]
Plot row_sum.
This would reduce the (potentially) (3*2)-dimensional problem, with 3 RGB channels for each pixel on the 2D image to a (1*2)-D problem with the black and white pixel for the 2D image instead. Then, summing across the rows reduces this to a 1D problem.
I used the following code to implement the above:
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
class Stitcher():
def combine_2(self, img1, img2):
# thr1, thr2 = self.get_cropped_bw(img1, 115, img2, 80)
thr1, thr2 = self.get_cropped_bw(img1, 0, img2, 0)
row_sum1 = np.log(thr1.sum(1))
row_sum2 = np.log(thr2.sum(1))
self.plot_4x4(thr1, thr2, row_sum1, row_sum2)
def get_cropped_bw(self, img1, img1_keep_from, img2, img2_keep_till):
im1 = Image.open(img1).convert("L")
im2 = Image.open(img2).convert("L")
data1 = (np.array(im1)[img1_keep_from:]
if img1_keep_from != 0 else np.array(im1))
data2 = (np.array(im2)[:img2_keep_till]
if img2_keep_till != 0 else np.array(im2))
return data1, data2
def plot_4x4(self, thr1, thr2, row_sum1, row_sum2):
fig, ax = plt.subplots(2, 2, sharey="row", constrained_layout=True)
ax[0, 0].imshow(thr1, cmap="Greys")
ax[0, 1].imshow(thr2, cmap="Greys")
ax[1, 0].plot(row_sum1, "k.")
ax[1, 1].plot(row_sum2, "r.")
ax[1, 0].set(
xlabel="Index Value",
ylabel="Row Sum",
)
plt.show()
imgs = (r"combine\imgs\test_image_part_1.jpg",
r"combine\imgs\test_image_part_2.jpg")
s = Stitcher()
s.combine_2(*imgs)
This gave me this graph:
(I've added in those yellow boxes, to indicate the overlap regions.)
This is the bit I'm stuck at. I want to find exactly:
the index value of the left-side of the yellow box for the 1st image and
the index value of the right-side of the yellow box for the 2nd image.
I define the overlap region as the longest range for which the end of the 1st graph 'matches' the start of the 2nd graph. For the method to find the overlap region, what should I do if the row sum values aren't exactly the same (what if one is the other scaled by some factor)?
I feel like this could be a problem that could use dot products to find the similarity between the 2 graphs? But I can't think of how to implement this.
I had a lot more fun with this than I expected. I wrote this using opencv, but that's just to load and show the image. Everything else is done with numpy so swapping this to PIL shouldn't be too difficult.
I'm using a brute-force matcher. I also wrote a random-start hillclimber that runs in much less time, but I can't guarantee it'll find the correct answer since the gradient space isn't smooth. I won't include it in my code since it's long and janky, but if you really need the time efficiency I can add it back in later.
I added a random crop and some salt and pepper noise to the images to test for robustness.
The brute-force matcher operates on the idea that we don't know which section of the two images overlap, so we need to convolve the smaller image over the larger image from left to right, top to bottom. This means our search space is:
horizontal = small_width + big_width
vertical = small_height + big_height
area = horizontal * vertical
This will grow very quickly with image size. I motivate the algorithm by giving it points for having a larger overlap, but it loses more points for having differences in color for the overlapped area.
Here are some pictures from an execution of this program
import cv2
import numpy as np
import random
# randomly snips edges
def randCrop(image, maxMargin):
c = [random.randint(0,maxMargin) for a in range(4)];
return image[c[0]:-c[1], c[2]:-c[3]];
# adds noise to image
def saltPepper(image, minNoise, maxNoise):
h,w = image.shape;
randNum = random.randint(minNoise, maxNoise);
for a in range(randNum):
x = random.randint(0, w-1);
y = random.randint(0, h-1);
image[y,x] = random.randint(0, 255);
return image;
# evaluate layout
def getScore(one, two):
# do raw subtraction
left = one - two;
right = two - one;
sub = np.minimum(left, right);
return np.count_nonzero(sub);
# return 2d random position within range
def randPos(img, big_shape):
th,tw = big_shape;
h,w = img.shape;
x = random.randint(0, tw - w);
y = random.randint(0, th - h);
return [x,y];
# overlays small image onto big image
def overlay(small, big, pos):
# unpack
h,w = small.shape;
x,y = pos;
# copy and place
copy = big.copy();
copy[y:y+h, x:x+w] = small;
return copy;
# calculates overlap region
def overlap(one, two, pos_one, pos_two):
# unpack
h1,w1 = one.shape;
h2,w2 = two.shape;
x1,y1 = pos_one;
x2,y2 = pos_two;
# set edges
l1 = x1;
l2 = x2;
r1 = x1 + w1;
r2 = x2 + w2;
t1 = y1;
t2 = y2;
b1 = y1 + h1;
b2 = y2 + h2;
# go
left = max(l1, l2);
right = min(r1, r2);
top = max(t1, t2);
bottom = min(b1, b2);
return [left, right, top, bottom];
# wrapper for overlay + getScore
def fullScore(one, two, pos_one, pos_two, big_empty):
# check positions
x,y = pos_two;
h,w = two.shape;
th,tw = big_empty.shape;
if y+h > th or x+w > tw or x < 0 or y < 0:
return -99999999;
# overlay
temp_one = overlay(one, big_empty, pos_one);
temp_two = overlay(two, big_empty, pos_two);
# get overlap
l,r,t,b = overlap(one, two, pos_one, pos_two);
temp_one = temp_one[t:b, l:r];
temp_two = temp_two[t:b, l:r];
# score
diff = getScore(temp_one, temp_two);
score = (r-l) * (b-t);
score -= diff*2;
return score;
# do brute force
def bruteForce(one, two):
# calculate search space
# unpack size
h,w = one.shape;
one_size = h*w;
h,w = two.shape;
two_size = h*w;
# small and big
if one_size < two_size:
small = one;
big = two;
else:
small = two;
big = one;
# unpack size
sh, sw = small.shape;
bh, bw = big.shape;
total_width = bw + sw * 2;
total_height = bh + sh * 2;
# set up empty images
empty = np.zeros((total_height, total_width), np.uint8);
# set global best
best_score = -999999;
best_pos = None;
# start scrolling
ybound = total_height - sh;
xbound = total_width - sw;
for y in range(ybound):
print("y: " + str(y) + " || " + str(empty.shape));
for x in range(xbound):
# get score
score = fullScore(big, small, [sw,sh], [x,y], empty);
# show
# prog = overlay(big, empty, [sw,sh]);
# prog = overlay(small, prog, [x,y]);
# cv2.imshow("prog", prog);
# cv2.waitKey(1);
# compare
if score > best_score:
best_score = score;
best_pos = [x,y];
print("best_score: " + str(best_score));
return best_pos, [sw,sh], small, big, empty;
# do a step of hill climber
def hillStep(one, two, best_pos, big_empty, step):
# make a step
new_pos = best_pos[1][:];
new_pos[0] += step[0];
new_pos[1] += step[1];
# get score
return fullScore(one, two, best_pos[0], new_pos, big_empty), new_pos;
# hunt around for good position
# let's do a random-start hillclimber
def randHill(one, two, shape):
# set up empty images
big_empty = np.zeros(shape, np.uint8);
# set global best
g_best_score = -999999;
g_best_pos = None;
# lets do 200 iterations
iters = 200;
for a in range(iters):
# progress check
print(str(a) + " of " + str(iters));
# start with random position
h,w = two.shape[:2];
pos_one = [w,h];
pos_two = randPos(two, shape);
# get score
best_score = fullScore(one, two, pos_one, pos_two, big_empty);
best_pos = [pos_one, pos_two];
# hill climb (only on second image)
while True:
# end condition: no step improves score
end_flag = True;
# 8-way
for y in range(-1, 1+1):
for x in range(-1, 1+1):
if x != 0 or y != 0:
# get score and update
score, new_pos = hillStep(one, two, best_pos, big_empty, [x,y]);
if score > best_score:
best_score = score;
best_pos[1] = new_pos[:];
end_flag = False;
# end
if end_flag:
break;
else:
# show
# prog = overlay(one, big_empty, best_pos[0]);
# prog = overlay(two, prog, best_pos[1]);
# cv2.imshow("prog", prog);
# cv2.waitKey(1);
pass;
# check for new global best
if best_score > g_best_score:
g_best_score = best_score;
g_best_pos = best_pos[:];
print("top score: " + str(g_best_score));
return g_best_score, g_best_pos;
# load both images
top = cv2.imread("top.jpg");
bottom = cv2.imread("bottom.jpg");
top = cv2.cvtColor(top, cv2.COLOR_BGR2GRAY);
bottom = cv2.cvtColor(bottom, cv2.COLOR_BGR2GRAY);
# randomly crop
top = randCrop(top, 20);
bottom = randCrop(bottom, 20);
# randomly add noise
saltPepper(top, 200, 1000);
saltPepper(bottom, 200, 1000);
# set up max image (assume no overlap whatsoever)
tw = 0;
th = 0;
h, w = top.shape;
tw += w;
th += h;
h, w = bottom.shape;
tw += w*2;
th += h*2;
# do random-start hill climb
_, best_pos = randHill(top, bottom, (th, tw));
# show
empty = np.zeros((th, tw), np.uint8);
pos1, pos2 = best_pos;
image = overlay(top, empty, pos1);
image = overlay(bottom, image, pos2);
# do brute force
# small_pos, big_pos, small, big, empty = bruteForce(top, bottom);
# image = overlay(big, empty, big_pos);
# image = overlay(small, image, small_pos);
# recolor overlap
h,w = empty.shape;
color = np.zeros((h,w,3), np.uint8);
l,r,t,b = overlap(top, bottom, pos1, pos2);
color[:,:,0] = image;
color[:,:,1] = image;
color[:,:,2] = image;
color[t:b, l:r, 0] += 100;
# show images
cv2.imshow("top", top);
cv2.imshow("bottom", bottom);
cv2.imshow("overlayed", image);
cv2.imshow("Color", color);
cv2.waitKey(0);
Edit: I added in the random-start hillclimber

TypeError: Int object is not iterable

I am trying to generate dataset for OCR using different fonts, but upon a certain for loop, the iteration giving me and error Typeerror: int object is not iterable.
I have searched enough to conclude that most of the answers on StackOverFlow suggests to use the range in my for loop including (len) but I am not sure if I follow that.
The function is as follows:
def gen_rand_string_data(data_count,
min_char_count=3,
max_char_count=8,
max_char=16,
x_pos='side',
img_size=(32, 256, 1),
font=cv2.FONT_HERSHEY_SIMPLEX,
font_scale=np.arange(0.7, 1, 0.1),
thickness=range(1, 3, 1)):
'''
random string data generation
'''
start_time = dt.datetime.now()
images = []
labels = []
color = (255, 255, 255)
count = 0
char_list = list(string.ascii_letters) \
+ list(string.digits) \
+ list(' ')
while (1):
for fs in font_scale:
for thick in thickness:
for f in font:
img = np.zeros(img_size, np.uint8)
char_count = np.random.randint(min_char_count, \
(max_char_count + 1))
rand_str = ''.join(np.random.choice(char_list, \
char_count))
# generate image data
text_size = cv2.getTextSize(rand_str, f, fs, thick)[0]
if x_pos == 'side':
org_x = 0
else:
org_x = (img_size[1] - text_size[0]) // 2
org_y = (img_size[0] + text_size[1]) // 2
cv2.putText(img, rand_str, (org_x, org_y), f, fs, \
color, thick, cv2.LINE_AA)
label = list(rand_str) + [' '] \
* (max_char - len(rand_str))
for i, t in enumerate(label):
label[i] = char_list.index(t)
label = np.uint8(label)
images.append(img)
labels.append(label)
count += 1
if count == data_count:
break
else:
continue
break
else:
continue
break
else:
continue
break
end_time = dt.datetime.now()
print("time taken to generate data", end_time - start_time)
return images, labels
The error raised is at line : for f in font:
What am I doing wrong here? Do I have to use the range()?
font=cv2.FONT_HERSHEY_SIMPLEX
for f in font:
...
In CV2, the font is a simple integer representing the font itself, I'm not entirely sure(a) why you're trying to iterate over it.
If you wanted to iterate over sizes of the font, you would have to use (for example) the fontScale parameter of putText().
If you want to iterate over a collection of fonts, you have to provide that collection, such as with one of:
font = [cv2.FONT_HERSHEY_SIMPLEX] # one font as a collection
font = [cv2.FONT_HERSHEY_SIMPLEX, cv2.FONT_HERSHEY_PLAIN] # two fonts
If you only have the one font, then don't iterate over it at all. Get rid of the for f in font line (unindenting the stuff currently "inside" it) and just use font wherever you're currently using f.
(a) Python is having similar troubles trying to figure out your intent :-)

How to use my own bitmap font in PIL.ImageFont?

I created a bitmap font, basically a 256x256 png image where each character occupies 8x8 tile. I want to use it with Pillow as ImageFont but there's no info on this in Pillow docs. It says I can load bitmap fonts like this
font = ImageFont.load("arial.pil")
but "PIL uses its own font file format to store bitmap fonts." so I guess png file won't work. How can I tell PIL to use said bitmap and where each character is on it?
Not a complete answer, but too much for a comment, and it may be useful or spur someone else to work out the other 60% :-)
I may delete it if anyone else comes up with something better...
You can go to the Pillow repository on Github and download a ZIP file of the code.
If you go in there and nose around you will find two things that appear to work hand-in-hand, namely a .PIL file and a .PBM file.
In Tests/fonts there is a file called 10x20.pbm which is actually a PNG file if you look inside it. So, if you change its name to 10x20.png you can view it and it looks like this:
By the way, if you want to split that into 10x20 size chunks with one letter in each, you can use ImageMagick in Terminal like this:
convert 10x20.pbm -crop 10x20 char_%d.png
and you will get a bunch of files called char_0.png, char_1.png etc. The first 4 look like this:
If you look in src/PIL/FontFile.py there is this code that seems to know how to access/generate the metrics for a font:
#
# The Python Imaging Library
# $Id$
#
# base class for raster font file parsers
#
# history:
# 1997-06-05 fl created
# 1997-08-19 fl restrict image width
#
# Copyright (c) 1997-1998 by Secret Labs AB
# Copyright (c) 1997-1998 by Fredrik Lundh
#
# See the README file for information on usage and redistribution.
#
from __future__ import print_function
import os
from . import Image, _binary
WIDTH = 800
def puti16(fp, values):
# write network order (big-endian) 16-bit sequence
for v in values:
if v < 0:
v += 65536
fp.write(_binary.o16be(v))
##
# Base class for raster font file handlers.
class FontFile(object):
bitmap = None
def __init__(self):
self.info = {}
self.glyph = [None] * 256
def __getitem__(self, ix):
return self.glyph[ix]
def compile(self):
"Create metrics and bitmap"
if self.bitmap:
return
# create bitmap large enough to hold all data
h = w = maxwidth = 0
lines = 1
for glyph in self:
if glyph:
d, dst, src, im = glyph
h = max(h, src[3] - src[1])
w = w + (src[2] - src[0])
if w > WIDTH:
lines += 1
w = (src[2] - src[0])
maxwidth = max(maxwidth, w)
xsize = maxwidth
ysize = lines * h
if xsize == 0 and ysize == 0:
return ""
self.ysize = h
# paste glyphs into bitmap
self.bitmap = Image.new("1", (xsize, ysize))
self.metrics = [None] * 256
x = y = 0
for i in range(256):
glyph = self[i]
if glyph:
d, dst, src, im = glyph
xx = src[2] - src[0]
# yy = src[3] - src[1]
x0, y0 = x, y
x = x + xx
if x > WIDTH:
x, y = 0, y + h
x0, y0 = x, y
x = xx
s = src[0] + x0, src[1] + y0, src[2] + x0, src[3] + y0
self.bitmap.paste(im.crop(src), s)
self.metrics[i] = d, dst, s
def save(self, filename):
"Save font"
self.compile()
# font data
self.bitmap.save(os.path.splitext(filename)[0] + ".pbm", "PNG")
# font metrics
with open(os.path.splitext(filename)[0] + ".pil", "wb") as fp:
fp.write(b"PILfont\n")
fp.write((";;;;;;%d;\n" % self.ysize).encode('ascii')) # HACK!!!
fp.write(b"DATA\n")
for id in range(256):
m = self.metrics[id]
if not m:
puti16(fp, [0] * 10)
else:
puti16(fp, m[0] + m[1] + m[2])
So hopefully someone has time/knowledge of how to put those two together to enable you to generate the metrics file for your PNG. I think you just need something that does the last 10 lines of that code for your PNG.
There appear to be 23 bytes of header which you can simply replicate, and then there are 256 "entries", i.e. 1 for each of 256 glyphs. Each entry has 10 numbers in it, and each number is 16-bit big endian.
Let's look at the header:
dd if=10x20.pil bs=23 count=1| xxd -c23 | more
00000000: 5049 4c66 6f6e 740a 3b3b 3b3b 3b3b 3230 3b0a 4441 5441 0a PILfont.;;;;;;20;.DATA.
Then you can see the entries using the command below to skip the header and group nicely:
dd if=10x20.pil bs=23 iseek=1| xxd -g2 -c20
which gives:
Column 1 appears to be the width of the glyph.
Column 7 is the x-offset of the left edge of the glyph in the image and column 9 is the x-offset of the right edge of the glyph in the image. So you will see that column 7 on each line is the same as column 9 on the previous line, i.e. that the glyphs abutt each other going across the image.
If you look at this extract from further down the file, you can see it starts a new row of glyphs in the output image in the middle of the extract (marked in red). That tells us that the bitmap should be no more than 800 pixels wide and that column 8 is the y-offset of the top of the glyph in the bitmap file and column 10 is the y-offset of the bottom of the glyph in the bitmap. You should see that when a new line row of glyphs starts in the bitmap file that x goes to zero and column 8 takes the previous value from column 10.

Python: Image Segmentation as pre-process for Classification

What technique do you recommend to segment the characters in this image to be ready to fed a model like the ones use with MNIST dataset; because they take one character at a time. This question is regadless the importance of transforming the image and the binarization of it.
Thanks!
As a starting point i would try the following:
Use OTSU threshold.
Than do some morphological operations to get rid of noise and to isolate each digit.
Run connected component labling.
Fed each connected component to your classifier to get recognize the digit if the classification score is low discard.
Final validation you expect all the digit to be more or less on line and in more or less some constant distance from each other.
Here are the first 4 stages. Now you need to add your recognition software to recognize the digits.
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Params
EPSSILON = 0.4
MIN_AREA = 10
BIG_AREA = 75
# Read img
img = cv2.imread('i.jpg',0)
# Otzu threshold
a,thI = cv2.threshold(img,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
# Morpholgical
se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(1,1))
thIMor = cv2.morphologyEx(thI,cv2.MORPH_CLOSE,se)
# Connected compoent labling
stats = cv2.connectedComponentsWithStats(thIMor,connectivity=8)
num_labels = stats[0]
labels = stats[1]
labelStats = stats[2]
# We expect the conneccted compoennt of the numbers to be more or less with a constats ratio
# So we find the medina ratio of all the comeonets because the majorty of connected compoent are numbers
ratios = []
for label in range(num_labels):
connectedCompoentWidth = labelStats[label,cv2.CC_STAT_WIDTH]
connectedCompoentHeight = labelStats[label, cv2.CC_STAT_HEIGHT]
ratios.append(float(connectedCompoentWidth)/float(connectedCompoentHeight))
# Find median ratio
medianRatio = np.median(np.asarray(ratios))
# Go over all the connected component again and filter out compoennt that are far from the ratio
filterdI = np.zeros_like(thIMor)
filterdI[labels!=0] = 255
for label in range(num_labels):
# Ignore biggest label
if(label==1):
filterdI[labels == label] = 0
continue
connectedCompoentWidth = labelStats[label,cv2.CC_STAT_WIDTH]
connectedCompoentHeight = labelStats[label, cv2.CC_STAT_HEIGHT]
ratio = float(connectedCompoentWidth)/float(connectedCompoentHeight)
if ratio > medianRatio + EPSSILON or ratio < medianRatio - EPSSILON:
filterdI[labels==label] = 0
# Filter small or large compoennt
if labelStats[label,cv2.CC_STAT_AREA] < MIN_AREA or labelStats[label,cv2.CC_STAT_AREA] > BIG_AREA:
filterdI[labels == label] = 0
plt.imshow(filterdI)
# Now go over each of the left compoenet and run the number recognotion
stats = cv2.connectedComponentsWithStats(filterdI,connectivity=8)
num_labels = stats[0]
labels = stats[1]
labelStats = stats[2]
for label in range(num_labels):
# Crop the bounding box around the component
left = labelStats[label,cv2.CC_STAT_LEFT]
top = labelStats[label, cv2.CC_STAT_TOP]
width = labelStats[label, cv2.CC_STAT_WIDTH]
height = labelStats[label, cv2.CC_STAT_HEIGHT]
candidateDigit = labels[top:top+height,left:left+width]
# plt.figure(label)
# plt.imshow(candidateDigit)
I connect to the Amitay answer.
For the 2:
I would use thinning as morphological operation (look thinning algorithm in opencv)
For the 3:
And in OpenCV 3.0 there is already a function called cv::connectedComponents)
Hope it helps

Categories

Resources