How to obtain the best result from pytesseract? - python

I'm trying to read text from an image, using OpenCV and Pytesseract, but with poor results.
The image I'm interested in reading the text is: https://www.lubecreostorepratolapeligna.it/gb/img/logo.png
This is the code I am using:
pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
image = cv2.imread(path_to_image)
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
cv2.imshow('threshold image', threshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
The result of the execution is : ["cu"," ","LUBE"," ","STORE","PRATOLA PELIGNA"]
But the result should be these 7 words: ["cucine", "LUBE", "CREO", "kitchens", "STORE", "PRATOLA", "PELIGNA"]
Is there anyone who could help me to solve this problem ?

Edit, 17.12.2020: Using preprocessing now it recognizes all, but the "O" in CREO. See the stages in ocr8.py. Then ocr9.py demonstrates (but not automated yet) finding the lines of text by the coordinates returned from pytesseract.image_to_boxes(), approcimate size of the letters and inter-symbol distance, then extrapolating one step ahead and searching for a single character (--psm 8).
It happened that Tesseract had actually recognized the "O" in CREO, but it read it as ♀, probably confused by the little "k" below etc.
Since it is a rare and "strange"/unexpected symbol, it could be corrected - replaced automatically (see the function Correct()).
There is a technical detail: Tesseract returns the ANSI/ASCII symbol 12, (0x0C) while the code in my editor was in Unicode/UTF-8 - 9792. So I coded it inside as chr(12).
The latest version: ocr9.py
You mentioned that PRATOLA and PELIGNA have to be given sepearately - just split by " ":
splitted = text.split(" ")
RECOGNIZED
CUCINE
LUBE
STORE
PRATOLA PELIGNA
CRE [+O with correction and extrapolation of the line]
KITCHENS
...
C 39 211 47 221 0
U 62 211 69 221 0
C 84 211 92 221 0
I 107 211 108 221 0
N 123 211 131 221 0
E 146 211 153 221 0
L 39 108 59 166 0
U 63 107 93 166 0
B 98 108 128 166 0
E 133 108 152 166 0
S 440 134 468 173 0
T 470 135 499 173 0
O 500 134 539 174 0
R 544 135 575 173 0
E 580 135 608 173 0
P 287 76 315 114 0
R 319 76 350 114 0
A 352 76 390 114 0
T 387 76 417 114 0
O 417 75 456 115 0
L 461 76 487 114 0
A 489 76 526 114 0
P 543 76 572 114 0
E 576 76 604 114 0
L 609 76 634 114 0
I 639 76 643 114 0
G 649 75 683 115 0
N 690 76 722 114 0
A 726 76 764 114 0
C 21 30 55 65 0
R 62 31 93 64 0
E 99 31 127 64 0
K 47 19 52 25 0
I 61 19 62 25 0
T 71 19 76 25 0
C 84 19 89 25 0
H 96 19 109 25 0
E 113 19 117 25 0
N 127 19 132 25 0
S 141 19 145 22 0
These are from getting "boxes".
Initial message:
I guess that for the area where "cucine" is, an adaptive threshold may segment it better or maybe applying some edge detection first.
Kitchens seems very small, what about trying to enlarge that area/distance.
For the CREO, I guess it's confused with the big and small size of adjacent captions.
For the "O" in creo, you may apply dilate in order to close the gap of the "O".
Edit: I played a bit, but without Tesseract and it needs more work. My goal was to make the letters more contrasting, may need some of these processings to be applied selectively only on the Cucine, maybe applying the recognition in two passes. When getting those partial words "Cu", apply adaptive threshold etc. (below) and OCR on a top rectangle around "CU..."
Binary Threshold:
Adaptive Threshold, Median blur (to clean noise) and invert:
Dilate connects small gaps, but it also destroys detail.
import cv2
import numpy as np
#pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
path_to_image = "logo.png"
#path_to_image = "logo1.png"
image = cv2.imread(path_to_image)
h, w, _ = image.shape
w*=3; h*=3
w = (int)(w); h = (int) (h)
image = cv2.resize(image, (w,h), interpolation = cv2.INTER_AREA) #Resize 3 times
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
#threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
threshold_img = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,13,3) #cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11,2)[1]
cv2.imshow('threshold image', threshold_img)
cv2.waitKey(0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
threshold_img = cv2.medianBlur(threshold_img,5)
cv2.imshow('medianBlur', threshold_img)
cv2.waitKey(0)
threshold_img = cv2.bitwise_not(threshold_img)
cv2.imshow('Invert', threshold_img)
cv2.waitKey(0)
#kernel = np.ones((1, 1), np.uint8)
#threshold_img = cv2.dilate(threshold_img, kernel)
#cv2.imshow('Dilate', threshold_img)
#cv2.waitKey(0)
cv2.imshow('threshold image', thrfeshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)

Related

Shape of face encodings differ

I'm trying to make a face recognition program but the problem is the face encoding shape of some encodings are bigger than the others and thus im getting the error
ValueError: setting an array element with a sequence.
Here's my code to generate the encodings
class FaceEncoder():
def __init__(self, files, singleton = False, model_path='./models/lbpcascade_animeface.xml', scale_factor=1.1, min_neighbours=1):
self.singleton = singleton
self.files = files
self.model = model_path
self.scale_factor = scale_factor
self.min_neighbours = min_neighbours
def encode(self, singleton=False):
if self.singleton == False:
encodings = []
labels = []
for file in self.files:
cascade = cv2.CascadeClassifier(self.model)
image = cv2.imread(file)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
faces = cascade.detectMultiScale(rgb, self.scale_factor, self.min_neighbours)
if len(faces) > 0:
print('Found face in '+file)
encodings.append(faces.flatten())
labels.append(file.split('/')[2])
else:
print('Couldnt find face in '+file)
return encodings, labels
Here are some of the encodings
[204 96 211 211]
[525 168 680 680]
[205 11 269 269]
[ 165 31 316 316 1098 181 179 179]
[ 113 422 1371 1371]
[ 71 86 183 183]
[209 19 33 33 88 27 60 60 133 80 65 65 68 117 52 52]
[117 77 149 149]
[ 63 77 284 284]
[370 222 490 490]
[433 112 114 114 183 98 358 358]
[ 44 35 48 48 192 34 48 48]
[210 82 229 229]
[429 90 153 153]
[318 50 174 174 118 142 120 120]
you should not put several found rects into the same list entry.
if there are many faces found, put each on its own row, and add a label per face found (not per image)
then, what you have now, are NOT "encodings", just mere boxes / rectangles.
read up on how to get real encodings (facenet, spherenet ?), then you need to:
crop the face region fom the image
resize it to the nn input size (e.g. 96x96)
run it through the nn to receive the encoding
save that along with a label to a db/list

Filtering a labeled image by particle area

I have a labeled image of detected particles and a dataframe with the corresponding area of each labeled particle. What I want to do is filter out every particle on the image with an area smaller than a specified value.
I got it working with the example below, but I know there must be a smarter and especially faster way.
For example skipping the loop by comparing the image with the array.
Thanks for your help!
Example:
labels = df["label"][df.area > 5000].to_numpy()
mask = np.zeros(labeled_image.shape)
for label in labels:
mask[labeled_image == label] = 1
Dataframe:
label centroid-0 centroid-1 area
0 1 15 3681 191
1 2 13 1345 390
2 3 43 3746 885
3 4 32 3616 817
4 5 20 4250 137
... ... ... ...
3827 3828 4149 1620 130
3828 3829 4151 852 62
3829 3830 4155 330 236
3830 3831 4157 530 377
3831 3832 4159 3975 81
You can use isin to check equality to several labels. The resulting boolean array can be directly used as the mask after casting to the required type (e.g. int):
labels = df.loc[df.area.gt(5000), 'label']
mask = np.isin(labeled_image, labels).astype(int)

PyTesseract not seeing some single-digit numbers in table

I have this image of a table
I'm trying to parse it using PyTesseract. I've gotten pretty darn close using this code:
from PIL import Image, ImageOps
import pytesseract
og_image = Image.open('og_image.png')
grayscale = ImageOps.grayscale(og_image)
inverted = ImageOps.invert(grayscale.convert('RGB'))
print(pytesseract.image_to_string(inverted))
This seems to be very accurate, except the single-digit numbers in the second-to-last column are blank. Do I need to do something different to pick up on those numbers?
Tesseract has several modes of page segmentation, and choosing the right one is necessary to help it getting best result. See documentation.
Also in this case, you can restrict tesseract to a certain character set.
Another thing, tesseract is sensitive to the fonts and image size. A simple resizing can change the results greatly. Here I change image size horizontally by factor 2 and vertically to get best result ;)
Combining all the above, you will get:
custom_config = r'--psm 6 -c tessedit_char_whitelist=0123456789.'
print(pytesseract.image_to_string(inverted.resize((1506, 412), Image.ANTIALIAS), config=custom_config))
1525 .199 303 82 161 162 7 .241
1464 .290 424 70 139 198 25 .352
1456 .292 425 116 224 224 0 .345
1433 .240 346 81 130 187 15 .275
1390 .273 373 108 217 216 3 .345
1386 .276 383 54 181 154 18 .315
1225 .208 255 68 148 129 1 .242
1218 .238 230 46 128 127 18 .273
1117 .240 268 43 113 1193 1 .308

Improving Numpy For Loop Speed

I'm trying to find the pixels closest to an RGB value of (0,0,255). I'm trying to calculate the distance of the pixel in RGB values to that value using a 3D Pythagoras calculation, add them to a list, and then return the X and Y coordinates of the values that have the lowest distance. Here's what I have:
# import the necessary packages
import numpy as np
import scipy.spatial as sp
import matplotlib.pyplot as plt
import cv2
import math
from PIL import Image, ImageDraw, ImageFont
background = Image.open("test.tif").convert('RGBA')
png = background.save("test.png")
retina = cv2.imread("test.png")
#convert BGR to RGB image
retina = cv2.cvtColor(retina, cv2.COLOR_BGR2RGB)
h,w,bpp = np.shape(retina)
min1_d = float('inf')
min1_coords = (None, None)
min2_d = float('inf')
min2_coords = (None, None)
for py in range(0,h):
for px in range (0,w):
r = retina[py][px][0]
g = retina[py][px][1]
b = retina[py][px][2]
d = math.sqrt(((r-0)**2) + ((g-0)**2) + ((255-b)**2))
print(str(r) + "," + str(g) + "," + str(b) + ",," + str(px) + "," + str(py) + ",," + str(d))
if d < min1_d:
min2_d = min1_d
min2_coords = min1_coords
min1_d = d
min1_coords = (px, py)
elif d < min2_d: # if it's not the smallest, check if it's the second smallest
min2_d = d
min2_coords = (px, py)
print(min1_coords, min2_coords)
width, height = background.size
x_max = int(width)
y_max = int(height)
img = Image.new('RGBA', (x_max, y_max), (255,255,255,0))
draw = ImageDraw.Draw(img)
draw.point(min1_coords, (0,0,255))
draw.point(min2_coords, (0,0,255))
foreground = img
background.paste(foreground, (0, 0), foreground)
foreground.save("test_bluer.png")
background.save("test_bluer_composite.png")
How can I speed up my for loops? I believe this answer is on the right track, but I'm not sure how to implement the px and py variables while slicing as this answer shows.
You can speed up your code by vectorizing the for loop:
r = retina[:,:,0]
g = retina[:,:,1]
b = retina[:,:,2]
d = np.sqrt(r**2 + g**2 + (255-b)**2)
You can find the coordinates of the minimum with:
min_coords = np.unravel_index(np.argmin(d), np.shape(d))
If you want to find the second smallest distance just change the previous minimum to be a larger distance:
d[min_coords[0],min_coords[1]] = np.inf
min_coords = np.unravel_index(np.argmin(d), np.shape(d))
# min_coords now has the second smallest distance
Here is one way in Python/OpenCV.
Read the input
Define your color (pure blue)
Create an image of the color desired
Compute an image representing the rmse difference
Threshold the rmse image
Get the coordinates of all white pixels
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('red_blue2.png')
# reference color (blue)
color = (255,0,0)
# create image the size of the input, but with blue color
ref = np.full_like(img, color)
# compute rmse difference image
diff = cv2.absdiff(img, ref)
diff2 = diff*diff
b,g,r = cv2.split(diff)
rmse = np.sqrt( ( b+g+r )/3 )
# threshold for pixels within 1 graylevel different
thresh = cv2.threshold(rmse, 1, 255, cv2.THRESH_BINARY_INV)[1]
# get coordinates
coords = np.argwhere(thresh == 255)
for coord in coords:
print(coord[1],coord[0])
# write results to disk
cv2.imwrite("red_blue2_rmse.png", (20*rmse).clip(0,255).astype(np.uint8))
cv2.imwrite("red_blue2_thresh.png", thresh)
# display it
cv2.imshow("rmse", rmse)
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
RMSE Image (scaled in brightness by 20x for viewing):
Thresholded rmse image:
Coordinates:
127 0
128 0
127 1
128 1
127 2
128 2
127 3
128 3
127 4
128 4
127 5
128 5
127 6
128 6
127 7
128 7
127 8
128 8
127 9
128 9
127 10
128 10
127 11
128 11
127 12
128 12
127 13
128 13
127 14
128 14
127 15
128 15
127 16
128 16
127 17
128 17
127 18
128 18
127 19
128 19
127 20
128 20
127 21
128 21
127 22
128 22
127 23
128 23
127 24
128 24
127 25
128 25
127 26
128 26
127 27
128 27
127 28
128 28
127 29
128 29
127 30
128 30
127 31
128 31
127 32
128 32
127 33
128 33
127 34
128 34
127 35
128 35
127 36
128 36
127 37
128 37
127 38
128 38
127 39
128 39
127 40
128 40
127 41
128 41
127 42
128 42
127 43
128 43
127 44
128 44
127 45
128 45
127 46
128 46
127 47
128 47
127 48
128 48
127 49
128 49
As commented, subtract rgb value from array, square, average(or sum) pixel rgb values, get minimum.
Here is my variant:
import numpy
rgb_value = numpy.array([17,211,51])
img = numpy.random.randint(255, size=(1000,1000,3),dtype=numpy.uint8)
img_goal = numpy.average(numpy.square(numpy.subtract(img, rgb_value)), axis=2)
result = numpy.where(img_goal == numpy.amin(img_goal))
result_list = [result[0].tolist(),result[1].tolist()]
for i in range(len(result_list[0])):
print("RGB needed:", rgb_value)
print("Pixel:", result_list[0][i], result_list[1][i])
print("RGB gotten:", img[result_list[0][i]][result_list[1][i]])
print("Distance to value:", img_goal[result_list[0][i]][result_list[1][i]])
There can be multiple results with the same values.

Parallel processing image analyzer function in Python

I have created a function, imgs_to_df() (which relies on img_to_vec()) that takes a list of URLs that point to a JPG (e.g. https://live.staticflickr.com/65535/48123413937_54bb53e98b_o.jpg), resizes it, and converts the URLs to a dataframe of RGB values, where each row is a different image, and each column is the R, G, or B value of the pixel of the (resized) image.
However, the function is very slow, especially once it gets into lists of hundreds or thousands of links, so I need a way to parallelize or otherwise make the process much, much faster. I'd also like to ensure there is a way to easily to match the URLs back with the RGB vectors after I'm done.
I am very new to parallel processing and everything I have read is just confusing me even more.
from PIL import Image
from io import BytesIO
import urllib.request
import requests
import numpy as np
import pandas as pd
def img_to_vec(jpg_url, resize=True, new_width=300, new_height=300):
""" Takes a URL of an image, resizes it (optional), and converts it to a
vector representing RGB values.
Parameters
----------
jpg_url: String. A URL that points to a JPG image.
resize: Boolean. Default True. Whether image should be resized before calculating RGB.
new_width: Int. Default 300. New width to convert image to before calculating RGB.
new_height: Int. Default 300. New height to conver image to before calculating RGB.
Returns
-------
rgb_vec: Vector of size 3*new_width*new_height for the RGB values in each pixel of the image.
"""
response = requests.get(jpg_url) # Create remote image connection
img = Image.open(BytesIO(response.content)) # Save image connection (NOT actual image)
if resize:
img = img.resize((new_width, new_height))
rgb_img = np.array(img) # Create matrix of RGB values
rgb_vec = rgb_img.ravel() # Flatten 3D matrix of RGB values to a vector
return rgb_vec
# Consider parallel processing here
def imgs_to_df(jpg_urls, common_width=300, common_height=300):
""" Takes a list of jpg_urls and converts it to a dataframe of RGB values.
Parameters
----------
jpg_urls: A list of jpg_urls to be resized and converted to a dataframe of RGB values.
common_width: Int. Default 300. New width to convert all images to before calculating RGB.
common_height: Int. Default 300. New height to convert all images to before calculating RGB.
Returns
-------
rgb_df: Pandas dataframe of dimensions len(jpg_urls) rows and common_width*common_height*3
columns. Each row is a unique jpeg image, and each column is an R/G/B value of
a particular pixel of the resized image
"""
assert common_width>0 and common_height>0, 'Error: invalid new_width or new_height dimensions'
for url_idx in range(len(jpg_urls)):
if url_idx % 100 == 0:
print('Converting url number {urlnum} of {urltotal} to RGB '.format(urlnum=url_idx, urltotal=len(jpg_urls)))
try:
img_i = img_to_vec(jpg_urls[url_idx])
if url_idx == 0:
vecs = img_i
else:
try:
vecs = np.vstack((vecs, img_i))
except:
vecs = np.vstack((vecs, np.array([-1]*common_width*common_height*3)))
print('Warning: Error in converting {error_url} to RGB'.format(error_url=jpg_urls[url_idx]))
except:
vvecs = np.vstack((vecs, np.array([-1]*common_width*common_height*3)))
print('Warning: Error in converting {error_url} to RGB'.format(error_url=jpg_urls[url_idx]))
rgb_df = pd.DataFrame(vecs)
return rgb_df
You can use a ThreadPool as your task is I/O bound.
I'm using concurrent.futures. Your function needs to be re-written so that it takes a single URL and makes it to a df.
I added two snippets, one just simply uses loops and another uses Threading. The second one is much much faster.
from PIL import Image
from io import BytesIO
import urllib.request
import requests
import numpy as np
import pandas as pd
def img_to_vec(jpg_url, resize=True, new_width=300, new_height=300):
""" Takes a URL of an image, resizes it (optional), and converts it to a
vector representing RGB values.
Parameters
----------
jpg_url: String. A URL that points to a JPG image.
resize: Boolean. Default True. Whether image should be resized before calculating RGB.
new_width: Int. Default 300. New width to convert image to before calculating RGB.
new_height: Int. Default 300. New height to conver image to before calculating RGB.
Returns
-------
rgb_vec: Vector of size 3*new_width*new_height for the RGB values in each pixel of the image.
"""
response = requests.get(jpg_url) # Create remote image connection
img = Image.open(BytesIO(response.content)) # Save image connection (NOT actual image)
if resize:
img = img.resize((new_width, new_height))
rgb_img = np.array(img) # Create matrix of RGB values
rgb_vec = rgb_img.ravel() # Flatten 3D matrix of RGB values to a vector
return rgb_vec
# Consider parallel processing here
def imgs_to_df(jpg_url, common_width=300, common_height=300):
assert common_width>0 and common_height>0, 'Error: invalid new_width or new_height dimensions'
try:
img_i = img_to_vec(jpg_url)
vecs = img_i
try:
vecs = np.vstack((vecs, img_i))
except:
vecs = np.vstack((vecs, np.array([-1]*common_width*common_height*3)))
print('Warning: Error in converting {error_url} to RGB'.format(error_url=jpg_urls[url_idx]))
except:
print('failed')
rgb_df = pd.DataFrame(vecs)
return rgb_df
img_urls = ['https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/1200px-Flower_poster_2.jpg', 'https://www.tiltedtulipflorist.com/assets/1/14/DimFeatured/159229xL_HR_fd_3_6_17.jpg?114702&value=217',
'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/1200px-Flower_poster_2.jpg', 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/1200px-Flower_poster_2.jpg']
import time
t1 = time.time()
dfs = []
for iu in img_urls:
df = imgs_to_df(iu)
dfs.append(df)
t2 = time.time()
print(t2-t1)
print(dfs)
# aprroach with multi-threading
import concurrent.futures
t1 = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
dfs = [df for df in executor.map(imgs_to_df, img_urls)]
t2 = time.time()
print(t2-t1)
print(dfs)
Out:
3.540484666824341
[ 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 255 255 255 255 ... 93 155 119 97
1 255 255 255 255 ... 93 155 119 97
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns]]
1.2170848846435547
[ 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 255 255 255 255 ... 93 155 119 97
1 255 255 255 255 ... 93 155 119 97
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns]]

Categories

Resources