Why is Tesseract number recognition not working properly? - python

I've been working with pytesseract the past days, and I've noticed that the library is quite bad at identifying numbers. I do not know, if I am doing something wrong, but I keep getting ♀ as an output.
class Image_Recognition():
def digit_identification(self):
# save normal screenshot
screen = ImageGrab.grab(bbox=(706,226,1200,726))
screen.save(r'tmp\tmp.png')
# read the image file
img = cv2.imread(r'tmp\tmp.png', 2)
# convert to binary image
[ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
# use OCR library to identify numbers in screenshot
text = pytesseract.image_to_string(bw_img)
print(text)
INPUT:
(Converted to a binary image in order to make numbers more intelligible.)
OUTPUT:
♀
Tell me if there is something off, or just suggest other approaches for handling text recognition.

First of all, please read the article Improving the quality of the output, especially the section regarding the page segmentation method. Also, you can limit the characters to be found to digits 0-9.
You have a tiny image, which makes extraction of all numbers at once quite challenging, especially for the mixture of bright text on dark background and vice versa. But, you can quite easily crop all the single tiles, and extract the numbers one by one. So, no distinction between these two types of tiles needs to be made.
Also, you know, that numbers must be multiples of two (I guess, most people will know 2048). So, if no such a number could be found, try upscaling the cropped tile, and repeat. (Eventually, give up after a few times.)
That'd be my full code:
import cv2
import math
import pytesseract
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
return math.log10(x) / math.log10(2)
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
return math.ceil(log2(n)) == math.floor(log2(n))
# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]
# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)
# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
# https://stackoverflow.com/q/4944830/11089932
config = '--psm 6 -c tessedit_char_whitelist=0123456789'
# Iterate tiles, and extract texts
for i in range(4):
for j in range(4):
# Crop tile
x1 = i * w
x2 = (i + 1) * w
y1 = j * h
y2 = (j + 1) * h
roi = img[y1:y2, x1:x2]
# If no proper power of 2 is found, upscale image and repeat
while True:
text = pytesseract.image_to_string(roi, config=config)
text = text.replace('\n', '').replace('\f', '')
if (text == '') or (not is_power_of_2(int(text))):
roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
if roi.shape[0] > 1000:
a[j, i] = -1
break
else:
a[j, i] = int(text)
break
print(a)
For the given image, I get the following output:
[[ 8 16 4 2]
[ 2 8 32 8]
[ 2 4 16 4]
[ 4 2 4 2]]
For another similar image
I get:
[[ 4 -1 -1 -1]
[ 2 2 -1 -1]
[-1 -1 -1 -1]
[ 2 -1 -1 -1]]
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.3
OpenCV: 4.5.3
pytesseract: 5.0.0-alpha.20201127
----------------------------------------

Related

How to randomly generate scratch like lines with opencv automatically

I am trying to generate synthetic images for my deep learning model. I need to draw scratches on a black surface. I already have a little script that can generate random white scratch like lines but only horizontally. I need the scratches to also be vertically and curved. On top of that it would also be very helpfull if the thickness of the scratches would also be random so I have thick and thin scratches.
This is my code so far:
import cv2
import numpy as np
import random
height = 384
width = 384
blank_image = np.zeros((height, width, 3), np.uint8)
num_scratches= random.randint(0,5)
for _ in range(num_scratches):
row_random = random.randint(20,370)
blank_image[row_random:(row_random+1), row_random:(row_random+random.randint(25,75))] = (255,255,255)
cv2.imshow("synthetic", blank_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
This is one example result outcome:
How do I have to edit my script so I can get more diverse looking scratches?
The scratches should somehow look like this for example (Done with paint):
need the scratches to also be vertically
Your method might be adopted as follows
import numpy as np # cv2 read image into np.array
img = np.zeros((5,5),dtype='uint8') # same as loading 5 x 5 px black rectangle
img[1:4,2:3] = 255
print(img)
Output:
[[ 0 0 0 0 0]
[ 0 0 255 0 0]
[ 0 0 255 0 0]
[ 0 0 255 0 0]
[ 0 0 0 0 0]]
Explanation: I set all elements (pixel) which have y-cordinate between 1 (inclusive) and 4 (exclusive) and x-cordinate between 2 (inclusive) and 3 (exclusive).
Nonetheless cv2 provide function for drawing lines namely cv2.line which is more handy to use, it does accept img on which to work, start point, end point, color and thickness, docs give following example:
# Draw a diagonal blue line with thickness of 5 px
img = cv2.line(img,(0,0),(511,511),(255,0,0),5)
If you are working in grayscale use value rather than 3-tuple as color.

How to split an image horizontally into equal-sized pieces?

I am trying to split this input image into three/four equal-sized pieces horizontally but I am not getting the desired output. I don't know what I am doing wrong.
Here is the code which I wrote to split it:
import os.path
import numpy as np
from PIL import image
input_1 = "/home/task-1/split_operation/111.jpg"
outputPath = "/home/task-1/split_operation/"
im = Image.open(input_1)
x_width, y_height = im.size
split = np.int(x_width / 3)
outputFileFormat = "{0}-{1}.jpg"
baseName = "cropped_1"
for i in range(0, x_width, split):
x = split + i
box = (x, 0, x + split, y_height)
a = im.crop(box)
a.load()
outputName = os.path.join(outputPath, outputFileFormat.format(baseName, i + 1))
a.save(outputName, "JPEG")
The input image:
The output images I currently get:
As you can see last two images are black. I don't know why I am getting a black image.
Your problem is that when i=0 your first x is split which means you skip the first image. You can actually see that the first image doesn't align with the edge of the original. In addition, you are rounding the pixels to use as a step to the range function but this creates a problematic behaviour: your image's width is 800 pixels. This means that your split is 266. This means that the last i value is going to be 798 and this is why the third image has a thin line of 2 pixels at the left edge.
A better way will be to generate all "edges" using np.linspace. This assures the right amount of pictures in the exact right sizes instead of bothering with range loops and width calculations. Just create the edges list and use it in the loop as follows:
...
pictures = 3
edges = np.linspace(0, x_width, pictures+1)
for start, end in zip(edges[:-1], edges[1:]):
box = (start, 0, end, y_height)
...
The rest stays the same.
Running this with pictures = 3 will produce:
Running with pictures = 4 will produce:
`
I will assume that width of the image is 900px.
Then split = 300
Let's look at first few iterations of your for loop:
i = 0; x = 300; x + split = 600
i = 300; x = 300 + 300 = 600; x + split = 900
...
As you can see, part of the image (0--300px) is missing.
It could be easily fixed:
for i in range(0, x_width, split):
box = (i, 0, i + split, y_height)
...
The reason you getting black images is just because you try to access pixels that are not part of the image.

Optimization of basic image processing in Python

I'm trying to accomplish a basic image processing. Here is my algorithm :
Find n., n+1., n+2. pixel's RGB values in a row and create a new image from these values.
Here is my example code in python :
import glob
import ntpath
import time
from multiprocessing.pool import ThreadPool as Pool
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
images = glob.glob('model/*.png')
pool_size = 17
def worker(image_file):
try:
new_image = np.zeros((2400, 1280, 3), dtype=np.uint8)
image_name = ntpath.basename(image_file)
print(f'Processing [{image_name}]')
image = Image.open(image_file)
data = np.asarray(image)
for i in range(0, 2399):
for j in range(0, 1279):
pix_x = j * 3 + 1
red = data[i, pix_x - 1][0]
green = data[i, pix_x][1]
blue = data[i, pix_x + 1][2]
new_image[i, j] = [red, green, blue]
im = Image.fromarray(new_image)
im.save(f'export/{image_name}')
except:
print('error with item')
pool = Pool(pool_size)
for image_file in images:
pool.apply_async(worker, (image_file,))
pool.close()
pool.join()
My input and output images are in RGB format. My code is taking 5 second for every image. I'm open for any idea to optimization this task.
Here is example input and output images :
Input Image2 [ 3840 x 2400 ]
Output Image3 [ 1280 x 2400 ]
Here is an approach:
import cv2
import numpy as np
# Load input image
im = cv2.imread('input.png')
# Calculate new first layer - it is every 3rd pixel of the first layer of im
n1 = im[:, ::3, 0]
# Calculate new second layer - it is every 3rd pixel of the second layer of im, starting with an offset of 1 pixel
n2 = im[:, 1::3, 1]
# Calculate new third layer - it is every 3rd pixel of the third layer of im, starting with an offset of 2 pixels
n3 = im[:, 2::3, 2]
# Now stack the three new layers to make a new output image
res = np.dstack((n1,n2,n3))
As far as I understood from the question, you want to shift the pixel values of each channel of the input image in the output image. So, here is my approach.
im = cv2.cvtColor(cv2.imread('my_image.jpg'), cv2.COLOR_BGR2RGB)
im = np.pad(im, [(3, 3),(3,3),(0,0)], mode='constant', constant_values=0) # Add padding for enabling the shifting process later
r= im[:,:,0]
g= im[:,:,1]
g = np.delete(g,np.s_[-1],axis=1) # remove the last column
temp_pad = np.zeros(shape=(g.shape[0],1)) # removed part
g = np.concatenate((temp_pad,g),axis=1) # put the removed part back
b = im[:,:,2]
b = np.delete(b,np.s_[-2::],axis=1) # remove the last columns
temp_pad = np.zeros(shape=(b.shape[0],2)) # removed parts
b = np.concatenate((temp_pad,b),axis=1) # put the removed parts back
new_im = np.dstack((r,g,b)) # Merge the channels
new_im = new_im[3:-3,3:-3,:]/np.amax(new_im)#*255 # Remove the padding
Basically, I achieved the shifting by padding&merging the green and blue channels. Let me know if this is what you are looking for. Kolay gelsin :)

PyTesseract - Text broken by horizontal white lines

It is a classic PyTesseract problem of noisy image scanning. However, in this case dot matrix printer is printing some horizontal white lines in the text. Attached are some samples. I am not sure what kind of preprocessing will improve the scanning of the text.
Using below command following output comes for below sample:
tesseract test.png stdout --psm 6 --dpi 120
Output: (Expected is "RVC 64.80%" )
PRVG
64.5056"
For Above image pytesseract gives
152.00 KILOGRAW
817.51 USO
and the expected is - 152.00 KILOGRAM 617.51 USD
I know the images are noisy so please do not post obvious answer that as the images are noisy so the output is bad. As I always get same text from the printer so I can apply same type of preprocessing.
The first one picture,handle code:
from PIL import Image
import numpy as np
import pytesseract
import time
from collections import Counter
img = Image.open('OCR.png').convert('L')
pixelArray = img.load()
threshold = 240
table = []
for y in range(img.size[1]): # binaryzation
List = []
for x in range(img.size[0]):
if pixelArray[x,y] < threshold:
List.append(0)
else:
List.append(256)
table.append(List)
img = Image.fromarray(np.array(table))
img.show()
def operation(image):
resultList = []
pixelList = image.load()
flag = False
for y in range(image.size[1]):
temp = []
linePixel = 0
for x in range(image.size[0]):
if not pixelList[x,y]:
linePixel += 1
temp.append(pixelList[x,y])
if linePixel >= 35: # judge the black dot in one line
flag = True
resultList.append(temp)
elif flag:
# resultList.append([0]*image.size[0]) # to check the handling lines
flag = False
else:
resultList.append([256] * image.size[0])
return Image.fromarray(np.array(resultList))
for i in range(6):
img = operation(img)
img.show()
print(pytesseract.image_to_string(img,config='--psm 6'))
The first measure for handling(binaryzation):
The second measure is to remove the white line(judge black pixels in one line):
And finally,the result is:
"RVC
64.80%"

How to create CMYK halftone Images from a color image?

I am working on a project that requires me to separate out each color in a CYMK image and generate a halftone image that will be printed on a special halftone printer. The method used is analogues to silk screening in that the process is almost identical. Take a photo and break out each color channel. Then produce a screen for the half tone. Each color screen must have it's screen skewed by 15-45 (adjustable) degrees. Dot size and LPI must be calculated from user configurable values to achieve different effects. This process I am told is used in silk screening but I have been unable to locate any information that explains CYMK halftoning. I find plenty for reducing to a single color and generating new print style b/w halftone image.
I would guess that I need to:
split the file into it's color channels.
generate a monochrome halftone image for that channel.
Skew the resultant halftone image by the number of degrees * channel number.
Does anyone know if this is the correct approach and any existing python code for this? Or of any good explanations for this process or algorithms?
I used to run a screen printing studio (it was a fairly small one), and although I have never actually done colour separation printing, I am reasonably familiar with the principles. This is how I would approach it:
Split the image into C, M, Y, K.
Rotate each separated image by 0, 15, 30, and 45 degrees respectively.
Take the half-tone of each image (dot size will be proportional to the intensity).
Rotate back each half-toned image.
Now you have your colour separated images. As you mention, the rotation step reduces dot alignment issues (which would mess everything up), and things like Moiré pattern effects will be reasonably minimized.
This should be pretty easy to code using PIL.
Update 2:
I wrote some quick code that will do this for you, it also includes a GCR function (described below):
import Image, ImageDraw, ImageStat
def gcr(im, percentage):
'''basic "Gray Component Replacement" function. Returns a CMYK image with
percentage gray component removed from the CMY channels and put in the
K channel, ie. for percentage=100, (41, 100, 255, 0) >> (0, 59, 214, 41)'''
cmyk_im = im.convert('CMYK')
if not percentage:
return cmyk_im
cmyk_im = cmyk_im.split()
cmyk = []
for i in xrange(4):
cmyk.append(cmyk_im[i].load())
for x in xrange(im.size[0]):
for y in xrange(im.size[1]):
gray = min(cmyk[0][x,y], cmyk[1][x,y], cmyk[2][x,y]) * percentage / 100
for i in xrange(3):
cmyk[i][x,y] = cmyk[i][x,y] - gray
cmyk[3][x,y] = gray
return Image.merge('CMYK', cmyk_im)
def halftone(im, cmyk, sample, scale):
'''Returns list of half-tone images for cmyk image. sample (pixels),
determines the sample box size from the original image. The maximum
output dot diameter is given by sample * scale (which is also the number
of possible dot sizes). So sample=1 will presevere the original image
resolution, but scale must be >1 to allow variation in dot size.'''
cmyk = cmyk.split()
dots = []
angle = 0
for channel in cmyk:
channel = channel.rotate(angle, expand=1)
size = channel.size[0]*scale, channel.size[1]*scale
half_tone = Image.new('L', size)
draw = ImageDraw.Draw(half_tone)
for x in xrange(0, channel.size[0], sample):
for y in xrange(0, channel.size[1], sample):
box = channel.crop((x, y, x + sample, y + sample))
stat = ImageStat.Stat(box)
diameter = (stat.mean[0] / 255)**0.5
edge = 0.5*(1-diameter)
x_pos, y_pos = (x+edge)*scale, (y+edge)*scale
box_edge = sample*diameter*scale
draw.ellipse((x_pos, y_pos, x_pos + box_edge, y_pos + box_edge), fill=255)
half_tone = half_tone.rotate(-angle, expand=1)
width_half, height_half = half_tone.size
xx=(width_half-im.size[0]*scale) / 2
yy=(height_half-im.size[1]*scale) / 2
half_tone = half_tone.crop((xx, yy, xx + im.size[0]*scale, yy + im.size[1]*scale))
dots.append(half_tone)
angle += 15
return dots
im = Image.open("1_tree.jpg")
cmyk = gcr(im, 0)
dots = halftone(im, cmyk, 10, 1)
im.show()
new = Image.merge('CMYK', dots)
new.show()
This will turn this:
into this (blur your eyes and move away from the monitor):
Note that the image sampling can be pixel by pixel (thus preserving the resolution of the original image, in the final image). Do this by setting sample=1, in which case you need to set scale to a larger number so that there are a number of possible dot sizes. This will also result in a larger output image size (original image size * scale ** 2, so watch out!).
By default when you convert from RGB to CMYK the K channel (the black channel) is empty. Whether you need the K channel or not depends upon your printing process. There are various possible reasons you might want it: getting a better black than the overlap of CMY, saving ink, improving drying time, reducing ink bleed, etc. Anyhow I've also written a little Grey component replacement function GCR, so you can set the percentage of K channel you want to replace CMY overlap with (I explain this a little further in the code comments).
Here is a couple of examples to illustrate. Processing the letter F from the image, with sample=1 and scale=8, so fairly high resolution.
The 4 CMYK channels, with percentage=0, so empty K channel:
combines to produce:
CMYK channels, with percentage=100, so K channel is used. You can see the cyan channel is fully supressed, and the magenta and yellow channels use a lot less ink, in the black band at the bottom of the image:
My solution also uses PIL, but relies on the internal dithering method (Floyd-Steinberg) supported internally. Creates artifacts, though, so I am considering rewriting its C code.
from PIL import Image
im = Image.open('tree.jpg') # open RGB image
cmyk= im.convert('CMYK').split() # RGB contone RGB to CMYK contone
c = cmyk[0].convert('1').convert('L') # and then halftone ('1') each plane
m = cmyk[1].convert('1').convert('L') # ...and back to ('L') mode
y = cmyk[2].convert('1').convert('L')
k = cmyk[3].convert('1').convert('L')
new_cmyk = Image.merge('CMYK',[c,m,y,k]) # put together all 4 planes
new_cmyk.save('tree-cmyk.jpg') # and save to file
The implicit GCR PIL applies can also be expanded with a more generic one, but I have tried to describe a simple solution, where also resolution and sampling are ignored.
With ordered (Bayer) Dither using D4 pattern, we can modify #fraxel's code to end up with an output dithered image with 4 times the height and width of the input color image, as shown below
import matplotlib.pylab as plt
def ordered_dither(im, D4):
im = (15 * (im / im.max())).astype(np.uint8)
h, w = im.shape
im_out = np.zeros((4 * h, 4 * w), dtype = np.uint8)
x, y = 0, 0
for i in range(h):
for j in range(w):
im_out[x:x + 4, y:y + 4] = 255 * (D4 < im[i, j])
y = (y + 4) % (4 * w)
x = (x + 4) % (4 * h)
return im_out
def color_halftoning_Bayer(cmyk, angles, D4):
out_channels = []
for i in range(4):
out_channel = Image.fromarray(
ordered_dither(np.asarray(cmyk[i].rotate(angles[i],expand=1)), D4)
).rotate(-angles[i], expand=1)
x = (out_channel.size[0] - cmyk[i].size[0]*4) // 2
y = (out_channel.size[1] - cmyk[i].size[1]*4) // 2
out_channel = out_channel.crop((x, y, x + cmyk[i].size[0]*4, y + cmyk[i].size[1]*4))
out_channels.append(out_channel)
return Image.merge('CMYK',out_channels)
image = Image.open('images/tree.jpg')
cmyk = gcr(image,100).split()
D4 = np.array([[ 0, 8, 2, 10],
[12, 4, 14, 6],
[ 3, 11, 1, 9],
[15, 7, 13, 5]], dtype=np.uint8)
out = np.asarray(color_halftoning_Bayer(cmyk, np.linspace(15, 60, 4), D4).convert('RGB'))
plt.figure(figsize=(20,20))
plt.imshow(out)
plt.show()
The above code when ran on the following tree input
it generates the following dithered output:

Categories

Resources