PyTesseract - Text broken by horizontal white lines

PyTesseract - Text broken by horizontal white lines - python

It is a classic PyTesseract problem of noisy image scanning. However, in this case dot matrix printer is printing some horizontal white lines in the text. Attached are some samples. I am not sure what kind of preprocessing will improve the scanning of the text.
Using below command following output comes for below sample:
tesseract test.png stdout --psm 6 --dpi 120
Output: (Expected is "RVC 64.80%" )
PRVG
64.5056"
For Above image pytesseract gives
152.00 KILOGRAW
817.51 USO
and the expected is - 152.00 KILOGRAM 617.51 USD
I know the images are noisy so please do not post obvious answer that as the images are noisy so the output is bad. As I always get same text from the printer so I can apply same type of preprocessing.

The first one picture,handle code:
from PIL import Image
import numpy as np
import pytesseract
import time
from collections import Counter
img = Image.open('OCR.png').convert('L')
pixelArray = img.load()
threshold = 240
table = []
for y in range(img.size[1]): # binaryzation
List = []
for x in range(img.size[0]):
if pixelArray[x,y] < threshold:
List.append(0)
else:
List.append(256)
table.append(List)
img = Image.fromarray(np.array(table))
img.show()
def operation(image):
resultList = []
pixelList = image.load()
flag = False
for y in range(image.size[1]):
temp = []
linePixel = 0
for x in range(image.size[0]):
if not pixelList[x,y]:
linePixel += 1
temp.append(pixelList[x,y])
if linePixel >= 35: # judge the black dot in one line
flag = True
resultList.append(temp)
elif flag:
# resultList.append([0]*image.size[0]) # to check the handling lines
flag = False
else:
resultList.append([256] * image.size[0])
return Image.fromarray(np.array(resultList))
for i in range(6):
img = operation(img)
img.show()
print(pytesseract.image_to_string(img,config='--psm 6'))
The first measure for handling(binaryzation):
The second measure is to remove the white line(judge black pixels in one line):
And finally,the result is:
"RVC
64.80%"

Related

Would this be the correct method of finding the position of a pixel underneath another pixel?

def detect_edges (image: Image, threshold: float) -> Image:
new_image = copy(image)
for x,y, (r,g,b) in image:
brightness = (r+g+b)/3
if abs(brightness-brightness(x,y-1)) > threshold:
red = 0
green = 0
blue = 0
else:
red = 255
green = 255
blue = 255
det_im = create_color(red,green,blue)
set_color(new_image,x,y,det_im)
return new_image
I am writing this code to check the individual pixels of every point within an image and I need to compare the brightness of that point with the brightness of the point directly underneath it. Would this be the best way to go about solving this? I am also getting a 'float' object is not callable error with this code.

using this image as input
and this code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Feb 3 10:41:37 2021
#author: Pietro
"""
import numpy as np
from PIL import Image
# import copy
image1 = Image.open('imgQ.tif', mode='r')
print('image size : ', image1.size)
print('image mode : ', image1.mode)
image2 = image1.convert('RGB')
print('image size : ', image2.size)
print('image mode : ', image2.mode)
def detect_edges (image: Image, threshold: float) -> Image:
image2_copy = image.copy()
image2_copy.show()
x, y = image2_copy.size
arrayz = (np.random.randint(0,1 ,(x , y))).astype(np.float)
print('arrayz shape : ',arrayz.shape)
print('arrayz dim :',arrayz.ndim, arrayz[0,:].shape, arrayz[:,0].shape)
print('arrayz type :',arrayz.dtype)
# for i in arrayz:
# print(i)
# print(arrayz)
pixels = image2_copy.load() # create the pixel map
for x in range(image2_copy.size[0]):
for y in range(image2_copy.size[1]):
r,g,b = pixels[x,y]
brightness = (r+b+g)/3
arrayz[x,y]=brightness
print(arrayz)
for x in range(image2_copy.size[0]):
for y in range(image2_copy.size[1]):
if abs(arrayz[x,y]-arrayz[x-1,y-1]) > threshold:
print('********************')
pixels[x,y] = (0,0,0)
else:
pixels[x,y] = (255,255,255)
print(pixels)
image2_copy.show()
image2_copy.save('imgQ_copy.tif')
# return new_image
imageM = detect_edges(image2, 100)
# imageM.show()
in python3 I get as result tis image:
Not sure if thats what you were talking about,
and I belive there is something wrong in the approach used, if you zoom at the edges of the polygons the black edge is not present for some of the pixel, I think this is about the algorithm you were thinking, but could be due to my code as well, I am not versed on Images, hope someone better could shed some light on your question

How to improve accuracy of OpenCV matching results in Python

I'm trying to configure OpenCV within Python 3.6 to match a character icon (pattern) 1 with a box of characters 2. Nevertheless, the match is quite low, especially for shaded characters like 1.
I tried to solve it by using not only matchTemplate, but also comparing histograms, nevertheless - result is still poor.
I did try using gray-scale, colors, matching just a center of picture (cropped face), matching whole picture... resizing pattern to have exact dimension as it would be in a box... all combinations... and still this is VERY random (see attached image of correlation results)
Thank you in advance for help!
Here's the code:
import numpy as np
import cv2 as cv
from PIL import Image
import os
box = Image.open("/Users/user/Desktop/dbz/my_box.jpeg")
box.thumbnail((592,1053))
#conditions for each match step
character_threshold = 0.6 #checks in box
hist_threshold = 0.3
import numpy as np
import cv2 as cv
from PIL import Image
import os
box = Image.open("/Users/user/Desktop/dbz/my_box.jpeg")
box.thumbnail((592,1053))
#conditions for each match step
character_threshold = 0.6
hist_threshold = 0.3
for root, dirs, files in os.walk("/Users/user/Desktop/dbz/img/Super/TEQ/"):
for file in files:
if not file.startswith("."):
print("now " + file)
char = os.path.join(root, file)
#Opens and generate character's icon
character = Image.open(char)
character.thumbnail((153,139))
#Crops face from the character's icon and converts to grayscale CV object
face = character.crop((22,22,94,94)) #size 72x72 with centered face (should be 22,22,94,94)
face_array = np.array(face).astype(np.uint8)
face_array_gray = cv.cvtColor(face_array, cv.COLOR_RGB2GRAY)
#Converts the character's icon to grayscale CV object
character_array = np.array(character).astype(np.uint8)
character_array_gray = cv.cvtColor(character_array, cv.COLOR_RGB2GRAY)
#Converts box screen to grayscale CV object
box_array = np.array(box).astype(np.uint8)
box_array_gray = cv.cvtColor(box_array, cv.COLOR_RGB2GRAY)
#Check whether the face is in the box
character_score = cv.matchTemplate(box_array[:,:,2],face_array[:,:,2],cv.TM_CCOEFF_NORMED)
if character_score.max() > character_threshold:
ij = np.unravel_index(np.argmax(character_score),character_score.shape)
x, y = ij[::-1] #np returns lower-left coordinates, whilst PIL accepts upper, left,lower, right !!!
w, h = face_array_gray.shape
face.show()
found = box.crop((x,y,x+w,y+h)) #expand border to 25 pixels in each size (Best is (x-20,y-5,x+w,y+h+20))
#found.show()
#found_character = np.array(found_character).astype(np.uint8)
#found_character = cv.cvtColor(found_character, cv.COLOR_RGB2GRAY)
found_array = np.array(found).astype(np.uint8)
found_array_gray = cv.cvtColor(found_array, cv.COLOR_RGB2GRAY)
found_hist = cv.calcHist([found_array],[0,1,2],None,[8,8,8],[0,256,0,256,0,256])
found_hist = cv.normalize(found_hist,found_hist).flatten()
found_hist_gray = cv.calcHist([found_array_gray],[0],None,[8],[0,256])
found_hist_gray = cv.normalize(found_hist_gray,found_hist_gray).flatten()
face_hist = cv.calcHist([face_array],[0,1,2],None,[8,8,8],[0,256,0,256,0,256])
face_hist = cv.normalize(face_hist,face_hist).flatten()
face_hist_gray = cv.calcHist([face_array_gray],[0],None,[8],[0,256])
face_hist_gray = cv.normalize(face_hist_gray,face_hist_gray).flatten()
character_hist = cv.calcHist([character_array],[0,1,2],None,[8,8,8],[0,256,0,256,0,256])
character_hist = cv.normalize(character_hist,character_hist).flatten()
character_hist_gray = cv.calcHist([character_array_gray],[0],None,[8],[0,256])
character_hist_gray = cv.normalize(character_hist_gray,character_hist_gray).flatten()
hist_compare_result_CORREL = cv.compareHist(found_hist_gray, character_hist_gray,cv.HISTCMP_CORREL)
#hist_compare_result_CHISQR = cv.compareHist(found_hist_gray, character_hist_gray,cv.HISTCMP_CHISQR)
#hist_compare_result_INTERSECT = cv.compareHist(found_hist_gray, character_hist_gray,cv.HISTCMP_INTERSECT)
#hist_compare_result_BHATTACHARYYA = cv.compareHist(found_hist_gray, character_hist_gray,cv.HISTCMP_BHATTACHARYYA)
if (hist_compare_result_CORREL+character_score.max()) > 1:
print(f"Found {file} with a score:\n match:{character_score.max()}\n hist_correl: {hist_compare_result_CORREL}\n SUM:{hist_compare_result_CORREL+character_score.max()}", file=open("/Users/user/Desktop/dbz/out.log","a+"))
(1)
(2)

Here is a simple example of masked template matching in Python/OpenCV.
Image:
Transparent Template:
Template with alpha removed:
Template alpha channel extracted as mask image:
i
mport cv2
import numpy as np
# read image
img = cv2.imread('logo.png')
# read template with alpha
tmplt = cv2.imread('hat_alpha.png', cv2.IMREAD_UNCHANGED)
hh, ww = tmplt.shape[:2]
# extract template mask as grayscale from alpha channel and make 3 channels
tmplt_mask = tmplt[:,:,3]
tmplt_mask = cv2.merge([tmplt_mask,tmplt_mask,tmplt_mask])
# extract templt2 without alpha channel from tmplt
tmplt2 = tmplt[:,:,0:3]
# do template matching
corrimg = cv2.matchTemplate(img,tmplt2,cv2.TM_CCORR_NORMED, mask=tmplt_mask)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(corrimg)
max_val_ncc = '{:.3f}'.format(max_val)
print("correlation match score: " + max_val_ncc)
xx = max_loc[0]
yy = max_loc[1]
print('xmatch =',xx,'ymatch =',yy)
# draw red bounding box to define match location
result = img.copy()
pt1 = (xx,yy)
pt2 = (xx+ww, yy+hh)
cv2.rectangle(result, pt1, pt2, (0,0,255), 1)
cv2.imshow('image', img)
cv2.imshow('template2', tmplt2)
cv2.imshow('template_mask', tmplt_mask)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save results
cv2.imwrite('logo_hat_match2.png', result)
Match location on input:
Match Information:
correlation match score: 1.000
xmatch = 417 ymatch = 44
Without the mask, the large green area in the template would mismatch in the input and lower the match score dramatically.

Change colors of photos in PIL

So in my coursework, I am supposed to change the different variations of colors and get an image that looks like the example provided. I have gotten halfway there, but for the life of me cannot figure out how to get the colors to change, even a little bit. Even if the code runs without errors, the colors will not change
i've tried numpy arrays, editing the pixel color, etc. I can not get anything to work.
import PIL
from PIL import Image
from PIL import ImageEnhance
from PIL import ImageDraw
from PIL import ImageFont
fnt = ImageFont.truetype('readonly/fanwood-webfont.ttf', 75)
# read image and convert to RGB
image=Image.open("readonly/msi_recruitment.gif")
image=image.convert('RGB')
drawing_object = ImageDraw.Draw(image)
# build a list of 9 images which have different brightnesses
enhancer=ImageEnhance.Brightness(image)
images=[]
x = 1
for i in range(0, 10):
pixels = img.load()
print(image.size)
x += 1
z = x
if x%3 == 1 :
z = 9
drawing_object.rectangle((0,450,800,325), fill='black')
drawing_object.text((20,350),'channel intensity o 0.{}'.format(z), font=fnt, fill=(255,255,255))
elif x%3 == 0:
z = 5
drawing_object.rectangle((0,450,800,325), fill='black')
drawing_object.text((20,350),'channel intensity o 0.{}'.format(z), font=fnt, fill=(255,255,255))
else:
z = 1
drawing_object.rectangle((0,450,800,325), fill='black')
drawing_object.text((20,350),'channel intensity o 0.{}'.format(z), font=fnt, fill=(255,255,255))
images.append(enhancer.enhance(10/10))
## create a contact sheet from different brightnesses
first_image=images[0]
contact_sheet=PIL.Image.new(first_image.mode, (first_image.width*3,first_image.height*3))
x=0
y=0
for img in images:
# Lets paste the current image into the contact sheet
contact_sheet.paste(img, (x, y) )
# Now we update our X position. If it is going to be the width of the image, then we set it to 0
# and update Y as well to point to the next "line" of the contact sheet.
if x+first_image.width == contact_sheet.width:
x=0
y=y+first_image.height
else:
x=x+first_image.width
# resize and display the contact sheet
contact_sheet = contact_sheet.resize((int(contact_sheet.width/2),int(contact_sheet.height/2) ))
display(contact_sheet)

So in general you can use OpenCVs ColorMap to change colors in your image:
import cv2
img = cv2.imread(r"<IMAGE PATH>")
img = cv2.applyColorMap(img, cv2.COLORMAP_JET) # Change colors of image with predefined colormap
# Display the new image
cv2.imshow("img", img)
cv2.waitKey()
Now if you want to create your own colormap instead of using a predefined one in OpenCV you can do it with cv2.LUT() (OpenCV Documentation)
Example Input:
Example Output:
And here would be a quick example of how to change Gamma with this approach:
def adjust_gamma(img, gamma=1.0):
assert (img.shape[0] == 1)
invGamma = 1.0 / gamma
table = np.array([((i / 255.0) ** invGamma) *
255 for i in np.arange(0, 256)]).astype("uint8")
new_img = cv2.LUT(np.array(img, dtype=np.uint8), table)
return new_img

It looks like you have a typo in your loop:
for i in range(0, 10):
# if-elses
...
images.append(enhancer.enhance(10/10))
You're enhancing the image with a factor of 1.0 at each iteration, which according to the docs:
Adjust image brightness.
This class can be used to control the brightness of an image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image.
Using z, which I suspect was your intention, will give you the tiers of brightness your comments describe:
for i in range(0, 10):
# if-elses
...
images.append(enhancer.enhance(z/10))
Edit: changed assumed value from i to z, based on the text you're writing.
I should also point out that you probably want to create a temp "watermark" image during the for-loop to apply the text/rectangle to. Since the ImageDraw objects modify the image in-place, you're applying the text on top of each other at each iteration, causing some weird text on the later images.

Converting graphs from a scanned document into data

I'm currently trying to write something that can extract data from some uncommon graphs in a book. I scanned the pages of the book, and by using opencv I would like to detect some features from the graphs in order to convert it into useable data. In the left graph I'm looking for the height of the "triangles" and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
The first thing I thought of was detecting the lines of the charts, in the hopes I could somehow measure their length or position. For this I'm using the Hough Line Transform. The following snippet of code shows how far I've gotten already.
import numpy as np
import cv2
# Reading the image
img = cv2.imread('test2.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection
edges = cv2.Canny(gray,50,150,apertureSize = 3)
# Line detection
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength=50,maxLineGap=20)
for line in lines:
x1,y1,x2,y2 = line[0]
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imwrite('linesDetected.jpg',img)
The only problem is that this detection algorithm is not accurate at all. At least not for me. And in order to extract some data from the charts, the detection of the lines should be somewhat accurate. Is their any way I could do this? Or is my strategy to detect lines just wrong in the first place? Should I maybe start with detecting something else, like circles,object sizes, contours or colors?

Using color segmentation is an easy way to convert this graph to data. This method does require some manual annotation. After the graph is segmented, count the pixels for each color. Check out the 'watershed' demo in the demo files that are included in the OpenCV library:
import numpy as np
import cv2 as cv
from common import Sketcher
class App:
def __init__(self, fn):
self.img = cv.imread(fn)
self.img = cv.resize(self.img, (654,654))
h, w = self.img.shape[:2]
self.markers = np.zeros((h, w), np.int32)
self.markers_vis = self.img.copy()
self.cur_marker = 1
self.colors = np.int32( list(np.ndindex(2, 2, 3)) ) * 123
self.auto_update = True
self.sketch = Sketcher('img', [self.markers_vis, self.markers], self.get_colors)
def get_colors(self):
return list(map(int, self.colors[self.cur_marker])), self.cur_marker
def watershed(self):
m = self.markers.copy()
cv.watershed(self.img, m)
cv.imshow('img', self.img)
overlay = self.colors[np.maximum(m, 0)]
vis = cv.addWeighted(self.img, 0.5, overlay, 0.5, 0.0, dtype=cv.CV_8UC3)
cv.imshow('overlay', np.array(overlay, np.uint8))
cv.imwrite('/home/stephen/Desktop/overlay.png', np.array(overlay, np.uint8))
cv.imshow('watershed', vis)
def run(self):
while cv.getWindowProperty('img', 0) != -1 or cv.getWindowProperty('watershed', 0) != -1:
ch = cv.waitKey(50)
if ch >= ord('1') and ch <= ord('9'):
self.cur_marker = ch - ord('0')
print('marker: ', self.cur_marker)
if self.sketch.dirty and self.auto_update:
self.watershed()
self.sketch.dirty = False
if ch == 27: break
cv.destroyAllWindows()
fn = '/home/stephen/Desktop/test.png'
App(cv.samples.findFile(fn)).run()
The output will be an image like this:
You can count the pixels for each color using this code:
# Extract the values from the image
vals = []
img = cv.imread('/home/stephen/Desktop/overlay.png')
# Get the colors in the image
flat = img.reshape(-1, img.shape[-1])
colors = np.unique(flat, axis=0)
# Iterate through the colors (ignore the first and last colors)
for color in colors[1:-1]:
a,b,c = color
lower = a-1, b-1, c-1
upper = a+1,b+1,c+1
lower = np.array(lower)
upper = np.array(upper)
mask = cv.inRange(img, lower, upper)
vals.append(sum(sum(mask)))
cv.imshow('mask', mask)
cv.waitKey(0)
cv.destroyAllWindows()
And print out the output data using this code:
names = ['alcohol', 'esters', 'biter', 'hoppy', 'acid', 'zoetheid', 'mout']
print(list(zip(names, vals)))
The output is:
[('alcohol', 22118), ('esters', 26000), ('biter', 16245), ('hoppy', 21170), ('acid', 19156), ('zoetheid', 11090), ('mout', 7167)]

Detect the green lines in this image and calculate their lengths

Sample Images
The image can be more noisy at times where more objects intervene from the background. Right now I am using various techniques using the RGB colour space to detect the lines but it fails when there is change in the colour due to intervening obstacles from the background. I am using opencv and python.
I have read that HSV is better for colour detection and used but haven't been successful yet.
I am not able to find a generic solution to this problem. Any hints or clues in this direction would be of great help.

STILL IN PROGRESS
First of all, an RGB image consists of 3 grayscale images. Since you need the green color you will deal only with one channel. The green one. To do so, you can split the image, you can use b,g,r = cv2.split('Your Image'). You will get an output like that if you are showing the green channel:
After that you should threshold the image using your desired way. I prefer Otsu's thresholding in this case. The output after thresholding is:
It's obvious that the thresholded image is extremley noisy. So performing erosion will reduce the noise a little bit. The noise reduced image will be similar to the following:
I tried using closing instead of dilation, but closing preserves some unwanted noise. So I separately performed erosion followed by dilation. After dilation the output is:
Note that: You can do your own way in morphological operation. You can use opening instead of what I did. The results are subjective from
one person to another.
Now you can try one these two methods:
1. Blob Detection.
2. HoughLine Transform.
TODO
Try out these two methods and choose the best.

You should use the fact that you know you are trying to detect a line by using the line hough transform.
http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/hough_lines/hough_lines.html
When the obstacle also look like a line use the fact that you know approximately what is the orientation of the green lines.
If you don't know the orientation of the line use hte fact that there are several green lines with the same orientation and only one line that is the obstacle
Here is a code for what i meant:
import cv2
import numpy as np
# Params
minLineCount = 300 # min number of point alogn line with the a specif orientation
minArea = 100
# Read img
img = cv2.imread('i.png')
greenChannel = img[:,:,1]
# Do noise reduction
iFilter = cv2.bilateralFilter(greenChannel,5,5,5)
# Threshold data
#ret,iThresh = cv2.threshold(iFilter,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
iThresh = (greenChannel > 4).astype(np.uint8)*255
# Remove small areas
se1 = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
iThreshRemove = cv2.morphologyEx(iThresh, cv2.MORPH_OPEN, se1)
# Find edges
iEdge = cv2.Canny(iThreshRemove,50,100)
# Hough line transform
lines = cv2.HoughLines(iEdge, 1, 3.14/180,75)
# Find the theta with the most lines
thetaCounter = dict()
for line in lines:
theta = line[0, 1]
if theta in thetaCounter:
thetaCounter[theta] += 1
else:
thetaCounter[theta] = 1
maxThetaCount = 0
maxTheta = 0
for theta in thetaCounter:
if thetaCounter[theta] > maxThetaCount:
maxThetaCount = thetaCounter[theta]
maxTheta = theta
# Find the rhos that corresponds to max theta
rhoValues = []
for line in lines:
rho = line[0, 0]
theta = line[0, 1]
if theta == maxTheta:
rhoValues.append(rho)
# Go over all the lines with the specific orientation and count the number of pixels on that line
# if the number is bigger than minLineCount draw the pixels in finaImage
lineImage = np.zeros_like(iThresh, np.uint8)
for rho in range(min(rhoValues), max(rhoValues), 1):
a = np.cos(maxTheta)
b = np.sin(maxTheta)
x0 = round(a*rho)
y0 = round(b*rho)
lineCount = 0
pixelList = []
for jump in range(-1000, 1000, 1):
x1 = int(x0 + jump * (-b))
y1 = int(y0 + jump * (a))
if x1 < 0 or y1 < 0 or x1 >= lineImage.shape[1] or y1 >= lineImage.shape[0]:
continue
if iThreshRemove[y1, x1] == int(255):
pixelList.append((y1, x1))
lineCount += 1
if lineCount > minLineCount:
for y,x in pixelList:
lineImage[y, x] = int(255)
# Remove small areas
## Opencv 2.4
im2, contours, hierarchy = cv2.findContours(lineImage,cv2.RETR_CCOMP,cv2.CHAIN_APPROX_NONE )
finalImage = np.zeros_like(lineImage)
finalShapes = []
for contour in contours:
if contour.size > minArea:
finalShapes.append(contour)
cv2.fillPoly(finalImage, finalShapes, 255)
## Opencv 3.0
# output = cv2.connectedComponentsWithStats(lineImage, 8, cv2.CV_32S)
#
# finalImage = np.zeros_like(output[1])
# finalImage = output[1]
# stat = output[2]
# for label in range(output[0]):
# if label == 0:
# continue
# cc = stat[label,:]
# if cc[cv2.CC_STAT_AREA] < minArea:
# finalImage[finalImage == label] = 0
# else:
# finalImage[finalImage == label] = 255
# Show image
#cv2.imwrite('finalImage2.jpg',finalImage)
cv2.imshow('a', finalImage.astype(np.uint8))
cv2.waitKey(0)
and the result for the images:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyTesseract - Text broken by horizontal white lines - python

Related

Would this be the correct method of finding the position of a pixel underneath another pixel?

How to improve accuracy of OpenCV matching results in Python

Change colors of photos in PIL

Converting graphs from a scanned document into data

Detect the green lines in this image and calculate their lengths

Categories

Resources