I am working on a program that uses a webcam to read constantly changing digits off of a screen using pytesseract (long story). It takes an image of the whole screen, then cuts out each number needed to be recorded (there are 23 of them) using predetermined coordinates stored in the list called 'roi'. There are some other steps but this is the most important part. Currently it is adding, deleting, and changing numbers constantly, but not consistently. Here are some examples:
It reads this incorrectly as '32.0'
It reads this correctly as '52.0'
It reads this incorrectly as '39.3'
It reads this incorrectly as '2499.1'
These images have already been processed using OpenCV, and it's what all the images in the roi set look like. Based on other answers, I have binarized it, tried to clean up the edges, and put a white border around the image (see code).
This program reads the screen every 30 seconds, sometimes getting it right, other times getting it wrong. Many times it likes change 5s into 3s, 3s into 5s, and 5s into 9s. Sometimes it just misses or adds digits altogether. Below is my code for processing the images.
pytesseract.pytesseract.tesseract_cmd = #tesseract file path
scale = 1.4
img = cv2.imread(#image file path#)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.rotate(img, cv2.ROTATE_180)
width = int(img.shape[1] / scale)
height = int(img.shape[0] / scale)
dim = (width, height)
img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
cv2.destroyAllWindows()
myData = []
cong = r'--psm 6 -c tessedit_char_whitelist=+0123456789.-'
for x,r in enumerate(roi):
imgCrop = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
scalebig = 0.2
wid = int(imgCrop.shape[1] / scalebig)
hei = int(imgCrop.shape[0] / scalebig)
newdims = (wid, hei)
imgCrop = cv2.resize(imgCrop, newdims)
imgCrop = cv2.threshold(imgCrop,155,255,cv2.THRESH_BINARY)[1]
kernel2 = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
imgCrop = cv2.morphologyEx(imgCrop, cv2.MORPH_CLOSE, kernel2, iterations=2)
value = [255,255,255]
imgCrop = cv2.copyMakeBorder(imgCrop, 10, 10, 10, 10, cv2.BORDER_CONSTANT, None, value = value)
datapoint = pytesseract.image_to_string(imgCrop, lang='eng', config=cong)
myData.append(datapoint)
The output is the pictures I linked above.
I have looked into fine tuning it, but I have a Windows machine and I can't seem to find a good tutorial. I am not a programmer by trade, I spent 2 months teaching myself Python to do this, but the machine learning aspect of Tesseract has me spinning, and I don't know how else to fix remarkably inconsistent readings. If you need any further info please ask and I'll be happy to tell you.
Edit: Added some more incorrectly read images for reference
Make sure you use the right image format (jpeg is the wrong format for OCR)
In the case of the tesseract LSTM engine make sure the letter size is not bigger than 35 points.
With tesseract best_tessdata I got these results:
tesseract 593_small.png -
59.3
tesseract 520_small.png -
52.0
tesseract 2491_small.png -
249.1
Related
I want to find a way to detect the red number 3 which is on a red background. I've tried changing the contrast on the image, as well as also trying a blur + adaptive thresholding, which both don't detect anything. What's interesting is I can't detect single numbers, but can detect 2 numbers next to each other at nearly 100% accuracy using the same two methods above. I think it's because the background is lighter when it's just one number, so the OCR is having trouble finding it.
Here's the number 3 from the original image (it's 96 dpi): (https://i.stack.imgur.com/t0VR7.jpg)
I changed the contrast on the image by using the following code, and then cropped it to just show the number.
img = cv2.imread(path_to_img, 0)
alpha = 3 # Contrast control (1.0-3.0)
beta = 0 # Brightness control (0-100)
images_contrast = cv2.convertScaleAbs(img, alpha=alpha, beta=beta)
cropped = images_contrast[885:917, 1008:1055]
cv2.imshow("contrast.jpg", cropped)
cv2.imwrite("contrast_easyOCR.jpg", cropped)
cv2.waitKey(0)
cv2.destroyAllWindows()
reader = easyocr.Reader(['en'], gpu=False, verbose=False)
result_Kripp_Hp = reader.readtext(cropped, allowlist="-0123456789")
print(result_Kripp_Hp)
This is the result: 3hp after changing contrast
I also tried a medianblur + adaptive thresholding, which gets me this: (https://i.stack.imgur.com/ezpVD.jpg)
Code below:
img = cv2.imread(path_to_img, 0)
img = cv2.medianBlur(img, 3)
adapt_Thresholding = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
images = adapt_Thresholding
cropped_adaptive_thresholding = images[885:917, 1011:1055]
cv2.imshow("adaptiveThresholding.jpg", cropped_adaptive_thresholding)
cv2.imwrite("adaptThreshold_easyOCR.jpg", cropped_adaptive_thresholding)
cv2.waitKey(0)
cv2.destroyAllWindows()
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
num = pytesseract.image_to_data(cropped_adaptive_thresholding, config='--psm 11 -c tessedit_char_whitelist=0123456789')
print(num)
Both of the above result in no detection by easyocr and pytesseract.
Lastly, easyocr is finding 37 at 99.9% confidence using the contrast code (near the top of this post), which I find a bit odd. Image here: easyocr detects this as '37' correctly at 99.9% confidence
Another thing I tried was messing around with the image in GIMP, and after adding some black pixels to the perimeter of my '3' and then running it through the 'contrast code' above, it detected the 3 correctly at 99.9% confidence. Here's the image: (https://i.stack.imgur.com/fEZ0i.jpg). I think thickening the black line around the 3 would work, but I couldn't figure out how to do this with opencv / python.
Any tips / suggestions (I'm coding in Python) would be greatly appreciated! Thank you.
So guys I'll explain quickly.
I have a fixed camera and I took a photo thus obtaining the "background".
Then my friend stood in front of the camera and I took another photo.
I want to get an image where there is only my friend in the foreground and where it is necessary to delete the background.
I have tried many methods (absdiff(), tensorflow + bodypix and more) but the only method that is giving me good results is using SubtractorKNN
import numpy as np
import cv2
import sys
backgroundSubtractor = cv2.createBackgroundSubtractorKNN(detectShadows=True)
# apply the algorithm for background images using learning rate > 0
for i in range(1, 16):
bgImageFile = "background.jpg"
print ("Opening background", bgImageFile)
bg = cv2.imread(bgImageFile)
backgroundSubtractor.apply(bg, learningRate=0.9)
# apply the algorithm for detection image using learning rate 0
stillFrame = cv2.imread("background-with-friend.jpg")
fgmask = backgroundSubtractor.apply(stillFrame, learningRate=0.9)
kernel = np.ones((3,3),np.uint8)
morphology_img = cv2.morphologyEx(fgmask, cv2.MORPH_OPEN,kernel,iterations=1)
nuovo = morphology_img
#'nuovo.jpg' is nuovo
ok= cv2.imread('nuovo.jpg')
giona = cv2.cvtColor(ok, cv2.COLOR_BGR2GRAY)
ret,range = cv2.threshold(giona,250,255,cv2.THRESH_BINARY)
cv2.imshow("nuovo kernel", cv2.resize(nuovo, (0, 0), fx=0.5, fy=0.5))
cv2.imshow("range", cv2.resize(range, (0, 0), fx=0.5, fy=0.5))
THE QUESTION IS:
there is a way to reconstruct the outline (e.g. left leg, face, arms), fill inside (then eliminate the black spots) and eliminate the white dots in the background (I have already used cv2.morphologyEx but use a kernel bigger would have further ruined the outline of the person).
Is possible ?
If I can get the figure of the person then I can remove the background from the original image.
EDIT
I used cv2.createBackgroundSubtractorKNN, then cv2.morphologyExand finally cv2.threshold(img,250,255,cv2.THRESH_BINARY)to delete shadows
I have an image processing problem that I can't solve. I have a set of 375 images like the one below (1). I'm trying to remove the background, so to make "background substraction" (or "foreground extraction") and get only the waste on a plain background (black/white/...).
(1) Image example
I tried many things, including createBackgroundSubtractorMOG2 from OpenCV, or threshold. I also tried to remove the background pixel by pixel by subtracting it from the foreground because I have a set of 237 background images (2) (the carpet without the waste, but which is a little bit offset from the image with the objects). There are also variations in brightness on the background images.
(2) Example of a background image
Here is a code example that I was able to test and that gives me the results below (3) and (4). I use Python 3.8.3.
# Function to remove the sides of the images
def delete_side(img, x_left, x_right):
for i in range(img.shape[0]):
for j in range(img.shape[1]):
if j<=x_left or j>=x_right:
img[i,j] = (0,0,0)
return img
# Intialize the background model
backSub = cv2.createBackgroundSubtractorMOG2(history=250, varThreshold=2, detectShadows=True)
# Read the frames and update the background model
for frame in frames:
if frame.endswith(".png"):
filepath = FRAMES_FOLDER + '/' + frame
img = cv2.imread(filepath)
img_cut = delete_side(img, x_left=190, x_right=1280)
gray = cv2.cvtColor(img_cut, cv2.COLOR_BGR2GRAY)
mask = backSub.apply(gray)
newimage = cv2.bitwise_or(img, img, mask=mask)
img_blurred = cv2.GaussianBlur(newimage, (5, 5), 0)
gray2 = cv2.cvtColor(img_blurred, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray2, 10, 255, cv2.THRESH_BINARY)
final = cv2.bitwise_or(img, img, mask=binary)
newpath = RESULT_FOLDER + '/' + frame
cv2.imwrite(newpath, final)
I was inspired by many other cases found on Stackoverflow or others (example: removing pixels less than n size(noise) in an image - open CV python).
(3) The result obtained with the code above
(4) Result when increasing the varThreshold argument to 10
Unfortunately, there is still a lot of noise on the resulting pictures.
As a beginner in "background substraction", I don't have all the keys to get an optimal solution. If someone would have an idea to do this task in a more efficient and clean way (Is there a special method to handle the case of transparent objects? Can noise on objects be eliminated more effectively? etc.), I'm interested :)
Thanks
Thanks for your answers. For information, I simply change of methodology and use a segmentation model (U-Net) with 2 labels (foreground, background), to identify the background. It works quite well.
I am trying to write a function that will take a jpg of a floorplan of a house and use OCR to extract the square footage that is written somewhere on the image
import requests
from PIL import Image
import pytesseract
import pandas as pd
import numpy as np
import cv2
import io
def floorplan_ocr(url):
""" a row-wise function to use pytesseract to scrape the word data from the floorplan
images, requires tesseract
to be installed https://github.com/tesseract-ocr/tesseract/wiki"""
if pd.isna(url):
return np.nan
res = ''
response = requests.get(url, stream=True)
if response.status_code == 200:
img = response.raw
img = np.asarray(bytearray(img.read()), dtype="uint8")
img = cv2.imdecode(img, cv2.CV_8UC1)
img = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY,11,2)
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
res = pytesseract.image_to_string(img, lang='eng', config='--remove-background')
del response
del img
else:
return np.nan
#print(res)
return res
However I am not getting much success. Only about 1 in 4 images actually outputs text that contains the square footage.
e.g currently
floorplan_ocr(https://i.imgur.com/9qwozIb.jpg) outputs 'K\'Fréfiéfimmimmuuéé\n2|; apprnxx 135 max\nGArhaPpmxd1m max\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\nTOTAL APPaux noon AREA 523 so Fr, us. a 50. M )\nav .Wzms him "a! m m... mi unwary mmnmrmm mma y“ mum“;\n‘ wmduw: reams m wuhrmmm mm“ .m nanspmmmmy 3 mm :51\nmm" m mmm m; wan wmumw- mm my and mm mm as m by any\nwfmw PM” rmwm mm m .pwmwm m. mum mud ms nu mum.\n(.5 n: ma undammmw an we Ewen\nM vagw‘m Mewpkeem' (and takes a long time to do it)
floorplan_ocr(https://i.imgur.com/sjxMpVp.jpg) outputs ' '.
I think some of the issues I am facing are:
text may be greyscale
Images are low DPI (appears to be some debate if this is actually important or if it the total resolution)
Text is not formatted consistently
I am stuck and am struggling to improve my results. All I want to extract is 'XXX sq ft' (and all the ways that might be written)
Is there a better way to do this?
Many thanks.
By applying these few lines to resize and change contrast/brightness on your second image, after cropping the bottom quarter of the image :
img = cv2.imread("download.jpg")
img = cv2.resize(img, (0, 0), fx=2, fy=2)
img = cv2.convertScaleAbs(img, alpha=1.2, beta=-40)
text = pytesseract.image_to_string(img, config='-l eng --oem 1 --psm 3')
i managed to get this result :
TOTAL APPROX. FLOOR AREA 528 SQ.FT. (49.0 SQ.M.)
Whilst every attempt has been made to ensure the accuracy of the floor
plan contained here, measurements: of doors, windows, rooms and any
other items are approximate and no responsibility ts taken for any
error, omission, or mis-statement. This plan is for #ustrative
purposes only and should be used as such by any prospective purchaser.
The services, systems and appliances shown have not been tested and no
guarantee a8 to the operability or efficiency can be given Made with
Metropix ©2019
I did not treshold the image as your images structures vary from one another, and since the image is not only text, OTSU Thresholding does not find the right value.
To answer everything: Tesseract actually work best with grayscale image (black text on white background).
About the DPI/Resolution question, there is indeed some debate but there is also some empirical truth : DPI value doesn't really matters (since text size can vary for same DPI). For Tesseract OCR to work best, your characters need to be (edited :) 30-33 pixels (height), smaller by a few px can make Tesseract almost useless, and bigger characters actually reduce accuracy, though not significantly. (edit : found the source -> https://groups.google.com/forum/#!msg/tesseract-ocr/Wdh_JJwnw94/24JHDYQbBQAJ)
Finally, text format doesn't really change (at least in your examples). So your main problem here is text size, and the fact that you parse a whole page. If the text line you want is consistently at the bottom of the image, just extract (slice) your original image so you only feed Tesseract the relevent data, wich also will make it way faster.
EDIT :
If you were also searching for a way to extract the square footage from your ocr'ed text :
text = "some place holder text 5471 square feet some more text"
# store here all the possible way it can be written
sqft_list = ["sq ft", "square feet", "sqft"]
extracted_value = ""
for sqft in sqft_list:
if sqft in text:
start = text.index(sqft) - 1
end = start + len(sqft) + 1
while text[start - 1] != " ":
start -= 1
extracted_value = text[start:end]
break
print(extracted_value)
5471 square feet
All of the pixelation around the text makes it harder for Tesseract to do its thing.
I used a simple brightness/contrast algorithm from here to make the dots go away. I didn't do any thresholding/binarization. But I did have to scale the image to get any character recognition.
import pytesseract
import numpy as np
import cv2
img = cv2.imread('floor_original.jpg', 0) # read as grayscale
img = cv2.resize(img, (0,0), fx=2, fy=2) # scale image 2X
alpha = 1.2
beta = -20
img = cv2.addWeighted( img, alpha, img, 0, beta)
cv2.imwrite('output.png', img)
res = pytesseract.image_to_string(img, lang='eng', config='--remove-background')
print(res)
Edit
There may be some platform/version dependence on above code. It runs on my Linux machine, but not on my Windows machine. To get it to run on Windows, I modified last two lines to
res = pytesseract.image_to_string(img, lang='eng', config='remove-background')
print(res.encode())
Output from tesseract(bolding added by me to emphasize sq footage):
TT xs?
IN
Approximate Gross Internal Area = 50.7 sq m / 546 sq ft
All dimensions are estimates only and may not be exact meas ent plans
are subject lo change The sketches. renderngs graph matenala, lava,
apectes
ne developer, the management company, the owners and other affiliates
re rng oo all of ma ther sole discrebon and without enor scbioe
jements Araxs are approximate
Image after processing:
My simple resizing code is returning a black square in the desired size. This is obviously some rookie error but I can't work out for the life of me what it is.
Has nothing to do with compression as I have tried with a blank image and the same result occurs.
img = cv2.imread('imageToSave.jpg')
# percent of original size
width = 28
height = 28
dim = (width, height)
res = cv2.resize(img, dim)
cv2.imwrite('imageToSave.jpg',res)
Ideally the result would be a rescaled version of the 'imageToSave file
I may be too down to earth but did you try to display your picture after the reading and after your resize instead of saving it locally?
These two step will locate the issue, and you'll fix it quick i think :)