I've been working with pytesseract the past days, and I've noticed that the library is quite bad at identifying numbers. I do not know, if I am doing something wrong, but I keep getting ♀ as an output.
class Image_Recognition():
def digit_identification(self):
# save normal screenshot
screen = ImageGrab.grab(bbox=(706,226,1200,726))
screen.save(r'tmp\tmp.png')
# read the image file
img = cv2.imread(r'tmp\tmp.png', 2)
# convert to binary image
[ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
# use OCR library to identify numbers in screenshot
text = pytesseract.image_to_string(bw_img)
print(text)
INPUT:
(Converted to a binary image in order to make numbers more intelligible.)
OUTPUT:
♀
Tell me if there is something off, or just suggest other approaches for handling text recognition.
First of all, please read the article Improving the quality of the output, especially the section regarding the page segmentation method. Also, you can limit the characters to be found to digits 0-9.
You have a tiny image, which makes extraction of all numbers at once quite challenging, especially for the mixture of bright text on dark background and vice versa. But, you can quite easily crop all the single tiles, and extract the numbers one by one. So, no distinction between these two types of tiles needs to be made.
Also, you know, that numbers must be multiples of two (I guess, most people will know 2048). So, if no such a number could be found, try upscaling the cropped tile, and repeat. (Eventually, give up after a few times.)
That'd be my full code:
import cv2
import math
import pytesseract
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
return math.log10(x) / math.log10(2)
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
return math.ceil(log2(n)) == math.floor(log2(n))
# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]
# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)
# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
# https://stackoverflow.com/q/4944830/11089932
config = '--psm 6 -c tessedit_char_whitelist=0123456789'
# Iterate tiles, and extract texts
for i in range(4):
for j in range(4):
# Crop tile
x1 = i * w
x2 = (i + 1) * w
y1 = j * h
y2 = (j + 1) * h
roi = img[y1:y2, x1:x2]
# If no proper power of 2 is found, upscale image and repeat
while True:
text = pytesseract.image_to_string(roi, config=config)
text = text.replace('\n', '').replace('\f', '')
if (text == '') or (not is_power_of_2(int(text))):
roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
if roi.shape[0] > 1000:
a[j, i] = -1
break
else:
a[j, i] = int(text)
break
print(a)
For the given image, I get the following output:
[[ 8 16 4 2]
[ 2 8 32 8]
[ 2 4 16 4]
[ 4 2 4 2]]
For another similar image
I get:
[[ 4 -1 -1 -1]
[ 2 2 -1 -1]
[-1 -1 -1 -1]
[ 2 -1 -1 -1]]
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.3
OpenCV: 4.5.3
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
I need to know how can I capture the extreme edges of a line(starting and ending coordinates) after identifying the contours. Currently, I am identifying the contours for the shapes(different types of lines) in the following image and drawing them back to a new image. I have already tried to obtain the topmost, the bottom-most, the leftmost, and the right-most coordinates from the contours array but they will not accurate to a line that has curves like below. So is there any way to capture those starting and ending points from the contours array?
Source Code
import cv2
import numpy as np
# Let's load a simple image with 3 black squares
image = cv2.imread("C:/Users/Hasindu/3D Objects/edge-test-188.jpg")
cv2.waitKey(0)
# Grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find Canny edges
edged = cv2.Canny(gray, 30, 200)
cv2.waitKey(0)
# Finding Contours
# Use a copy of the image e.g. edged.copy()
# since findContours alters the image
contours, hierarchy = cv2.findContours(edged,
cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cv2.imshow('Canny Edges After Contouring', edged)
cv2.waitKey(0)
print("Number of Contours found = " + str(len(contours)))
print(contours)
topmost = tuple(contours[0][contours[0][:,:,1].argmin()][0]);
bottommost = tuple(contours[0][contours[0][:,:,1].argmax()][0]);
print(topmost);
print(bottommost);
# Draw all contours
# -1 signifies drawing all contours
cv2.drawContours(image, contours, -1, (0, 255, 0), 3)
cv2.imshow('Contours', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Input 1
Output 1
EDIT:
I have followed the solution suggested by stateMachine but it was not 100 percent accurate on my all the inputs.You can see clearly some of the endpoints on the 2nd input image are not detected by the solution.
Input 2
Output 2
One possible solution involves applying the approach in this post. It involves convolving the input image with a special kernel used to identify end-points. These are the steps:
Converting the image to grayscale
Getting a binary image by applying Otsu's thresholding to the grayscale image
Applying a little bit of morphology, to ensure we have continuous and closed curves
Compute the skeleton of the image
Convolve the skeleton with the end-points kernel
Draw the end-points on the original image
Let's see the code:
# Imports:
import cv2
import numpy as np
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Prepare a deep copy of the input for results:
inputImageCopy = inputImage.copy()
# Grayscale conversion:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
_, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
The first bit is very straightforward. Just get a binary image using Otsu's Thresholding. This is the result:
The thresholding could miss some pixels inside the curves, leading to "gaps". We don't want that, because we are trying to identify the end-points, which are essentially gaps on the curves. Let's fill possible gaps using a little bit of morphology - a closing will help fill those smaller gaps:
# Set morph operation iterations:
opIterations = 2
# Get the structuring element:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
# Perform Closing:
binaryImage = cv2.morphologyEx(binaryImage, cv2.MORPH_CLOSE, kernel, None, None, opIterations, cv2.BORDER_REFLECT101)
This is now the result:
Ok, what follows is getting the skeleton of the binary image. The skeleton is a version of the binary image where lines have been normalized to have a width of 1 pixel. This is useful because we can then convolve the image with a 3 x 3 kernel and look for specific pixel patterns - those that identify a end-point. Let's compute the skeleton using OpenCV's extended image processing module:
# Compute the skeleton:
skeleton = cv2.ximgproc.thinning(binaryImage, None, 1)
Nothing fancy, the thing is done in just one line of code. The result is this:
It is very subtle with this image, but the curves have now 1 px of width, so we can apply the convolution. The main idea of this approach is that the convolution yields a very specific value where patterns of black and white pixels are found in the input image. The value we are looking for is 110, but we need to perform some operations before the actual convolution. Refer to the original post for details. These are the operations:
# Threshold the image so that white pixels get a value of 0 and
# black pixels a value of 10:
_, binaryImage = cv2.threshold(skeleton, 128, 10, cv2.THRESH_BINARY)
# Set the end-points kernel:
h = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
# Convolve the image with the kernel:
imgFiltered = cv2.filter2D(binaryImage, -1, h)
# Extract only the end-points pixels, those with
# an intensity value of 110:
endPointsMask = np.where(imgFiltered == 110, 255, 0)
# The above operation converted the image to 32-bit float,
# convert back to 8-bit uint
endPointsMask = endPointsMask.astype(np.uint8)
If we imshow the endPointsMask, we would get something like this:
In the above image, you can see the location of the identified end-points. Let's get the coordinates of these white pixels:
# Get the coordinates of the end-points:
(Y, X) = np.where(endPointsMask == 255)
Finally, let's draw circles on these locations:
# Draw the end-points:
for i in range(len(X)):
# Get coordinates:
x = X[i]
y = Y[i]
# Set circle color:
color = (0, 0, 255)
# Draw Circle
cv2.circle(inputImageCopy, (x, y), 3, color, -1)
cv2.imshow("Points", inputImageCopy)
cv2.waitKey(0)
This is the final result:
EDIT: Identifying which blob produces each set of points
Since you need to also know which blob/contour/curve produced each set of end-points, you can re-work the code below with some other functions to achieve just that. Here, I'll mainly rely on a previous function I wrote that is used to detect the biggest blob in an image. One of the two curves will always be bigger (i.e., have a larger area) than the other. If you extract this curve, process it, and then subtract it from the original image iteratively, you could process curve by curve, and each time you could know which curve (the current biggest one) produced the current end-points. Let's modify the code to implement these ideas:
# Imports:
import cv2
import numpy as np
# image path
path = "D://opencvImages//"
fileName = "w97nr.jpg"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Deep copy for results:
inputImageCopy = inputImage.copy()
# Grayscale conversion:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
_, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Set morph operation iterations:
opIterations = 2
# Get the structuring element:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
# Perform Closing:
binaryImage = cv2.morphologyEx(binaryImage, cv2.MORPH_CLOSE, kernel, None, None, opIterations, cv2.BORDER_REFLECT101)
# Compute the skeleton:
skeleton = cv2.ximgproc.thinning(binaryImage, None, 1)
Up until the skeleton computation, everything is the same. Now, we will extract the current biggest blob and process it to obtain its end-points, we will continue extracting the current biggest blob until there are no more curves to extract. So, we just modify the prior code to manage the iterative nature of this idea. Additionaly, let's store the end-points on a list. Each row of this list will denote a new curve:
# Processing flag:
processBlobs = True
# Shallow copy for processing loop:
blobsImage = skeleton
# Store points per blob here:
blobPoints = []
# Count the number of processed blobs:
blobCounter = 0
# Start processing blobs:
while processBlobs:
# Find biggest blob on image:
biggestBlob = findBiggestBlob(blobsImage)
# Prepare image for next iteration, remove
# currrently processed blob:
blobsImage = cv2.bitwise_xor(blobsImage, biggestBlob)
# Count number of white pixels:
whitePixelsCount = cv2.countNonZero(blobsImage)
# If the image is completely black (no white pixels)
# there are no more curves to process:
if whitePixelsCount == 0:
processBlobs = False
# Threshold the image so that white pixels get a value of 0 and
# black pixels a value of 10:
_, binaryImage = cv2.threshold(biggestBlob, 128, 10, cv2.THRESH_BINARY)
# Set the end-points kernel:
h = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
# Convolve the image with the kernel:
imgFiltered = cv2.filter2D(binaryImage, -1, h)
# Extract only the end-points pixels, those with
# an intensity value of 110:
endPointsMask = np.where(imgFiltered == 110, 255, 0)
# The above operation converted the image to 32-bit float,
# convert back to 8-bit uint
endPointsMask = endPointsMask.astype(np.uint8)
# Get the coordinates of the end-points:
(Y, X) = np.where(endPointsMask == 255)
# Prepare random color:
color = (np.random.randint(low=0, high=256), np.random.randint(low=0, high=256), np.random.randint(low=0, high=256))
# Prepare id string:
string = "Blob: "+str(blobCounter)
font = cv2.FONT_HERSHEY_COMPLEX
tx = 10
ty = 10 + 10 * blobCounter
cv2.putText(inputImageCopy, string, (tx, ty), font, 0.3, color, 1)
# Store these points in list:
blobPoints.append((X,Y, blobCounter))
blobCounter = blobCounter + 1
# Draw the end-points:
for i in range(len(X)):
x = X[i]
y = Y[i]
cv2.circle(inputImageCopy, (x, y), 3, color, -1)
cv2.imshow("Points", inputImageCopy)
cv2.waitKey(0)
This loop extracts the biggest blob and processes it just like in the first part of the post - we convolve the image with the end-point kernel and locate the matching points. For the original input, this would be the result:
As you see, each set of points is drawn using one unique color (randomly generated). There's also the current blob "ID" (just an ascending count) drawn in text with the same color as each set of points, so you know which blob produced each set of end-points. The info is stored in the blobPoints list, we can print its values, like this:
# How many blobs where found:
blobCount = len(blobPoints)
print("Found: "+str(blobCount)+" blobs.")
# Let's check out each blob and their end-points:
for b in range(blobCount):
# Fetch data:
p1 = blobPoints[b][0]
p2 = blobPoints[b][1]
id = blobPoints[b][2]
# Print data for each blob:
print("Blob: "+str(b)+" p1: "+str(p1)+" p2: "+str(p2)+" id: "+str(id))
Which prints:
Found: 2 blobs.
Blob: 0 p1: [39 66] p2: [ 42 104] id: 0
Blob: 1 p1: [129 119] p2: [25 49] id: 1
This is the implementation of the findBiggestBlob function, which just computes the biggest blob on the image using its area. It returns an image of the biggest blob isolated, this comes from a C++ implementation I wrote of the same idea:
def findBiggestBlob(inputImage):
# Store a copy of the input image:
biggestBlob = inputImage.copy()
# Set initial values for the
# largest contour:
largestArea = 0
largestContourIndex = 0
# Find the contours on the binary image:
contours, hierarchy = cv2.findContours(inputImage, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
# Get the largest contour in the contours list:
for i, cc in enumerate(contours):
# Find the area of the contour:
area = cv2.contourArea(cc)
# Store the index of the largest contour:
if area > largestArea:
largestArea = area
largestContourIndex = i
# Once we get the biggest blob, paint it black:
tempMat = inputImage.copy()
cv2.drawContours(tempMat, contours, largestContourIndex, (0, 0, 0), -1, 8, hierarchy)
# Erase smaller blobs:
biggestBlob = biggestBlob - tempMat
return biggestBlob
After finding the contours, we apply a special filter for each point of the contour. It applies the mask centered around each pixel, then finds the contours (or connected components or blobs) in the masked region. Ideally, for end points there'll be only one blob in the region, for other points there'll be more than one. We take the candidate end points for each contour, then cluster them into two clusters because there'll be more than two candidates because of the filter width and line thickness. If the clustering outputs two points, they are the end points of the processed contour.
An example is shown below.
mask:
1 1 1 1 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 1
image:
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 0 0 0
0 0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
response for end point:
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 & 0 0 1 1 1 = 0 0 0 0 1
1 0 0 0 1 0 0 0 1 1 0 0 0 0 1
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
response for corner point:
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 & 1 1 1 0 0 = 1 0 0 0 0
1 0 0 0 1 1 1 1 0 0 1 0 0 0 0
1 1 1 1 1 0 0 1 0 0 0 0 1 0 0
This won't work well if the image is too noisy or your inputs are jpegs and the thresholding isn't good, because it can introduce some stray components in the masked region so that the point is counted as not a candidate for an end point.
If the lines in your input image (or the thresholded image) are more than 2 pixels wide, you can change the filter radius (r in the code).
If the line gap is less than two pixels, again you'll have problems with the current filter or anything larger. But in this case, you can draw each contour in a separate image and then apply the filter, but I haven't done this in the code for simplicity.
Here, we are using CHAIN_APPROX_SIMPLE to reduce the contour pixel count, and Otsu thresholding. For simplicity, the code does not handle cases where contour points fall at image boundaries.
import cv2 as cv
import numpy as np
im = cv.imread('dclSa.jpg')
gray = cv.cvtColor(im, cv.COLOR_BGR2GRAY)
# apply Otsu threshold
th, bw = cv.threshold(gray, 0, 1, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
# find contours
contours, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
# create filter
r = 2
mask = np.ones((2*r+1, 2*r+1), dtype=np.uint8)
mask[1:2*r, 1:2*r] = 0
#print mask
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 5, 1.0)
for contour in contours:
all_x = []
all_y = []
for point in contour:
x = point[0][0]
y = point[0][1]
# extract the region centered around the contour pixel
roi = bw[y-r:y+r+1, x-r:x+r+1]
# find the blobs in masked region
n, _ = cv.findContours(roi & mask, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
# if the blob count is 1, this pixel is an end point candidate
# if you use cv.connectedComponents to find blobs, then check 2 == n as it counts background
if 1 == len(n):
all_x.append(x)
all_y.append(y)
# if there are no candidate points, check next contour
if not all_x:
continue
# we are done with a contour. cluster the end point candidates into two clusters
points = np.vstack((all_x, all_y))
_, _, endpoints = cv.kmeans(np.float32(points.transpose()), 2, None, criteria, 5, cv.KMEANS_RANDOM_CENTERS)
# if the clustering goes well, we'll have the two end points of the contour
if 2 == len(endpoints) and 2 == len(endpoints[0]) and 2 == len(endpoints[1]):
im = cv.circle(im, (int(endpoints[0][0]), int(endpoints[0][1])), 3, (255, 0, 0), -1)
im = cv.circle(im, (int(endpoints[1][0]), int(endpoints[1][1])), 3, (0, 255, 0), -1)
I am trying to generate synthetic images for my deep learning model. I need to draw scratches on a black surface. I already have a little script that can generate random white scratch like lines but only horizontally. I need the scratches to also be vertically and curved. On top of that it would also be very helpfull if the thickness of the scratches would also be random so I have thick and thin scratches.
This is my code so far:
import cv2
import numpy as np
import random
height = 384
width = 384
blank_image = np.zeros((height, width, 3), np.uint8)
num_scratches= random.randint(0,5)
for _ in range(num_scratches):
row_random = random.randint(20,370)
blank_image[row_random:(row_random+1), row_random:(row_random+random.randint(25,75))] = (255,255,255)
cv2.imshow("synthetic", blank_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
This is one example result outcome:
How do I have to edit my script so I can get more diverse looking scratches?
The scratches should somehow look like this for example (Done with paint):
need the scratches to also be vertically
Your method might be adopted as follows
import numpy as np # cv2 read image into np.array
img = np.zeros((5,5),dtype='uint8') # same as loading 5 x 5 px black rectangle
img[1:4,2:3] = 255
print(img)
Output:
[[ 0 0 0 0 0]
[ 0 0 255 0 0]
[ 0 0 255 0 0]
[ 0 0 255 0 0]
[ 0 0 0 0 0]]
Explanation: I set all elements (pixel) which have y-cordinate between 1 (inclusive) and 4 (exclusive) and x-cordinate between 2 (inclusive) and 3 (exclusive).
Nonetheless cv2 provide function for drawing lines namely cv2.line which is more handy to use, it does accept img on which to work, start point, end point, color and thickness, docs give following example:
# Draw a diagonal blue line with thickness of 5 px
img = cv2.line(img,(0,0),(511,511),(255,0,0),5)
If you are working in grayscale use value rather than 3-tuple as color.
I'm making a pipe connection game in Pygame where you rotate a set of given pipe pieces to connect the start to the end, and am having trouble making a way for the game to actually be completed. I need to find if any path of white pixels (from the pipe piece sprites) connects any red pixel to any blue pixel (the colours of the start and end pieces). How could I go about doing this? The background colour is black, if that helps.
I would make a scraper-like tool:
1 From each red pixel check for connecting white/blue pixels, put these in a list
2 From each white pixel in the list, check if a blue pixel can be reached, if so , return True
3 add any white pixels that connect to white pixels already in the list (but don't add any that already are in the list)
4 if no new white pixels where added and no blue pixels where found, return False. else, go back to step 2
You would not solve this using colors at all. The colors are just used during the visualization. Anything else is abstract.
e.g. every field is a 3 x 3 array. You could handle more types by adding a dimension.
You can use separate matrices for colors, like this (0 is no connection, 1 is a connection.
x 1 x x 1 x x 0 x
1 0 0 1 0 x 0 0 0
x 0 x x 1 x x 0 x
You could even use bit-field magic and use a single matrix per field:
x 1 x
2 0 3
x 4 x
bit 0 (1) = red
bit 1 (2) = green
bit 2 (4) = blue
0 = no connection
1 = red
2 = green
3 = red, green
4 = blue
5 = blue, red
6 = green, blue
7 = red, green, blue