Recognizing matrix from image - python

I have written algorithm that solves the pluszle game matrix.
Input is numpy array.
Now I want to recognize the digits of matrix from screenshot.
There are different levels, this is hard one:
And this is easy one:
the output of recognition should be numpy array
array([[6, 2, 4, 2],
[7, 8, 9, 7],
[1, 2, 4, 4],
[7, 2, 4, 0]])
I have tried to feed last image to tesseract
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/79017/screen_plus.jpg')))
The output is unacceptable
LEVEL 4
(}00:03 M0
J] —.°—#—#©
I think that I should use contours from opencv, because the font is always the same. maybe I should save contours for every digit, than save every countour that exist on screenshot than somehow make matrix from coordinates of every digit-contour. But I have no idea how to do it.

1- Binarize
Tesseract needs you to binarize the image first. No need for contour or any convolution here. Just a threshold should do. Especially considering that you are trying to che... I mean win intelligently to a specific game. So I guess you are open to some ad-hoc adjustments.
For example, (hard<240).any(axis=2) put in white (True) everything that is not white on the original image, and black the white parts.
Note that you don't get the sums (or whatever they are, I don't know what this game is) here. Which are on the contrary almost black areas
But you can have them with another filter
(hard>120).any(axis=2)
You could merge those filters, obviously
(hard<240).any(axis=2) & (hard>120).any(axis=2)
But that may not be a good idea: after all, it gives you an opportunity to distinguish to different kind of data, why you may want to do.
2- Restrict
Secondly, you know you are looking for digits, so, restrict to digits. By adding config='digits' to your pytesseract args.
pytesseract.image_to_string((hard>240).all(axis=2))
# 'LEVEL10\nNOVEMBER 2022\n\n™\noe\nOs\nfoo)\nso\n‘|\noO\n\n9949 6 2 2 8\n\nN W\nN ©\nOo w\nVon\n+? ah ®)\nas\noOo\n©\n\n \n\x0c'
pytesseract.image_to_string((hard>240).all(axis=2), config='digits')
# '10\n2022\n\n99496228\n\n17\n-\n\n \n\x0c'
3- Don't use image_to_string
Use image_to_data preferably.
It gives you bounding boxes of text.
Or even image_to_boxes which give you digits one by one, with coordinates
Because image_to_string is for when you have a good old linear text in the image. image_to_data or image_to_boxes assumes that text is distributed all around, and give you piece of text with position.
image_to_string on such image may intervert what you would consider the logical order
4- Select areas yourself
Since it is an ad-hoc usage for a specific application, you know where the data are.
For example, your main matrix seems to be in area
hard[740:1512, 132:910]
See
print(pytesseract.image_to_boxes((hard[740:1512, 132:910]<240).any(axis=2), config='digits'))
Not only it avoids flooding you with irrelevant data. But also, tesseract performs better when called only with an image without other things than what you want to read.
Seems to have almost all your digits here.
5- Don't expect for miracles
Tesseract is one of the best OCR. But OCR are not a sure thing...
See what I get with this code (summarizing what I've said so far), printing in red digits detected by tesseract just next to where they were found in the real image.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pytesseract
hard=cv2.imread("hard.jpg")
hard=hard[740:1512, 132:910]
bin=(hard<240).any(axis=2)
boxes=[s.split(' ') for s in pytesseract.image_to_boxes(bin, config='digits').split('\n')[:-1]]
out=hard.copy() # Just to avoid altering original image, in case we want to retry with other parameters
H=len(hard)
for b in boxes:
cv2.putText(out, b[0], (30+int(b[1]), H-int(b[2])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
plt.imshow(cv2.cvtColor(out,cv2.COLOR_BGR2RGB))
plt.show()
As you can see, result are fairly good. But there are 5 missing numbers. And one 3 was read as "3.".
For this kind of ad-hoc reading of an app, I wouldn't even use tesseract. I am pretty sure that, with trial and errors, you can easily learn to extract each digits box your self (there are linearly spaced in both dimension).
And then, inside each box, well there are only 9 possible values. Should be quite easy, on a generated image, to find some easy criterions, such as the number of white pixels, number of white pixels in top area, ..., that permits a very simple classification

You might want to pre-process the image first. By applying a filter, you can, for example, get the contours of an image.
The basic idea of a filter, is to 'slide' some matrix of values over the image, and multiply every pixel value by the value inside the matrix. This process is called convolution.
Convolution helps out here, because all irrelevant information is discarded, and thus it is made easier for tesseract to 'read' the image.
This might help you out: https://medium.com/swlh/image-processing-with-python-convolutional-filters-and-kernels-b9884d91a8fd

Related

how to get captcha numbers separately using python

I have this specific 3-digit of captcha, like:
I am trying to slice the 3 digits, I tried to use pytesseract module to recognize text
in images but it's not so accurate. so I researched about it and fount out that I could make the background completely white so that I could crop all the extra space from the picture and dividing the picture to 3 pieces would most likely happens to be what I need, so I'm looking for a way to implement this filter and crop it and slicing it into three pieces
I found out PIL module can help me import the image on python
from PIL import Image
im = Image.open("captcha.jpg")
and I'm looking for a way which I can make the background totally white and crop the extra spaces and divide the picture into three pieces, thanks for your guidance in advance.
so I have found this library called cv2 with this method called threshold
For every pixel, the same threshold value is applied. If the pixel value is smaller than the threshold, it is set to 0, otherwise it is set to a maximum value.
img = cv.imread('gradient.png',0)
ret,thresh1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
in the example above it takes an image and if the pixel is below 127 it makes it completely white, otherwise it's going to be completely black.
further reading:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html

How to remove white noise and Connected Component extraction in OpenCV?

I am working on a sudoku solver that takes input from a video camera(Laptop) and processes it, parses the sudoku image as a list of lists, solves it, and projects back the solution onto the sheet.
I am now at the point where I need to recognize each digit from the image. I'm using the MNIST dataset to train my model which expects each input image in the shape of (28, 28, 1), I am successfully able to locate each digit and extract it but performing any kind of threshold on the digit leads to a lot of noise around the digit, which ultimately leads to misclassification by my model.
Is there any method to get rid of the white noise and only extract the digit from the square and then feed it to the Keras Model.
I think this can be achieved by using the cv2.connectedComponentsWithStats by extracting the largest connected component but I do not know how the method works (and the arguments it expects or the output of the method) and I couldn't find a good explanation on how to use it.
If there is an alternative way other than using cv2.connectedComponentsWithStats that produces better results please do suggest if not please explain how the cv2.connectedComponentsWithStats the method works or please point me towards a good resource that helps me understand it and how to use it for my specific case.
PS. If you think the MNIST isn't a good dataset for this task please do tell why and any other dataset that may achieve the task of recognizing digits.
To remove the noise you can use an erosion. It is used to filter out white pixel and "fill in the (white) gap".
Every white areas will be smaller, and very small area will disapeared. Digits will look thiner.
You can then dilate dilate to get an image more similar to the original one (thiner digit will become fatter and look like the original one, even if there remain little differences).
This operation is know as an opening. See https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html
Example:
import cv2
import numpy as np
img = cv2.imread('input.jpg',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
dilatation = cv2.dilate(erosion,kernel,iterations = 1)
Edit a kernel of (3,3) for the dilatation makes the image less blurry.
Input
Erosion
Dilatation
Just ignore the small blobs (small width, small height and/or small area). At the same time, you can ignore the large ones.
To skip the grid lines, it is advisable to reconstruct the grid geometry (use the characters to locate the grid columns/rows, and possibly detect the long straight lines), and only keep the blobs wholly inside a cell.

Skimage Image Segmentation

I have a bunch of SEM images that I'm trying to segment to search for features. This is one of those things that's obvious to a person looking at it but less so to a computer (at least one coded by me!). I was hoping that someone might give me a way to think about these kinds of problems and this one in particular.
So here's an easy example that works more or less whatever I do:
I do some basic trimming to get rid of the meta-data (actually I read it first but that's btw). Then if I, for example, do:
import skimage.filters as filters
threshold = filters.threshold_isodata(image)
seg_image = image > threshold
I get a true/false array which can be viewed as an image and looks like this:
(I put a little black strip at the bottom so you can flick back and forth with pleasing effect).
However, the same code on this image:
leads to this result:
It should be clear to any human reading this that it's got a lot of false positives. The background has been overexposed and has added a bunch of noise which then gets caught in the threshold.
I've tried various combinations of:
import skimage.segmentation as seg
import skimage.restoration as res
import skimage.filters as filters
import skimage.morphology as morph
seg_image = np.copy(image)
seg_image = morph.opening(seg_image, morph.disk(2))
seg_image = res.denoise_bilateral(seg_image)
seed = np.copy(seg_image)
seed[1:-1,1:-1] = seg_image.min()
seg_image = seg_image - morph.reconstruction(seed, seg_image)
As well as a few other filters (in different orders and including or excluding at random). I don't, in general, do all of those things because it's a disaster.
My logic, such as it is, was:
opening is a way to get rid of small bright spots. A small disk could remove this little noise pixels in the bulk of the image.
denoise_bilateral was a similar logic.
the reconstruction is meant to remove backgrounds and leave only the foreground. Unfortunately this tended to include the noisy pixels.
I'm continuing to fiddle around, but I'd love to know thoughts on these kinds of "gray on gray" images. But I really want to understand a better approach.
Update: Canny edge detection seems to work quite uncannily.
Good image:
Bad image:
Now the challenge is to count the stuff inside these edges. The hole filling approach from here (using from scipy import ndimage as ndi , ndi.binary_fill_holes(seg_image)) fails because it fills the wrong areas. But this seems like a good approach.

shape detection

I have tried 3 algorithms:
Compare by Compare_ssim.
Difference detection by PIL (ImageChops.difference).
Images subtraction.
The first algorithm:
(score, diff) = compare_ssim(img1, img2, full=True)
diff = (diff * 255).astype("uint8")
The second algorithm:
from PIL import Image ,ImageChops
img1=Image.open("canny1.jpg")
img2=Image.open("canny2.jpg")
diff=ImageChops.difference(img1,img2)
if diff.getbbox():
diff.show()
The third algorithm:
image3= cv2.subtract(image1,image2)
The problem is these algorithms are so sensitive. If the images have different noise, they consider that the two images are totally different. Any ideas to fix that?
These pictures are different in many ways (deformation, lighting, colors, shape) and simple image processing just cannot handle all of this.
I would recommend a higher level method that tries to extract the geometry and color of those tubes, in the form of a simple geometric graph. Then compare the graphs rather than the images.
I acknowledge that this is easier said than done, and will only work with this particular kind of scene.
It is very difficult to help since we don't really know which parameters you can change, like can you keep your camera fixed? Will it always be just about tubes? What about tubes colors?
Nevertheless, I think what you are looking for is a framework for image registration and I propose you to use SimpleElastix. It is mainly used for medical images so you might have to get familiar with the library SimpleITK. What's interesting is that you have a lot of parameters to control the registration. I think that you will have to look into the documentation to find out how to control a specific image frequency, the one that create the waves and deform the images. Hereafter I did not configured it to have enough local distortion, you'll have to find the best trade-off, but I think it should be flexible enough.
Anyway, you can get such result with the following code, I don't know if it helps, I hope so:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import SimpleITK as sitk
fixedImage = sitk.ReadImage('1.jpg', sitk.sitkFloat32)
movingImage = sitk.ReadImage('2.jpg', sitk.sitkFloat32)
elastixImageFilter = sitk.ElastixImageFilter()
affine_registration_parameters = sitk.GetDefaultParameterMap('affine')
affine_registration_parameters["NumberOfResolutions"] = ['6']
affine_registration_parameters["WriteResultImage"] = ['false']
affine_registration_parameters["MaximumNumberOfSamplingAttempts"] = ['4']
parameterMapVector = sitk.VectorOfParameterMap()
parameterMapVector.append(affine_registration_parameters)
parameterMapVector.append(sitk.GetDefaultParameterMap("bspline"))
elastixImageFilter.SetFixedImage(fixedImage)
elastixImageFilter.SetMovingImage(movingImage)
elastixImageFilter.SetParameterMap(parameterMapVector)
elastixImageFilter.Execute()
registeredImage = elastixImageFilter.GetResultImage()
transformParameterMap = elastixImageFilter.GetTransformParameterMap()
resultImage = sitk.Subtract(registeredImage, fixedImage)
resultImageNp = np.sqrt(sitk.GetArrayFromImage(resultImage) ** 2)
cv2.imwrite('gray_1.png', sitk.GetArrayFromImage(fixedImage))
cv2.imwrite('gray_2.png', sitk.GetArrayFromImage(movingImage))
cv2.imwrite('gray_2r.png', sitk.GetArrayFromImage(registeredImage))
cv2.imwrite('gray_diff.png', resultImageNp)
Your first image resized to 256x256:
Your second image:
Your second image registered with the first one:
Here is the difference between the first and second image which could show what's different:
This is one of the classical problems of image treatment - and one which does not have an answer which holds universally. The possible answers depend highly on what type of images you have, and what type of information you want to extract from them and the differences between them.
You can reduce noise by two means:
a) take several images of the same object, such that the object does not change. You can stack the images and noise is reduced by square-root of the number of images.
b) You can run a blur filter over the image. The more you blur, the more noise is averaged. Noise is here reduced by square-root of the number of pixels you average over. But so is detail in the images.
In both cases (a) and (b) you run the difference analysis after you applied either method.
Probably not applicable to you as you likely cannot get hold of either: it helps, if you can get hold of flatfields which give the inhomogeneity of illumination and pixel sensitivity of your camera and allow correcting the images prior to any treatment. Similar goes for darkfields which give an estimate of the influence of the read-out noise of the camera and allow correcting images for those.
There is somewhat another 3rd option, which is more high-level: run your object analysis first at a detailed-enough level. And compare the results.

Basic pattern recognition in binary (pixelated) image

Here is a cropped example (about 11x9 pixels) of the kind of images (which ultimately are actually all of size 28x28, but stored in memory flattened as a 784-components array) I will be trying to apply the algorithm on:
Basically, I want to be able to recognize when this shape appears (red lines are used to put emphasis on the separation of the pixels, while the surrounding black border is used to better outline the image against the white background of StackOverflow):
The orientation of it doesn't matter: it must be detected in any of its possible representations (rotations and symmetries) along the horizontal and vertical axis (so, for example, a 45° rotation shouldn't be considered, nor a diagonal symmetry: only consider 90°, 180°, and 270° rotations, for example).
There are two solutions to be found on that image that I first presented, though only one needs to be found (ignore the gray blurr surrounding the white region):
Take this other sample (which also demonstrates that the white figures inside the images aren't always fully surrounded by black pixels):
The function should return True because the shape is present:
Now, there is obviously a simple solution to this:
Use a variable such as pattern = [[1,0,0,0],[1,1,1,1]], produce its variations, and then slide all of the variations along the image until an exact match is found at which point the whole thing just stops and returns True.
This would, however, in the worst case scenario, take up to 8*(28-2)*(28-4)*(2*4) which is approximately 40000 operations for a single image, which seem a bit overkill (if I did my quick calculations right).
I'm guessing one way of making this naive approach better would be to first of all scan the image until I find the very first white pixel, and then start looking for the pattern 4 rows and 4 columns earlier than that point, but even that doesn't seem good enough.
Any ideas? Maybe this kind of function has already been implemented in some library? I'm looking for an implementation or an algorithm that beats my naive approach.
As a side note, while kind of a hack, I'm guessing this is the kind of problem that can be offloaded to the GPU but I do not have much experience with that. While it wouldn't be what I'm looking for primarily, if you provide an answer, feel free to add a GPU-related note.
EDIT:
I ended up making an implementation of the accepted answer. You can see my code in this Gist.
If you have too many operations, think how to do less of them.
For this problem I'd use image integrals.
If you convolve a summing kernel over the image (this is a very fast operation in fft domain with just conv2,imfilter), you know that only locations where the integral is equal to 5 (in your case) are possible pattern matching places. Checking those (even for your 4 rotations) should be computationally very fast. There can not be more than 50 locations in your example image that fit this pattern.
My python is not too fluent, but this is the proof of concept for your first image in MATLAB, I am sure that translating this code should not be a problem.
% get the same image you have (imgur upscaled it and made it RGB)
I=rgb2gray(imread('https://i.stack.imgur.com/l3u4A.png'));
I=imresize(I,[9 11]);
I=double(I>50);
% Integral filter definition (with your desired size)
h=ones(3,4);
% horizontal and vertical filter (because your filter is not square)
Ifiltv=imfilter(I,h);
Ifilth=imfilter(I,h');
% find the locations where integral is exactly the value you want
[xh,yh]=find(Ifilth==5);
[xv,yv]=find(Ifiltv==5);
% this is just plotting, for completeness
figure()
imshow(I,[]);
hold on
plot(yh,xh,'r.');
plot(yv,xv,'r.');
This results in 14 locations to check. My standard computer takes 230ns on average on computing both image integrals, which I would call fast.
Also GPU computing is not a hack :D. Its the way to go with a big bunch of problems because of the enormous computing power they have. E.g. convolutions in GPUs are incredibly fast.
The operation you are implementing is an operator in Mathematical Morphology called hit and miss.
It can be implemented very efficiently as a composition of two erosions. If the shape you’re detecting can be decomposed into a few simple geometrical shapes (especially rectangles are quick to compute) then the operator can be even more efficient.
You’ll find very efficient erosions in most image processing libraries, for example try OpenCV. OpenCV also has a hit and miss operator, here is a tutorial for how to use it.
As an example for what output to expect, I generated a simple test image (left), applied a hit and miss operator with a template that matches at exactly one place in the image (middle), and again with a template that does not match anywhere (right):
I did this in MATLAB, not Python, because I have it open and it's easiest for me to use. This is the code:
se = [1,1,1,1 % Defines the template
0,0,0,1];
img = [0,0,0,0,0,0 % Defines the test image
0,1,1,1,1,0
0,0,0,0,1,0
0,0,0,0,0,0
0,0,0,0,0,0
0,0,0,0,0,0];
img = dip_image(img,'bin');
res1 = hitmiss(img,se);
res2 = hitmiss(img,rot90(se,2));
% Quick-and-dirty display
h = dipshow([img,res1,res2]);
diptruesize(h,'tight',3000)
hold on
plot([5.5,5.5],[-0.5,5.5],'r-')
plot([11.5,11.5],[-0.5,5.5],'r-')
The code above uses the hit and miss operator as I implemented in DIPimage. This same implementation is available in DIPlib's Python bindings as dip.HitAndMiss() (install with pip install diplib):
import diplib as dip
# ...
res = dip.HitAndMiss(img, se)

Categories

Resources