I am working on a sudoku solver that takes input from a video camera(Laptop) and processes it, parses the sudoku image as a list of lists, solves it, and projects back the solution onto the sheet.
I am now at the point where I need to recognize each digit from the image. I'm using the MNIST dataset to train my model which expects each input image in the shape of (28, 28, 1), I am successfully able to locate each digit and extract it but performing any kind of threshold on the digit leads to a lot of noise around the digit, which ultimately leads to misclassification by my model.
Is there any method to get rid of the white noise and only extract the digit from the square and then feed it to the Keras Model.
I think this can be achieved by using the cv2.connectedComponentsWithStats by extracting the largest connected component but I do not know how the method works (and the arguments it expects or the output of the method) and I couldn't find a good explanation on how to use it.
If there is an alternative way other than using cv2.connectedComponentsWithStats that produces better results please do suggest if not please explain how the cv2.connectedComponentsWithStats the method works or please point me towards a good resource that helps me understand it and how to use it for my specific case.
PS. If you think the MNIST isn't a good dataset for this task please do tell why and any other dataset that may achieve the task of recognizing digits.
To remove the noise you can use an erosion. It is used to filter out white pixel and "fill in the (white) gap".
Every white areas will be smaller, and very small area will disapeared. Digits will look thiner.
You can then dilate dilate to get an image more similar to the original one (thiner digit will become fatter and look like the original one, even if there remain little differences).
This operation is know as an opening. See https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html
Example:
import cv2
import numpy as np
img = cv2.imread('input.jpg',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
dilatation = cv2.dilate(erosion,kernel,iterations = 1)
Edit a kernel of (3,3) for the dilatation makes the image less blurry.
Input
Erosion
Dilatation
Just ignore the small blobs (small width, small height and/or small area). At the same time, you can ignore the large ones.
To skip the grid lines, it is advisable to reconstruct the grid geometry (use the characters to locate the grid columns/rows, and possibly detect the long straight lines), and only keep the blobs wholly inside a cell.
Related
I'm using yolo v8 to detect subjects in pictures. It's working well, and can create quite precise masks over subjects.
from ultralytics import YOLO
model = YOLO('yolov8x-seg.pt')
for output in model('image.jpg', return_outputs=True):
for segment in output['segment']:
print(segment)
The code above works, and generates a series of "segments", which are a list of points that define the shape of subjects on my image. That shape is not convex (for example horses).
I need to figure out if a random coordinate on the image falls within these segments, and I'm not sure how to do it.
My first approach was to build an image mask using PIL. That roughly worked, but it doesn't always work, depending on the shape of the segments. I also thought about using shapely, but it has restrictions on the Polygon classes, which I think will be a problem in some cases.
In any case, this really feels like a problem that could easily be solved with the tools I'm already using (yolo, pytorch, numpy...), but to be honest I'm too new to all this to figure out how to do it properly.
Any suggestion is appreciated :)
You should be able to get a segmentation mask from your model: imagine a binary image where black (zeros) represents the background and white (or other non zero values) represent an instance of a segmentation class.
Once you have the binary image you can use opencv's findContours function to get a the largest outer path.
Once you have that path you can use pointPolygonTest() to check if a point is inside that contour or not.
I have written algorithm that solves the pluszle game matrix.
Input is numpy array.
Now I want to recognize the digits of matrix from screenshot.
There are different levels, this is hard one:
And this is easy one:
the output of recognition should be numpy array
array([[6, 2, 4, 2],
[7, 8, 9, 7],
[1, 2, 4, 4],
[7, 2, 4, 0]])
I have tried to feed last image to tesseract
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/79017/screen_plus.jpg')))
The output is unacceptable
LEVEL 4
(}00:03 M0
J] —.°—#—#©
I think that I should use contours from opencv, because the font is always the same. maybe I should save contours for every digit, than save every countour that exist on screenshot than somehow make matrix from coordinates of every digit-contour. But I have no idea how to do it.
1- Binarize
Tesseract needs you to binarize the image first. No need for contour or any convolution here. Just a threshold should do. Especially considering that you are trying to che... I mean win intelligently to a specific game. So I guess you are open to some ad-hoc adjustments.
For example, (hard<240).any(axis=2) put in white (True) everything that is not white on the original image, and black the white parts.
Note that you don't get the sums (or whatever they are, I don't know what this game is) here. Which are on the contrary almost black areas
But you can have them with another filter
(hard>120).any(axis=2)
You could merge those filters, obviously
(hard<240).any(axis=2) & (hard>120).any(axis=2)
But that may not be a good idea: after all, it gives you an opportunity to distinguish to different kind of data, why you may want to do.
2- Restrict
Secondly, you know you are looking for digits, so, restrict to digits. By adding config='digits' to your pytesseract args.
pytesseract.image_to_string((hard>240).all(axis=2))
# 'LEVEL10\nNOVEMBER 2022\n\n™\noe\nOs\nfoo)\nso\n‘|\noO\n\n9949 6 2 2 8\n\nN W\nN ©\nOo w\nVon\n+? ah ®)\nas\noOo\n©\n\n \n\x0c'
pytesseract.image_to_string((hard>240).all(axis=2), config='digits')
# '10\n2022\n\n99496228\n\n17\n-\n\n \n\x0c'
3- Don't use image_to_string
Use image_to_data preferably.
It gives you bounding boxes of text.
Or even image_to_boxes which give you digits one by one, with coordinates
Because image_to_string is for when you have a good old linear text in the image. image_to_data or image_to_boxes assumes that text is distributed all around, and give you piece of text with position.
image_to_string on such image may intervert what you would consider the logical order
4- Select areas yourself
Since it is an ad-hoc usage for a specific application, you know where the data are.
For example, your main matrix seems to be in area
hard[740:1512, 132:910]
See
print(pytesseract.image_to_boxes((hard[740:1512, 132:910]<240).any(axis=2), config='digits'))
Not only it avoids flooding you with irrelevant data. But also, tesseract performs better when called only with an image without other things than what you want to read.
Seems to have almost all your digits here.
5- Don't expect for miracles
Tesseract is one of the best OCR. But OCR are not a sure thing...
See what I get with this code (summarizing what I've said so far), printing in red digits detected by tesseract just next to where they were found in the real image.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pytesseract
hard=cv2.imread("hard.jpg")
hard=hard[740:1512, 132:910]
bin=(hard<240).any(axis=2)
boxes=[s.split(' ') for s in pytesseract.image_to_boxes(bin, config='digits').split('\n')[:-1]]
out=hard.copy() # Just to avoid altering original image, in case we want to retry with other parameters
H=len(hard)
for b in boxes:
cv2.putText(out, b[0], (30+int(b[1]), H-int(b[2])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
plt.imshow(cv2.cvtColor(out,cv2.COLOR_BGR2RGB))
plt.show()
As you can see, result are fairly good. But there are 5 missing numbers. And one 3 was read as "3.".
For this kind of ad-hoc reading of an app, I wouldn't even use tesseract. I am pretty sure that, with trial and errors, you can easily learn to extract each digits box your self (there are linearly spaced in both dimension).
And then, inside each box, well there are only 9 possible values. Should be quite easy, on a generated image, to find some easy criterions, such as the number of white pixels, number of white pixels in top area, ..., that permits a very simple classification
You might want to pre-process the image first. By applying a filter, you can, for example, get the contours of an image.
The basic idea of a filter, is to 'slide' some matrix of values over the image, and multiply every pixel value by the value inside the matrix. This process is called convolution.
Convolution helps out here, because all irrelevant information is discarded, and thus it is made easier for tesseract to 'read' the image.
This might help you out: https://medium.com/swlh/image-processing-with-python-convolutional-filters-and-kernels-b9884d91a8fd
I am trying to approximate different shapes of a weld bead geometry cross section in additive manufacturing with a graph or ideally (but not necessarily) a function. The regions are the outer shape as well as the individual layers. (see following images)
Therefore, I applied some pre-processing methods to extract the relevant pixels which represent the geometry of a weld bead which are shown as white pixels. (see third image)
I derived this image with canny edge detection and multiple morphological operations such as closing erosion and dilation prior to that and of course converting it into grey-scale.
The "noisy" areas are the transition areas between individual layers of metal and only show up in this way, so in general there is not a "better" or "sharper" transition in thus less "noise". Pictures 3 and 4 are an example of some of the image pre-processing methods I used.
My main approach to treat the inner geometry so far was to split up the image in several sub-images and perform least squares regression on each individual one by interpreting the white pixels as data points. Afterwards I've stitched all those little approximation functions back together to form the image of the original size. I've tried it with different sizes of those sub-images. (see pictures 5 and 6)
However, this approach produces jumps between the functions as well as functions next to each other where the pixels or data points in my case should only be approximated with one function (see attached image). My next approach would be to use multivariate adaptive regression on the sub-images.
Thus, I'm asking if anybody knows a better solution for my problem, maybe even for an approximation on global scale without splitting the image into the sub-images. The approximation does not need to be a polynomial function, piece wise linear but connected functions are totally sufficient. I would be thankful if anybody knows a method that is at least capable of achieving what I want to do. Whether a pure non-linear regression method. Unfortunately I don't have many images (only 64), hence I don't think I can use an ANN. (please correct me if I'm wrong)
If you need to take a look at my code, just let me know. Thank you! :)
The best I could obtain is with bilateral filtering for denoising, then adaptive binarization.
And on a reduced image:
Here is a cropped example (about 11x9 pixels) of the kind of images (which ultimately are actually all of size 28x28, but stored in memory flattened as a 784-components array) I will be trying to apply the algorithm on:
Basically, I want to be able to recognize when this shape appears (red lines are used to put emphasis on the separation of the pixels, while the surrounding black border is used to better outline the image against the white background of StackOverflow):
The orientation of it doesn't matter: it must be detected in any of its possible representations (rotations and symmetries) along the horizontal and vertical axis (so, for example, a 45° rotation shouldn't be considered, nor a diagonal symmetry: only consider 90°, 180°, and 270° rotations, for example).
There are two solutions to be found on that image that I first presented, though only one needs to be found (ignore the gray blurr surrounding the white region):
Take this other sample (which also demonstrates that the white figures inside the images aren't always fully surrounded by black pixels):
The function should return True because the shape is present:
Now, there is obviously a simple solution to this:
Use a variable such as pattern = [[1,0,0,0],[1,1,1,1]], produce its variations, and then slide all of the variations along the image until an exact match is found at which point the whole thing just stops and returns True.
This would, however, in the worst case scenario, take up to 8*(28-2)*(28-4)*(2*4) which is approximately 40000 operations for a single image, which seem a bit overkill (if I did my quick calculations right).
I'm guessing one way of making this naive approach better would be to first of all scan the image until I find the very first white pixel, and then start looking for the pattern 4 rows and 4 columns earlier than that point, but even that doesn't seem good enough.
Any ideas? Maybe this kind of function has already been implemented in some library? I'm looking for an implementation or an algorithm that beats my naive approach.
As a side note, while kind of a hack, I'm guessing this is the kind of problem that can be offloaded to the GPU but I do not have much experience with that. While it wouldn't be what I'm looking for primarily, if you provide an answer, feel free to add a GPU-related note.
EDIT:
I ended up making an implementation of the accepted answer. You can see my code in this Gist.
If you have too many operations, think how to do less of them.
For this problem I'd use image integrals.
If you convolve a summing kernel over the image (this is a very fast operation in fft domain with just conv2,imfilter), you know that only locations where the integral is equal to 5 (in your case) are possible pattern matching places. Checking those (even for your 4 rotations) should be computationally very fast. There can not be more than 50 locations in your example image that fit this pattern.
My python is not too fluent, but this is the proof of concept for your first image in MATLAB, I am sure that translating this code should not be a problem.
% get the same image you have (imgur upscaled it and made it RGB)
I=rgb2gray(imread('https://i.stack.imgur.com/l3u4A.png'));
I=imresize(I,[9 11]);
I=double(I>50);
% Integral filter definition (with your desired size)
h=ones(3,4);
% horizontal and vertical filter (because your filter is not square)
Ifiltv=imfilter(I,h);
Ifilth=imfilter(I,h');
% find the locations where integral is exactly the value you want
[xh,yh]=find(Ifilth==5);
[xv,yv]=find(Ifiltv==5);
% this is just plotting, for completeness
figure()
imshow(I,[]);
hold on
plot(yh,xh,'r.');
plot(yv,xv,'r.');
This results in 14 locations to check. My standard computer takes 230ns on average on computing both image integrals, which I would call fast.
Also GPU computing is not a hack :D. Its the way to go with a big bunch of problems because of the enormous computing power they have. E.g. convolutions in GPUs are incredibly fast.
The operation you are implementing is an operator in Mathematical Morphology called hit and miss.
It can be implemented very efficiently as a composition of two erosions. If the shape you’re detecting can be decomposed into a few simple geometrical shapes (especially rectangles are quick to compute) then the operator can be even more efficient.
You’ll find very efficient erosions in most image processing libraries, for example try OpenCV. OpenCV also has a hit and miss operator, here is a tutorial for how to use it.
As an example for what output to expect, I generated a simple test image (left), applied a hit and miss operator with a template that matches at exactly one place in the image (middle), and again with a template that does not match anywhere (right):
I did this in MATLAB, not Python, because I have it open and it's easiest for me to use. This is the code:
se = [1,1,1,1 % Defines the template
0,0,0,1];
img = [0,0,0,0,0,0 % Defines the test image
0,1,1,1,1,0
0,0,0,0,1,0
0,0,0,0,0,0
0,0,0,0,0,0
0,0,0,0,0,0];
img = dip_image(img,'bin');
res1 = hitmiss(img,se);
res2 = hitmiss(img,rot90(se,2));
% Quick-and-dirty display
h = dipshow([img,res1,res2]);
diptruesize(h,'tight',3000)
hold on
plot([5.5,5.5],[-0.5,5.5],'r-')
plot([11.5,11.5],[-0.5,5.5],'r-')
The code above uses the hit and miss operator as I implemented in DIPimage. This same implementation is available in DIPlib's Python bindings as dip.HitAndMiss() (install with pip install diplib):
import diplib as dip
# ...
res = dip.HitAndMiss(img, se)
I'm newbie in computer vision. My goal is to distinguish individual cells on a set of pictures like this: Example
Basically, I blur whole image, find region maximum on it and use it like seed in watershed algorithm on distance tranfsform of threesholded blurred image. In fact I'm following tutorial which you can find here:
github/luispedro/python-image-tutorial
(sorry, can't post more than 2 links).
My problem is that some cells in my set have very distinguishable dark nucleus (which you can see on the example) and my algorithm produce results like this which are cleary wrong.
Of course it's possible to fix it by increasing strength of gaussian blur but it will merge some other cells toghether which is even worse.
What can be done to solve this problem? What are other possibilites if watershed just isn't situable for this case (keeping in mind that my set is pretty small and learning seems impossible)?
The watershed tends to over-segment if you don't use a watershed with markers.
Usually, we start with DNA/DAPI segmentation that is easy, and it provides the number of cells and the inner markers for the watershed.
If you blur the images, you smooth all the patterns. You should use an alternate sequential filter (opening / closing) in order to simplify each zone, and then try an ultimate eroded in order to find the number of inner seed for your watershed.