I have this specific 3-digit of captcha, like:
I am trying to slice the 3 digits, I tried to use pytesseract module to recognize text
in images but it's not so accurate. so I researched about it and fount out that I could make the background completely white so that I could crop all the extra space from the picture and dividing the picture to 3 pieces would most likely happens to be what I need, so I'm looking for a way to implement this filter and crop it and slicing it into three pieces
I found out PIL module can help me import the image on python
from PIL import Image
im = Image.open("captcha.jpg")
and I'm looking for a way which I can make the background totally white and crop the extra spaces and divide the picture into three pieces, thanks for your guidance in advance.
so I have found this library called cv2 with this method called threshold
For every pixel, the same threshold value is applied. If the pixel value is smaller than the threshold, it is set to 0, otherwise it is set to a maximum value.
img = cv.imread('gradient.png',0)
ret,thresh1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
in the example above it takes an image and if the pixel is below 127 it makes it completely white, otherwise it's going to be completely black.
further reading:
https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
Related
I have written algorithm that solves the pluszle game matrix.
Input is numpy array.
Now I want to recognize the digits of matrix from screenshot.
There are different levels, this is hard one:
And this is easy one:
the output of recognition should be numpy array
array([[6, 2, 4, 2],
[7, 8, 9, 7],
[1, 2, 4, 4],
[7, 2, 4, 0]])
I have tried to feed last image to tesseract
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/79017/screen_plus.jpg')))
The output is unacceptable
LEVEL 4
(}00:03 M0
J] —.°—#—#©
I think that I should use contours from opencv, because the font is always the same. maybe I should save contours for every digit, than save every countour that exist on screenshot than somehow make matrix from coordinates of every digit-contour. But I have no idea how to do it.
1- Binarize
Tesseract needs you to binarize the image first. No need for contour or any convolution here. Just a threshold should do. Especially considering that you are trying to che... I mean win intelligently to a specific game. So I guess you are open to some ad-hoc adjustments.
For example, (hard<240).any(axis=2) put in white (True) everything that is not white on the original image, and black the white parts.
Note that you don't get the sums (or whatever they are, I don't know what this game is) here. Which are on the contrary almost black areas
But you can have them with another filter
(hard>120).any(axis=2)
You could merge those filters, obviously
(hard<240).any(axis=2) & (hard>120).any(axis=2)
But that may not be a good idea: after all, it gives you an opportunity to distinguish to different kind of data, why you may want to do.
2- Restrict
Secondly, you know you are looking for digits, so, restrict to digits. By adding config='digits' to your pytesseract args.
pytesseract.image_to_string((hard>240).all(axis=2))
# 'LEVEL10\nNOVEMBER 2022\n\n™\noe\nOs\nfoo)\nso\n‘|\noO\n\n9949 6 2 2 8\n\nN W\nN ©\nOo w\nVon\n+? ah ®)\nas\noOo\n©\n\n \n\x0c'
pytesseract.image_to_string((hard>240).all(axis=2), config='digits')
# '10\n2022\n\n99496228\n\n17\n-\n\n \n\x0c'
3- Don't use image_to_string
Use image_to_data preferably.
It gives you bounding boxes of text.
Or even image_to_boxes which give you digits one by one, with coordinates
Because image_to_string is for when you have a good old linear text in the image. image_to_data or image_to_boxes assumes that text is distributed all around, and give you piece of text with position.
image_to_string on such image may intervert what you would consider the logical order
4- Select areas yourself
Since it is an ad-hoc usage for a specific application, you know where the data are.
For example, your main matrix seems to be in area
hard[740:1512, 132:910]
See
print(pytesseract.image_to_boxes((hard[740:1512, 132:910]<240).any(axis=2), config='digits'))
Not only it avoids flooding you with irrelevant data. But also, tesseract performs better when called only with an image without other things than what you want to read.
Seems to have almost all your digits here.
5- Don't expect for miracles
Tesseract is one of the best OCR. But OCR are not a sure thing...
See what I get with this code (summarizing what I've said so far), printing in red digits detected by tesseract just next to where they were found in the real image.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pytesseract
hard=cv2.imread("hard.jpg")
hard=hard[740:1512, 132:910]
bin=(hard<240).any(axis=2)
boxes=[s.split(' ') for s in pytesseract.image_to_boxes(bin, config='digits').split('\n')[:-1]]
out=hard.copy() # Just to avoid altering original image, in case we want to retry with other parameters
H=len(hard)
for b in boxes:
cv2.putText(out, b[0], (30+int(b[1]), H-int(b[2])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
plt.imshow(cv2.cvtColor(out,cv2.COLOR_BGR2RGB))
plt.show()
As you can see, result are fairly good. But there are 5 missing numbers. And one 3 was read as "3.".
For this kind of ad-hoc reading of an app, I wouldn't even use tesseract. I am pretty sure that, with trial and errors, you can easily learn to extract each digits box your self (there are linearly spaced in both dimension).
And then, inside each box, well there are only 9 possible values. Should be quite easy, on a generated image, to find some easy criterions, such as the number of white pixels, number of white pixels in top area, ..., that permits a very simple classification
You might want to pre-process the image first. By applying a filter, you can, for example, get the contours of an image.
The basic idea of a filter, is to 'slide' some matrix of values over the image, and multiply every pixel value by the value inside the matrix. This process is called convolution.
Convolution helps out here, because all irrelevant information is discarded, and thus it is made easier for tesseract to 'read' the image.
This might help you out: https://medium.com/swlh/image-processing-with-python-convolutional-filters-and-kernels-b9884d91a8fd
I have the following image:
I am attempting to use handwritten OCR to capture this number, and for some images, I need to manually rotate the image.
The code I am using to rotate this image is the following:
When I execute this code, the following image is the result:
I want the white background surrounding the 1 to completely encase the 1. I do not want the upper left and upper right of the image to have the black areas (triangles).
I have tried to adjust the resizing, but this does not seem to help. Does anyone have an idea as to what I can do to prevent this?
As already pointed out by fmw42 in the comments, the API you are using has optional arguments to deal with this.
Please peruse the docs for the API you use: skimage.transform.rotate
The docs offer a mode, which says how to fill in those pixels that don't come from the source image.
The modes replicate/edge and constant should be of interest to you.
edge Pads with the edge values of array.
constant (default) Pads with a constant value, in conjunction with the cval argument (default 0, i.e. black).
Working on object detection in Python with opencv.
I have two pictures
The reference picture with no object in it.
Picture with object.
The result of the images is:
The problem is, the pattern of the reference image is now on my objects. I want to remove this pattern and I don't know how to do it. For further image processing I need the the correct outline of the objects.
Maybe you know how to fix it, or have better ideas to exctract the object.
I would be glad for your help.
Edit: 4. A black object:
As #Mark Setchell commented, the difference of the two images shows which pixels contain the object, you shouldn't try to use it as the output. Instead, find the pixels with a significant difference, and then read those pixels directly from the input image.
Here, I'm using Otsu thresholding to find what "significant difference" is. There are many other ways to do this. I then use the inverse of the mask to blank out pixels in the input image.
import PyDIP as dip
bg = dip.ImageReadTIFF('background.tif')
bg = bg.TensorElement(1) # The image has 3 channels, let's use just the green one
fg = dip.ImageReadTIFF('object.tif')
fg = fg.TensorElement(1)
mask = dip.Abs(bg - fg) # Difference between the two images
mask, t = dip.Threshold(mask, 'otsu') # Find significant differences only
mask = dip.Closing(mask, 7) # Smooth the outline a bit
fg[~mask] = 0 # Blank out pixels not in the mask
I'm using PyDIP above, not OpenCV, because I don't have OpenCV installed. You can easily do the same with OpenCV.
An alternative to smoothing the binary mask as I did there, is to smooth the mask image before thresholding, for example with dip.Gauss(mask,[2]), a Gaussian smoothing.
Edit: The black object.
What happens with this image, is that its illumination has changed significantly, or you have some automatic exposure settings in your camera. Make sure you have turned all of that off so that every image is exposed exactly the same, and that you use the raw images directly off of the camera for this, not images that have gone through some automatic enhancement procedure or even JPEG compression if you can avoid it.
I computed the median of the background image divided by the object image (fg in the code above, but for this new image), which came up to 1.073. That means that the background image is 7% brighter than the object image. I then multiplied fg by this value before computing the absolute difference:
mask = dip.Abs(fg * dip.Median(bg/fg)[0][0] - bg)
This helped a bit, but it showed that the changes in contrast are not consistent across the image.
Next, you can change the threshold selection method. Otsu assumes a bimodal histogram, and works well if you have a significant number of pixels in each group (foreground and background). Here we'll have fewer pixels belonging to the object, because only some of the object pixels have a different color from the background. The 'triangle' method is suitable in this case:
mask, t = dip.Threshold(mask, 'triangle')
This will lead to a mask that contains only some of the object pixels. You'll have to add some additional knowledge about your object (i.e. it is a rotated square) to find the full object. There are also some isolated background pixels that are being picked up by the threshold, those are easy to eliminate using a bit of blurring before the threshold or a small opening after.
Getting the exact outline of the object in this case will be impossible with your current setup. I would suggest you improve your setup by either:
making the background more uniform in illumination,
using color (so that there are fewer possible objects that match the background color so exactly as in this case),
using infrared imaging (maybe the background could have different properties from all the objects to be detected in infrared?),
using back-illumination (this is the best way if your aim is to measure the objects).
My ideas are:
1.0. [unsolved, hard image-detection] Breaking image into squares and removing borders, surely other techniques!
1.1. [unsolved] Imagemagick: crop (instructions here), remove
certain borders -- this may take a
lot of time to locate the grid, image detection
problem (comparing white/black here) -- or there may be some magic wand style filter.
1.2. [unsolved] Python: you probably need thisfrom PIL import Image.
Obivously, Gimp's eraser is the wrong way to solve this problem since it's slow and error-prone. How would you remove the grid programmatically?
P.s. Casual discussion about this problem in Graphics.SE here that contains more physical and mechanical hacks.
If all images consist of black lines over a gray grid, you could adjust the white threshold to remove the grid (e.g. with ImageMagick):
convert -white-threshold 80% with-grid.png without-grid.png
You will probably have to experiment with the exact threshold value. 80% worked for me with your sample image. This will make the lines pixelated. But perhaps resampling can reduce that to an acceptable amount, e.g. with:
convert -resize 200% -white-threshold 80% -resize 50% with-grid.png without-grid.png
In your image the grid is somewhat lighter than the drawing, so we can set a threshold, and filter the image such that all 'light' pixels are set to white. Using PIL it could look like this:
import Image
def filter(x):
#200 is our cutoff, try adjusting it to see the difference.
if x > 200:
return 255
return x
im = Image.open('bird.png')
im = im.point(filter)
im.show()
Processing your uploaded image with this code gives:
Which in this case is a pretty good result. Provided your drawing is darker than the grid, you should be able to use this method without too many problems.
Feedback to the answers: emulbreh and fraxel
The python -version utilizes the ImageMagick so let's consider the ImageMagick. It does not work with colored version like the below due to different color-channel -profiles. Let's investigate this a bit further.
$ convert -white-threshold 0% bird.png without.png
This picture shows the amount of noise in the original scanned picture.
Puzzle: removing the right -hand corner as an example
I inversed the colors $ convert -negate whiteVersion.png blackVersion.png to make it easier to vizualise. Now with the below black photo, I wanted to remove the blue right corner i.e. make it black -- it means that I want to set BG channels to 0 of BG with 100% channel -value.
$ convert -channel BG -threshold 100% bbird.png without.png
Now the only thing left is of course Red -channel, I removed GB but white still have Red left. Now how can I remove just the right-hand -corner? I need to specify area and then do the earlier -operations.
How can I get this working with arbitrary photo where you want to remove certain color but leave some colors intact?
I don't know an easy way. The first problem is color-detection problem -- you specify some condition for colors (R,G,B) with some inequality. If the condition is true, you remove it in just the part. Now you do this for all basic colors i.e. when (R,G,B)=(100%,0,0), (R,G,B)=(0,100%,0) and (R,G,B)=(0,0,100%). Does there exist some ready implementation for this? Probably but it is much nicer to do it yourself, puzzle set!
Prerequisite knowledge
Tutorials here and here about Imagemagick.
In order to understand this topic, we need to know some basic physics: white color is a mixture of all colors and black consists of no colors.
I have about 3000 images and 13 different colors (the background of the majority of these images is white). If the main color of an image is one of those 13 different colors, I'd like them to be associated.
I've seen similar questions like Image color detection using python that ask for an average color algorithm. I've pretty much copied that code, using the Python Image Library and histograms, and gotten it to work - but I find that it's not too reliable for determining main colors.
Any ideas? Or libraries that could address this?
Thanks in advance!
:EDIT:
Thanks guys - you all pretty much said the same thing, to create "buckets" and increase the bucket count with each nearest pixel of the image. I seem to be getting a lot of images returning "White" or "Beige," which is also the background on most of these images. Is there a way to work around or ignore the background?
Thanks again.
You can use the getcolors function to get a list of all colors in the image. It returns a list of tuples in the form:
(N, COLOR)
where N is the number of times the color COLOR occurs in the image. To get the maximum occurring color, you can pass the list to the max function:
>>> from PIL import Image
>>> im = Image.open("test.jpg")
>>> max(im.getcolors(im.size[0]*im.size[1]))
(183, (255, 79, 79))
Note that I passed im.size[0]*im.size[1] to the getcolors function because that is the maximum maxcolors value (see the docs for details).
Personally I would split the color space into 8-16 main colors, then for each pixel I'd increment the closest colored bucket by one. At the end the color of the bucket with the highest amount of pixels wins.
Basically, think median instead of average. You only really care about the colors in the image, whereas averaging colors usually gives you a whole new color.
Since you're trying to match a small number of preexisting colors, you can try a different approach. Test each image against all of the colors, and see which one is the closest match.
As for doing the match, I'd start by resizing each image to a smaller size to reduce the amount of work you'll be doing for each; our perception of the color of an image isn't too dependent on the amount of detail. For each pixel of the smaller image, find which of the 13 colors is the closest. If it's within some threshold, bump a counter for that color. At the end whichever of the 13 has the highest count is the winner.