I'm trying to preprocess frames of a game in real-time for a ML project.
I want to extract numbers from the frame, so I chose Pytesseract, since it looked quite good with text.
Though, no matter how clear I make the text, it won't read it correctly.
My code looks like this:
section = process_screen(screen_image)[1]
pixels = rgb_to_bw(section) #Makes the image grayscale
pixels[pixels < 200] = 0 #Makes all non-white pixels black
tess.image_to_string(pixels)
=> 'ye ml)'
At best it outputs "ye ml)" when I don't specify I want digits, and when I do, it outputs nothing at all.
The non-processed game image looks like so:
The "pixels" image looks like so :
Thanks to Alex Alex, I inverted the image, and got this
And got "2710", which is better, but still not perfect.
You must invert the image before recognition.
Related
I'm new on the OCR world and I have document with numbers to analyse with Python, openCV and pytesserract.
The files I received are pdfs and the numbers are not text. So, I converted it to jpg with this :
first_page = convert_from_path(path__to_pdf, dpi=600, first_page=1, last_page=1)
first_page[0].save(TEMP_FOLDER+'temp.jpg', 'JPEG')
Then , the images look like this :
I still have some noise around the digits.
I tried to select the "black color only" with this :
img_hsv = cv2.cvtColor(img_raw, cv2.COLOR_BGR2HSV)
img_changing = cv2.cvtColor(img_raw, cv2.COLOR_RGB2GRAY)
low_color = np.array([0, 0, 0])
high_color = np.array([180, 255, 30])
blackColorMask = cv2.inRange(img_hsv, low_color, high_color)
img_inversion = cv2.bitwise_not(img_changing)
img_black_filtered = cv2.bitwise_and(img_inversion, img_inversion, mask = blackColorMask)
img_final_inversion = cv2.bitwise_not(img_black_filtered)
So, with this code, my image looks like this :
Even with cv2.blur, I don't even reach 75% of image FULLY analysed.
For at least 25% of the images, pytesseract misses 1 or more digits.
Is that normal ? Do you have ideas of what I can do to maximize the succesfull rate ?
Thanks
Whenever you see that Tesseract is missing a character or digit, think about page segmentation modes. If the character is not correct but was read, it is a recognition issue.
OCR engines split the text in the image we input, and this splitting is called page segmentation. Then, the engines try to recognize the text. Tesseract supports 13 page modes as follows:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
For your case, the best solution would be treating your image as a block to avoid missing any digits. Then, restrict the output to digits only to get a better result. Your code should be like this:
text = pytesseract.image_to_string(image, lang='eng',
config='--psm 6 -c tessedit_char_whitelist=0123456789')
print(text)
Output:
1821293045013
Your attempt to process a field entry was thwarted by "artifacts" see upper pair for my best result with your coloured source.
Normal advice is use greyscale but in this case that makes matters worse as there is background chatter.
You were right to attempt thresholding, as that will produce clearer results, however tesseract is prone to odd line and white space insertion when characters are not words.
I suggested you double check if there was no vector data in the file and it appears you uncovered an entry (annotation ?) that matched the data field.
I want to overlay a certain pattern on a shirt in Pillow Python, this is my code -
design =Image.open("source/stripe.png").convert("RGBA")
shirtsketchtrans = Image.open("source/shirtsketchtrans.png").convert("RGBA")
design.paste(shirtsketchtrans, (0,0), shirtsketchtrans)
design.show()
The outcome coming is this -
I don't really mind the size, that can be fixed.
But what I want to do is have the striped pattern overlay only on my shirt PNG and not in the whole background, basically get the pattern only on my shirt and nowhere else.
Is there any solution to this?
Any help is appreciated! Thanks!
Edit: Input/Source Images -
PIL on its own isn't smart enough to know what's "inside" or "outside" the shirt. You need to make a Transparency Mask and then use PIL.Image.composite to combine them.
Example:
design = Image.open("source/stripe.png").convert("RGBA")
shirt_sketch_trans = Image.open("source/shirtsketchtrans.png").convert("RGBA")
shirt_sketch_mask = Image.open("source/shirtsketchmask.png").convert("RGBA")
full_design = Image.composite(design, shirt_sketch_trans, shirt_sketch_mask)
full_design.show()
I would like to remove the background light gradient from the following image, such that the lightening would become more homogeneous, the interesting objects being the kind of "cones" seen from the top.
Image:
I also have an image "background" without the cones :
I tried the simplest thing , which is to convert these images in grayscale and the substracting it but the result is pretty ... (really) bad, using :
img = np.array(Image.open('../Pics/image.png').convert('L'))
background = np.array(Image.open('../Pics/background.JPG').convert('L'))
img_filtered = img - background
What could you advise me ? The ideal would be to stay in RGB, though I don't know almost anything about image processing, filters, etc ...
By "the result is pretty ... (really) bad", i assume, you see a picture like this:
This seems to be due to the fact, that subtracting images, which could produce negative numbers instead starts "from the top" of the brightness-scale, like this:
4-5 = 255 instead of -1.
This is a byproduct, on how the pictures are loaded.
If i use "plain numpy array", get a picture like this:
So maybe try handling your pictures as numpy arrays: take a look over here
[Edit: This is due to the dtype uint8 of the numpy arrays. Changing to int should already be enough]
I'm using PIL to transform a portion of the screen perspectively.
The original image-data is a pygame Surface which needs to be converted to a PIL Image.
Therefore I found the tostring-function of pygame which exists for that purpose.
However the result looks pretty odd (see attached screenshot). What is going wrong with this code:
rImage = pygame.Surface((1024,768))
#draw something to the Surface
sprite = pygame.sprite.RenderPlain((playboard,))
sprite.draw(rImage)
pil_string_image = pygame.image.tostring(rImage, "RGBA",False)
pil_image = Image.fromstring("RGBA",(660,660),pil_string_image)
What am I doing wrong?
As I noted in a comment, pygame documentation
for pygame.image.fromstring(string, size, format, flipped=False) says “The size and format image must compute the exact same size as the passed string buffer. Otherwise an exception will be raised”. Thus, using (1024,768) in place of (660,660), or vice versa – in general, the same dimensions for the two calls – is more likely to work. (I say “more likely to work” instead of “will work” because of I didn't test any cases.)
The reason for suspecting a problem like this: The strange look of part of the image resembles a display screen which is set to a raster rate it can't synchronize; ie, lines of the image start displaying at points other than at the left margin; in this case because of image line lengths being longer than display line lengths. I'm assuming the snowflakes are sprites, generated separately from the distorted image.
I have a number of images from Chinese genealogies, and I would like to be able to programatically categorize them. Generally speaking, one type of image has primarily line-by-line text, while the other type may be in a grid or chart format.
Example photos
'Desired' type: http://www.flickr.com/photos/63588871#N05/8138563082/
'Other' type: http://www.flickr.com/photos/63588871#N05/8138561342/in/photostream/
Question: Is there a (relatively) simple way to do this? I have experience with Python, but little knowledge of image processing. Direction to other resources is appreciated as well.
Thanks!
Assuming that at least some of the grid lines are exactly or almost exactly vertical, a fairly simple approach might work.
I used PIL to find all the columns in the image where more than half of the pixels were darker than some threshold value.
Code
import Image, ImageDraw # PIL modules
withlines = Image.open('withgrid.jpg')
nolines = Image.open('nogrid.jpg')
def findlines(image):
w,h, = image.size
s = w*h
im = image.point(lambda i: 255 * (i < 60)) # threshold
d = im.getdata() # faster than per-pixel operations
linecolumns = []
for col in range(w):
black = sum( (d[x] for x in range(col, s, w)) )//255
if black > 450:
linecolumns += [col]
# return an image showing the detected lines
im2 = image.convert('RGB')
draw = ImageDraw.Draw(im2)
for col in linecolumns:
draw.line( (col,0,col,h-1), fill='#f00', width = 1)
return im2
findlines(withlines).show()
findlines(nolines).show()
Results
showing detected vertical lines in red for illustration
As you can see, four of the grid lines are detected, and, with some processing to ignore the left and right sides and the center of the book, there should be no false positives on the desired type.
This means that you could use the above code to detect black columns, discard those that are near to the edge or the center. If any black columns remain, classify it as the "other" undesired class of pictures.
AFAIK, there is no easy way to solve this. You will need a decent amount of image processing and some basic machine learning to classify these kinds of images (and even than it probably won't be 100% successful)
Another note:
While this can be solved by only using machine learning techniques, I would advice you to start searching for some image processing techniques first and try to convert your image to a form that has a decent difference for both images. For this you best start reading about the fft. After that have a look at some digital image processing techniques. When you feel comfortable that you have a decent understanding of these, you can read up on pattern recognition.
This is only one suggested approach though, there are more ways to achieve this.