So, the idea here is that the given text, which happens to be Devanagari character such as संस्थानका कर्मचारी and I want to convert given text to image. Here is what I have attempted.
def draw_image(myString):
width=500
height=100
back_ground_color=(255,255,255)
font_size=10
font_color=(0,0,0)
unicode_text = myString
im = Image.new ( "RGB", (width,height), back_ground_color )
draw = ImageDraw.Draw (im)
unicode_font = ImageFont.truetype("arial.ttf", font_size)
draw.text ( (10,10), unicode_text, font=unicode_font, fill=font_color )
im.save("text.jpg")
if cv2.waitKey(0)==ord('q'):
cv2.destroyAllWindows()
But the font is not recognized, so the image consists of boxes, and other characters that are not understandable. So, which font should I use to get the correct image? Or is there any better approach to convert, the given text in character such as those, to image?
So I had a similar problem when I wanted to write text in Urdu onto images, firstly you need the correct font since writing purely with PIL or even openCV requires the appropriate Unicode characters, and even when you get the appropriate font the letters of one word are disjointed, and you don't get the correct results.
To resolve this you have to stray a bit from the traditional python-only approach since I was creating artificial datasets for an OCR, i needed to print large sets of such words onto a white background. I decided to use graphics software for this. Since some like photoshop even allows you to write scripts to automate processes.
The software I went for was GIMP, which allows you to quickly write and run extensions.scripts to automate the process. It allows you to write an extension in python, or more accurately a modified version of python, known as python-fu. Documentation was limited so it was difficult to get started, but with some persistence, I was able to write functions that would read text from a text file, and place them on white backgrounds and save to disk.
I was able to generate around 300k images from this in a matter of hours. I would suggest if you too are aiming for large amounts of text writing then you too rely on python-fu and GIMP.
For more info you may refer to the GIMP Python Documentation
Related
I'm new to Tesseract and wanted to know if there were any ways to clean up photos for a simple OCR program to get better results. Thanks in advance for any help!
The code I am using:
#loads tesseract
tess.pytesseract.tesseract_cmd =
#filepath
file_path =
image = Image.open(file_path)
#processes image
text = tess.image_to_string(image, config='')
print(text)
I've used pytesseract in the past and with the following four modifications, I could read almost anything as long as the text font wasn't too small to begin with. pytesseract seems to struggle with small writing, even after resizing.
- Convert to Black & White -
Converting the images to black and white would frequently improve the recognition of the program. I used OpenCV to do so and got the code from the end of this article.
- Crop -
If all your photos are in similar format, as in the text you need is always in the same spot, I'd recommend cropping your pictures. If possible, pass only the exact part of the photo to pytesseract that you want analyzed, the less the program has to analyze, the better. In my case, I was taking screenshots and would specify the exact region of where to take one.
- Resize -
Another thing you can do is to play with the scaling of the original photo. Sometimes after resizing to almost double it's initial size pytesseract could read the text a lot easier. Generaly, bigger text is better but there's a limit as the photo can become too pixelated after resizing to be recognizable.
- Config -
I've noticed that pytesseract can recognize text a lot easier than numbers. If possible, break the photo down into sections and whenever you have a trouble spot with numbers you can use:
pytesseract.image_to_string(image, config='digits')
i want to detect the font of text in an image so that i can do better OCR on it. searching for a solution i found this post. although it may seem that it is the same as my question, it does not exactly address my problem.
background
for OCR i am using tesseract, which uses trained data for recognizing text. training tesseract with lots of fonts reduces the accuracy which is natural and understandable. one solution is to build multiple trained data - one per few similar fonts - and then automatically use the appropriate data for each image. for this to work we need to be able to detect the font in image.
number 3 in this answer uses OCR to isolate image of characters along with their recognized character and then generates the same character's image with each font and compare them with the isolated image. in my case the user should provide a bounding box and the character associated with it. but because i want to OCR Arabic script(which is cursive and character shapes may vary depending on what other characters are adjacent to it) and because the bounding box may not be actually the minimal bounding box, i am not sure how i can do the comparing.
i believe Hausdorff distance is not applicable here. am i right?
shape context may be good(?) and there is a shapeContextDistanceExtractor class in opencv but i am not sure how i can use it in opencv-python
thank you
sorry for bad English
I have continued making progress on my python roguelike, and dived further into this tutorial: http://roguebasin.roguelikedevelopment.org/index.php?title=Complete_Roguelike_Tutorial,_using_python%2Blibtcod . I also made my own font to use in this game, but I'm not sure how to do so. This is a link to the font image I'm currently using: http://i.imgur.com/j6FdNky.png . In the python code, it sets a custom font to 'arial10x10.png' which is that font image. I tried making an image from my own font, but it got really distorted.
Does anyone know how I could implement my own font? Also, I'm using libtcod, and I only have my own font in a .ttf format.
Thanks.
To render your TrueType font to a bitmap in the way that libtcod expects, you should use a separate library -- font rendering is a surprisingly complex task. FreeType is a very popular open source library for font rendering. You can find a Python interface here: http://code.google.com/p/freetype-py/. You will only need to use FreeType in a tool you'll use when developing your roguelike, not in your actual game.
First, determine what characters you will be using in your roguelike. Determine how to layout these characters on your font sheet. You can also simply choose to use the same layout as the one in the image you posted -- that's a sheet with 32 columns, starting at the space character (character 32).
Using your font rendering library, render each character by itself at the desired size. Pay attention to the size generated for each -- for instance, a '.' will be small and a 'w' will be large, even at the same font size. You must not just calculate a height, but a height above the baseline and a height below the baseline. (Example: if 'A' and 'g' are both 16 pixels tall, it's possible that you'll still need a rectangle taller than 16 pixels to align both correctly within it -- baseline-to-baseline.) Find the smallest rectangle size that will accommodate all of these sizes -- this is how large each cell in your font sheet must be.
Once you know how large your sheet will be, make another pass through all the desired letters to construct your bitmap, putting each letter in its cell. As far as y-positioning goes, all baselines must be aligned. If some characters are wider than others, you can choose to center or to left-align each character within its cell. (Each of these will come with its own weirdnesses -- you're really going to want a fixed-width font for a roguelike.)
Additional tips:
Use antialiasing. This'll make your font easier on the eyes than pure
monochrome.
Don't use colour, render your font in grayscale. libtcod has
functionality to generate coloured text from your grayscale
fontsheet.
Consider whether you want a square font or not. The advantage of a
square font is that "circles" in your roguelike world will look like
circles on the screen. The disadvantage is that square fonts are
generally "uglier" and harder to read.
I'm trying to detect a few uppercase characters from a screen shot. I convert it to black and white with PIL, and then using the code example from the PyTesser page, I run tesser.exe on the image:
from pytesser import *
image = Image.open('fnord.tif')
print image_to_string(image)
I'm using this image:
But it doesn't recognize it as an E, or really anything for that matter. I think that it's a clean enough capture? The noise at the top isn't throwing it off, right?
Is there something I'm missing?
If you are concerned about whether the noise is an issue then manually open the image in MSPaint or something similar, remove the noise and then run the new image through the OCR. This is the best way to learn how the OCR engine works and what confuses it and what doesn't. Every OCR engine works differently.
In this case it could be the small bits of noise are confusing the character zoning process as well. You should check the bounding box values returned from the OCR engine to see if the OCR engine is even looking in the correct location for your word or character.
Some OCR engines have options to remove noise from an image during the OCR process. This is often called depspeckle or noise removal. It would be possible to remove noise using Leptonica ( http://www.leptonica.org ) which is now part of the latest Tesseract images.
Screen fonts present a big challenge to OCR engines because the DPI is often very low. In the case of your 'E' there should be more than enough pixels to be recognised. The heavy stroke weight could be confusing the engine.
Also the commercial engines will usually be more accurate than Tesseract but will also come with expensive licence fees.
I have all the characters of a font rendered in a PNG. I want to use the PNG for texture-mapped font rendering in OpenGL, but I need to extract the glyph information - character position and size, etc.
Are there any tools out there to assist in that process? Most tools I see generate the image file and glyph data together. I haven't found anything to help me extract from an existing image.
I use the gimp as my primary image editor, and I've considered writing a plugin to assist me in the process of identifying the bounding box for each character. I know python but haven't done any gimp plugins before, so that would be a chore. I'm hoping something already exists...
Generally, the way this works is you use a tool to generate the glyph images. That tool will also generate metric information for those glyphs (how big they are, etc). You shouldn't be analyzing the image to find the glyphs; you should have additional information alongside your glyphs that tell where they are, how big they should be, etc.
Consider the letter "i". Depending on the font, there will be some space to the left and right of it. Even if you have a tool that can identify the glyph "i", you would need to know how many pixels of space the font put to the left and right of the glyph. This is pretty much impossible to do accurately for all letters. Not without some very advanced computer vision algorithms. And since the tool that generated those glyphs already knew how much spacing they were supposed to get, you would be better off changing the tool to write the glyph info as well.
You can use PIL to help you automate the process.
Assuming there is at least one row/column of background colour separating lines/characters, you can use the Image.crop method to check each row (and then each column of the row) if it contains only the background colour; thus you get the borders of each character.
Ping if you need further assistance.