i want to detect the font of text in an image so that i can do better OCR on it. searching for a solution i found this post. although it may seem that it is the same as my question, it does not exactly address my problem.
background
for OCR i am using tesseract, which uses trained data for recognizing text. training tesseract with lots of fonts reduces the accuracy which is natural and understandable. one solution is to build multiple trained data - one per few similar fonts - and then automatically use the appropriate data for each image. for this to work we need to be able to detect the font in image.
number 3 in this answer uses OCR to isolate image of characters along with their recognized character and then generates the same character's image with each font and compare them with the isolated image. in my case the user should provide a bounding box and the character associated with it. but because i want to OCR Arabic script(which is cursive and character shapes may vary depending on what other characters are adjacent to it) and because the bounding box may not be actually the minimal bounding box, i am not sure how i can do the comparing.
i believe Hausdorff distance is not applicable here. am i right?
shape context may be good(?) and there is a shapeContextDistanceExtractor class in opencv but i am not sure how i can use it in opencv-python
thank you
sorry for bad English
Related
So, the idea here is that the given text, which happens to be Devanagari character such as संस्थानका कर्मचारी and I want to convert given text to image. Here is what I have attempted.
def draw_image(myString):
width=500
height=100
back_ground_color=(255,255,255)
font_size=10
font_color=(0,0,0)
unicode_text = myString
im = Image.new ( "RGB", (width,height), back_ground_color )
draw = ImageDraw.Draw (im)
unicode_font = ImageFont.truetype("arial.ttf", font_size)
draw.text ( (10,10), unicode_text, font=unicode_font, fill=font_color )
im.save("text.jpg")
if cv2.waitKey(0)==ord('q'):
cv2.destroyAllWindows()
But the font is not recognized, so the image consists of boxes, and other characters that are not understandable. So, which font should I use to get the correct image? Or is there any better approach to convert, the given text in character such as those, to image?
So I had a similar problem when I wanted to write text in Urdu onto images, firstly you need the correct font since writing purely with PIL or even openCV requires the appropriate Unicode characters, and even when you get the appropriate font the letters of one word are disjointed, and you don't get the correct results.
To resolve this you have to stray a bit from the traditional python-only approach since I was creating artificial datasets for an OCR, i needed to print large sets of such words onto a white background. I decided to use graphics software for this. Since some like photoshop even allows you to write scripts to automate processes.
The software I went for was GIMP, which allows you to quickly write and run extensions.scripts to automate the process. It allows you to write an extension in python, or more accurately a modified version of python, known as python-fu. Documentation was limited so it was difficult to get started, but with some persistence, I was able to write functions that would read text from a text file, and place them on white backgrounds and save to disk.
I was able to generate around 300k images from this in a matter of hours. I would suggest if you too are aiming for large amounts of text writing then you too rely on python-fu and GIMP.
For more info you may refer to the GIMP Python Documentation
I need to perform OCR on an image of a single character on a clear background. This is for an autonomous UAV student competition, so everything needs to be automatic and the process cannot be manually tailored in flight. The character will however be in a known set (likely capital alpha-numeric). For context, I start with an image at arbitrary orientation:
I then automatically determine the angle, crop down and pre-process the image before running it through OCR. The one thing that I can't automatically compute beforehand (as it's really part of the OCR process) is which of the 4 remaining orientations (see below) is correct. This is key to my question - is it possible to set up the OCR so that it sees an A (or any character) rotated to 90, 180 or 270 degrees as an A rather than thinking it is something else such as a V? From what I can find, OSD features seem to be available in Tesseract but I cannot get them working with single characters.
https://i.stack.imgur.com/TlaOr.png
https://i.stack.imgur.com/ET9hr.png
https://i.stack.imgur.com/maD0E.png
https://i.stack.imgur.com/b4mth.png
Currently, I am using PyTesseract to access a Tesseract OCR installation.
ocrText = pytesseract.image_to_string(imgD, config='-psm 6')
Separately, I have been having trouble with the general accuracy of the system even when presented with a clear image at the correct orientation - any tips on that would also be useful. For instance, this is why I am using PSM 6 instead of PSM 10 - it seems to provide better accuracy, even though 10 is specifically for single characters.
Any help would be much appreciated
Thanks!
An easy solution is to perform the training with all four rotated versions of each character. You can train them as the same character (all 'A') or as distinct ones ('A0', 'A1', 'A2', 'A3').
Note anyway that this can degrade performance a little.
In your case, if the character set is known and there is a nice frame around the characters, you can very well perform the recognition by yourself, without Tesseract.
I have a bunch of scanned images of documents of the same layout (strict forms filled out with variable data) that I need to process with OCR. I can more or less cope with the OCR process itself (convert text images to text) but still have to cope with the annoying fact that the scanned images are distorted either by different degree of rotation, different scaling or both.
Because my method focuses on reading pieces of information from respective cells that are defined as bounding boxes by pixels, I must convert all pictures to a "standard" version where every corresponding cells are in the same pixel position, otherwise my reader "misreads". My question is, how could I "normalize" the distorted images?
I use Python.
Today in high-volume form-scanning jobs we use commercial software with adaptive template matching, which does deskew and selective binarization to prepare the images, but then it adapts field boxes per image, not placing boxes on XY-location.
Deskeing process overall increases the image size. It is visible in this random image from online search:
https://github.com/tesseract-ocr/tesseract/wiki/skew-linedetection.png
Notice how the title of the document was near the top border, and in the deskewed image it is shifted down. In this oversimplified example an XY-based box would not catch it.
I use commercial software for deskewing and image pre-processing. It is quite inexpensive but good. Unfortunately, I believe it will take you only part-way if the data capture method relies on xy-coordinate field matching. I sense your frustration with dealing with it, thus appropriate tools were already created for handling that.
I run a service bureau for such form processing. If you are interested I can further share privately methods how we process.then
I am new to the image processing subject. I'm using opencv library for image processing with python. I need to extract symbols and texts related to those symbols for further work. I saw some of developers have done handwritten text recognitions with Neural network, KNN and other techniques.
My question is what is the best way to extract these symbols and handwritten texts related to them?
Example diagram:
Details I need to extract:
No of Circles in the diagram.
What are the texts inside them.
What are the words within square brackets.
Are they connected with arrows or not.
Of course, there is a method called SWT - Stokes Width Transform.
Please see this paper, if you search it by its name, you can find the codes that some students have written during their school project.
By using this method, text recognitions can be applied. But it is not a days job.
Site: Detecting Text in Natural Scenes with
Stroke Width Transform
Hope that it helps.
For handwritten text recognition, try using TensorFlow. Their website has a simple example for digit recognition (with training data). You can use it to implement your own application for recognizing handwritten alphabets as well. (You'll need to get training data for this though; I used a training data set provided by NIST.)
If you are using OpenCV with python, Hough transform can detect circles in images. You might miss some hand drawn circles, but there are ways to detect ovals and other closed shapes.
For handwritten character recognition, there are lots of libraries available.
Since you are now to this area, I strongly recommend LearnOpenCV and and PyImageSearch to help you familiarize with the algorithms that are available for this kind of tasks.
I'm writing an OCR application to read characters from a screenshot image. Currently, I'm focusing only on digits. I'm partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.
I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:
Top: a target digit that I successfully extracted from a screenshot
Middle: the template: a digit from my training set
Bottom: the error (absolute difference) between the top and middle images
The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).
You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits -- for example, it's not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.
Currently, I'm handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn't resolve the problem (false positives on other digits).
I'm a bit reluctant to perform matching using a shifted template (it'd be essentially the same as what I'm doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover's_distance) in 2D: basically, I need a comparison method that isn't as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).
Can anybody suggest a more effective matching method than absolute difference?
I'm doing all this in OpenCV using the C-style Python wrappers (import cv).
I would look into using Haar cascades. I've used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough '2's, '3's, '4's, and so on.
http://alereimondo.no-ip.org/OpenCV/34
http://en.wikipedia.org/wiki/Haar-like_features
OCR on noisy images is not easy - so simple approaches no not work well.
So, I would recommend you to use HOG to extract features and SVM to classify. HOG seems to be one of the most powerful ways to describe shapes.
The whole processing pipeline is implemented in OpenCV, however I do not know the function names in python wrappers. You should be able to train with the latest haartraining.cpp - it actually supports more than haar - HOG and LBP also.
And I think the latest code (from trunk) is much improved over the official release (2.3.1).
HOG usually needs just a fraction of the training data used by other recognition methods, however, if you want to classify shapes that are partially ocludded (or missing), you should make sure you include some such shapes in training.
I can tell you from my experience and from reading several papers on character classification, that a good way to start is by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and it turns out that OpenCV already includes excellent implementations on PCAs and SVMs. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website.
Another library for Python that I recommend you is "scikits.learn". It is very easy to send cvArrays to scikits.learn and run machine learning algorithms on your data. A basic example for OCR using SVM is here.
Another more complicated example using manifold learning for handwritten character recognition is here.