I am new to the image processing subject. I'm using opencv library for image processing with python. I need to extract symbols and texts related to those symbols for further work. I saw some of developers have done handwritten text recognitions with Neural network, KNN and other techniques.
My question is what is the best way to extract these symbols and handwritten texts related to them?
Example diagram:
Details I need to extract:
No of Circles in the diagram.
What are the texts inside them.
What are the words within square brackets.
Are they connected with arrows or not.
Of course, there is a method called SWT - Stokes Width Transform.
Please see this paper, if you search it by its name, you can find the codes that some students have written during their school project.
By using this method, text recognitions can be applied. But it is not a days job.
Site: Detecting Text in Natural Scenes with
Stroke Width Transform
Hope that it helps.
For handwritten text recognition, try using TensorFlow. Their website has a simple example for digit recognition (with training data). You can use it to implement your own application for recognizing handwritten alphabets as well. (You'll need to get training data for this though; I used a training data set provided by NIST.)
If you are using OpenCV with python, Hough transform can detect circles in images. You might miss some hand drawn circles, but there are ways to detect ovals and other closed shapes.
For handwritten character recognition, there are lots of libraries available.
Since you are now to this area, I strongly recommend LearnOpenCV and and PyImageSearch to help you familiarize with the algorithms that are available for this kind of tasks.
Related
I am trying to detect and extract the "labels" and "dimensions" of a 2D technical drawing which is being saved as PDF using python. I came across a python library call "pytesseract" which has optical character recognition capability. I tried the demo on my image but it fails to detect most of the label/dimensions. Please suggest if there is other way to do it. Thank you**.
** Attached is a sample of the 2D technical drawing I try to detect
** what I am trying to achieve is to able to obtain the coordinate of every dimensions (the 160,120,10 4x45 etc) on the image, and extract the, as well.
About 16 months ago we asked ourselves the same question.
If you want to implement it yourself, I'd suggest the following process:
Extract the Canvas from the sheet
Separate the Cuts
Detect the Measure Regions on each Cut
Detect the individual attributes of the Measure Regions to understand where the Measure Start & End. In your particular example that's relatively easy.
Run the detected Measure Labels through OCR
Associate the Labels to the Measures
Verify your results
Alternatively you can also run it through our API and get the results as JSON.
Here's a quick visualization of the result:
Drawing Read (GT stands for General Tolerances)
I've implemented CBIR app by using standard ConvNet approach:
Use Transfer Learning to extract features from the data set of images
Cluster extracted features via knn
Given search image, extract its features
Give top 10 images that are close to the image in hand in knn network
I am getting good results, but I want to further improve them by adding text search as well. For instance, when my image is the steering wheel of the car, the close results will be any circular objects that resemble a steering wheel for instance bike wheel. What would be the best possible way to input text say "car part" to produce only steering wheels similar to the search image.
I am unable to find a good way to combine ConvNet with text search model to construct improved knn network.
My other idea is to use ElasticSearch in order to do text search, something that ElasticSearch is good at. For instance, I would do a CBIR search described previously and out of the return results, I can look up their description and then use ElasticSearch on the subset of the hits to produce the results. Maybe tag images with classes and allow user to de/select groups of images of interest.
I don't want to do text search before image search as some of the images are poorly described so text search would miss them.
Any thoughts or ideas will be appreciated!
I have not found the original paper, but maybe you might find it interesting: https://www.slideshare.net/xavigiro/multimodal-deep-learning-d4l4-deep-learning-for-speech-and-language-upc-2017
It is about looking for a vector space where both images and text are (multimodal embedding). In this way you can find text similar to a images, images referring to a text, or use the tuple text / image to find similar images.
I think maybe this idea is an interesting point to start from.
I am about to start learning CV and ML. I want to start by solving a problem. Below I am sharing an image and I want to extract each symbol and location from an image and create a new image with those extracted symbols in a pattern just like in source image. After that, I will do a translation job. Right now how can I or which steps I should follow to extract the symbols and find those symbols from the dataset (in terms of Gardiner's sign list) and place in the new image?
I know there is some computer vision + machine learning is involved in this process because symbols are not 100% accurate because these are too old symbols. I don't know from where to start and end. I have plans to use Python. Also, share if you know anyone already done this. Thank you.
Run sobel edge detection on source images in Gardiner's sign list.
Train a CNN on the list.
Normalize the contrast of the source image.
Run sobel edge detection on the source image. (referred to as source image heretofore)
Evaluate in the CNN by varying heights and widths(from the largest to smallest) on the source image.
Select the highest probability glyph. Output the corresponding Glyph from Gardiner's list at that start position and the corresponding height and width.
I do not claim this can be done in six simple steps, but this is the approach I would take.
i want to detect the font of text in an image so that i can do better OCR on it. searching for a solution i found this post. although it may seem that it is the same as my question, it does not exactly address my problem.
background
for OCR i am using tesseract, which uses trained data for recognizing text. training tesseract with lots of fonts reduces the accuracy which is natural and understandable. one solution is to build multiple trained data - one per few similar fonts - and then automatically use the appropriate data for each image. for this to work we need to be able to detect the font in image.
number 3 in this answer uses OCR to isolate image of characters along with their recognized character and then generates the same character's image with each font and compare them with the isolated image. in my case the user should provide a bounding box and the character associated with it. but because i want to OCR Arabic script(which is cursive and character shapes may vary depending on what other characters are adjacent to it) and because the bounding box may not be actually the minimal bounding box, i am not sure how i can do the comparing.
i believe Hausdorff distance is not applicable here. am i right?
shape context may be good(?) and there is a shapeContextDistanceExtractor class in opencv but i am not sure how i can use it in opencv-python
thank you
sorry for bad English
I'm writing an OCR application to read characters from a screenshot image. Currently, I'm focusing only on digits. I'm partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.
I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:
Top: a target digit that I successfully extracted from a screenshot
Middle: the template: a digit from my training set
Bottom: the error (absolute difference) between the top and middle images
The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).
You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits -- for example, it's not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.
Currently, I'm handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn't resolve the problem (false positives on other digits).
I'm a bit reluctant to perform matching using a shifted template (it'd be essentially the same as what I'm doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover's_distance) in 2D: basically, I need a comparison method that isn't as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).
Can anybody suggest a more effective matching method than absolute difference?
I'm doing all this in OpenCV using the C-style Python wrappers (import cv).
I would look into using Haar cascades. I've used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough '2's, '3's, '4's, and so on.
http://alereimondo.no-ip.org/OpenCV/34
http://en.wikipedia.org/wiki/Haar-like_features
OCR on noisy images is not easy - so simple approaches no not work well.
So, I would recommend you to use HOG to extract features and SVM to classify. HOG seems to be one of the most powerful ways to describe shapes.
The whole processing pipeline is implemented in OpenCV, however I do not know the function names in python wrappers. You should be able to train with the latest haartraining.cpp - it actually supports more than haar - HOG and LBP also.
And I think the latest code (from trunk) is much improved over the official release (2.3.1).
HOG usually needs just a fraction of the training data used by other recognition methods, however, if you want to classify shapes that are partially ocludded (or missing), you should make sure you include some such shapes in training.
I can tell you from my experience and from reading several papers on character classification, that a good way to start is by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and it turns out that OpenCV already includes excellent implementations on PCAs and SVMs. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website.
Another library for Python that I recommend you is "scikits.learn". It is very easy to send cvArrays to scikits.learn and run machine learning algorithms on your data. A basic example for OCR using SVM is here.
Another more complicated example using manifold learning for handwritten character recognition is here.