I have a large image from which I need to extract some information. I am using Python opencv library for image enhancement. Using opencv methods I extracted a part of the image that interests me. It is given below.
Tesseract is not able to distinguish between 0 and O. It is giving output as all zeroes. I tried with --psm options 6 and others, but to no avail. I am using the latest stable release of tesseract (v3) on windows.
I am continuing to work on this problem. Any help would be appreciated. Thanks.
EDIT:
I found a solution for this. Used box output from tesseract. Need to give makebox as an argument to tesseract command. The box output contains the X and Y coordinates of a 'box' around each character read. I derived the ratio of width to height and with some input images trained a Logistic Regression Model to predict the output a 0 or O. Then I used this trained model for test images and it worked like a charm.
Related
I'm trying to coordinate two systems; one that was already pre-trained on the MuJoCo MsPacman-v0 and another that only supports the gym version for training. With both systems working on the rgb image representations, the color palette discrepancy is problematic (Gym Output Left, Expected Right):
Is there a simple way to fix this (i.e. pixel mapping trick or some environment setting I'm not aware of), or is there something more involved that I have to do? Of note, the actual simulation I'm running uses gym.
Heyo, sorry about that! Look's like I'm dumb.
Context; I was trying to incorporate the SPACE detection model into DreamerV2, and I didn't see the little footnote with Space:
For some reason we were using BGR images for our Atari dataset and our
pretrained models can only handle that. Please convert the images to
BGR if you are to test your own Atari images with the provided
pretrained models.
So yeah... if you see something like this, I guess this is what's wrong...
Im trying to see the output after embedding an image. I tried using an opencv image function and ask to print the result, but it is not working. Any suggestions on how to test embedding using dlib?
Step1: Get the position of each image in the embedding space.
Step2: Visualize it. May by the dimension is higher than 2D/3D you can use some methods like t-SNE to do the visualization.
I am about to start learning CV and ML. I want to start by solving a problem. Below I am sharing an image and I want to extract each symbol and location from an image and create a new image with those extracted symbols in a pattern just like in source image. After that, I will do a translation job. Right now how can I or which steps I should follow to extract the symbols and find those symbols from the dataset (in terms of Gardiner's sign list) and place in the new image?
I know there is some computer vision + machine learning is involved in this process because symbols are not 100% accurate because these are too old symbols. I don't know from where to start and end. I have plans to use Python. Also, share if you know anyone already done this. Thank you.
Run sobel edge detection on source images in Gardiner's sign list.
Train a CNN on the list.
Normalize the contrast of the source image.
Run sobel edge detection on the source image. (referred to as source image heretofore)
Evaluate in the CNN by varying heights and widths(from the largest to smallest) on the source image.
Select the highest probability glyph. Output the corresponding Glyph from Gardiner's list at that start position and the corresponding height and width.
I do not claim this can be done in six simple steps, but this is the approach I would take.
i want to detect the font of text in an image so that i can do better OCR on it. searching for a solution i found this post. although it may seem that it is the same as my question, it does not exactly address my problem.
background
for OCR i am using tesseract, which uses trained data for recognizing text. training tesseract with lots of fonts reduces the accuracy which is natural and understandable. one solution is to build multiple trained data - one per few similar fonts - and then automatically use the appropriate data for each image. for this to work we need to be able to detect the font in image.
number 3 in this answer uses OCR to isolate image of characters along with their recognized character and then generates the same character's image with each font and compare them with the isolated image. in my case the user should provide a bounding box and the character associated with it. but because i want to OCR Arabic script(which is cursive and character shapes may vary depending on what other characters are adjacent to it) and because the bounding box may not be actually the minimal bounding box, i am not sure how i can do the comparing.
i believe Hausdorff distance is not applicable here. am i right?
shape context may be good(?) and there is a shapeContextDistanceExtractor class in opencv but i am not sure how i can use it in opencv-python
thank you
sorry for bad English
I'm kinda new both to OCR recognition and Python.
What I'm trying to achieve is to run Tesseract from a Python script to 'recognize' some particular figures in a .tif.
I thought I could do some training for Tesseract but I didn't find any similar topic on Google and here at SO.
Basically I have some .tif that contains several images (like an 'arrow', a 'flower' and other icons), and I want the script to print as output the name of that icon. If it finds an arrow then print 'arrow'.
Is it feasible?
This is by no means a complete answer, but if there are multiple images in the tif and if you know the size in advance, you can standardize the image samples prior to classifying them. You would cut up the image into all the possible rectangles in the tif.
So when you create a classifier (I don't mention the methods here), the end result would take a synthesis of classifying all of the smaller rectangles.
So if given a tif , the 'arrow' or 'flower' images are 16px by 16px , say, you can use
Python PIL to create the samples.
from PIL import Image
image_samples = []
im = Image.open("input.tif")
sample_dimensions = (16,16)
for box in get_all_corner_combinations(im, sample_dimensions):
image_samples.append(im.crop(box))
classifier = YourClassifier()
classifications = []
for sample in image_samples:
classifications.append (classifier (sample))
label = fuse_classifications (classifications)
Again, I didn't talk about the learning step of actually writing YourClassifier. But hopefully this helps with laying out part of the problem.
There is a lot of research on the subject of learning to classify images as well as work in cleaning up noise in images before classifying them.
Consider browsing through this nice collection of existing Python machine learning libraries.
http://scipy-lectures.github.com/advanced/scikit-learn/index.html
There are many techniques that relate to images as well.