I am trying to detect text from a video game. The game uses a custom font, and I have the font files available to me. The text characters in the game are not distorted and are clearly visible. I am aware that similar questions have been asked already, but it seems to me that the previously presented solutions might be outdated.
What would be the 2023 way of detecting the text and bounding boxes in images using python?
I tried to do it in Tesseract, but the font is not included in the pre-trained model, hence Tesseract performs poorly.
In principle, I could train Tesseract to use the custom font. The problem seems quite straightforward to me: create a custom .traineddata file and tell tesseract to use it when detecting. However, it seems like the Tesseract trainer by Anyline is not available anymore and the Qt-box-editor used in some tutorials to do it manually is not maintained anymore.
The fact that my search for something that in my mind should be quite straightforward and fully automated (use custom font file to create labeled text images, train tesseract on them, use the trained model) only leads to outdated and quite lengthy instructions that involve many manual steps, indicates to me that either I am trying to use Tesseract for something it was not meant to do or that I am missing some tool that everyone else is using.
Related
i want to detect the font of text in an image so that i can do better OCR on it. searching for a solution i found this post. although it may seem that it is the same as my question, it does not exactly address my problem.
background
for OCR i am using tesseract, which uses trained data for recognizing text. training tesseract with lots of fonts reduces the accuracy which is natural and understandable. one solution is to build multiple trained data - one per few similar fonts - and then automatically use the appropriate data for each image. for this to work we need to be able to detect the font in image.
number 3 in this answer uses OCR to isolate image of characters along with their recognized character and then generates the same character's image with each font and compare them with the isolated image. in my case the user should provide a bounding box and the character associated with it. but because i want to OCR Arabic script(which is cursive and character shapes may vary depending on what other characters are adjacent to it) and because the bounding box may not be actually the minimal bounding box, i am not sure how i can do the comparing.
i believe Hausdorff distance is not applicable here. am i right?
shape context may be good(?) and there is a shapeContextDistanceExtractor class in opencv but i am not sure how i can use it in opencv-python
thank you
sorry for bad English
I want to write a script, that converts unknown images (jpg, png, gif, bmp, tiff, etc.) to a specific resolution and format as well as generating a thumbnail.
the problem is that the compression level, that is totally fine for pictures produces crap for exports of Presentations for example; So I want to differ the conversion settings based on the contents of the image.
Does anyone have experience in doing that kind of stuff in python (or shell scripts whose output is easily pasreable)?
my ideas are:
increase contrast and check histogramm if there are only single spikes left
doing a high pass filtering of the image and check what?
doing face recognition of known letters
the goal is that the recognition should be quite fast (approx. 10 images/second) and quite easy to implement
This is a pretty trivial machine learning problem, I would research the MNIST dataset problems that teach you how to recognize handwritten characters, this process should be very similar. Check out this tutorial and see if you can modify it to recognize graphics vs pictures. If your error rate ends up too high you'll have to try more advanced machine learning techniques.
http://mxnet.io/tutorials/python/mnist.html
I am new to the image processing subject. I'm using opencv library for image processing with python. I need to extract symbols and texts related to those symbols for further work. I saw some of developers have done handwritten text recognitions with Neural network, KNN and other techniques.
My question is what is the best way to extract these symbols and handwritten texts related to them?
Example diagram:
Details I need to extract:
No of Circles in the diagram.
What are the texts inside them.
What are the words within square brackets.
Are they connected with arrows or not.
Of course, there is a method called SWT - Stokes Width Transform.
Please see this paper, if you search it by its name, you can find the codes that some students have written during their school project.
By using this method, text recognitions can be applied. But it is not a days job.
Site: Detecting Text in Natural Scenes with
Stroke Width Transform
Hope that it helps.
For handwritten text recognition, try using TensorFlow. Their website has a simple example for digit recognition (with training data). You can use it to implement your own application for recognizing handwritten alphabets as well. (You'll need to get training data for this though; I used a training data set provided by NIST.)
If you are using OpenCV with python, Hough transform can detect circles in images. You might miss some hand drawn circles, but there are ways to detect ovals and other closed shapes.
For handwritten character recognition, there are lots of libraries available.
Since you are now to this area, I strongly recommend LearnOpenCV and and PyImageSearch to help you familiarize with the algorithms that are available for this kind of tasks.
I found the following algorithm to work fairly well for enhancement of various images of whiteboards, etc:
duplicate the layer, make sure the top layer is active
(gaussian)blur the new layer. You shouldn't be able to read the text anymore.
set the layer mode to dodge
invert the layer
I tried it out in gimp and it looks promising. I'd like to try it en-masse for a large number of images and make it available as a command line tool.
I realize I can script gimp, but that feels too heavy for this purpose. Imagemagick seems ideal for this, but I don't know if I can do the layering with it.
So the question is:
Is there a way to script imagemagick to do the above algorithm without the use of a temporary file?
Is there a reasonable alternative library/tool that can be integrated with Python?
There is a ImageMagick script for whiteboard cleanup here - http://www.fmwconcepts.com/imagemagick/whiteboard/index.php
I am trying to detect a marker in a webcam video feed and overlay it with a 3d object - pretty much exactly like this: http://www.morethantechnical.com/2009/06/28/augmented-reality-with-nyartoolkit-opencv-opengl/
I know artoolkit is the best module for this, but I was hoping to just use opencv in python since I dont know nearly enough c/c++ to be able to use artoolkit. I am hoping someone will be able to get me on the right track towards detecting the marker and determining its location and orientation etc since I have no idea how best to go about this or what functions I should be using.
OpenCV doesn't have marker detection / tracking functionality out of box. However it provides all algorithms needed so it's fairly easy to implement your own one.
The article you are referring to uses OpenCV only for video grabbing. The marker detection is done by NyARToolkit which is derived from ARToolkit. NyARToolkit have versions for Java, C# and ActionScript.
ARToolkit is mostly written in plain C without using fancy C++ features. It's probably easier to use than you thought. The documentation contains well explained tutorials. e.g http://www.hitl.washington.edu/artoolkit/documentation/devstartup.htm
The introductory documentation can help you understand the process of marker detection even if you decide not to use ARToolkit.
I think the most used way to perform marker detection using python and open CV is to use SURF Descriptors.
I have found very useful this video and the linked code you can find in this page. Here you can download the code. I don't know how to overlay it with a 3d object but I'm sure you can do something with pygame or matplotlib.