I just started with OCR and looking for a solution for the following problem.
I noticed that in some cases PyTorch detects two separate strings instead of a single string. This is indicated with the green rectangles in the screenshot. Is it possible to change some settings in order to avoid this?
Any suggestion, links and explanations are very welcome. As well as the introduction to new vocabulary in case my question is not well phrased.
Code:
import easyocr
reader = easyocr.Reader(['de'], gpu=False)
result = reader.readtext(file) # file is a jpg
Related
I want to detect the characters in a image like this with python:
In this case the code should return the result '6010001'.
How can I get the result out of this image? What do I need?
For your information, if the solution is a AI-solution, there are about 20.000 labeled images.
Thank you in forward :)
Question: Are all the pictures of similar nature?
Meaning the Numbers are stamped into a similar material, or are they random pictures with numbers with different techniques (e.g. pen drawn, stamped etc.)?
If they are all quite similar (nice contrast as in sample pic), I would recommend to write your "own" AI, otherwise use an existing neural network / library (as I assume you may want to avoid the pain of creating your own neural network - and tag a lot of pictures).
If they pics are quite "similar", following suggested approach:
greyscale Image with increase contrast
define box (greater than a digit), scan over image and count 0s, define by trial valid range to detect a digit, avoid overlaps
each hit take area, split it in sectors, e.g. 6x4, count 0s
build a little knowledge base (csv file) of counts per sector for each number from 0-9 (e.g. a string); you will end up in the database with multiple valid strings per each number, just ensure they are unique (otherwise redefine steps 1-3)
In addition I recommend to make yourself a smart knowledge database, meaning: if the digit could not be identified, save digit picture and result. Then make yourself a little review program where it shows you the undefined digits and the result string, you can then manually add them to your knowledge database for the respective number.
Hope it helps. I used the same approach read a lot of different data from screen pictures and store them in a database. Works like a charm.
#better do it yourself than using a standard neural network :)
You can use opencv-python and pytesseract
import cv2
import pytesseract
img = cv2.imread('img3.jpeg')
text = pytesseract.image_to_string(img)
print(text)
It doesn't work for all images with text, but works for most.
I was asked this peculiar question today and I couldn't give a straight answer.
I have an image depicting base64 text. How can I convert this to text?
I tried this via pytesseract, but in tesseract is a language component that garbles the text. So I don't think that's a way to go. I tried researching a bit, but seems it's not a fairly common problem (to say the least). I've no clue how it could be useful, but for sure it's vexing!
What other things could I try?
What an interesting question. This task isn’t super irregular, however, as I’ve seen people extract plenty of jumbled words from images before. Extracting a long jumbled line of base64 text could prove to be more challenging. Some OCR tools ive seen used are:
opencv-python wrapper of OpenCV
pytesseract wrapper of Tesseract (As you stated)
More OCR wrappers I found other than the two popular ones: https://pythonrepo.com/repo/kba-awesome-ocr-python-computer-vision
For these to work the image also needs to be fairly good quality. If the base64 image is predictable and in a structured form, you could create your own reference images and compare them to the original also to determine each character in the string and bypass the need for an OCR completely.
There is limitations to OCR obviously such as the fact the image needs scaling, contrast, and alignment, and any small error can ruin the base64 text. I obviously have never seen OCR used for such a thing before so I’m unsure where to go past there, but I am positive you are on the right track!
Kindly find the link to the image in question here.1
I've tried using PyTesseract to achieve the intended objective. While it works well to extract the words, it doesn't pick the numbers to any degree of acceptable precision. In fact, it doesn't even pick the numbers I require, at all. I intend to design a program that picks up numbers from four particular locations in an image and stores them in a structured data variable (list/dictionary/etc.) and since I require to do this for a good 2500-odd screenshots via that program, I cannot manually pick the numbers I require, even if it begins to read them correctly. The following was the output returned while using PyTesseract (for the image talked about above).
`Activities Boyer STA
Candle Version 4.1-9 IUAC, N.Delhi - BUILD (Tuesday 24 October 2017 04:
CL-F41. Markers:
—
896 13) 937.0
Back
Total,
Peak-1
Lprnenea dais cinasedl
Ee
1511 Show State
Proceed Append to File`
The code used to produce this output was:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/vatsa/Desktop/Screenshot from 2020-06-15 21-41-06.png')))
Referring to the image, I'm interested in extracting the numbers present at those positions across all the screenshots, where 146.47, 915.16, 354.5 and 18.89 are present in this picture and probably save them as a list. How can I achieve such functionality using Python?
Also, upon opening the image in question with Google Docs(linked here) shows what a great job Google does to extract the text. Can an automated program do the job of using Google Docs to do this conversion and then scrape the desired data values as described before? Either approach towards solving the issue would be acceptable and any attempt at finding a solution would be highly appreciated.
[edit]: The question suggested in the comments section was really insightful, yet fell short of proving effective as the given code was unable to find the contours of the numbers in the image and therefore the model could not be trained.
I am trying to implement U-net and I use https://github.com/jakeret/tf_unet/tree/master/scripts this link as reference. I don't understand which dataset they used. please give me some idea or link which dataset i use.
On their github README.md they show three different datasets, that they applied their implementation to. Their implementation is dataset agnostic, therefore it shouldn't matter too much what data they use if you're trying to solve your own problem with your own data. But if you're looking for a toy data set to play around, check out their demos. There you'll see two readily available examples and how they can be used:
demo_radio_data.ipynb which uses an astronomic radio data example set from here: http://people.phys.ethz.ch/~ast/cosmo/bgs_example_data/
demo_toy_problem.ipynb which uses their built-in data generator of a noisy image with circles that are to be detected.
The latter is probably the easiest one if it comes to just having something to play with. To see how data is generated, check out the class:
image_gen.py -> GrayScaleDataProvider
(with an IDE like PyCharm you can just jump into the according classes in the demo source code)
I'm using pytesseract to OCR patent images to turn these old patents into machine readable text. An example image I use is here. The output is here. Basically I'm doing it fairly simply. My relevant code is this:
for each4 in listoffiles:#in list of files get all text into text using tesseract
im = Image.open(path2+'\\'+each4)
text = text + pytesseract.image_to_string(im)
I have experimented a little with modifying the the config file but the only improvement I found was by white-listing [a-zA-Z0-9,.]. I haven't modified the code yet to take into account the config file, as performance is not yet up to my standards. There are so many options I feel like a missed a lot though, so any other suggestions on config file modification would be helpful.
I see from other suggestions to use OpenCV, ndimage, and skimage for python. I am quite inexperienced in computer vision so I wouldn't know where to start with these packages for my problems and guidance would be appreciated.
Other options I am thinking of include using Tesseract 4.0 and training the OCR on my own on patents/adding specific patent related words to the dictionary. Don't know what I should prioritize, but if you have suggestions, luckily I possess the rare ability to read readme files (actually not entirely true, but I will try my best).