I was asked this peculiar question today and I couldn't give a straight answer.
I have an image depicting base64 text. How can I convert this to text?
I tried this via pytesseract, but in tesseract is a language component that garbles the text. So I don't think that's a way to go. I tried researching a bit, but seems it's not a fairly common problem (to say the least). I've no clue how it could be useful, but for sure it's vexing!
What other things could I try?
What an interesting question. This task isn’t super irregular, however, as I’ve seen people extract plenty of jumbled words from images before. Extracting a long jumbled line of base64 text could prove to be more challenging. Some OCR tools ive seen used are:
opencv-python wrapper of OpenCV
pytesseract wrapper of Tesseract (As you stated)
More OCR wrappers I found other than the two popular ones: https://pythonrepo.com/repo/kba-awesome-ocr-python-computer-vision
For these to work the image also needs to be fairly good quality. If the base64 image is predictable and in a structured form, you could create your own reference images and compare them to the original also to determine each character in the string and bypass the need for an OCR completely.
There is limitations to OCR obviously such as the fact the image needs scaling, contrast, and alignment, and any small error can ruin the base64 text. I obviously have never seen OCR used for such a thing before so I’m unsure where to go past there, but I am positive you are on the right track!
Related
I'm using pytesseract to OCR patent images to turn these old patents into machine readable text. An example image I use is here. The output is here. Basically I'm doing it fairly simply. My relevant code is this:
for each4 in listoffiles:#in list of files get all text into text using tesseract
im = Image.open(path2+'\\'+each4)
text = text + pytesseract.image_to_string(im)
I have experimented a little with modifying the the config file but the only improvement I found was by white-listing [a-zA-Z0-9,.]. I haven't modified the code yet to take into account the config file, as performance is not yet up to my standards. There are so many options I feel like a missed a lot though, so any other suggestions on config file modification would be helpful.
I see from other suggestions to use OpenCV, ndimage, and skimage for python. I am quite inexperienced in computer vision so I wouldn't know where to start with these packages for my problems and guidance would be appreciated.
Other options I am thinking of include using Tesseract 4.0 and training the OCR on my own on patents/adding specific patent related words to the dictionary. Don't know what I should prioritize, but if you have suggestions, luckily I possess the rare ability to read readme files (actually not entirely true, but I will try my best).
I've been trying to implement an OCR program with Python that reads numbers with a specific format, XXX-XXX. I used Google's Cloud Vision API Text Recognition, but the results were unreliable. Out of 30 high-contrast 1280 x 1024 bmp images, only a handful resulted in the correct output, or at least included the correct output in the results. The program tends to omit some numbers, output in non-English languages or sneak in a few special characters.
The goal is to at least output the correct numbers consecutively, doesn't matter if the results are sprinkled with other junk. Is there a way to help the program recognize numbers better, for example limit the results to a specific format, or to numbers only?
I am unable to tell you why this works, perhaps it has to do with how the language is read, o vs 0, l vs 1, etc. But whenever I use OCR and I am specifically looking for numbers, I have read to set the detection language to "Korean". It works exceptionally well for me and has influenced the accuracy greatly.
At this moment it is not possible to add constraints or to give a specific expected number format to Vision API requests, as mentioned here (by the Project Manager of Cloud Vision API).
You can also check all the possible request parameters (in the API reference), none indicating anything to specify number format. Currently only options to:
latLongRect: specify location of the image
languageHints: indicating the expected language for text_detection (list of supported languages here)
I assume you already checked out the multiple responses (with different included image regions) to see if you could reconstruct the text using the location of different digits?
Note that the Vision API and text_detection is not optimized for your data specifically, if you would have a lot of annotated data, it is also an option to actually build your own model using Tensorflow. This blogpost explains a system setup to detect number plates (with a specific number format). All the code is available on Github and the problem seems very related to yours.
From my current understanding, png is relatively easier to decode than bitmap-based formats like jpg in python and is already implemented in python elsewhere. For my own purposes though I need the jpg format.
What are good resources for building a jpg library from scratch? At the moment I only wish to support the resizing of images, but this would presumably involve both encoding/decoding ops.
Edit: to make myself more clear: I am hoping that there is a high level design type treat of how to implement a jpg library in code: specifically considerations when encoding/decoding, perhaps even pseudocode. Maybe it doesn't exist, but better to ask and stand on the shoulders of giants rather than reinvent the wheel.
Use PIL, it already has highlevel APIs for image handling.
If you say "I don't want to use PIL" (and remember, there are private/unofficial ports to 3.x) then I would say read the wikipedia article on JPEG, as it will describe the basics, and also links to in depth articles/descriptions of the JPEG format.
Once you read over that, pull up the source code for PIL JPEGS to see what they are doing there (it is surprisingly simple stuff) The only things they import really, are Image, which is a class they made to hold the raw image data.
Is there a good way to identify (or at least approximate) the graphics program used to obtain a particular image? For instance, I want to know if there is a certain signature that these programs embed into an image. Any suggestions?
If not, is there a reference where I can find what all meta-information can be extracted out of an image?
Certain image file formats do have meta-data. It is format dependent. Digital cameras usually write some of their information into the meta-data. EXIF is what comes to mind. Images not acquired through a digital camera may or may not have relevant meta-data, so you can't consider meta-data of any sort to be a guaranteed reliable identifier. That's about as much as I can give as an answer, alas. I'm sure someone else may have more details.
I have gat some samples about how to open a presentation and access the slides and shapes. But I want to do some more other operations(e.g. generate a thumbnail from a specified slide). What methods can I use? Is there any document illustrating all the functionalities?
Not to discourage you, but my experience using COM from Python is that you won't find many examples.
I would be shocked (but happy to see) if anybody posted a big tutorial or reference using PowerPoint in Python. Probably the best you'll find, which you've probably already found, is this article
However, if you follow along through that article and some of the other Python+COM code around, you start to see the patterns of how VB and C# code converts to Python code using the same interfaces.
Once you understand that, your best source of information is probably the PowerPoint API reference on MSDN.
From looking at the samples Jeremiah pointed to, it looks like you'd start there then do something like this, assuming you wanted to export slide #42:
Slide = Presentation.Slides(42)
Slide.Export FileName, "PNG", 1024, 768
Substitute the full path\filename.ext to the file you want to export to for Filename; string.
Use PNG, JPG, GIF, WMF, EMF, TIF (not always a good idea from PowerPoint), etc; string
The next two numbers are the width and height (in pixels) at which to export the image; VBLong (signed 32-bit (4-byte) numbers ranging in value from -2,147,483,648 to 2,147,483,647)
I've petted pythons but never coded in them; this is my best guess as to syntax. Shouldn't be too much of a stretch to fix any errors.