Preprocessing seven segment image for Tesseract OCR using OpenCV

Preprocessing seven segment image for Tesseract OCR using OpenCV - python

I'm trying to develop a system which can convert a seven-segment display on an old analog pressure output system to text, such that the data can be processed by LabVIEW. I've been working on image processing to get Tesseract (using v3.02) to recognize the numbers correctly, but have been hitting some roadblocks and don't quite know how to proceed. This is what I've got so far:
Image needs to be a height of between 50-100 pixels for Tesseract to read it correctly. I've found the best results with a height of 50.
Image needs to be cropped such that there is only one line of text.
Image should be in black and white
Image should be relatively level from left to right.
I've been using the seven-segment training data 'letsgodigital'. This is the code for the image manipulation I've been doing so far:
ret, i = video.read()
h,width,channels = i.shape #get dimensions
g = cv2.cvtColor(i,cv2.COLOR_BGR2GRAY)
histeq=cv2.equalizeHist(g) #spreads pixel values across entire spectrum
_,t = cv2.threshold(histeq,150,225,cv2.THRESH_BINARY) #thresholds histeq
cropped = t[int(0.4*h):int(.6*h), int(0.1*width):int(0.9*width)]
rotated = imutils.rotate_bound(cropped, angle)
resized = imutils.resize(rotated,height=resizing_height)
Some numbers work better than others - for example, '1' seems to have a lot of trouble. The numbers occurring after the '+' or '-' often don't show up, and the '+' often shows up as a '-'. I've played around with the threshold values a bit, too.
The last three parts are because my video sample I've been drawing from was slightly askew. I could try taking some better data to work with, and I could also try making my own training data over the standard 'letsgodigital' lang. I feel like I'm not doing the image processing in the best way though, and would appreciate some guidance.
I plan to use some degree of edge detection to autocrop to the display, but for now I've just been trying to keep it simple and manually get the results I want. I've uploaded sample images with various degrees of image processing applied at http://imgur.com/a/vnqgP. It's difficult because sometimes I get the exact right answer from tesseract, and other times get nothing. The camera or light levels haven't really changed though, which makes me think it's a problem with my training data. Any suggestions or direction on where I should go would be much appreciated!! Thank you

For reading seven segment digits, normal OCR programs like tesseract don't usually work too well because of the space between individual segments. You should try ssocr, which was made specifically for reading seven segment digits. However, your preprocessing will need to be better as ssocr expects the input to be a single row of seven segment digits.
References - https://www.unix-ag.uni-kl.de/~auerswal/ssocr/
Usage example - http://www.instructables.com/id/Raspberry-Pi-Reading-7-Segment-Displays/

Related

Eliminate the background (the common points) of 3 images - OpenCV

Forgive me but I'm new in OpenCV.
I would like to delete the common background in 3 images, where there is a landscape and a man.
I tried some subtraction codes but I can't solve the problem.
I would like output each image only with the man and without landscape
Are there in OpenCV Algorithms what do this do? (then without any manual operation so no markers or other)
I tried this python code CV - Extract differences between two images
but not works because in my case i don't have an image with only background (without man).
I thinks that good solution should to Compare all the images and save those "points" that are the same at least in an image.
In this way I can extrapolate a background (which we call "Result.jpg") and finally analyze each image and cut those portions that are also present in "Result.jpg".
You say it's a good idea? Do you have other simplest ideas?

Without semantic segmentation, you can't do that.
Because all you can compute is where two images differ, and this does not give you the silhouette of the person, but an overlapping of two silhouettes. You'll never know the exact outline.

Remove differences between two video frames

Im trying to remove the differences between two frames and keep the non-chaning graphics. Would probably repeat the same process with more frames to get more accurate results. My idea is to simplify the frames removing things that won't need to simplify the rest of the process that will do after.
The different frames are coming from the same video so no need to deal with different sizes, orientation, etc. If the same graphic its in another frame but with a different orientation or scale, I would like to also remove it. For example:
Image 1
Image 2
Result (more or less, I suppose that will be uglier but containing a similar information)
One of the problems of this idea is that the source video, even if they are computer generated graphics, is compressed so its not that easy to identify if a change on the tonality of a pixel its actually a change or not.
Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
During the last couple of days I have tried to achieve it in a Python script by using OpenCV with all kinds of combinations of absdiffs, subtracts, thresholds, equalizeHists, canny but so far haven't found the right implementation and would appreciate any guidance. How would you achieve it?

Im ideally not looking at a pixel level and given the differences in saturation applied by the compression probably is not possible. Im looking for unchaged "objects" in the image. I want to extract the information layer shown on top of whats happening behind it.
This will be extremely hard. You would need to employ proper CV and if you're not an expert in that field, you'll have really hard time.
How about this, forgetting about tooling and libs, you have two images, ie. two equally sized sequences of RGB pixels. Image A and Image B, and the output image R. Allocate output image R of the same size as A or B.
Run a single loop for every pixel, read pixel a and from A and pixel b from B. You get a 3-element (RGB) vector. Find distance between the two vectors, eg. magnitude of a vector (b-a), if this is less than some tolerance, write either a or b to the same offset into result image R. If not, write some default (background) color to R.
You can most likely do this with some HW accelerated way using OpenCV or some other library, but that's up to you to find a tool that does what you want.

how to locate and extract coordinates/data/sub-components of charts/map image data?

I'm working on creating a tile server from some raster nautical charts (maps) i've paid for access, and i'm trying to post-process the raw image data that these charts are distributed as, prior to geo-referencing them and slicing them up into tiles
i've got a two sets of tasks and would greatly appreciate any help or even sample code on how to get these done in an automated way. i'm no stranger to python/jupyter notebooks but have zero experience with this type of data-science to do image analysis/processing using things like opencv/machine learning (or if there's a better toolkit library that i'm not even yet aware of).
i have some sample images (originals are PNG but too big to upload so i encoded them in high-quality JPEGs to follow along/provide sample data).. here's what i'm trying to get done:
validation of all image data.. the first chart (as well as last four) demonstrate what properly formatted charts images should looks like (i manually added a few colored rectangles to the first, to highlight different parts of the image in the bonus section below)
some images will either have missing tile data, as in the 2nd sample image, these are ALWAYS chunks of 256x256 image data, so should be straightforward to identify black boxes of this exact size..
some images will have corrupt/misplaced tiles as in the 3rd image (notice in the center/upper half of the image is a large colorful semi-circle/arcs, it is slightly duplicated beneath and if you look along horizontally you can see the image data is shifted and so these tiles have been corrupted somehow
extraction of information, ultimately once all image data is verified to be valid (the above steps are ensured), there is a few bit of data i really need pulled out of the image, the most important of which is
the 4 coordinates (upper left, upper right, lower left, lower right) of the internal chart frame, in the first image they are highlighted in a small pink box at each corner (the other images don't have them but they are located in a simlar way) - NOTE, because these are geographic coordinates and involve projections, they are NOT always 100% horizontal/vertical of each other.
the critical bit is that SOME images container more than one "chartlet", i really need to obtain the above 4 coordinate for EACH chartlet (some charts have no chartlets, some two to several of them, and they are not always simple rectangular shapes), i may be able to generate for input the number of chartlets if that helps..
if possible, what would also help is extracting each chartlet as a separate image (each of these have a single capital letter, A, B, C in a circle that would be good if it appeared in the filename)
as a bonus, if there was a way to also extract the sections sampled in the first sample image (in the lower left corner), this would probably involve recognize where/if in the image this appears (would probably only appear once per file but not certain) and then extracting based on its coordinates?
mainly the most important is inside a green box and represents a pair of tables (the left table is an example and i believe would always be the same, and the right has a variable amount of columns)
also the table in the orange box would be good to also get the text from as it's related
as would the small overview map in the blue box, can be left as an image
i have been looking at tutorials on opencv and image recognition processes but the content so far has been highly elementary not to mention an overwhelming endless list of algorithms for different operations (which again i don't know which i'd even need), so i'm not sure how it relates to what i'm trying to do.. really i don't even know where to begin to structure the steps needed for undertaking all these tasks or how each should be broken down further to ease the processing.

Extracting text from scanned engineering drawings

I'm trying to extract text from a scanned technical drawing. For confidentiality reasons, I cannot post the actual drawing, but it looks similar to this, but a lot busier with more text within shapes. The problem is quite complex due to issues with letters touching both each other and it's surrounding borders / symbols.
I found an interesting paper that does exactly this called "Detection of Text Regions From Digital Engineering Drawings" by Zhaoyang Lu. It's behind a paywall so you might not be able to access it, but essentially it tries to erase everything that's not text from the image through mainly two steps:
1) Erases linear components, including long and short isolated lines
2) Erases non-text strokes in terms of analysis of connected components of strokes
What kind of OpenCV functions would help in performing these operations? I would rather not write something from the ground up to do these, but I suspect I might have to.
I've tried using a template-based approach to try to isolate the text, but since the text location isn't completely normalized between drawings (even in the same project), it fails in detecting text past the first scanned figure.

I am working on a similar problem. Technical drawings are an issue because OCR software mostly tries to find text baselines and the drawing artifacts (lines etc) get in the way of that approach. In the drawing you specified there are not many characters touching each other. So I suggest to break the image into contiguous (black) pixels and then scan those individually. The height of the contiguous areas should give you also an indication if the contiguous area is text, or a piece of the drawing. To break the image into contiguous pixels, use a flood fill algorithm, and for the scanning Tesseract does a good job.

Obviously I've never attempted this specific task, however if the image really looks like the one you showed me I would start by removing all vertical and horizontal lines. This could be done pretty easily, just set a width threshold and for all pixels with intensity larger than some N value, and after that look the threshold amount of pixels perpendicular to the hypothethic line orientation. If it looks like a line erase it.
More elegant and perhaps better would be to do a hough transform for lines and circles and remove those elements that way.
Also you could maybe try some FFT based filtering, but I'm not so sure about that.
I've never used OpenCV but i would guess it can do the things i mentioned.

Align text for OCR

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR each page. Since the data in these books does not come in well formatted tables, I need to segment each page into rows and columns and then OCR each piece separately.
One of the critical steps is to align the text in the image.
For example, this is a typical page that needs to be aligned:
A solution I found is to smudge the text horizontally (I'm using skimage.ndimage.morphology.binary_dilation) and find the rotation that maximizes the sum of white pixels along the horizontal dimension.
This works fine, but it takes about 8 seconds per page, which given the volume of pages I am working with, is way too much.
Do you know of a better, faster way of accomplishing aligning the text?
Update:
I use scikit-image for image processing functions, and scipy to maximize the count of white pixels along the horizontal axis.
Here is a link to an html view of the Jupyter notebook I used to work on this. The code uses some functions from a module I've written for this project so it cannot be run on its own.
Link to notebook (dropbox): https://db.tt/Mls9Tk8s
Update 2:
Here is a link to the original raw image (dropbox): https://db.tt/1t9kAt0z

Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.
You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:
I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:
and simply take the columnwise mean:
The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:
Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.
Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)
Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.

This is not a full solution but there is more than a comment's worth of thoughts.
You have a margin on the left and right and top and bottom of your image. If you remove that, and even cut into the text in the process, you will still have enough information to align the image. So, if you chop, say 15%, off the top, bottom, left and right, you will have reduced your image area by 50% already - which will speed things up down the line.
Now take your remaining central area, and divide that into, say 10 strips all of the same height but the full width of the page. Now calculate the mean brightness of those strips and take the 1-4 darkest as they contain the most (black) lettering. Now work on each of those in parallel, or just the darkest. You are now processing just the most interesting 5-20% of the page.
Here is the command to do that in ImageMagick - it's just my weapon of choice and you can do it just as well in Python.
convert scan.jpg -crop 300x433+64+92 -crop x10# -format "%[fx:mean]\n" info:
0.899779
0.894842
0.967889
0.919405
0.912941
0.89933
0.883133 <--- choose 4th last because it is darkest
0.889992
0.88894
0.888865
If I make separate images out of those 10 stripes, I get this
convert scan.jpg -crop 300x433+64+92 -crop x10# m-.jpg
and effectively, I do the alignment on the fourth last image rather than the whole image.
Maybe unscientific, but quite effective and pretty easy to try out.
Another thought, once you have your procedure/script sorted out for straightening a single image, do not forget you can often get massive speedup by using GNU Parallel to harass all your CPU's lovely, expensive cores simultaneously. Here I specify 8 processes to run in parallel...
#!/bin/bash
for ((i=0;i<100000;i++)); do
ProcessPage $i
done | parallel --eta -j 8

"align the text in the image" I suppose means to deskew the image so that text lines have the same baseline.
I thoroughly enjoyed reading scientific answers to this quite overengineered task. Answers are great, but is it really necessary to spend so much time (very precious resource) to implement this? There is an abundance of tools available for this function without needing to write a single line of code (unless OP is a CS student and wants to practice the science, but obviously OP is doing this out of necessity to get all images processed). These methods took me back to my college years, but today I would use different tools to process this batch quickly and efficiently, which I do daily. I work for a high-volume document conversion and data extraction service bureau and OCR consulting company.
Here is the result of a basic open and deskew step in ABBYY FineReader commercial desktop OCR package. Deskewing was more than sufficient for further OCR processing.
And I did not need to recreate and program my own browser just to post this answer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.