Text Documents Image Alignment

Text Documents Image Alignment - python

I am trying different image alignment approaches to align the images containing texts using Computer Vision. I have tested following image alignment approaches:
Probabilistic Houghlines Transform to align images according to the detected lines. https://medium.com/p/97b61eeffb20 is my implementation. But that didn't help me as expected.
Implemented SIFT and ORB to detect and align images according to the template image but instead of aligning all images, it distorts the image sometimes. I have used https://pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/ as a reference.
Edge detection followed contour detection, corner detection and perspective transformation. But it doesn't work with images having different background types. This is the reference example https://pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
morphology followed by contour detection and masking. Reference Crop exactly document paper from image
Trained the YOLO(You only look once) object detector to detect the documents but it detects the bounding box, my requirement is Quardilaterl with four image corners from which I can align documents using perspective transform.
Calculating the skewness and deskewing. Reference: https://github.com/sbrunner/deskew
But I couldn't align the document(identity documents such as citizenship, passport, license etc) images with different backgrounds perfectly using the above approaches.
This is a sample test image(important information are hidden due to privacy issue).
Is there are any other approaches of image alignment which can align the document images perfectly by correcting the skewness of the available text. My main focus is to extract the information form document using OCR preserving the information sequence in the document image.
Thank you!

To me, the third approach seems to be the most promising. But as you said, a cluttered background is a problem. Two ideas came to me about this:
Implementing a GUI as a fallback solution, so the user could select the contour.
Render some artificial dataset of official documents against a cluttered background and train a CNN to predict a segmentation map of the document. This map could be used then, as an initialization for the edge detection / contour detection. This answer contains two links to databases of images of official documents. Maybe these are of some use for you.

Related

image segmentation without labeling

Is there any image segmentation model for segmenting images without putting labels for segmented parts (and bounding boxes)? I need to segment my image even for not-trained objects, so I guess I should use a model that does not have specific labels for segmenting.

You seem to be looking for unsupervised image segmentation?
https://github.com/kanezaki/pytorch-unsupervised-segmentation
https://github.com/Mirsadeghi/Awesome-Unsupervised-Segmentation
include some potential solutions.

How to detect only specific objects like cardboard boxes?

Imagine a factory warehouse, there are different sized boxes which loaded with products. I want to measure these.boxes with a camera. There is no background, background is natural factory warehouse. I have a code for measuring. But this code measuring everything. I want to measure only boxes.
I have code for measuring objects but How to detect only cardboard boxes with opencv?
Should i detect with color filter or with yolo?
Also maybe user will measure other objects instead of cardboard boxes like industrial machines etc. Maybe i need more general solution...
camera facing with width,height(180degrees).
As you see code measuring everything but I want only cardboard Boxes. I have tried filter colors with Hue, Saturation, Volume. But it didnt work because I'm using Aruco perimeter. Aruco perimeter is Black and White. When i lost other colors, Aruco perimeter is lost too. And maybe there would be different colored boxes.

You can try detecting rectangular or quadrilateral contours in a copy of B/W frame and then correlate the contours to the original(colored) frame. Thus you can apply color filtering on it to detect cardboard box. The advantage of this over DL would be DL might take more processing power.

Did your use any deep learning(DL) methods for cardboardboxes detection? If not, I recommend you to use yolov5 method based DL or choose some machine learning methods such as HOG with SVM. The advantage you use DL methods is that you only need label this boxes and pass the data(images and annotations) to model without worrying about whatever other object.

I tagged the cells using Labelme software (you can tag any object with it), and then I trained yolact's model with the images and annotations. Figure 1 shows the results that the model predicted a new image.

Get Image Bounding box from Image of Page with Python

Say I have an image of a book page (something similar to what is pictured below) and want to generate a bounding box for the central image (outlined in green). How might I do this with python? I've tried the normal edge detection route but have found it to be too slow and that it picks up too many edges within the actual image of interest. Meanwhile libraries like detecto attempt to look for objects within the images rather than just detect some rectangular image. I have about 100 of these that I'd like to process and generate bounding boxes for.
100 is too few for me too want to train any kind of AI model, but too many to just do manually. Any thoughts on an approach?

Find Coordinates of cropped image (JPG) from it's original

I have a database of original images and for each original images there are various cropped versions.
This is an example of how the image look like:
Original
Horizontal Crop
Square Crop
This is a very simple example, but most images are like this, some might taken a smaller section of the original image than others.
I was looking at OpenCV in python but I'm very new to this kind of image processing.
The idea is to be able to save the cropping information separate from the image to save space and then generate all the cropping and different aspect ratio on the fly with a cache system instead.

The method you are looking for is called "template matching". You find examples here
https://docs.opencv.org/trunk/d4/dc6/tutorial_py_template_matching.html
For your problem, given the large images, it might be a good idea to constrain the search space by resizing both images by the same factor. So that searching a position that isn't as precise, but allows then to constrain the actual full pixel sized search to a smaller region around that point.

Removing text while processing the image

I am working on an application where I need feature like Cam Scanner where document is to be detected in an image. For that I am using Canny Edge detection followed by Hough Transform.
The results look promising but the text in the document is creating issues as explained via images below:
Original Image
After canny edge detection
After hough transform
My issue lies in the third image, the text in original mage near the bottom has forced hough transform to detect the horizontal line(2nd cluster from bottom).
I know I can take the largest quadrilateral and that would work fine in most cases, but still I want to know any other ways where in this processing I can ignore the effect of text on the edges.
Any help would be appreciated.

I solved the issue of text with the help of median filter of size 15(square) in an image of 500x700.
Median filter doesn't affect the boundaries of the paper, but can help eliminate the text completely.
Using that I was able to get much more effective boundaries.

Another approach you could try is to use thresholding to find the paper boundaries. This would create a binary image. You can then examine the blobs of white pixels and see if any are large enough to be the paper and have the right dimensions. If it fits the criteria, you can find the min/max points of this blob to represent the paper.
There are several ways to do the thresholding, including iterative, otsu, and adaptive.
Also, for best results you may have to dilate the binary image to close the black lines in the table as shown in your example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.