Get Image Bounding box from Image of Page with Python

Get Image Bounding box from Image of Page with Python - python

Say I have an image of a book page (something similar to what is pictured below) and want to generate a bounding box for the central image (outlined in green). How might I do this with python? I've tried the normal edge detection route but have found it to be too slow and that it picks up too many edges within the actual image of interest. Meanwhile libraries like detecto attempt to look for objects within the images rather than just detect some rectangular image. I have about 100 of these that I'd like to process and generate bounding boxes for.
100 is too few for me too want to train any kind of AI model, but too many to just do manually. Any thoughts on an approach?

Related

Text Documents Image Alignment

I am trying different image alignment approaches to align the images containing texts using Computer Vision. I have tested following image alignment approaches:
Probabilistic Houghlines Transform to align images according to the detected lines. https://medium.com/p/97b61eeffb20 is my implementation. But that didn't help me as expected.
Implemented SIFT and ORB to detect and align images according to the template image but instead of aligning all images, it distorts the image sometimes. I have used https://pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/ as a reference.
Edge detection followed contour detection, corner detection and perspective transformation. But it doesn't work with images having different background types. This is the reference example https://pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
morphology followed by contour detection and masking. Reference Crop exactly document paper from image
Trained the YOLO(You only look once) object detector to detect the documents but it detects the bounding box, my requirement is Quardilaterl with four image corners from which I can align documents using perspective transform.
Calculating the skewness and deskewing. Reference: https://github.com/sbrunner/deskew
But I couldn't align the document(identity documents such as citizenship, passport, license etc) images with different backgrounds perfectly using the above approaches.
This is a sample test image(important information are hidden due to privacy issue).
Is there are any other approaches of image alignment which can align the document images perfectly by correcting the skewness of the available text. My main focus is to extract the information form document using OCR preserving the information sequence in the document image.
Thank you!

To me, the third approach seems to be the most promising. But as you said, a cluttered background is a problem. Two ideas came to me about this:
Implementing a GUI as a fallback solution, so the user could select the contour.
Render some artificial dataset of official documents against a cluttered background and train a CNN to predict a segmentation map of the document. This map could be used then, as an initialization for the edge detection / contour detection. This answer contains two links to databases of images of official documents. Maybe these are of some use for you.

How to detect only specific objects like cardboard boxes?

Imagine a factory warehouse, there are different sized boxes which loaded with products. I want to measure these.boxes with a camera. There is no background, background is natural factory warehouse. I have a code for measuring. But this code measuring everything. I want to measure only boxes.
I have code for measuring objects but How to detect only cardboard boxes with opencv?
Should i detect with color filter or with yolo?
Also maybe user will measure other objects instead of cardboard boxes like industrial machines etc. Maybe i need more general solution...
camera facing with width,height(180degrees).
As you see code measuring everything but I want only cardboard Boxes. I have tried filter colors with Hue, Saturation, Volume. But it didnt work because I'm using Aruco perimeter. Aruco perimeter is Black and White. When i lost other colors, Aruco perimeter is lost too. And maybe there would be different colored boxes.

You can try detecting rectangular or quadrilateral contours in a copy of B/W frame and then correlate the contours to the original(colored) frame. Thus you can apply color filtering on it to detect cardboard box. The advantage of this over DL would be DL might take more processing power.

Did your use any deep learning(DL) methods for cardboardboxes detection? If not, I recommend you to use yolov5 method based DL or choose some machine learning methods such as HOG with SVM. The advantage you use DL methods is that you only need label this boxes and pass the data(images and annotations) to model without worrying about whatever other object.

I tagged the cells using Labelme software (you can tag any object with it), and then I trained yolact's model with the images and annotations. Figure 1 shows the results that the model predicted a new image.

Find Coordinates of cropped image (JPG) from it's original

I have a database of original images and for each original images there are various cropped versions.
This is an example of how the image look like:
Original
Horizontal Crop
Square Crop
This is a very simple example, but most images are like this, some might taken a smaller section of the original image than others.
I was looking at OpenCV in python but I'm very new to this kind of image processing.
The idea is to be able to save the cropping information separate from the image to save space and then generate all the cropping and different aspect ratio on the fly with a cache system instead.

The method you are looking for is called "template matching". You find examples here
https://docs.opencv.org/trunk/d4/dc6/tutorial_py_template_matching.html
For your problem, given the large images, it might be a good idea to constrain the search space by resizing both images by the same factor. So that searching a position that isn't as precise, but allows then to constrain the actual full pixel sized search to a smaller region around that point.

How to automate grabcut algorithm in opencv python?

I have used the interactive grabcut.py available at opencv.
Visit: https://github.com/opencv/opencv/blob/master/samples/python/grabcut.py
I successfully segmented one image by drawing rectangle with the mouse and subsequently segmenting it.
But i want to apply the same segmentation (i.e i want the program to take the same rectangle values) for a set of 10 images instead of drawing the rectangles individually on all of the 10 images.
Can anyone please help me?

Homography of soccer field

Okay so i am trying to find homography of a soccer match. What i have till now is
Read images from a folder which is basically many cropped images of a template soccer field. Basically this has images for center circle and penalty lines etc.
Read video stream from a file and crop it into many smaller segments.
Loop inside the images in video stream and inside that another loop for images that i read from folder.
Now in the two images that i get through iteration , i applied a green filter because of my assumption that field is green
Use orb to find points and then find matches.
Now the Problem is that because of players and some noise from croud, i am unable to find proper matches for homography. Also removing them is a problem because that also tends to hide the soccer field lines that i need to calculate the homography on.
Any suggestions on this is greatly appreciated. Also below are some sample code and images that i am using.
"Code being used"
Sample images
Output that i am getting
The image on right of output is a frame from video and that on left is the same sample image that i uploaded after filterGreen function as can be seen from the code.
Finally what i want is for the image to properly map to center circle so i can draw a cube in center, Somewhat similar to "This example" . Thanks in advance for helping me out.

An interesting technique to throw at this problem is RASL. It computes homographies that align stacks of related images. It does not require that you specify corresponding points on the images, but operates directly on the image pixels. It is robust against image occlusions (eg, players moving in the foreground).
I've just released a Python implementation here: https://github.com/welch/rasl
(there are also links there to the original RASL paper, MATLAB implementation, and data).
I am unsure if you'd want to crop the input images to that center circle, or if the entire frames can be aligned. Try both and see.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.