How to detect only specific objects like cardboard boxes?

How to detect only specific objects like cardboard boxes? - python

Imagine a factory warehouse, there are different sized boxes which loaded with products. I want to measure these.boxes with a camera. There is no background, background is natural factory warehouse. I have a code for measuring. But this code measuring everything. I want to measure only boxes.
I have code for measuring objects but How to detect only cardboard boxes with opencv?
Should i detect with color filter or with yolo?
Also maybe user will measure other objects instead of cardboard boxes like industrial machines etc. Maybe i need more general solution...
camera facing with width,height(180degrees).
As you see code measuring everything but I want only cardboard Boxes. I have tried filter colors with Hue, Saturation, Volume. But it didnt work because I'm using Aruco perimeter. Aruco perimeter is Black and White. When i lost other colors, Aruco perimeter is lost too. And maybe there would be different colored boxes.

You can try detecting rectangular or quadrilateral contours in a copy of B/W frame and then correlate the contours to the original(colored) frame. Thus you can apply color filtering on it to detect cardboard box. The advantage of this over DL would be DL might take more processing power.

Did your use any deep learning(DL) methods for cardboardboxes detection? If not, I recommend you to use yolov5 method based DL or choose some machine learning methods such as HOG with SVM. The advantage you use DL methods is that you only need label this boxes and pass the data(images and annotations) to model without worrying about whatever other object.

I tagged the cells using Labelme software (you can tag any object with it), and then I trained yolact's model with the images and annotations. Figure 1 shows the results that the model predicted a new image.

Related

Text Documents Image Alignment

I am trying different image alignment approaches to align the images containing texts using Computer Vision. I have tested following image alignment approaches:
Probabilistic Houghlines Transform to align images according to the detected lines. https://medium.com/p/97b61eeffb20 is my implementation. But that didn't help me as expected.
Implemented SIFT and ORB to detect and align images according to the template image but instead of aligning all images, it distorts the image sometimes. I have used https://pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/ as a reference.
Edge detection followed contour detection, corner detection and perspective transformation. But it doesn't work with images having different background types. This is the reference example https://pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
morphology followed by contour detection and masking. Reference Crop exactly document paper from image
Trained the YOLO(You only look once) object detector to detect the documents but it detects the bounding box, my requirement is Quardilaterl with four image corners from which I can align documents using perspective transform.
Calculating the skewness and deskewing. Reference: https://github.com/sbrunner/deskew
But I couldn't align the document(identity documents such as citizenship, passport, license etc) images with different backgrounds perfectly using the above approaches.
This is a sample test image(important information are hidden due to privacy issue).
Is there are any other approaches of image alignment which can align the document images perfectly by correcting the skewness of the available text. My main focus is to extract the information form document using OCR preserving the information sequence in the document image.
Thank you!

To me, the third approach seems to be the most promising. But as you said, a cluttered background is a problem. Two ideas came to me about this:
Implementing a GUI as a fallback solution, so the user could select the contour.
Render some artificial dataset of official documents against a cluttered background and train a CNN to predict a segmentation map of the document. This map could be used then, as an initialization for the edge detection / contour detection. This answer contains two links to databases of images of official documents. Maybe these are of some use for you.

OpenCV, python - detect and removing objects / buildings in an image

I am new to python and opencv. I am analysing images of clouds, and I need to remove the buildings, so that the subsequent analysis will have less noise. I tried using Canny edge detection and then fill in the contours, but did not get too far. I also tried thresholding by pixel colours, but cannot reliably exclude just the buildings and not other parts of the image containing the clouds.
Is there a way I can efficiently and accurately remove the buildings and keep all of the clouds/sky? Thanks for the tips in advance.

You could use a computer vision model that finds the buildings. There may be some open source ones out there. The only one I can think of at the moment is this semantic segmentation model. There should be details on how to implement it, but there could definitely be others out there.
https://github.com/CSAILVision/semantic-segmentation-pytorch
I think one of the classes is buildings and you could theoretically run the model and get the dimensions of the building and take it out.

Eliminating shadow outlines from an object

I'm looking for ideas to help improve my current approach for real-time object detection using computer vision (specifically the opencv library). My goal is to accurately detect a golf-ball through image processing in a large variety of environments/lighting conditions. My detection process works quite well probably 80% of the time, but I'm hitting edge cases that cause failures that I can't ignore. The edge case I'm focusing on right now is extreme shadows being cast by the golf-ball. Here is a pair of example images. The coloured image is my source, and the black and white image is my post-processed result.
There are a few important variables to consider with my application
Source coming from a video feed, and being processed in real-time
It can be windy, so camera shake can be an issue
Camera isn't guaranteed to be incredible quality, so need to account for extra noise/not incredible resolution
I won't go into full details in the processing I'm doing to detect moving objects (Kalman Filter, Background Subtraction, ...) as in this specific example I'm failing to detect a stationary object (ie. ball has gone to rest).
Grab initial frame before any balls are in-frame as my base frame (this will be used for background subtraction)
convert image to greyscale
apply a median blur to eliminate noise, which can otherwise be pretty extreme due to a combination of camera shake, poor camera quality
apply an adaptive threshold on the image. I'm using ADAPTIVE_THRESH_GUASSIAN_C and have been tuning the block size and C constant values as best I can
apply background subtraction (I'm using the built-in CNT Subtractor)
Apply a small dilation kernel to the entire image to try and increase the size of the contours that are left after the above processing, as they can sometimes become quite small after the blur filter for example
use opencv's "findContours" with RETR_TREE, and CHAIN_APPROX_SIMPLE parameters
walk the contour hierarchy, looking for "filled in" contours. The idea being that the golf-balls should mostly be completely filled in, compared to other objects which will have an outline, and I can use the hierarchy to determine which contours are filled in or not (ie do they have child contours)
for each filled in contour, do an enclosing circle. Compare area of enclosing circle to contour area, filter by an acceptable difference to determine how circular the object is
another pass, filtering by min/max area size since I can assume the camera will always be at a similar height to "hone-in" on the object
As you can see from the images above, this approach runs into problems when the ball itself has a lot of contrast due to shadows. To me it looks like the adaptive threshold pass is filtering out the darker part of the ball (due to shadow) which creates a non-circular shape. Perhaps I need to dial in the adaptive threshhold pass to allow for a bit more contrast since we can assume shadows are always on the dark side and a ground shadow should be darker than the shaded part of the ball? I'd also like to completely eliminate the leftover outline of the ground-shadow if possible. My guess is that the edges of the shadow being slightly lighter is the reason they don't get filtered out by my adaptive threshold pass. Open to any and all suggestions :-)

How to detect the relative depth of pixels on a image?

I am trying to obtain the relative depth of pixels of an image. For example, the image in https://www.awn.com/news/nvidia-unveils-quadro-rtx-worlds-first-ray-tracing-gpu . I don't need the precise distance of each pixel, which I believe would be impossible, but I would like to get something as "the green ball is further than the other balls". Is it possible using OpenCV in python? The codes I generated can identify each ball, but not their relative distance or depth, so they are pretty much useless to my intents.

That's an ill-posed problem (you can not measure depth with a single RGB camera) and a topic of resent research. I found this survey paper. Most often a depth image is learned from an RGB image using convolutional neural networks.
However, if you use a lot of prior information about your scene (all objects are circular within in the image and the partially visible circles corresponds to the ones which are in the background), then you might be able to do something with heuristical methods like, thresholding, edge detection or hough transforms, but it won't be easy.

"hard" supervision in image segmentation with python

There are several packages and methods for segmentation in Python. However, if I know apriori that certain pixels (and no others) correspond to a particular object, how can I use that to segment other objects?
Which methods implemented in python would lend themselves to this approach?
Thanks.

You'll want to take a look at semi-automated image segmentation. Image segmentation in a semi-automated perspective means that you know before hand what class certain pixels belong to - either foreground or background. Given this a priori information, the goal is to minimize an energy function that best segments the rest of the pixels into foreground and background.
The best two methods that I know of are Graph Cuts and Random Walks. If you want to study the fundamentals of both of them, you should read the canonical papers by Boykov (Graph Cuts) and Grady (Random Walks) respectively:
Graph Cuts - Boykov: http://www.csd.uwo.ca/~yuri/Papers/ijcv06.pdf
Random Walks - Grady: http://webdocs.cs.ualberta.ca/~nray1/CMPUT615/MRF/grady2006random.pdf
For Graph Cuts, OpenCV uses the GrabCut algorithm, which is an extension of the original Graph Cuts algorithm: http://en.wikipedia.org/wiki/GrabCut. Essentially, you surround a box around the object you want segmented, and Gaussian Mixture Models are used to model the foreground and background and the object will be segmented from the background inside this box. Additionally, you can add foreground and background markers inside the box to further constrain the solution to ensure you get a good result.
Take a look at this official OpenCV tutorial for more details: http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_grabcut/py_grabcut.html
For Random Walks, this is implemented in the scikit-image library and here's a great tutorial on how to get the segmentation up and running off of their official website: http://scikit-image.org/docs/dev/auto_examples/plot_random_walker_segmentation.html
Good luck!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.