As I understand, anchor-based is using multiple box at once to predict bounding box close to ground truth.
1. Is it correct?
2. And what is anchor-free?
3. What is the difference between anchor-based and anchor-free (methods, pros, cons,...)?
I'm new and thanks for any answer!
The following paper is providing a quick overview which you might be interested in. https://ieeexplore.ieee.org/document/9233610
What I understood is that there are some approaches to finding bounding boxes. These are categorized as
Sliding window: Consider all possible bounding boxes
Anchor-based: Get a way to find prior knowledge on what widths and heights are more suitable for every class type (it is basically the same as learning common aspect ratios for each class). Then tile those bounding boxes in the image and just predict the probability of those tiles.
YOLOv5 uses clustering to estimate anchor-boxes before training and saves them. With that said, it does have its disadvantages. The first is that you must learn anchor-boxes for each class. The second is that your accuracy may be based on your anchor box prediction.
Anchor-free: Instead of using prior knowledge or considering all possibilities, they predict two points (top-left and bottom-right) for every object directly.
Yolov3, Yolov4 and Yolov5 use anchors but YOLOX and CornerNet don't.
Though not a complete explanation, I think you get the point.
References:
Anchor Boxes for Object Detection
A fully convolutional anchor-free object detector
Forget the hassles of Anchor boxes with FCOS: Fully Convolutional One-Stage Object Detection
Related
Imagine a factory warehouse, there are different sized boxes which loaded with products. I want to measure these.boxes with a camera. There is no background, background is natural factory warehouse. I have a code for measuring. But this code measuring everything. I want to measure only boxes.
I have code for measuring objects but How to detect only cardboard boxes with opencv?
Should i detect with color filter or with yolo?
Also maybe user will measure other objects instead of cardboard boxes like industrial machines etc. Maybe i need more general solution...
camera facing with width,height(180degrees).
As you see code measuring everything but I want only cardboard Boxes. I have tried filter colors with Hue, Saturation, Volume. But it didnt work because I'm using Aruco perimeter. Aruco perimeter is Black and White. When i lost other colors, Aruco perimeter is lost too. And maybe there would be different colored boxes.
You can try detecting rectangular or quadrilateral contours in a copy of B/W frame and then correlate the contours to the original(colored) frame. Thus you can apply color filtering on it to detect cardboard box. The advantage of this over DL would be DL might take more processing power.
Did your use any deep learning(DL) methods for cardboardboxes detection? If not, I recommend you to use yolov5 method based DL or choose some machine learning methods such as HOG with SVM. The advantage you use DL methods is that you only need label this boxes and pass the data(images and annotations) to model without worrying about whatever other object.
I tagged the cells using Labelme software (you can tag any object with it), and then I trained yolact's model with the images and annotations. Figure 1 shows the results that the model predicted a new image.
We are newbie in the field of programming and we want to learn more, or you could help us code, learn more about YOLO (real-time object detection) with distance measurement from the camera to the object. Also we want the output to be in the form of audio. For example, a car has been detected and there will be an audio recording saying "A car has been detected at a distance of ...cm.
There are two different approaches. One would be to train the YOLO network to output the distance to the detected object along with the outher outputs. This might be well hard and time consuming, especially if you are new to this DNN stuff. Another, easier, way would be to get the bounding box size from the YOLO detections, and calculate the distance based on the known car size against the bounding box size (the smaller is the bounding box -- the farther is the car).
Don't expect cm precision with all this, you'd be lucky if you get your precision within 10%, that's 10m for 100m distance.
I'm trying to understand YOLOv3's algorithm. I've watched Andrew Ng's video on Coursera about uses of anchor box in object detection model, especially in YOLOv3. But I still don't understand some points:
- If I try to change the value of my face detection model's anchor boxes, it leads to very poor results. What is the importance of anchor box in class predicting YOLO?
- YOLOv3 uses only 9 anchor boxes, 3 for each scale for default. So if we have to detect an object from 80 classes, and each class has a different usual shape, what does the shape of these anchor boxes look like?
I'm new to computer vision and machine learning, so my questions could be hard to understand.
The purpose of the Anchor box is to detect multiple objects of different sizes for which the center is located in the same cell. Changing the number of anchor boxes leads to a change in length of ground truth and prediction array.
Assuming a single box(in a cell) has following predictions for 80 classes [Pc,P1,P2...P80,X1,Y1,X2,Y2] i.e 85, then 9 anchor boxes will have 85*9 = 765 length array predictions.
Below is an example of anchor boxes plotted around (0,0) of different scales.
The current version of Tensorflow object detection API supports the representation of bounding boxes without angle - represented by xmin, ymin, xmax, ymax.
I am looking for ideas to represent (and predict) bounding boxes with an angle/orientation.
Like this:
Use backpropagation to identify the pixel contributing the strongest to the activation and put a reasonable threshold to identify which pixels belong to the object.
The default algorithm does this then computes an axis-aligned bounding box of the selected pixels (because it's really simple). You need to run another bounding box algorithm that allows for arbitrary orientation. Wikipedia has some ideas (link).
For how to get the interesting pixels you can look inside the tensorflow code to figure it out.
Oriented bounding boxes is a very interesting topic and it has been kind of ignored by the deep learning based object detection approaches and it's hard to find datasets.
A recent paper/dataset/challenge wich I found very interesting (specially because they pay attention to oriented boxes) can be found here:
http://captain.whu.edu.cn/DOTAweb/index.html
They don't share the code (nor give much details in the paper) of their modification of Fater-RCNN to work with oriented bounding boxes but the dataset by itself and the representation discussion are quite usefull.
I thought about tackling a new project in which I use Tensorflow object detection API to detect Euro pallets (eg. pic).
My ultimate goal is to know how far I am away from the pallet and which relative position I have to it. So I thought about first detecting the euro pallet in an RGB feed from a kinect camera and then using its 3D feature to get the distance to the pallet.
But how do I go about the relative position of the pallet? I could create different classes, for example one is "Front view laying pallet" another one Side view laying pallet etc. but I think for that to be accurate I'd need quite a few pictures for each class for it to be valid? Like 200 for each class?
Since my guess is that there are no such labeled datasets yet thats quite a pain to create by myself.
Another way I could think of, is if I label my pallets with segmentation instead of bounding boxes, maybe there is another way to find out my relative position to the pallet? I never did semantic segmentation labeling myself but can anyone name any good programs which I could use?
I'm hoping someone can help point me in the right direction. Any help would be appreciated.
Some ideas: assuming detection and segmentation with classifier(s) works, one could then try feature detection like edges / lines to obtain clues about its orientation (bounding box).
Of course this will be tricky for simple feature detection because of very different surfaces (wood, dirt), backgrounds and lighting.
Also, "markerless tracking" (a topic in augmented reality) and "bin picking" (actually applied in the automation industry) may be keywords for similar problems, although you are probably not starting with an unordered pile of pallets.