Non maximum suppression is a technique used in object detection to filter bounding boxes generated by object detection algorithms. If we don’t use NMS, we will get an image with dense frames.
Table of Contents
Non Maximum Suppression (NMS)
Non maximum suppression (NMS) is a technique used in the post-processing task of object detection. Generally speaking, object detection algorithm generates many bounding boxes for an object. However, for the same object, we only need one bounding box. Therefore, we need to use NMS to help us filter redundant and irrelevant bounding boxes and retain only the best bounding box.
Intersection Over Union (IoU)
Before go deeper into the NMS algorithm, let’s first understand what intersection over union (IoU) is. IoU is a metric used to measure how two bounding boxes overlap. Its formula is as follows. From the formula, we can know that when the overlapping area of two bounding boxes is larger, the IoU will be larger, and vice versa.
NMS Algorithm
Before we dive into the algorithm, we must first understand the input value format that NMS expects. The bounding boxes generated by object detection algorithm may contain six values.
- Four values represent the range of a bounding box:
- It might be the center point (x, y), width, and height.
- It may also be the point in the upper left corner (x1, y1) and the point in the lower right corner (x2, y2).
- A confidence score represents how likely the object exists in the box. The higher the value, the higher the probability of containing the object, and the lower the probability, the lower the probability.
- A class ID represents the ID of the contained object. When the object detection algorithm supports the detection of multiple objects in an image, it generates an ID to represent the class ID of the detected object.
Now we can start to understand the NMS algorithm.
The algorithm looks a bit complicated, but it isn’t. The algorithm is roughly as follows.
- Line 1:
F
will contain the bounding boxes selected by the NMS. So, it starts out empty. - Line 2: First remove the bounding boxes smaller than the confidence threshold
T
. Because bounding boxes with too low confidence scores are very likely to not detect objects, that is, irrelevant bounding boxes, we do not consider these boxes at all. - Line 5-7: Select a box
b
with the highest confidence score fromB
, addb
toF
, and removeb
fromB
. - Line 8-13: For each box
r
inB
, calculate the IoU ofb
andr
. If IoU is greater than or equal to the IoU thresholdτ
, thenr
is removed fromB
. Because when the IoU is too high, it means that the overlapping range of the two boxes is too large, so we only need one of them. Of course, you should chooseb
with a high confidence score. - Line 4-14: Repeat these steps until all boxes in
B
are removed. Finally, the filtered boxes will be inF
.
YOLOv8 Example
YOLOv8 is a very popular object detection model. If you are not familiar with YOLOv8, you can refer to the following articles first.
The default confidence score threshold of YOLOv8 is 0.25, and the default IoU threshold is 0.45. Please refer to the NMS source code of YOLOv8.
We can use the two parameters conf
and iou
to adjust the confidence score threshold and IoU threshold of YOLOv8. Both values are floating point numbers between 0 and 1. When conf
is set higher, more bounding boxes will be filtered out, because bounding boxes less than or equal to conf
will be filtered out. However, iou
is the opposite. The lower the iou
is set, the more bounding boxes will be filtered out, because bounding boxes greater than or equal to iou
will be filtered out. Taking the baby penguin detection model trained in the above article as an example, the following is the YOLOv8 command with conf
and iou
parameters.
YOLOv8Example % .venv/bin/yolo detect mode=predict model=./runs/train/weights/best.pt source=image.jpg project=runs name=predict show_labels=False conf=0.05 iou=0.5
Change iou
from the default 0.45 to 0.7 to retain more bounding boxes. Then, let’s compare what impact different conf
will have on the predicted results.
Change conf
from the default 0.25 to 0.05 to retain more bounding boxes. Then, let’s compare how different iou
will affect the prediction results.
We can find that the default conf
and iou
of YOLOv8 are actually quite appropriate.
Conclusion
We don’t necessarily need to implement NMS ourselves. YOLOv8 already has NMS implemented, and we can easily set the confidence score and IoU thresholds. In addition, TorchVision also provides NMS functions. However, understanding NMS allows us to better understand how to adjust these thresholds.
Reference
- Non Maximum Suppression: Theory and Implementation in PyTorch, LearnOpenCV.
- Jaun R. Terven and Diana M. Cordova-Esparza, A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS, Machine Learning and Knowledge Extraction.