Non Maximum Suppression (NMS)

Photo by Danielle Barnes on Unsplash
Photo by Danielle Barnes on Unsplash
Non maximum suppression is a technique used in object detection to filter bounding boxes generated by object detection algorithms. If we don’t use NMS, we will get an image with dense frames.

Non maximum suppression is a technique used in object detection to filter bounding boxes generated by object detection algorithms. If we don’t use NMS, we will get an image with dense frames.

Non Maximum Suppression (NMS)

Non maximum suppression (NMS) is a technique used in the post-processing task of object detection. Generally speaking, object detection algorithm generates many bounding boxes for an object. However, for the same object, we only need one bounding box. Therefore, we need to use NMS to help us filter redundant and irrelevant bounding boxes and retain only the best bounding box.

NMS filters out redundant and irrelevant bounding boxes.
NMS filters out redundant and irrelevant bounding boxes.

Intersection Over Union (IoU)

Before go deeper into the NMS algorithm, let’s first understand what intersection over union (IoU) is. IoU is a metric used to measure how two bounding boxes overlap. Its formula is as follows. From the formula, we can know that when the overlapping area of ​​two bounding boxes is larger, the IoU will be larger, and vice versa.

Intersection over Union, IoU.
Intersection over Union, IoU.

NMS Algorithm

Before we dive into the algorithm, we must first understand the input value format that NMS expects. The bounding boxes generated by object detection algorithm may contain six values.

  • Four values ​​represent the range of a bounding box:
    • It might be the center point (x, y), width, and height.
    • It may also be the point in the upper left corner (x1, y1) and the point in the lower right corner (x2, y2).
  • A confidence score represents how likely the object exists in the box. The higher the value, the higher the probability of containing the object, and the lower the probability, the lower the probability.
  • A class ID represents the ID of the contained object. When the object detection algorithm supports the detection of multiple objects in an image, it generates an ID to represent the class ID of the detected object.

Now we can start to understand the NMS algorithm.

NMS Algorithm, source <a href=
NMS Algorithm, source A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS.

The algorithm looks a bit complicated, but it isn’t. The algorithm is roughly as follows.

  • Line 1: F will contain the bounding boxes selected by the NMS. So, it starts out empty.
  • Line 2: First remove the bounding boxes smaller than the confidence threshold T. Because bounding boxes with too low confidence scores are very likely to not detect objects, that is, irrelevant bounding boxes, we do not consider these boxes at all.
  • Line 5-7: Select a box b with the highest confidence score from B, add b to F, and remove b from B.
  • Line 8-13: For each box r in B, calculate the IoU of b and r. If IoU is greater than or equal to the IoU threshold τ, then r is removed from B. Because when the IoU is too high, it means that the overlapping range of the two boxes is too large, so we only need one of them. Of course, you should choose b with a high confidence score.
  • Line 4-14: Repeat these steps until all boxes in B are removed. Finally, the filtered boxes will be in F.

YOLOv8 Example

YOLOv8 is a very popular object detection model. If you are not familiar with YOLOv8, you can refer to the following articles first.

The default confidence score threshold of YOLOv8 is 0.25, and the default IoU threshold is 0.45. Please refer to the NMS source code of YOLOv8.

We can use the two parameters conf and iou to adjust the confidence score threshold and IoU threshold of YOLOv8. Both values ​​are floating point numbers between 0 and 1. When conf is set higher, more bounding boxes will be filtered out, because bounding boxes less than or equal to conf will be filtered out. However, iou is the opposite. The lower the iou is set, the more bounding boxes will be filtered out, because bounding boxes greater than or equal to iou will be filtered out. Taking the baby penguin detection model trained in the above article as an example, the following is the YOLOv8 command with conf and iou parameters.

YOLOv8Example % .venv/bin/yolo detect mode=predict model=./runs/train/weights/best.pt source=image.jpg project=runs name=predict show_labels=False conf=0.05 iou=0.5

Change iou from the default 0.45 to 0.7 to retain more bounding boxes. Then, let’s compare what impact different conf will have on the predicted results.

NMS with different confidence score thresholds.
NMS with different confidence score thresholds.

Change conf from the default 0.25 to 0.05 to retain more bounding boxes. Then, let’s compare how different iou will affect the prediction results.

NMS with different IoU thresholds.
NMS with different IoU thresholds.

We can find that the default conf and iou of YOLOv8 are actually quite appropriate.

Conclusion

We don’t necessarily need to implement NMS ourselves. YOLOv8 already has NMS implemented, and we can easily set the confidence score and IoU thresholds. In addition, TorchVision also provides NMS functions. However, understanding NMS allows us to better understand how to adjust these thresholds.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like