YOLOv8 Object Detection Tutorial

Photo by Paul Carroll on Unsplash
Photo by Paul Carroll on Unsplash
YOLO (You Only Look Once) is a popular object detection model. Its high performance and high accuracy made it popular quickly. This article will introduce how to use YOLOv8 for object detection.

YOLO (You Only Look Once) is a popular object detection model. Its high performance and high accuracy made it popular quickly. This article will introduce how to use YOLOv8 for object detection.

The complete code for this chapter can be found in .

YOLOv8

YOLO is a model published in 2015 by Joseph Redmon and Ali Farhadi of the University of Washington. After its publication, it quickly became popular due to its high efficiency and high accuracy. Next, Joseph Redmon published YOLOv2 in 2017 and YOLOv3 in 2018. After that, he announced his leaving from Computer Vision because he found that his work was being used in military applications and privacy violation. In 2020, Alexey Bochkovskiy published YOLOv4. In the same year, Ultralytics released YOLOv5. In 2022, Meituan Vision AI Department released YOLOv6. In the same year, Alexey Bochkovskiy published YOLOv7. In 2023, Ultralytics published YOLOv8. The history of YOLO is quite special and interesting. Different versions, released by different people or groups. For a more detailed historical introduction, please refer here.

Project Setup

Next, we will introduce how to use YOLOv8 by building a baby penguin detection model.

First, create a folder called YOLOv8Example and set up a virtual environment for Python3. Then, install ultralytics.

% mkdir YOLOv8Example
% cd YOLOv8Example
YOLOv8Example % python3 -m venv .venv
YOLOv8Example % .venv/bin/pip install ultralytics

Labeling

Before starting to train the model, we must first prepare the training dataset. In this project, we prepared 9 photos of baby penguins, 7 of which were used to train the model and 2 of which were used to validate the model. It should be noted that in real projects, such a dataset is far from sufficient.

After the dataset is ready, we need to start labeling the dataset. The so-called labeling data refers to label an area where has baby penguins in the photo. In addition, we can label more than one object or class in a photo. For example, we can label a certain area with baby penguins in a photo, and another area with polar bears. So, we have a class called Baby Penguin.

Below is a photo with its labeling. The labeling format of YOLOv8 is

  • Each line represents a labeling.
  • The first number represents the index of the class. 0 represents the first class, which is Baby Penguin.
  • The second and third numbers represent the center points x and y of the labeled range. They are expressed in percentages, not pixels.
  • The fourth and fifth numbers represent the width and height of the labeled range. They are expressed in percentages, not pixels.
Baby Penguin dataset.
Baby Penguin dataset.
0 0.482720 0.527601 0.517280 0.749469

There are two software that can help us to label objects.

Roboflow

Roboflow is a free online labeling software. The interface is nice and easy to use.

Labeling using Roboflow.
Labeling using Roboflow.

Yolo Label

Yolo Label is a free labeling software. The interface is not as good-looking as Roboflow, and when using it, the proportions of the pictures will run off. However, this does not affect the labeling result, since the labeled range is expressed in percentages, not pixels. Although it has these shortcomings, it can be used locally.

When using it, Yolo Label will ask you to enter the paths of image folder and class file you want to label.

Labeling using Yolo Label.
Labeling using Yolo Label.

We create a file called classes.txt, and each line in it is a class.

Baby Penguin

Training

Training Dataset

Before starting training, we first place the images and labels into the project. Images for training are placed in datasets/train/images, and labels are placed in datasets/train/labels. Images and labels used for validation are placed under datasets/val.

Training images.
Training images.
Labels of training images.
Labels of training images.

Training Settings

During training, we need to provide a configuration file. Create a file called data.yaml with the following contents. In data.yaml, we

  • Specify the path of the data set in path
  • The folder designated for training images in train
  • The folder specified in val for the images used for validation
  • nc specifies the number of classes
  • names specifies the names of the classes
path: /path/to/waynestalk/YOLOv8Example/datasets
train: train/images
val: val/images
nc: 1
names: [ "Baby Penguin" ]

YOLOv8 Pretrained Detect Models

Finally, before starting training, we need to choose a YOLOv8 pretrained model. Based on this pretrained model, we train the new classes we want to predict. The lower the model, the higher the accuracy, but the slower the speed and the larger the file size. You decide which model to use based on your needs. In the examples of this article, we will use YOLOv8s.

Modelsize (pixels)mAPval
50-95
Speed
CPU ONNX (ms)
Speed
A100 TensorRT (ms)
params (M)FLOPS (B)
YOLOv8n64037.380.40.993.28.7
YOLOv8s64044.9128.41.2011.228.6
YOLOv8m64050.2234.71.8325.978.9
YOLOv8l64052.9375.22.3943.7165.2
YOLOv8x64053.9479.13.5364.2257.8
YOLOv8 Pretrained Detect Models, source from https://docs.ultralytics.com/tasks/detect/#models

CLI

We can train our model through the following command. We will briefly explain each argument, please refer to the official.

  • mode: YOLOv8 provides 5 modes. We are currently using train mode.
  • data: refers to the data.yaml we created previously.
  • imgsz: refers to the size of the training image, which will be the input image size of the final target model. When the size of the image is different from imgsz, YOLOv8 will automatically resize the image to the size of imgsz first.
  • Epochs: An epoch refers to training the entire dataset once.
  • batch: The number of images in each batch of training. This depends on the memory size.
  • project, name: YOLOv8 does not provide parameters for us to specify the output path. Instead, use project and name to specify the output path. The output path will be project/name.
YOLOv8Example % .venv/bin/yolo detect mode=train model=yolov8s.pt data=data.yaml imgsz=640 epochs=5 batch=1 project=runs name=train

After training, we will get the following output data. Among them, best.pt is the trained model.

YOLOv8 Train mode.
YOLOv8 Train mode.

Python

In addition to CLI, we can also use YOLOv8 in Python, as follows.

from ultralytics import YOLO

# Load a pretrained YOLOv8s model
model = YOLO('yolov8s.pt')

# Train and validate the mode
train_results = model.train(data='data.yaml', imgsz=640, batch=1, epochs=5, project='runs', name='train', save=True)

Prediction

Next, we need to use the model we just trained to make predictions, which is object detection.

CLI

We can use the following command to detect whether there are baby penguins in the picture. We don’t need to worry about whether the image size to be predicted is 640×640, YOLOv8 will automatically adjust the size for us.

YOLOv8Example % .venv/bin/yolo detect mode=predict model=./runs/train/weights/best.pt source=image.jpg project=runs name=predict

After the prediction is completed, we can find the predicted results in the following folder.

YOLOv8 Prediction.
YOLOv8 Prediction.

YOLOv8 will automatically help us frame the detected range and label it with the class name.

Detected baby penguins.
Detected baby penguins.

YOLOv8 can not only predict pictures, but also predict videos. Its usage is the same, as follows.

YOLOv8Example % .venv/bin/yolo detect mode=predict model=./runs/train/weights/best.pt source=video.mp4 project=runs name=predict

Python

We can use YOLOv8 in Python.

from ultralytics import YOLO

# Load the trained model
model = YOLO('./runs/train/weights/best.pt')

# Predict an image
model.predict(source='image.jpg', project='run', name='predict', save=True)

# Predict a video
model.predict(source='video.mp4', project='run', name='predict', save=True)

Conclusion

YOLOv8 is very convenient and simple to use. Therefore, when training a model, a lot of time is spent collecting datasets and labeling these data. It provides 5 tasks, namely detect, segment, classify, pose, and OBB. This article only introduces how to use detect.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like