YOLO Data Augmentation Explained

Ruman
7 min readJun 4, 2023

--

Image source : https://www.mdpi.com/1424-8220/22/19/7383

Outline

  1. Introduction
  2. Yolo Data augmentation config file
  3. Different Data Augmentations in Yolo
  4. Conclusion

Introduction

When it comes to object detection algorithms, YOLO stands tall as the most popular choice among machine learning practitioner. Its exceptional speed and accuracy make it a preferred option for a wide range of applications. Over time, various iterations of YOLO, such as V5, V7, V8, and YOLO-NAS, have emerged, setting new records for state-of-the-art object detection.

However, fine-tuning these YOLO models to achieve optimal performance requires more than just implementing the algorithm itself. One crucial aspect is data augmentation. Each YOLO version comes with its own default data augmentation configuration, but simply relying on these settings may not yield the desired results for your specific use case.

In this article, we will explore the available data augmentation techniques and understand in detail. By gaining insight into these augmentation options, you will be better equipped to customize and fine-tune your YOLO models according to your specific needs.

Yolo Data augmentation config file

You can find the augmentation configuration specific to your desired YOLO version by referring to these files. Once you have identified the corresponding configuration file, you can easily modify the augmentation settings according to your requirements.

The data configuration files are organized in the following order: V4, V5, V7, and V8.

Different Data Augmentation

Image HSV (Hue, Saturation, and Value) Augmentation

Image source : https://pyimagesearch.com/2021/04/28/opencv-color-spaces-cv2-cvtcolor/

This augmentation technique helps the YOLO model by introducing variations in colour, lighting conditions, and contrast.

By altering the Hue component, we can simulate different lighting conditions, such as daylight or artificial lighting, enabling the model to learn to detect objects under various illumination settings.

Adjusting the Saturation component allows us to control the vividness or dullness of colors, providing the model with exposure to different color distributions.

Lastly, modifying the Value component affects the brightness of the image, allowing the model to adapt to different brightness levels.

By incorporating HSV augmentation into the YOLO, the model becomes more robust and capable of handling real-world scenarios with varying lighting conditions, colour schemes, and contrasts.

In the later versions of YOLO, excluding V4, we can configure HSV augmentation by specifying fractional values for hsv_h, hsv_s, and hsv_v. These values are defined within the range of 0 to 1, allowing for precise control over the desired changes in hue, saturation, and value components of the image.

hsv_h: 0.015  # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)

Image Angle/Degree rotation Augmentation

Image angle/degree rotation; source : https://www.invivoo.com/

Image angle/degree augmentation involves rotating the input images by a certain angle or degree. By introducing rotational variations during training, the model becomes more robust and capable of handling objects that may appear at different orientations or angles in real-world images.

Image rotation augmentation can be configured by specifying rotation degrees value within the range of 0 to 360.

degrees: 0.0  # image rotation (+/- deg)

Image Translation Augmentation

Image Horizontal and Vertical Translation; source : https://www.invivoo.com/

Translation augmentation involves shifting or moving the objects within the image. This technique simulates the scenario where objects are slightly displaced or moved within the frame.

This augmentation improves the model’s accuracy in detecting objects even when they are not centered or located at expected positions.

Image translation augmentation can be configured by specifying translate value within the range of 0 to 1.

translate: 0.2  # image translation (+/- fraction)

Image Prospective Transform Augmentation

Image Perspective Transform; source : https://medium.com/analytics-vidhya/perspective-transformation-in-python-on-live-video-bca558876e8

Perspective transform augmentation involves distorting the image to simulate perspective changes. This is particularly useful for scenarios where objects may appear at different distances or viewpoints.

By applying perspective transformations during training, the YOLO model learns to handle variations in object sizes, shapes, and distortions caused by perspective changes.

perspective: 0.0  # image perspective (+/- fraction), range 0-0.001

Image Scale Augmentation

Image scale augmentation ; source :https://github.com/xkumiyu/numpy-data-augmentation

Image scale augmentation involves resizing the input images to different scales or dimensions. By training the YOLO model on images with varied scales, it becomes more adaptable to objects appearing at different sizes in real-world scenarios.

This augmentation helps the model learn to detect objects with varying scales, enabling it to handle both small and large objects effectively.

Image scale augmentation can be configured by specifying the scale value, which determines the zoom level of the image. When a smaller scale value is used, the image is zoomed out, making objects appear smaller and providing a wider context. Conversely, a larger scale value brings the object closer, resulting in a zoomed-in view.

scale: 0.9  # image scale (+/- gain)

Image Shear Augmentation

Image shear Augmentation; source : https://blog.roboflow.com/shear-augmentation/

Shear augmentation introduces geometric deformations by tilting or skewing the images along the x or y-axis. This technique mimics real-world situations where objects may appear tilted or skewed due to perspective or camera angles.

By incorporating shear transformations during training, the YOLO model becomes more robust in detecting objects with distorted shapes, such as objects viewed from different angles or with perspective effects.

shear: 0.0  # image shear (+/- deg)

Image Flip up-down(Vertically) and Flip Left-Right(Horizontally)

Image Vertical and Horizontal Flip Augmentation; Source: Analytics Vidya

Flip up-down augmentation involves flipping the image vertically, resulting in a mirror image where the top becomes the bottom and vice versa. This augmentation helps the YOLO model learn to detect objects that may appear upside down or inverted in real-world scenarios.

Flip left-right augmentation, on the other hand, entails flipping the image horizontally, creating a mirror image where the left side becomes the right side and vice versa. This augmentation allows the YOLO model to learn and detect objects from different perspectives or viewpoints.

By training on vertically flipped or horizontally flipped images, the model becomes more robust and adaptable, enabling it to accurately detect objects regardless of their orientation.


flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)

Image Mosaic Augmentation

Image Mosaic Augmentation ; Source : https://arxiv.org/pdf/2004.10934.pdf

Mosaic augmentation is a technique that combines several images to create a single training sample with a mosaic-like appearance. This helps the YOLO model learn to detect objects in complicated scenes where objects may overlap or the environment is crowded.

When the model is trained using mosaic augmented images, it becomes better at handling situations where objects are partially hidden or blend together. This augmentation technique improves the model’s ability to accurately detect objects even in challenging scenarios.

mosaic: 1.0  # image mosaic (probability)

Image Mixup Augmentation

MixUp augmentation combines pairs of images and their corresponding object labels to create new training examples. By blending images and their labels, the YOLO model learns to recognize common object features and better generalize across different classes.

This augmentation technique enhances the model’s ability to handle variations in object appearances and improves its overall performance in detecting objects with similar characteristics.

mixup: 0.0  # image mixup (probability)

Image Cutmix Augmentation

Image Cutmix augmentation; source : PyImageSearch

CutMix augmentation involves randomly selecting a portion of one image and pasting it onto another image while maintaining the corresponding object labels.

This technique encourages the YOLO model to learn from mixed and overlapping object instances, promoting better object boundary localization and enhancing the model’s ability to handle partial object views.

Conclusion

In conclusion, data augmentation serves as a valuable tool in simplifying and enhancing the training process of YOLO models, paving the way for more effective and accurate object detection in various practical applications.

By incorporating various augmentation methods, such as HSV augmentation, image angle/degree, translation, perspective transform, image scale, flip up-down, flip left-right, as well as more advanced techniques like Mosaic, CutMix, and MixUp, we can significantly improve the performance and robustness of YOLO models.

References

--

--

Ruman

Senior ML Engineer | Sharing what I know, work on, learn and come across :) | Connect with me @ https://www.linkedin.com/in/rumank/