Doubling PyTorch Image Augmentation Speed [With Code]

8 min readApr 18, 2024

Increase your image augmentation speed by up to 250% using the Albumentations library compared to standard Torchvision augmentation. This blog dives deep into the performance advantages, helping you optimize your deep learning data preprocessing & augmentation for faster training.

Outline

Introduction
Integrating Albumentations into Your PyTorch Model Training
Benchmark Showdown : PyTorch vs. Albumentations Speed Comparison
Data Overview and Benchmark Scenarios
Augmentation Composition Testing
Conclusion

This article is part of an ongoing series “Fine-tuning PyTorch Models”:

Ruman

Ultimate Guide to Fine-Tuning in PyTorch

View list

4 stories

Introduction

Image augmentation is a crucial step in many computer vision workflows, whether you’re training deep learning models or performing essential pre-processing tasks. As an integral part of the data pipeline, it’s vital to have a fast and reliable augmentation solution at your disposal.

So, If you’ve ever wondered:

Does Augmentation Speed Really Matter?

The answer is a resounding yes, especially when working with large datasets. If you’re running small experiments with fairly small data (under 50,000 images), the impact of augmentation speed may not be as pronounced. However, when dealing with datasets in the 100,000 to millions range, optimizing your training pipeline becomes crucial.

Image augmentation is a significant part of the training process for vision models, and if you’re running augmentation operations on CPU during training, it can quickly become a bottleneck. This is where Albumentations will help you to overcome the augmentation speed challenge.

The primary motivation for this article is to share this valuable information with those who are struggling with augmentation performance issues.

To get started with Albumentations, you can find the comprehensive documentation here:

Albumentations Documentation

Albumentations provides a comprehensive, high-performance framework for augmenting images to improve machine learning…

albumentations.ai

Incorporating Albumentations into Your PyTorch Model Training Script

One of the best things about the Albumentations library is how seamlessly it integrates with existing PyTorch training scripts. The integration process is incredibly straightforward and clean. All you need to do is swap the Torchvision transforms with Albumentations, and you’re good to go.

For instance, here’s an example of a Torchvision transform:

torchvision_transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    )
])

To migrate this to Albumentations, the similar transform would look like this:

albumentations_transform = A.Compose([
    A.Resize(256, 256),
    A.RandomCrop(224, 224),
    A.HorizontalFlip(),
    A.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
    ToTensorV2()
])

It’s important to note a few differences between the two approaches:

Input Format: Albumentations functions expect NumPy arrays as input, whereas Torchvision transforms work with PIL images. If you’re using PIL images, you’ll need to convert them to NumPy arrays before applying the Albumentations transform.
Tensor Conversion: In the Torchvision transform, the ToTensor() operation happens before normalization, while in the Albumentations transform, the ToTensorV2() operation happens after normalization.
Tensor Differences: There’s a slight difference in the resulting torch tensor between the two approaches, which could be due to the difference in input format (PIL image vs. NumPy array) and the underlying libraries used for the conversion.

Albumentations Documentation - Migrating from torchvision to Albumentations

Albumentations provides a comprehensive, high-performance framework for augmenting images to improve machine learning…

albumentations.ai

Benchmark Showdown : PyTorch vs. Albumentations Speed Comparison

The best way to evaluate the performance of Albumentations for image augmentation is to compare its speed against the Torchvision library. By benchmarking the two approaches, we can quantify the difference in processing time.

Benchmark Data and Device

Data Used

We’ll be working with the Face Attributes Dataset from Kaggle. The dataset contains 24,000 images of 512x512 pixels and total size of the dataset is approximately 1.8GB.

Face Attributes Grouped

Eyewear | Headwear | Facewear | Accessories

www.kaggle.com

Benchmarking Machine

I’m using the Google Colab CPU Notebook with 2 cores and 12 GB of RAM for this benchmark.

Augmentation Composition Testing

One of the key aspects of image augmentation is the composition of various transformations. To evaluate the performance of Torchvision and Albumentations in this regard, we will define a set of augmentation operations and apply them to both libraries.

Let’s define augmentations for both Torchvision and Albumentations :

pytorch_transform = transforms.Compose([
    transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(degrees=30),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

albumentations_transform = A.Compose([
    A.Resize(IMAGE_SIZE, IMAGE_SIZE),
    A.HorizontalFlip(p=0.5),
    A.Rotate(limit=30, p=0.5),
    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=1),
    A.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
    ToTensorV2()
])

This same augmentation will be applied in both single and batch-wise test.

Single Image Augmentation

In this test, we will pass each of the 24,000 images in the dataset through the augmentation pipeline sequentially. This will allow us to measure the time taken to apply the full composition of transformations to a single image using both Torchvision and Albumentations.

Let’s look at code :

tat_list_torch = []

start_time_main = time.time()
for img_path in tqdm(image_path_list):
  start_time = time.time()
  image = Image.open(img_path)

  # Apply transformations
  image = pytorch_transform(image)

  # print("--- %s seconds time ---" % (time.time() - start_time))
  tat_list_torch.append((time.time() - start_time))
torch_24k_tat = (time.time() - start_time_main)

tat_list_albumentations = []

start_time_main = time.time()
for img_path in tqdm(image_path_list):
  start_time = time.time()
  image = Image.open(img_path)

  # Convert PIL image to numpy array
  image_np = np.array(image)
  
  # Apply transformations
  image = albumentations_transform(image=image_np)['image']

  tat_list_albumentations.append((time.time() - start_time))
albumentations_24k_tat = (time.time() - start_time_main)

The key points of the code are:

Torchvision Augmentation: Here we iterate through the image_path_list and applies the pytorch_transform function to each image. This records the time taken for each image in the tat_list_torch list, and the total time taken in the torch_24k_tat variable.
Albumentations Augmentation: Same here aswell where we iterates through the image_path_list and applies the albumentations_transform function to each image. This records the time taken for each image in the tat_list_albumentations list, and the total time taken in the albumentations_24k_tat variable.

Results :

Latency Comparison Table :

Single Image Augmentation Latency Comparison; ; Image by Author

Latency Comparison Graph :

Graph was smoothed out using a moving average.

Single Image Augmentation Latency Difference (Lower is better); Image by Author

In summary :

Albumentations is significantly faster than Torchvision, with a performance improvement of around 240%.
It can provide substantial speed advantages over Torchvision when processing images sequentially.

Batch-wise Augmentation

To simulate a more realistic training scenario, we will also benchmark the performance of the two libraries when applied to a batch of images. For this, we will create a custom dataset and load it into a PyTorch dataloader. By iterating through the dataloader and applying the augmentation to each batch, we can measure the time taken for batch-level augmentation using Torchvision and Albumentations.

Let’s talk about the batch configuration :

The batch size is configured to 64.
Over 5 epochs, we iterated through 375 batches in every epoch.
In total, we processed 1875 batches.

Let’s look at the code:

Here is the code to evaluate the Torchvision augmentation.

# Let's define a custom Dataset class for our data
class PytorchCustomDataSet(Dataset):
    def __init__(self, csv_file, class_list, transform=None):
        self.df = pd.read_csv(csv_file)
        self.transform = transform
        self.class_list = class_list
        self.image_path_list = self.df.file_path.values
        self.image_label_list = self.df.label.values

    def __len__(self):
        return self.df.shape[0]

    def __getitem__(self, index):
        image = Image.open(self.image_path_list[index])
        label = self.class_list.index(self.image_label_list[index])

        if self.transform:
            image = self.transform(image)
        return image, label


pytorch_data_object = PytorchCustomDataSet("/content/bench_data.csv", CLASS_LABELS, pytorch_transform)
pytorch_dataloader = DataLoader(pytorch_data_object, batch_size=64, shuffle=True)


# Iterate through each batch
pytorch_epoch_tat = []
pytorch_batch_tat = []

for epoch in tqdm(range(TOTAL_EPOCHS)):

  epoch_start = time.time()
  batch_start = time.time()

  for data, target in tqdm(pytorch_dataloader):
    pytorch_batch_tat.append(time.time() - batch_start)
    batch_start = time.time()

  pytorch_epoch_tat.append(time.time() - epoch_start)

Here is the code to evaluate the Albumentations augmentation.

# Let's define a custom Dataset class for our data
class AlbumentationsCustomDataSet(Dataset):
    def __init__(self, csv_file, class_list, transform=None):
        self.df = pd.read_csv(csv_file)
        self.transform = transform
        self.class_list = class_list
        self.image_path_list = self.df.file_path.values
        self.image_label_list = self.df.label.values

    def __len__(self):
        return self.df.shape[0]

    def __getitem__(self, index):
        image = Image.open(self.image_path_list[index])
        label = self.class_list.index(self.image_label_list[index])

        if self.transform:
          # Convert PIL image to numpy array
          image_np = np.array(image)

          # Apply transformations
          augmented = self.transform(image=image_np)

          image = augmented['image']

        return image, label


albm_data_object = AlbumentationsCustomDataSet("/content/bench_data.csv", CLASS_LABELS, albumentations_transform)
albumentations_dataloader = DataLoader(albm_data_object, batch_size=64, shuffle=True)


album_epoch_tat = []
album_batch_tat = []

for epoch in tqdm(range(TOTAL_EPOCHS)):

  epoch_start = time.time()
  batch_start = time.time()

  for data, target in tqdm(albumentations_dataloader):
    album_batch_tat.append(time.time() - batch_start)
    batch_start = time.time()

  album_epoch_tat.append(time.time() - epoch_start)

The key points of the above code are:

Custom Dataset Classes: We have defined two custom dataset classes, PytorchCustomDataSet and AlbumentationsCustomDataSet, to load the image data and apply the respective augmentation transforms.
Batch Configuration: The batch size is set to 64, and the code iterates through 375 batches per epoch for a total of 1875 batches over 5 epochs.
Torchvision Augmentation: Here we’re measuring the time taken for batch-level augmentation using Torchvision’s pytorch_transform function.
Albumentations Augmentation: Similar to torchvision here too we’re measuring the time taken for batch-level augmentation using Albumentation’s albumentations_transform function.
TAT Metrics: The above block of code records the time taken for each batch and each epoch, storing the values in pytorch_batch_tat, pytorch_epoch_tatfor Torchvision and album_batch_tat, and album_epoch_tat for Albumentations.

Result :

Latency Comparison Table :

Batch-wise comparison table; Image by Author

Latency Comparison Graph :

Graph was smoothed out using a moving average.

In summary:

Albumentations maintains a significant speed advantage over Torchvision, with a performance improvement of around 210% in the batch-wise test.
It can provide consistent speed benefits over Torchvision when handling batches of images, which is a common requirement during model training and evaluation.

Key Takeaways

Albumentations Outperforms Torchvision in Augmentation Speed

The benchmark results show that Albumentations is significantly faster than Torchvision in applying the specified augmentation composition, with performance improvements of over 200%.

Potential for Greater Speed with More Resources

The benchmark was conducted on a limited 2-core CPU setup. With access to more computational resources, the performance gap between Albumentations and Torchvision could potentially be even more pronounced.

Ease of Integration and Comprehensive Transformation Support

Albumentations offers a wide range of available transformations (almost all the transformations available with Torchvision) and seamless integration into existing PyTorch training scripts, making it a highly flexible and convenient choice for image augmentation.

Conclusion

Albumentations is a highly efficient image augmentation library that outperforms the built-in Torchvision augmentation in terms of speed. The benchmarking results highlighted the significant performance advantage of using Albumentations, making it a compelling choice for optimizing data augmentation in deep learning workflows. The ease of integration with Pytorch existing training pipeline makes it an ideal choice.

If you enjoyed this article, your applause would be greatly appreciated!

Check out my other articles:

Ultimate Guide to Fine-Tuning in PyTorch : Part 3 -Deep Dive to PyTorch Data Transforms

Explore PyTorch’s Transforms Functions: Geometric, Photometric, Conversion, and Composition Transforms for Robust Model…

rumn.medium.com

Simplifying Machine Learning Workflow with YAML Files

Use YAML files to manage your Machine Learning models configuration, promote code reusability, manage MLOps pipelines…

rumn.medium.com

Doubling PyTorch Image Augmentation Speed [With Code]

Outline

Ultimate Guide to Fine-Tuning in PyTorch

Introduction

Does Augmentation Speed Really Matter?

Albumentations Documentation

Albumentations provides a comprehensive, high-performance framework for augmenting images to improve machine learning…

Incorporating Albumentations into Your PyTorch Model Training Script

Albumentations Documentation - Migrating from torchvision to Albumentations

Albumentations provides a comprehensive, high-performance framework for augmenting images to improve machine learning…

Benchmark Showdown : PyTorch vs. Albumentations Speed Comparison

Benchmark Data and Device

Face Attributes Grouped

Eyewear | Headwear | Facewear | Accessories

Augmentation Composition Testing

Single Image Augmentation

Batch-wise Augmentation

Key Takeaways

Conclusion

Ultimate Guide to Fine-Tuning in PyTorch : Part 3 -Deep Dive to PyTorch Data Transforms

Explore PyTorch’s Transforms Functions: Geometric, Photometric, Conversion, and Composition Transforms for Robust Model…

Simplifying Machine Learning Workflow with YAML Files

Use YAML files to manage your Machine Learning models configuration, promote code reusability, manage MLOps pipelines…

Written by Ruman