Doubling PyTorch Image Augmentation Speed [With Code]

Ruman
8 min readApr 18, 2024

--

Increase your image augmentation speed by up to 250% using the Albumentations library compared to standard Torchvision augmentation. This blog dives deep into the performance advantages, helping you optimize your deep learning data preprocessing & augmentation for faster training.

Outline

  • Introduction
  • Integrating Albumentations into Your PyTorch Model Training
  • Benchmark Showdown : PyTorch vs. Albumentations Speed Comparison
  • Data Overview and Benchmark Scenarios
  • Augmentation Composition Testing
  • Conclusion

This article is part of an ongoing series “Fine-tuning PyTorch Models”:

Ultimate Guide to Fine-Tuning in PyTorch

4 stories

Introduction

Image augmentation is a crucial step in many computer vision workflows, whether you’re training deep learning models or performing essential pre-processing tasks. As an integral part of the data pipeline, it’s vital to have a fast and reliable augmentation solution at your disposal.

So, If you’ve ever wondered:

Does Augmentation Speed Really Matter?

The answer is a resounding yes, especially when working with large datasets. If you’re running small experiments with fairly small data (under 50,000 images), the impact of augmentation speed may not be as pronounced. However, when dealing with datasets in the 100,000 to millions range, optimizing your training pipeline becomes crucial.

Image augmentation is a significant part of the training process for vision models, and if you’re running augmentation operations on CPU during training, it can quickly become a bottleneck. This is where Albumentations will help you to overcome the augmentation speed challenge.

The primary motivation for this article is to share this valuable information with those who are struggling with augmentation performance issues.

To get started with Albumentations, you can find the comprehensive documentation here:

Incorporating Albumentations into Your PyTorch Model Training Script

One of the best things about the Albumentations library is how seamlessly it integrates with existing PyTorch training scripts. The integration process is incredibly straightforward and clean. All you need to do is swap the Torchvision transforms with Albumentations, and you’re good to go.

For instance, here’s an example of a Torchvision transform:

torchvision_transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
)
])

To migrate this to Albumentations, the similar transform would look like this:

albumentations_transform = A.Compose([
A.Resize(256, 256),
A.RandomCrop(224, 224),
A.HorizontalFlip(),
A.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
ToTensorV2()
])

It’s important to note a few differences between the two approaches:

  • Input Format: Albumentations functions expect NumPy arrays as input, whereas Torchvision transforms work with PIL images. If you’re using PIL images, you’ll need to convert them to NumPy arrays before applying the Albumentations transform.
  • Tensor Conversion: In the Torchvision transform, the ToTensor() operation happens before normalization, while in the Albumentations transform, the ToTensorV2() operation happens after normalization.
  • Tensor Differences: There’s a slight difference in the resulting torch tensor between the two approaches, which could be due to the difference in input format (PIL image vs. NumPy array) and the underlying libraries used for the conversion.

Benchmark Showdown : PyTorch vs. Albumentations Speed Comparison

Photo by Ahsan Avi on Unsplash

The best way to evaluate the performance of Albumentations for image augmentation is to compare its speed against the Torchvision library. By benchmarking the two approaches, we can quantify the difference in processing time.

Benchmark Data and Device

Data Used

  • We’ll be working with the Face Attributes Dataset from Kaggle. The dataset contains 24,000 images of 512x512 pixels and total size of the dataset is approximately 1.8GB.

Benchmarking Machine

  • I’m using the Google Colab CPU Notebook with 2 cores and 12 GB of RAM for this benchmark.

Augmentation Composition Testing

One of the key aspects of image augmentation is the composition of various transformations. To evaluate the performance of Torchvision and Albumentations in this regard, we will define a set of augmentation operations and apply them to both libraries.

Let’s define augmentations for both Torchvision and Albumentations :

pytorch_transform = transforms.Compose([
transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(degrees=30),
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
albumentations_transform = A.Compose([
A.Resize(IMAGE_SIZE, IMAGE_SIZE),
A.HorizontalFlip(p=0.5),
A.Rotate(limit=30, p=0.5),
A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=1),
A.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
ToTensorV2()
])

This same augmentation will be applied in both single and batch-wise test.

Single Image Augmentation

In this test, we will pass each of the 24,000 images in the dataset through the augmentation pipeline sequentially. This will allow us to measure the time taken to apply the full composition of transformations to a single image using both Torchvision and Albumentations.

Let’s look at code :

tat_list_torch = []

start_time_main = time.time()
for img_path in tqdm(image_path_list):
start_time = time.time()
image = Image.open(img_path)

# Apply transformations
image = pytorch_transform(image)

# print("--- %s seconds time ---" % (time.time() - start_time))
tat_list_torch.append((time.time() - start_time))
torch_24k_tat = (time.time() - start_time_main)
tat_list_albumentations = []

start_time_main = time.time()
for img_path in tqdm(image_path_list):
start_time = time.time()
image = Image.open(img_path)

# Convert PIL image to numpy array
image_np = np.array(image)

# Apply transformations
image = albumentations_transform(image=image_np)['image']

tat_list_albumentations.append((time.time() - start_time))
albumentations_24k_tat = (time.time() - start_time_main)

The key points of the code are:

  • Torchvision Augmentation: Here we iterate through the image_path_list and applies the pytorch_transform function to each image. This records the time taken for each image in the tat_list_torch list, and the total time taken in the torch_24k_tat variable.
  • Albumentations Augmentation: Same here aswell where we iterates through the image_path_list and applies the albumentations_transform function to each image. This records the time taken for each image in the tat_list_albumentations list, and the total time taken in the albumentations_24k_tat variable.

Results :

Latency Comparison Table :

Single Image Augmentation Latency Comparison; ; Image by Author

Latency Comparison Graph :

Graph was smoothed out using a moving average.

Single Image Augmentation Latency Difference (Lower is better); Image by Author

In summary :

  • Albumentations is significantly faster than Torchvision, with a performance improvement of around 240%.
  • It can provide substantial speed advantages over Torchvision when processing images sequentially.

Batch-wise Augmentation

To simulate a more realistic training scenario, we will also benchmark the performance of the two libraries when applied to a batch of images. For this, we will create a custom dataset and load it into a PyTorch dataloader. By iterating through the dataloader and applying the augmentation to each batch, we can measure the time taken for batch-level augmentation using Torchvision and Albumentations.

Let’s talk about the batch configuration :

  • The batch size is configured to 64.
  • Over 5 epochs, we iterated through 375 batches in every epoch.
  • In total, we processed 1875 batches.

Let’s look at the code:

Here is the code to evaluate the Torchvision augmentation.

# Let's define a custom Dataset class for our data
class PytorchCustomDataSet(Dataset):
def __init__(self, csv_file, class_list, transform=None):
self.df = pd.read_csv(csv_file)
self.transform = transform
self.class_list = class_list
self.image_path_list = self.df.file_path.values
self.image_label_list = self.df.label.values

def __len__(self):
return self.df.shape[0]

def __getitem__(self, index):
image = Image.open(self.image_path_list[index])
label = self.class_list.index(self.image_label_list[index])

if self.transform:
image = self.transform(image)
return image, label


pytorch_data_object = PytorchCustomDataSet("/content/bench_data.csv", CLASS_LABELS, pytorch_transform)
pytorch_dataloader = DataLoader(pytorch_data_object, batch_size=64, shuffle=True)


# Iterate through each batch
pytorch_epoch_tat = []
pytorch_batch_tat = []

for epoch in tqdm(range(TOTAL_EPOCHS)):

epoch_start = time.time()
batch_start = time.time()

for data, target in tqdm(pytorch_dataloader):
pytorch_batch_tat.append(time.time() - batch_start)
batch_start = time.time()

pytorch_epoch_tat.append(time.time() - epoch_start)

Here is the code to evaluate the Albumentations augmentation.

# Let's define a custom Dataset class for our data
class AlbumentationsCustomDataSet(Dataset):
def __init__(self, csv_file, class_list, transform=None):
self.df = pd.read_csv(csv_file)
self.transform = transform
self.class_list = class_list
self.image_path_list = self.df.file_path.values
self.image_label_list = self.df.label.values

def __len__(self):
return self.df.shape[0]

def __getitem__(self, index):
image = Image.open(self.image_path_list[index])
label = self.class_list.index(self.image_label_list[index])

if self.transform:
# Convert PIL image to numpy array
image_np = np.array(image)

# Apply transformations
augmented = self.transform(image=image_np)

image = augmented['image']

return image, label


albm_data_object = AlbumentationsCustomDataSet("/content/bench_data.csv", CLASS_LABELS, albumentations_transform)
albumentations_dataloader = DataLoader(albm_data_object, batch_size=64, shuffle=True)


album_epoch_tat = []
album_batch_tat = []

for epoch in tqdm(range(TOTAL_EPOCHS)):

epoch_start = time.time()
batch_start = time.time()

for data, target in tqdm(albumentations_dataloader):
album_batch_tat.append(time.time() - batch_start)
batch_start = time.time()

album_epoch_tat.append(time.time() - epoch_start)

The key points of the above code are:

  • Custom Dataset Classes: We have defined two custom dataset classes, PytorchCustomDataSet and AlbumentationsCustomDataSet, to load the image data and apply the respective augmentation transforms.
  • Batch Configuration: The batch size is set to 64, and the code iterates through 375 batches per epoch for a total of 1875 batches over 5 epochs.
  • Torchvision Augmentation: Here we’re measuring the time taken for batch-level augmentation using Torchvision’s pytorch_transform function.
  • Albumentations Augmentation: Similar to torchvision here too we’re measuring the time taken for batch-level augmentation using Albumentation’s albumentations_transform function.
  • TAT Metrics: The above block of code records the time taken for each batch and each epoch, storing the values in pytorch_batch_tat, pytorch_epoch_tatfor Torchvision and album_batch_tat, and album_epoch_tat for Albumentations.

Result :

Latency Comparison Table :

Batch-wise comparison table; Image by Author

Latency Comparison Graph :

Graph was smoothed out using a moving average.

Batch-wise latency comparison graph (Lower is better) ; Image by Author

In summary:

  • Albumentations maintains a significant speed advantage over Torchvision, with a performance improvement of around 210% in the batch-wise test.
  • It can provide consistent speed benefits over Torchvision when handling batches of images, which is a common requirement during model training and evaluation.

Key Takeaways

Albumentations Outperforms Torchvision in Augmentation Speed

  • The benchmark results show that Albumentations is significantly faster than Torchvision in applying the specified augmentation composition, with performance improvements of over 200%.

Potential for Greater Speed with More Resources

  • The benchmark was conducted on a limited 2-core CPU setup. With access to more computational resources, the performance gap between Albumentations and Torchvision could potentially be even more pronounced.

Ease of Integration and Comprehensive Transformation Support

  • Albumentations offers a wide range of available transformations (almost all the transformations available with Torchvision) and seamless integration into existing PyTorch training scripts, making it a highly flexible and convenient choice for image augmentation.

Conclusion

Albumentations is a highly efficient image augmentation library that outperforms the built-in Torchvision augmentation in terms of speed. The benchmarking results highlighted the significant performance advantage of using Albumentations, making it a compelling choice for optimizing data augmentation in deep learning workflows. The ease of integration with Pytorch existing training pipeline makes it an ideal choice.

If you enjoyed this article, your applause would be greatly appreciated!

--

--

Ruman

Senior ML Engineer | Sharing what I know, work on, learn and come across :) | Connect with me @ https://www.linkedin.com/in/rumank/