Ultimate Guide to Fine-Tuning in PyTorch : Part 3 —Deep Dive to PyTorch Data Transforms with Examples

Ruman
21 min readNov 6, 2023

--

If you’ve ever involved in fine-tuning a PyTorch model, you’ve likely encountered PyTorch’s built-in transformation functions, which make data augmentation a breeze. And even if you haven’t had prior experience with these functions, fear not. In this article, we will take an in-depth look into the world of PyTorch’s transformation functions. We’ll explore the various transformations at your disposal and how they can help in training a robust model.

Photo by Google DeepMind

This article is third part of Fine tuning Pytorch model series, with each part focusing on different aspects of fine-tuning models. Find all the articles here :

Ultimate Guide to Fine-Tuning in PyTorch

4 stories

Outlines

  • Deep Dive to PyTorch Transform Functions
  • Geometric Transforms
  • Photometric Transforms
  • Conversion Transforms
  • Composition Transforms
  • Bonus Points
  • Conclusion

Deep Dive to PyTorch Transform Functions

Photo by Daniela Echavez

What the heck is PyTorch Transforms Function ?

Transform functions are a part of the PyTorch library that make it easy to use different data enhancement techniques on your input data. These functions allow you to apply one or more changes at the same time.

You can find the official PyTorch documentation here:

Please Note — PyTorch recommends using the torchvision.transforms.v2 transforms instead of those in torchvision.transforms.

Here’s an example script that reads an image and uses PyTorch Transforms to change the image size:

from torchvision.transforms import v2
from PIL import Image
import matplotlib.pyplot as plt

# Load the image
image = Image.open('your_image.jpg') # Replace 'your_image.jpg' with the path to your image file

# Define a transformation
transform = v2.Compose([
v2.Resize((256, 256)), # Resize the image to 256x256 pixels
v2.ToTensor(), # Convert the image to a PyTorch tensor
])

# Apply the transformation to the image
transformed_image = transform(image)

In the example above, we can see two transformations are applied to the input image. First, it gets resized, and second, it’s converted into a tensor. These two changes are combined together, and we can do the same thing with other kind of transformation, applying them one after the other.

Now, let’s delve into this, and your focused attention is much appreciated here. 😄

We can categorise different transform functions into four broad categories :

  1. Geometric Transforms
  2. Photometric Transforms
  3. Conversion Transforms— Converting input data from one type to other
  4. Composition — Combining one or more transformations together
  5. Other Transforms — Which I don’t know to put it which category

We’ll look at each of these categories in detail.

Geometric Transforms

Photo by Assedrani Official : https://www.pexels.com/photo/coffee-shop-with-an-alfresco-dining-14903732/

As the name implies, geometric transformations are operations that alter the geometry of an image without modifying the pixel values. While the pixel values themselves remain mostly constant during these transformations, their positions within the image are changed.

Why are these Geometric transformations necessary?

Geometric transformations enable the representation of data in various forms, providing diverse perspectives on the data and enhancing the robustness of deep learning models. Incorporating geometric transformations in the input data enhances the model’s resilience to geometric variations.

Here are the various geometric transformations available in PyTorch :

Resize

As name suggests it helps in resizing the image to given size. We can define the size to which we want or we can define a range if we want random resize applied to input image for better resize variance.

# To resize input image for specified size
img = v2.Resize((300, 300))(orig_img)

# To resize inpur randomly given two range; ps - here max_range > min_range
max_range, min_range = 400, 300
img = v2.RandomResize(max_range,min_range)(orig_img)
Images with different size applied to it with v2.Resize()

Rotation

This helps us apply different rotation given angle for the input image. During training by exposing the model to rotated images, you teach it to be orientation-invariant. This means the model should recognize objects or features regardless of their orientation. There are different rotation transformations you can apply.

# To apply rotation on input image with specified angle
angle = 140
img = v2.functional.rotate(orig_img, angle)

# To rotate image randomly given two range; ps - here max_range > min_range
max_range, min_range = 180, 0
img = v2.RandomRotation(degrees=(min_range, max_range))(orig_img)
Different angle of rotation applied to the input image.

You can either on a set angle for rotation or a range (min and max) to randomly rotate the image. Random rotation is mainly used to add more variety to your data while training the model.

Flip

Flip is very similar to rotation but instead of rotating image by certain angle it rotates the image horizontally or vertically i.e, 180 degrees about x-axis or y-axis.

# to apply left-to-right flip 
img = v2.functional.horizontal_flip(orig_img)
# to apply top-to-bottom flip
img = v2.functional.vertical_flip(orig_img)

# to apply these flips randomly on input image
# Here p is probabality, if p=1 then applies flip on all the input image during training
img = v2.RandomHorizontalFlip(p=1)(orig_img)
img = v2.RandomVerticalFlip(p=1)(orig_img)
Horizontal and vertical transforms applied to the input image.

When you flip image randomly, you use p to say how often you want it to happen. If p is 0.5 (like 50%), it means half the images in a batch get flipped. We mainly use random flipping to make our data more diverse when training the model.

Here are a few instances where Rotation and flip can be beneficial:

  • Face Recognition: When training a face recognition model, flipping images horizontally can help the model recognize faces regardless of whether they are turned to the left or right.
  • Pedestrian Detection: Autonomous vehicles need to detect pedestrians in different orientations, such as walking, facing away, or at an angle. Rotation augmentation helps the model adapt to these variations.
  • Text Orientation Detection: When analyzing documents, the orientation of the text may vary. Rotation augmentation helps detect and correct the orientation, making it useful for OCR tasks.
  • and a lot more ..

Padding

Padding adds extra pixel values around image corner to increase the image dimension. This can be done to ensure image has consistent size or aspect ratio, this could be done to ensure no loss of information or preparing image for specific model architecture.

# If a single int is provided this is used to pad all borders
padd_pixel = 100
img = v2.Pad(padding=padd_pixel)(orig_img)

# If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively.
left_top, right_bottom = 100, 150
img = v2.Pad(padding=(left_top, right_bottom))(orig_img)

# If a sequence of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
left, right, top, bottom = 100, 100, 150, 150
img = v2.Pad(padding=(left, top, right, bottom))(orig_img)
Different size of padding applied to the input image.

Crop

These PyTorch transform functions help you crop the image to desire size or random sizes.

# Crop function expect exact co-ordinate x(left),y(top),w,h of crop region
top, left, height, width = 200, 50, 300, 300
img = v2.functional.crop(orig_img, top, left, height, width)

# We can also apply crop of specified size randomly on input image
height, width = 300, 300
img = v2.RandomCrop((height, width))(orig_img)

# We can also cropb the input at the centre of specified crop size
height, width = 300, 300
img = v2.CenterCrop((height, width))(orig_img)
Different type of crop transforms applied to the input image.

ResizeCrop

This is another type of crop transform, but it operates in a unique way. It starts by randomly selecting a part of the image and then resizes that chosen area to a specified size.

# randomly selects the region to crop then applies specified size to resize the crop
crop_size = (150, 150)
img = v2.RandomResizedCrop(size=crop_size)(orig_img)
Randomly cropping and resizing of size (w, h) applied on the input image.

This is very useful augmentation and is popularly used to train the Inception networks.

Here are a few instances where adding cropping augmentation can be beneficial:

  • In object detection tasks : Cropping can create training samples with different object scales and positions, helping the model learn to detect objects in various contexts.
  • In Scene Understanding task : Random crop augmentation can introduce variability in the composition of the scene, aiding the model in recognizing different environmental elements.
  • When classifying fine-grained categories, random cropping highlight specific object details, making the model more capable of distinguishing between similar categories, such as different bird species.
  • and a lot more ..

Perspective Transform

By applying this transformation to 2D image, you can create new training samples that simulate different perspectives and viewpoints(in 3D space).

# apply a random perspective transform on an input image
img = v2.RandomPerspective(distortion_scale=0.5, p=1.0)(orig_img)
RandomPerspective with distortion_scale=0.6, p=0.9, fill=(0,0,0)

Now, let’s look at different parameters passed to RandomPerspective:

  • distortion_scale: This parameter controls the extent of perspective transformation on an image. A distortion_scale of 0.2 signifies that the image might experience a random perspective shift of up to 20% of its width and height. This shift alters the image’s perspective, resulting in a skewed appearance.
  • p: The parameter p is set to 1.0, indicating a 100% probability of applying perspective transform. It uses the p value, representing the probability score, to determine whether the transformation should be applied to the image. When p is equal to 0, the transformation is not applied.

Here are a few examples where adding random perspective transform to augmentation can be beneficial :

  • Perspective transform can mimic lens distortion or simulate the way objects appear in a fish-eye camera, enhancing a model’s ability to handle real-world camera distortions.
  • For facial expression recognition, perspective transform can introduce variations in facial images, simulating different head angles and perspectives.
  • In handwriting recognition tasks, perspective transform can be used to augment training data by simulating how handwritten text may appear when viewed at different angles.
  • and a lot more ..

Affine Transform

This allows you to apply various affine transformations like rotation, translation, scaling, and shearing to an image. This is different from just rotation as this takes other parameters into consideration and not just angle of rotation.

Best part about this particular augmentation is with this you can introduce image translation, scaling and rotation all together. You’ll have to be just extra careful with the parameter values passed to Affine function.

# apply random affine transform on the input image
img = v2.RandomAffine(degrees=(30, 70), translate=(0.0, 0.1), scale=(0.5, 0.7))(orig_img)
RandomAffine with degrees=(30, 70), translate=(0.0, 0.1), scale=(0.5, 0.7)

Now, let’s look at different parameters passed to RandomAffine:

degrees

  • degrees=(30,70) :This indicates allowable range for image rotation where 30 is minimum degree and 70 is maximum degree. A rotation _degree is selected at random from range (30, 70) and is applied to image.
  • If degree=0 is set to 0, then it deactivates the rotation of image and no rotation is applied.

translate

  • translate=(0.0, 0.1): This parameter governs the extent of random image translation or shifting. The values are expressed as a fraction of the image’s width and height. In this case, the image may experience random shifts of up to 10% of its width or height in both horizontal and vertical directions.
  • Be default there’s no translation applied to the image.

scale

  • scale=(0.5, 0.7): This parameter defines the permissible range for random image scaling, enabling each image to be randomly resized to a factor between 0.5 and 0.7. This corresponds to adjusting the image’s dimensions to a size between 50% and 70% of its original dimensions.
  • By default it keeps the original image scale.

As RandomRotation already gives you rotation so you can achieve just image scale and translation transform with RandomAffine.

Instances where image translation & scaling transform can be beneficial:

  • In object detection tasks, image translation and scaling can create diverse training samples with objects at different positions and sizes, helping the model learn to detect objects in various contexts and scales.
  • In content-based image retrieval, translating and scaling images can create diverse query images for retrieval, allowing the model to find relevant matches in large image databases.
  • For handwriting recognition models, image translation and scaling can simulate variations in the placement and size of handwritten text, helping the model generalize better.
  • and a lot more ..

Elastic Transform

This is like image morphing and produces a very interesting see through water-like effect.

Elastic transformations simulate image deformations like stretching and warping, enhancing a model’s resilience to such changes in realworld. They have multiple applications, but the alpha value should be chosen wisely to avoid overdoing it.

# apply elastic transform on the input image
img = v2.ElasticTransform(alpha=250)(orig_img)
Elastic transform applied to input image with different alpha values

Photometric Transforms

Photo by H. Emre: https://www.pexels.com/photo/train-passing-773471/

Photometric transformations alters the photometric properties of image by modifying the pixel values while preserving the image geometry. It includes a wide range of transformations like lighting, colour, texture of images, etc.

Why are these Photometric transformations necessary?

To make our vision models more robust and generalizable we can apply Photometric transformations. The broader range of variation in colour and lighting that we achieve after applying these transformation on image result in models performing better in real-world scenarios where images may not always conform to ideal lighting and colour standards.

Here are the various photometric transformations available in PyTorch :

Before we proceed, let’s explore ColorJitter, a versatile transformation that enables the application of Brightness, Hue, Saturation, and Contrast adjustments to the input image using a single transform function.

Also your attention to ColorJitter is valued 🙂

ColorJitter

This transform function applies random photometric transformations i.e, brightness, hue, saturation and contrast based on defined range. First, let’s take a quick look at the code, and afterward, we’ll dive deep into a thorough explanation of each transformation.

# apply ColorJitter transformation to the input image
img = v2.ColorJitter(brightness=0.4,
contrast=0.5,
saturation=0.4,
hue=0.3)(org_img)
ColourJitter applied on image with ColorJitter(brightness=0.2, hue=0.3, saturation=0.4, contrast=0.5)

Each of these arguments i.e, brightness, hue, saturation, and contrastcan accept values as a single float (e.g., 0.4) or as a tuple specifying a range (e.g., (0.2,0.6)). The single float specifies a fixed value, while the tuple defines the min and max values.

Here are some key points to keep in mind about ColorJitter:

  • Default Values: The default values for brightness, hue, saturation, and contrast are all set to 0. This means that if no specific values are provided, these color adjustments will not be applied, and the image will remain unchanged in terms of these properties.
  • Transformation Order: When all the arguments are set, transformations are applied in a specific order. First, brightness changes are made, followed by contrast, then saturation, and finally, hue changes are applied to the input image. This order ensures a consistent sequence of color transformations.

Brightness Transform

Brightness changes the overall light of the image. It can be done by scaling all pixel values. Increasing brightness makes image lighter and reducing it makes the image darker. Lets have a look at code

# apply ColorJitter to alter image brightness

# to only apply brightness changes you must not sepcify other parameters # or set others to zero
img = v2.ColorJitter(brightness=(0.1, 1))(orig_img)
Image Brightness altered with ColorJitter → v2.ColorJitter(brightness=(0.1, 1)) applied to image

Brightness should be non negative number, and it can be a single float value or tuple of float value. Brightness values should be between (0, inf).

The below explanation about fixed values and tuples in context of brightness is crucial and applies similarly to both saturation and contrast.

If brightness is a single fixed float value i.e, brightness=0.4:

  • ColorJitter internally creates a range and from that range brightness_factoris chosen uniformly at random.
  • For example if v2.ColorJitter(brightness=0.4)(orig_img) then this is how the range is calculated
  • So in this case brightness_factor is chosen uniformly at random from range[0.6, 1.4] for every image and applied to the input image.

If brightness is a tuple of float value i.e, brightness=(0.1, 1):

  • If tuple passed then index0 is considered as min, and index1 considered as max.
  • Max value should be greater than min value i.e, max>min.
  • brightness_factor is chosen uniformly from given (min, max) range. In this case (0.1, 1)

Instances where altering image brightness can be beneficial during model training:

  • Weather monitoring may involve analyzing satellite or radar images with varying brightness levels due to cloud cover, sun angles, or precipitation. Brightness transformation helps models adapt to these variations.
  • In quality control and defect detection, adjusting image brightness can simulate variations in lighting on the factory floor, helping models identify defects under different lighting conditions.
  • Surveillance cameras often capture footage under varying lighting conditions. Image brightness transformation can help train models to detect objects and individuals in different levels of illumination.

Contrast Transform

Contrast is the difference between the darkest and lightest parts of an image. Adjusting contrast changes the spread of pixel values. Increasing contrast makes the dark areas darker and the light areas lighter, enhancing the overall visual difference.

# alter only image contrast
img = v2.ColorJitter(contrast=(0, 10))(orig_img)
Image Contrast altered with CollorJitter → v2.ColorJitter(contrast=(0, 10)) applied to image

Contrast should be non negative number, and it can be a single float value or tuple of float value. Contrast values should be between (0, inf).

Please Note — Please re-visit brightness section above for better understanding of contrast_factor calculation.

Either we provide contrast=0.4 a single value or a tuple contrast=(0,10), thecontrast_factor which will be applied on image is chosen from the range the same way as its calculated for brightness_factor.

Instances where changing image contrast and adding it as an augmentation during training can be beneficial:

  • In remote sensing and satellite image analysis, image contrast transformation can reveal important features on the Earth’s surface, such as land cover types, water bodies, and urban areas.
  • For art style recognition, adjusting image contrast can bring out unique artistic features and characteristics, enabling models to identify different art styles.
  • In underwater imaging, contrast transformation can improve the visibility of marine life, coral reefs, and underwater terrain, enhancing the accuracy of species identification and environmental monitoring

Saturation Transform

Saturation adjustments control the intensity of colours in an image. Increasing saturation makes colours more vibrant, while decreasing it makes colours more muted. This can help model be more robust towards and perfrom well regardless of color vibrancy.

# alter just the image saturation
img = v2.ColorJitter(saturation=(0.1, 1))(orig_img)
Image Saturation altered with CollorJitter → v2.ColorJitter(saturation=(0.1, 1)) applied to image

Saturation should be non negative number, and it can be a single float value or tuple of float value. Saturation values should be between (0, inf).

Please Note — Please visit brightness section above for better understanding of saturation_factor calculation.

Either we provide saturation=0.4 a single value or a tuple saturation=(0.1,1), the saturation_factor which will be applied on image is chosen the same way as its calculated for brightness_factor.

Instances where image saturation augmentation can be beneficial:

  • In agriculture and plant science, image saturation transformation can help models identify healthy and stressed vegetation by enhancing color differences in plant images.
  • For pest detection in agriculture, images may vary in color and lighting based on weather and time of day. Saturation augmentation ensures that the model can identify pests under different conditions.

Hue Transform

Hue adjustment shifts the colors in the image along the color wheel without changing the image brightness or saturation. Hue can be used to change the overall colour tone of an image. This can be useful in various problem as augmentation.

# Create a color jitter transform to apply only hue changes
# Here, setting the hue range from 0.1 (minimum) to 0.4 (maximum)
img = v2.ColorJitter(hue=(0.1, 0.4))(orig_img)
Hue transform applied on input image with ColorJitter →v2.ColorJitter(hue=(0.1, 0.4)) applied to the image

Few important points to consider to alter image Hue property:

  • We can pass hue a single value or a tuple. But hue values should be between (-0.5, 0.5).
  • If single value provided i.e, hue=0.2 then hue_factor is chosen uniformly from range (-0.2, 0.2)
  • If tuple passed then index0 is considered as min hue, and index1 considered as max hue. Here hue min≥-0.5 and max≤0.5. And hue_factor is chosen unifromly from range (min, max).
  • To alter image hue, the pixel values of the input image should be non-negative for conversion to HSV space

Instances where changing color tone of with Hue transform can be beneficial:

  • In tasks like facial recognition, people may appear in different skin tones due to various lighting conditions. Training with hue-adjusted images allows the model to recognize individuals under a range of skin tones.
  • For flower species identification, image hue transformation can emphasize the variations in flower color patterns, aiding models in distinguishing between different species.
  • For fruit ripeness detection, image hue transformation can simulate changes in fruit color as they ripen, allowing the model to identify ripe and unripe fruits accurately.

Sharpness

Adjusts the sharpness of image with given probablity.

# adjust the image sharpness
img = v2.RandomAdjustSharpness(sharpness_factor=10, p=0.8)(orig_img)
Image sharpnes altered with v2.RandomAdjustSharpness(sharpness_factor=10, p=0.8)

Here, p is probability and default value is 0.5, and sharpness_factor be any non-negative number.

GaussianBlur Transform

Gaussian blur is smoothing operation. It blurs the image by introducing a controlled amount of smoothing by averaging the pixel values within a local neighbourhood. This can help reduce noise and fine-grained details in the training data, making the model less sensitive to noise in real-world data.

# Apply blur on input image with GaussianBlur
img = v2.GaussianBlur(kernel_size=(11, 21), sigma=(5, 50)))(orig_img)

kernel_size and sigma these two factors together determine the extent of smoothing or blurring in the image. Adjusting these parameters allows you to fine-tune the gaussian blur effect to achieve the desired level of smoothing in your images.

sigma :

  • sigma controls the standard deviation of the Gaussian distribution used for the blur. It determines the amount of smoothing or blurring applied to the image.
  • sigma can be a single fixed value or a tuple (min_sigma, max_sigma).
  • If sigma=0.5 a single value then sigma is fixed and if tuple(min_sigma, max_sigma) then sigma is chosen uniformly at random from the range (min_sigma, max_sigma).
  • A larger sigma value results in a broader and more intense blur.

kernel_size :

  • kernel_size determines the size of the convolution kernel used to perform the Gaussian blur operation.
  • kernel_size should have an add positive number. The kernel_size defines the width and height of the kernel.
  • A larger kernel_size means a wider area of the image is considered when blurring, which results in a smoother and more pronounced blur effect.
  • The size of the Gaussian kernel defined by kernel_size controls how the blur spreads over a region of the image.

Instances where image blurriness augmentation can be beneficial:

  • In traffic monitoring, image blur transformation can simulate the motion blur of moving vehicles. This helps models estimate vehicle speeds and monitor traffic flow.
  • In applications such as autonomous vehicles, blur can simulate poor weather conditions, enhancing the model’s ability to navigate and make decisions under adverse situations.

Solarize Transform

Solarize is an operation that inverts the pixel values in an image beyond a certain threshold. In other words, it flips the brightness values so that dark regions become light, and light regions become dark. This effect is applied to pixel values above a specified threshold.

The result of solarizing an image is a striking and often surreal appearance. It can create visually interesting and high-contrast effects, with dark areas taking on a glowing or “solarized” quality. Solarization can be used for artistic and creative image manipulations.

# Solarize the input image
img = v2.RandomSolarize(threshold=5.0, Solarize)(orig_img)
Solarize transform applied to input image with v2.RandomSolarize(threshold=5.0)(orig_img)

Threshold value is pixel value, so if threshold=5.0 means all the pixel value beyond 5 is inverted.

Conversion Transforms

Photo by Lina Kivaka: https://www.pexels.com/photo/person-crossing-the-street-with-a-dog-3881223/

Conversion transformations differ significantly from the previous two; they serve more as utility transforms rather than augmentations. With these transformations, we can switch between PIL images and tensors or alter the input data type as needed.

Furthermore, I aim to keep this section concise, so I apologize if I’ve missed any crucial details.🙂

Please Note — Refer this documentation https://pytorch.org/vision/main/transforms.html#range-and-dtype before applying Conversion Transforms as some of these conversion might scale the value. Scale from unit8 to float and vice-versa.

Here are the various Conversion transformations available in PyTorch:

# Sample PIL image
pil_image = Image.open("sample_image.jpg")

PIL Image to Tensor —This converts the PIL Image into a tensor of the same data type, without affecting the pixel values. Let’s look at the code:

# Convert PIL image to a PyTorch tensor
tensor_image = transforms.PILToTensor()(pil_image)

Tensor to PIL Image — This converts the input nd.array or Tensor to PIL image. Lets look at the code :

# Convert the tensor back to a PIL image
tensor_to_pil = transforms.ToPILImage()(tensor_image)

There are couple more, please refer the documentation for more details.

Transforms Composition

Photo by Yogendra Singh: https://www.pexels.com/photo/people-walking-on-street-3444542/

Composition transformations are like combining different transform functions together and apply those on the input image. When you use this transformation pipeline on an image, it follows the order you’ve set for these transform functions, applying them one after the other. When creating a script to load data for training a model, you can specify a combination of transform functions to use on your input images. Let’s take a closer look at these functions.

There are two primarily used composition function which is available in PyTorch are :

  • Compose
  • RandomApply

We’ll look at these two only as these are more relevant.

Compose

This basically allows you to create a sequence of data transformations that can be applied to your dataset, particularly for image data. It allows you to combine multiple data augmentation into a single transformation pipeline.

Let’s have a look at code.


# Define a list of transformations you want to apply to your data
transformations_list = [
v2.Resize((224, 224)), # Resize the image to a fixed size
v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2), # Randomly adjust color
v2.RandomHorizontalFlip(), # Randomly flip the image horizontally
v2.RandomRotation(15)] # Randomly rotate the image by up to 15 degrees


# Pass the list of transformations to Compose function
transformations = v2.Compose(transformations_list)

# Apply the transformations to the image
transformed_image = transformations(orig_img)

The Compose function takes a list of transformation functions as input. When you run an image through this pipeline, it applies each transformation one by one, in the order they’re listed.

Composed transformation applied to input image.

Few things to note that :

  • If you’re using compose for training or inference the you’ll have Convert the image to a PyTorch tensor with v2.Totensor()
  • If needed, you can add an extra dimension to make it a batch of 1 with transformed_imgae = transformed_imgae.unsqueeze(0)

RandomApply

I am my discovering this for the first time as I typically use Compose. This is also quite interesting. It functions similarly to Compose, with the key difference is that it allows you to specify the probability pfor each transform function to be applied to the input image. Now, let’s have a look into the code to see how it works.

# Define a list of transformations you want to apply to your data
transformations_list = [
v2.Resize((224, 224)), # Resize the image to a fixed size
v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2), # Randomly adjust color
v2.RandomHorizontalFlip(), # Randomly flip the image horizontally
v2.RandomRotation(15)] # Randomly rotate the image by up to 15 degrees

# Pass the list of transformations to Compose function
transformations = v2.RandomApply(transformations_list, p=0.7)

# Apply the transformations to the image
transformed_image = transformations(orig_img)

The RandomApply function accepts a list of transformation functions as input and probability score swell. When an input image is processed through this pipeline, it sequentially applies each transformation to the image with the probability p.

Bonus Points

Photo by Darius Bright: https://www.pexels.com/photo/a-super-bonus-machine-in-close-up-photography-9398109/

There are a few things that didn’t quite fit into the previous sections, but I’ll be addressing them here. I’m unsure whether these will earn any bonus points, but I’ll include them nonetheless. 😃

Augmentation is important

For any AI problem, no matter how much data you have, using augmentation is really important. It helps you find new possibilities in your data, like a magical boost.

If you’re not sure which augmentation to use for your problem, you can simply Google it. There are many research papers and surveys that might have already shown which augmentations work best for your problems or similar to your problems.

Know Your Augmentation

You shouldn’t just grab a random augmentation script and use it on your data without thinking. That’s not a good idea 😈.

You should be really sure about which augmentations to use. You should try different things, see how the augmentation changes the input image, and observe the image closely. You need to figure out the right parameter values for the transformation functions.

You should know why you’re choosing a particular augmentation and how it will make your model stronger and perform better.

From my own experience training models, I can say it’s better to have some augmentations, very basics ones, than to have no augmentations at all when training your models.

Order of Transform / Augmentation

The order in which augmentations are applied to an image can be important and can impact the final result. Generally, geometric transformations are applied before photometric transformations in most cases. This is because geometric transformations alter the image’s structure and layout, and applying photometric changes before geometric ones can lead to unexpected and unrealistic results.

However, the specific order may vary depending on your use case and the nature of the augmentations you’re applying. In some scenarios, it might make sense to experiment with different orders to see which one works best for your specific task.

If you enjoyed this article, your applause would be greatly appreciated!

Some of my other articles related to Data Augmentation and Data Loading

Conclusion

In this in-depth exploration of PyTorch Transform Functions, we’ve covered Geometric Transforms for spatial manipulation, Photometric Transforms for visual variation, and Composition Transforms to combine two or more transforms. In conclusion, having good knowledge about these functions and augmentations we can enhance data quality and train more robust deep learning models, ensuring they are well-prepared for real-world challenges.

Reference

Other mentions

This is a good article, I’ll recommend you to give it a read

https://towardsdatascience.com/understanding-transformations-in-computer-vision-b001f49a9e61

--

--

Ruman

Senior ML Engineer | Sharing what I know, work on, learn and come across :)