Padding Function

In CNNs, convolution often reduces the feature map dimensions, which can hinder deep network performance, especially when matching input and output sizes is necessary.

Padding involves adding values around an image to preserve the feature map size after convolution. This enables the construction of deeper networks by preventing the rapid reduction of data dimensions.

Without padding, crucial details at the image edges are lost, affecting the network’s ability to analyze and learn.

embed - 2025-02-28T003116.417.svg

For example, a $3\times 3$ convolution layer on an $h\times w$ image results in a feature map of size $(h-2)\times (w-2)$ due to the lack of surrounding pixels at the edges.

Analysis How Many Pixels to Compensate

When applying a convolution with a kernel of size $Q$ to a one-dimensional input of length $T$, the output length is given by:

$$ \text{\#output} = T - Q + 1 $$

This happens because the kernel moves from the first element of the input to the last possible position where it still fully fits within the input. It can only take $T - Q + 1$ steps before running out of space.

The figure on the right illustrates this 1D convolution process, showing that the output length is reduced by $Q - 1$ compared to the input length $T$.

This reduction occurs because the kernel, which spans $Q$ elements, cannot extend beyond the input’s boundaries.

embed - 2025-02-28T165751.433.svg

In two dimensions, the same logic applies. If an image has dimensions $H × W$ and a $Q × Q$ kernel is used, the output size is:

$$ (H - Q + 1) \times (W - Q + 1). $$

Padding

To offset the loss of $(Q-1)$ pixels in both dimensions, we pre-pad the image. Typically, padding is applied on all sides. For a $Q=3$, kernel $Q-1=2$, we add $1$ pixel of padding to each edge (top, down, left, right).

embed - 2025-02-23T145606.081.svg

An $H\times W$ image then expands to $(H+2)\times (H+2)$. After convolution, losing two pixels per dimension, the output returns to $H\times W$, matching the original size.

Padding Methods: Various techniques are available to extend the image's borders, as shown by the common methods below.

Untitled

Zero Padding (Constant Padding): Fills the borders with zeroes or other constants. It's the most common form of padding.
Reflection Padding: Mirrors the content of the image at the border. This type of padding reflects the edge of the image onto the padded area, creating a seamless transition at the boundaries. It's particularly useful when the goal is to preserve the natural continuity of the image's edges.