we'll delve into the concept of geometric transformations in the context of image processing. We'll start by understanding pixels not just as matrix elements, but as 3-tuples representing spatial coordinates and intensity. We'll cover basic transformations like translation and rotation, and also touch on more advanced transformations such as affine and perspective transformations.

In digital imaging, a pixel is typically thought of as the smallest addressable element in an image. For this tutorial, let’s consider a pixel not just as an intensity value in the 2D or 3D array but as a tuple $(x, y, z)$, where $x$ and $y$ denote its location (horizontal and vertical indices in the image matrix), and $z$ represents its value (such as grayscale intensity or color).

Understanding the Image Coordinate System

In image processing, the coordinate system for images is typically defined with the origin $(0, 0)$ at the center of the top-left pixel of the image. Since the array is organized by height (H) first, followed by width (W), we express the coordinates as $(y, x)$ instead of $(x, y)$.

Coordinates increase downward and to the right, which positions the corners as follows:

Top-left corner: $(0,0)$
Top-right corner: $(0,W−1)$, where $W$ is the width of the image.
Bottom-left corner: $(H−1,0)$, where $H$ is the height of the image.
Bottom-right corner: $(H−1,W−1)$

This coordinate system is essential for correctly applying geometric transformations, as all operations will reference these positions.

For an image containing 10x10 pixels.

It’s coordinate system looks like left.

Top-left corner: $(0,0)$
Top-right corner: $(0,9)$
Bottom-left corner: $(9,0)$
Bottom-right corner: $(9,9)$

NOTE: that while the pixel coordinates are defined at the center of each pixel, the actual rendering of pixel values to color blocks on a screen is handled automatically and is not directly related to the contents of this tutorial.

embed (17).svg

Basic Geometric Transformations

Transformations are about moving pixels around:

In the right image, we can see a pixel that was originally located at position ($y=2$, $x=1$). After applying the offset ($\Delta_y=3$, $\Delta_x=2$), the pixel is moved to the new position ($y=5$, $x=3$). This offset effectively shifts the pixel downward by 3 units and to the right by 2 units.

NOTE: The movement changes the pixel’s position but not its value, so the pixel at the new position $(5, 3)$ will have the same value as the one at the original position $(2, 1)$.

embed - 2024-08-18T221754.654.svg

Mathematically, it implies the following equation,

$$ z_{\text{new}}[y_{\text{new}}, x_{\text{new}}]=z_{\text{old}}[y_{\text{old}}, x_{\text{old}}] $$

In most cases, we know $z_{\text{old}}$, and we consider the existence of a transformation function

$$ f(y_{\text{old}}, x_{\text{old}})=y_{\text{new}}, x_{\text{new}} \quad \text{OR} \quad f: y_{\text{old}}, x_{\text{old}} \rightarrow y_{\text{new}}, x_{\text{new}} $$

that acts on the pixel's position at $y_{\text{old}}$ and $x_{\text{old}}$, thereby obtaining the new position. In the following subsections, we will discuss some typical functions $f$.

NOTE: You may notice some challenges with moving pixels. For instance, if a new position doesn't correspond to any old position, there won't be a value to fill that spot. We will address this later using interpolation.

Translation

Translation is the simplest transformation, which involves shifting an image in space without altering its orientation or size. It can be visualized as moving every pixel in an image by a fixed offset. The equations are given below,