Data Structure in Image Processing

Before we delve into image processing, it's crucial to understand how neural networks perceive images. To a machine, an image is seen as a three-dimensional tensor (or an array, if you prefer). You might argue that an image should be two-dimensional, which is true for grayscale images like those in the MNIST dataset of handwritten digits.

Untitled

In the context of grayscale images, pixels represent the basic units that compose the image. Each pixel holds a single value that corresponds to its shade of gray, where 1 represents white and 0 signifies black, with various shades of gray in between. The image's resolution, typically measured in pixels (e.g., 28x28 in the MNIST dataset), dictates the quantity of rows and columns of pixels within the image. Higher-resolution images encompass a greater number of pixels, facilitating the representation of finer details and smoother transitions between different shades of gray.

Most camera and Internet images we encounter are colorful, utilizing color schemes such as RGB, CMYK, or HSV. This means each image consists of multiple channels, with each channel representing the intensity of a specific color. For instance, in an RGB image, there are three channels: red (R), green (G), and blue (B).

Untitled

Consider the example of a red-toned parrot in an RGB image. In the red channel (visualized as a grayscale image), the parrot's body area appears bright (values close to 1), indicating a high red intensity. Similarly, the parrot's cheek area, being white, suggests high intensity across all colors, resulting in bright values in all three channels.

When programming with PyTorch, an image is stored as a three-dimensional tensor, encompassing height, width, and color channels. It's important to note that while OpenCV (using NumPy) arranges image dimensions as height, width, and color, PyTorch organizes them as color, height, and width. Converting the dimensions accordingly is a vital step before processing images in PyTorch.

Practice

import matplotlib.pyplot as plt
import numpy as np
import imageio
import requests
from PIL import Image
from io import BytesIO

# Function to read an image from an external link
def read_image_from_url(url):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    img = img.convert('L')  # Convert to grayscale
    return np.array(img)

# URL of the image
image_url = "<https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png>"  # Replace YOUR_IMAGE_URL_HERE with the actual URL

# Read the image from the URL
image = read_image_from_url(image_url)

# Resize image for demonstration, if needed
from skimage.transform import resize

pad_size = 50  # Padding size

pad_image = np.pad(image, pad_size, mode='constant', constant_values=255)
plt.imshow(pad_image, cmap='gray')
plt.title('Original Image')
plt.axis('off')