Code (Not very useful)

Visualizing deeper layer kernels in image space can indeed be formulated as an optimization problem. The goal is to find an input image that maximizes the activation of a specific deep layer kernel. This is done by defining an objective function that quantifies the activation level of the kernel in response to an input image.

embed (77).svg

$$ \begin{aligned}\arg\max_x\quad &w_2*(w_1*x)\\ \text{s.t.}\quad & |x|=1 \end{aligned} $$

This optimization process essentially means the coding below,

# You may copy-paste it to Colab for Play
import torch
import torch.nn.functional as F

# Define two fixed 3x3 kernels (weights are not trainable) 
# The 4 dimensions are actually a convention in image processing, 
# The dimensions are allocated to [B, C, H, W]
# batch_size, channel_num, height, width
# Try to grab the MNIST trained parameters to here for learning. 
kernel_a = torch.randn(1, 1, 3, 3, requires_grad=False)
kernel_b = torch.randn(1, 1, 3, 3, requires_grad=False)

# Define a 5x5 input variable that requires gradient
input_var = torch.randn(1, 1, 5, 5, requires_grad=True)

# Define an optimizer to optimize the input variable
optimizer = torch.optim.Adam([input_var], lr=0.01)

# We will optimize the input so that the convolution with kernel_b is maximized
# This means we want to maximize the value in the output feature map
# Since most optimizers are designed to minimize functions, we can minimize the negative of the output

# Optimization loop
for iteration in range(100):
    optimizer.zero_grad()  # Clear gradients from the previous step

    # Project in Project Gradient Descent: Normalize the input variable x to vector length=1
    # be aware of the .data means we only adjust its value without create a more complex computational graph (recurrusive diagram on input_var itself)
    input_var.data = input_var.data / torch.norm(input_var.data, p=2).detach()

    # Convolve the input with both kernels to get the feature maps
    hidden_a = F.conv2d(input_var, kernel_a)

    # Add your activation function if you want
    # ---------------------------
    # Add your activation function if you want

    output_b = F.conv2d(hidden_a, kernel_b)

    # We use the negative of the output for kernel_b because we want to maximize this output
    loss = -torch.sum(output_b) # output_b should be just a single value

    # Backpropagation
    loss.backward()

    # Update the input variable
    optimizer.step()

    # Print the loss every 10 iterations
    if iteration % 10 == 0:
        print(f"Iteration {iteration}, Loss: {loss.item()}")

# After optimization, input_var is the optimized 5x5 input
optimized_input = input_var.detach()
print("Optimized 5x5 Input:", optimized_input)