Linear layers, implemented through matrix multiplication, are widely used in neural networks due to their versatility. Although their exact functionality is not always inherently interpretable, they offer significant benefits across various applications.
Matrix multiplication, despite sometimes being computationally inefficient, can implement a range of signal processing functions.
Moving Average: consider a simple second-order moving average applied to a time series $\mathbf{x}=[1, 2, 3, 4]$,
$$ \mathbf{y} = \frac{1}{2} \begin{bmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} = \begin{bmatrix} \frac{1 + 2}{2} \\ \frac{2 + 3}{2} \\ \frac{3 + 4}{2} \\ \frac{4 + 1}{2} \end{bmatrix} $$
Here, $\mathbf{y}$ represents the output, where each row of the transformation matrix applies a shifted averaging window of size $n = 2$.
Average Downsampling: consider a simple 2-to-1 average downsampling operation:
$$ \mathbf{y} = \frac{1}{2} \begin{bmatrix}1 & 1 & 0 & 0 \\0 & 0 & 1 & 1 \\\end{bmatrix}\begin{bmatrix}1 \\ 2 \\ 3 \\ 4\end{bmatrix}= \begin{bmatrix} \frac{1 + 2}{2} \\ \frac{3 + 4}{2} \end{bmatrix} $$
In this case, the operation averages the first two and the last two elements of $\mathbf{x}$, effectively reducing its length by half.
Permutation: Matrix multiplication also enables complex operations such as splitting and swapping parts of a signal. For example:
$$ \mathbf{y} = \begin{bmatrix}0 & 0 & 1 & 0 \\0 & 0 & 0 & 1 \\1 & 0 & 0 & 0 \\0 & 1 & 0 & 0 \\\end{bmatrix}\begin{bmatrix}1 \\ 2 \\ 3 \\ 4\end{bmatrix} = \begin{bmatrix}3 \\ 4 \\ 1 \\ 2\end{bmatrix} $$
Here, the matrix swaps the first two elements with the last two in the 4-element vector.
Alternatively, consider an operation that compresses (downsamples) and duplicates the signal:
$$ \mathbf{y} = \frac{1}{2}\begin{bmatrix}1 & 1 & 0 & 0 \\0 & 0 & 1 & 1 \\1 & 1 & 0 & 0 \\0 & 0 & 1 & 1 \\\end{bmatrix}\begin{bmatrix}1 \\ 2 \\ 3 \\ 4\end{bmatrix} = \begin{bmatrix}\frac{1+2}{2} \\ \frac{3+4}{2} \\ \frac{1+2}{2} \\ \frac{3+4}{2}\end{bmatrix} $$
Here, the matrix first downsamples $\mathbf{x}$ by averaging adjacent elements (as in the previous example) and then duplicates the resulting downsampled vector.
Non-uniform Stretching: Consider a signal of length 8, where we create a resampling matrix as follows:
$$ \small{\mathbf{y} =\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0.5 & 0.5 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0.5 & 0.5 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0.5 & 0.5 \\\end{bmatrix}\begin{bmatrix}1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\8\end{bmatrix}} $$
This transformation matrix stretches the input vector $\mathbf{x}$ in a non-uniform manner, altering the position of the last four elements while maintaining the others. To better understand this example, visualize the original input $[1, 2, 3, 4, 5, 6, 7, 8]$ and compare it with the transformed output $\mathbf{y}$.