Matrix multiplication is a fundamental operation that extends to various layers in neural networks.
These variations in matrix multiplication enable the network to capture both straightforward and dynamic relationships within the data.
In a neural network's fully connected (dense) layer,
the input undergoes a linear transformation that takes the form:
$$ \hat{\mathbf{y}}=\mathbf{W}\mathbf{x} + \mathbf{b} $$
where $\mathbf{W}$ is the weight matrix, $\mathbf{x}$ is the input vector, and $\mathbf{b}$ is the bias vector. This is a fundamental operation in deep learning that maps the input data into a new feature space. Let's express this using variables $\mathbf{W}$ and $\mathbf{x}$:
$$ \mathbf{W} = \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} $$
The linear transformation becomes:
$$ \mathbf{W}\mathbf{x} + \mathbf{b} = \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} $$
Expanding this yields:
$$ \mathbf{W}\mathbf{x} + \mathbf{b} = \begin{bmatrix}w_{11}x_1 + w_{12}x_2 \\w_{21}x_1 + w_{22}x_2\end{bmatrix} + \begin{bmatrix}b_1 \\b_2\end{bmatrix} = \begin{bmatrix}w_{11}x_1 + w_{12}x_2 + b_1 \\w_{21}x_1 + w_{22}x_2 + b_2\end{bmatrix} $$
This linear transformation helps the model learn relationships between the input features. When combined with non-linear activation functions (e.g., ReLU), it enables the network to approximate more complex patterns and decision boundaries beyond just linear ones.
Practice: Matrix Multiplication Recap
In attention mechanisms, the adaptive adjustment of weights and biases based on the input is a critical concept. Unlike static weights, attention layers dynamically compute weights $\mathbf{W}(\mathbf{x})$ and biases $\mathbf{B}(\mathbf{x})$, introducing a non-linear relationship through matrix multiplication influenced by the input $\mathbf{x}$.
Consider a simplified attention like below,