Let’s consider a multivariate function that is visualized below. It is a simplified neural network.

embed - 2025-02-08T174736.773.svg

$$ \nabla_{\hat{y}}J = \frac{d J}{d \hat{y}} = 2(y-\hat{y})\cdot (-1) $$

embed - 2025-02-08T174710.836.svg

$$ \nabla_{(q_1, q_2)}\hat{y} = \frac{d\hat{y}}{d\mathbf{q}} = \begin{bmatrix}\frac{\partial \hat{y}}{\partial q_1} & \frac{\partial \hat{y}}{\partial q_2} \end{bmatrix} = \begin{bmatrix}1 & 1\end{bmatrix}. $$

embed - 2025-02-08T174627.978.svg

$$ \nabla_{h_1, h_2}{(q_1, q_2)} = \frac{d\mathbf{q}}{d\mathbf{h}} = \begin{bmatrix} \frac{\partial q_1}{\partial h_1} & \frac{\partial q_1}{\partial h_2} \\ \frac{\partial q_2}{\partial h_1} & \frac{\partial q_2}{\partial h_2} \end{bmatrix} = \begin{bmatrix} \sigma'({h_1}) & 0 \\ 0 & \sigma'({h_2}) \end{bmatrix}. $$

embed - 2025-02-08T174542.711.svg

$$ \begin{align*}\nabla_{\mathbf{W}, \mathbf{b}}{(h_1, h_2)} = \frac{d\mathbf{h}}{d\theta} &= \begin{bmatrix} \frac{\partial h_1}{\partial w_{11}} & \frac{\partial h_1}{\partial w_{12}} & \frac{\partial h_1}{\partial b_{1}} & \frac{\partial h_1}{\partial w_{21}} & \frac{\partial h_1}{\partial w_{22}} & \frac{\partial h_1}{\partial b_{2}} \\

\frac{\partial h_2}{\partial w_{11}} & \frac{\partial h_2}{\partial w_{12}} & \frac{\partial h_2}{\partial b_{1}} & \frac{\partial h_2}{\partial w_{21}} & \frac{\partial h_2}{\partial w_{22}} & \frac{\partial h_2}{\partial b_{2}} \end{bmatrix} \\&= \begin{bmatrix} x_1 & x_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_1 & x_2 & 1 \end{bmatrix}. \end{align*} $$

embed - 2025-02-08T174507.808.svg

$$ \frac{d J}{d \mathbf{q}} = \frac{d J}{d \hat{y}}\cdot \frac{d\hat{y}}{d\mathbf{q}} = -2(y-\hat{y}) \cdot \begin{bmatrix}1 & 1 \end{bmatrix}
$$

embed - 2025-02-08T174418.855.svg

$$ \frac{d J}{d \mathbf{h}} = \frac{d J}{d \hat{y}}\cdot \frac{d\hat{y}}{d\mathbf{q}} \cdot \frac{d \mathbf{q}}{d \mathbf{h}}= -2(y-\hat{y}) \cdot \begin{bmatrix}1 & 1 \end{bmatrix} \cdot \begin{bmatrix} \sigma'({h_1}) & 0 \\ 0 & \sigma'({h_2}) \end{bmatrix} $$

embed - 2025-02-08T174317.044.svg

$$ \frac{d J}{d \theta} = \frac{d J}{d \hat{y}}\cdot \frac{d\hat{y}}{d\mathbf{q}} \cdot \frac{d \mathbf{q}}{d \mathbf{h}} \cdot \frac{d\mathbf{h}}{d \theta}= -2(y-\hat{y}) \cdot \begin{bmatrix}1 & 1 \end{bmatrix} \cdot \begin{bmatrix} \sigma'({h_1}) & 0 \\ 0 & \sigma'({h_2}) \end{bmatrix} \cdot \begin{bmatrix} x_1 & x_2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_1 & x_2 & 1 \end{bmatrix}. $$

embed - 2025-02-08T174155.599.svg