In forward computation, it's not always necessary to have a direct analytical expression for the derivative with respect to the variable $x$. However, when modularizing functions—especially when designing neural networks using low-level languages like CUDA—it becomes crucial to identify the analytical derivative of the function’s output with respect to the input of the module. Consider the composition of the square function $h = z^2$ with $\sin(h)$, resulting in the composite modular function $G(z) = \sin(z^2)$. In such cases, it's essential to compute the gradient of $G$ from start to finish with respect to its input, $z$. The derivative of $G$ with respect to $z$ is given by $\frac{dG}{dz} = \cos(z^2) \cdot 2z$. Calculating this analytical derivative is critical for the effectiveness of gradient-based methods in such low-level implementations.
Modular Function Transparency: After modularization, forward computation of internal variables of a module, such as $h$, is typically not performed since these internal values do not directly participate in gradient descent calculations. This is particularly relevant in low-level implementation like CUDA where managing the transparency of computational steps is key. Post-modularization, the transparency of a module’s internal variables during gradient computation diminishes. This loss of transparency aligns with the fundamental principle of modularization, which prioritizes the encapsulation of module functionalities over direct access to internal computational details.