Computational Graph and Chain Rule

In this section, we will explore a graph-based representation of composite functions, a method that elucidates how the derivation of composite functions is carried out. In the examples below, we focus on employing simpler functions to facilitate easier visualization, which can be directly applied to more complex functions.

What is Computational Graph

A computational graph is a representation used primarily to visualize and compute the operations involved in functions, particularly in the context of mathematical expressions and algorithms. It consists of nodes and edges, where each node represents an math operation or a variable, and edges denote the dependency between these operations. For example, the graph representation of function $y = \sin((x+1)^2)$ is given below,

embed (87).svg

The computational graph begins with the leftmost node, which denotes the input variable $x$. The second node represents the operation $z = x + 1$. Proceeding to the third node, it shows the squaring operation, $h = z^2$, which computes the square of $z$, the result of the preceding operation. The final node on the right implements the sine function, $y = \sin(h)$, using the value of $h$ derived from the square of $z$ to generate the output $y$. This sequential flow demonstrates how each node in the graph utilizes the output from the previous node as its input, subsequently applying a specific function to this input.

This modular breakdown not only simplifies the original function into fundamental operational steps but also facilitates the calculation of derivatives, as each step's derivative can be individually computed and then applied using the chain rule.

Chain Rule

The chain rule is a fundamental differentiation technique in calculus that allows the calculation of the derivative of a composite function by taking the derivatives of its constituent functions and multiplying them together.

Chain rule can be describe as below:

If there are two functions $y=f(u)$ and $u=g(x)$, then the derivative of the composite function $y=f(g(x))$ with respect to $x$ can be expressed as $\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$ . This indicates that the derivative of a composite function is the product of the derivatives of the inner and outer functions.

A more straightforward explanation of the chain rule is that we compute the derivatives step by step along the path in the computational graph and multiply them (if they are on a single path) to obtain the overall derivative.

Using the same example,

embed (87).svg

we aim to determine the relationship between $y$ and $x$, specifically the derivative of $y = \sin((x+1)^2)$ with respect to $x$. Using the chain rule, we identify the inner and outer functions involved in our expression. We set $y=\sin(h)$ where $h = (x+1)^2$, then $h = z^2$ where $z=x+1$.

First Derivative $\frac{dz}{dx}$: Since $z = x + 1$, the derivative of $z$ with respect to $x$ is $\frac{dz}{dx} = 1$.
Second Derivative $\frac{dh}{dz}$: For $h = z^2$, the derivative of $h$ with respect to $z$ is $\frac{dh}{dz} = 2z$. Substituting $z = x + 1$ gives $\frac{dh}{dz} = 2(x + 1)$.
Third Derivative $\frac{dy}{dh}$: Finally, for $y = \sin(h)$, the derivative of $y$ with respect to $h$ is $\frac{dy}{dh} = \cos(h)$. Here, $h = (x + 1)^2$, so $\frac{dy}{dh} = \cos((x + 1)^2)$.

Now, applying the chain rule, we multiply these derivatives together:

$$ \frac{dy}{dx} = \frac{dz}{dx} \cdot \frac{dh}{dz} \cdot \frac{dy}{dh} = 1 \cdot 2(x + 1) \cdot \cos((x + 1)^2). $$

If we revisit the computational graph, it becomes evident that the chain rule fundamentally involves computing the derivative at each node and subsequently multiplying all these gradients together.