A computational graph is a way of representing mathematical equations and functions in a structured, visual format that highlights how variables and operations are interconnected. Each node in the graph represents an operation (e.g., addition, multiplication) or a variable (e.g., input, intermediate result, or final output). Edges represent the flow of data or the relationship between operations and their results.
Example: $y = \sin(x) + 2 \cdot \log(x)$
The equation $y = \sin(x) + 2 \log(x)$ can be broken down into smaller steps:
In the computational graph, each of these steps corresponds to a node, and the connections between them represent how the outputs of one step feed into the next.
Computational graphs offer several advantages:
In deep learning, a neural network is essentially a large, complex function made up of many layers of operations (e.g., matrix multiplications, activations, loss functions). Each layer can be thought of as a function that takes some input, applies transformations, and produces an output, which becomes the input to the next layer.
Forward Propagation: During the forward pass, the input data is passed through the layers of the network. Each layer corresponds to a node or set of nodes in th.e graph, representing operations like weight multiplication, activation functions, etc.
For instance, in a layer $L$, if the input is $x$, the weight matrix is $W$, and the bias is $b$, the output is typically computed as:
$$ h = \sigma( W x + b) $$
where $\sigma$ is a non-linear function like ReLU or sigmoid. This process can be represented in a computational graph. In a computational graph (a), it can be viewed as 3 sequential steps: $q=Wx$, $z=q+b$, and $h = \sigma(z)$.
The right diagram (b) above is the fully expanded computational graph of vectors $w$ and $x$, where all the elements are scalars. Although this structure shows the details, we typically don't use this format because, in complex neural networks, too many nodes can become extremely cluttered, making it difficult to observe an efficient process.
This structure can be effectively transformed into the two common types of neural network structures (perceptron) that we often see from blogs.
The left view (c) shows the full process, where inputs $x_1, x_2, x_3$ are multiplied by weights $w_1, w_2, w_3$ summed with a bias $b$, and passed through an activation function to produce the output $h$. The right view (d) is a further simplified version, only showing the inputs and the output without highlighting the weights, bias, or activation function. While this view can be helpful for quick visualization, it is not suited for in-depth analysis because it omits important details of the perceptron’s structure.
In this course, unless otherwise indicated with additional explanations, the figures (a), (b), (c), and (d) are considered as equivalent representations of the perceptron equation $h = \sigma(Wx+b)$. This indicates that neural network modules can be viewed at different levels of granularity.