Neural Network is A Layered Approximator

In this section, we explore neural networks as universal approximators. Unlike polynomials, neural networks rely on linear computations, avoiding instability and parameter explosion from high-order cross-terms. We will demonstrate their ability to approximate a complex function through a simple, wide shallow network for regression.

NOTE: The following explanation does not delve into the learning or training aspects of the model. Instead, we manually craft parameters and functions to make it work for the given task.

Problem

Our goal in this regression task is to approximate a sine wave within a specific domain (the range of input values e.g., 3 periods), as a sine wave, being infinitely long, cannot be represented by a piecewise linear function.

embed (41).svg

First, it's important to know the concept that a function can be approximated by combining multiple functions. In this Fat Neural Network, our activation function is the ingredient to form the target function.

The percetron $\hat{y}=\sigma(\mathbf{w}\cdot \mathbf{x}+b)$ consists of parameters $w$, input $x$, bias $b$, and the activation function $\sigma$. To better visualize the function's composition, we consider one-dimensional input $x$ and one-dimensional parameter $w$. We will use the ReLU function $\sigma_{\text{ReLU}}(z) = \max(0, z)$ to highlight the slope changes at the corner point. The shape of ReLU is depicted in the following image, showing a distinct sharp corner at $z=0$.

output (2).png

The overall math expression for such as perceptron is given as,

$$ \sigma_{\text{ReLU}}(wx+b)=\max(0, wx+b) $$

Here, we consider only a univariate case. Regardless of changes to $w$ (scaling) and $b$ (shifting), a ReLU function cannot replicate the structure of a sine or triangle function. To achieve such transformations, multiple ReLU units must be combined, leading to the MLP structure illustrated below.

embed (40).svg

In the diagram, we have illustrated only two perceptrons in the first layer; however, in reality, more ReLU functions would be necessary to achieve such a fitting. Furthermore, we do not use the activation function after the hidden layer because we are addressing a regression problem.

NOTE: The sharp bend in the ReLU function around the origin makes it challenging to perfectly fit a sine wave. However, this sharp turning point provides a useful example for analysis, as it approximates the sine wave's oscillating pattern in a piecewise triangular manner. For a smoother and more accurate fit, activation functions like ELU (Exponential Linear Unit) are recommended.

Triangle Approximation and Solve by Eyes

Step 1: we manually adjust the entire function by adding 1 to the y-values, ensuring all function values $y$ are non-negative. This adjustment will be beneficial for subsequent ReLU processing.

embed (32).svg

As a result, we need to configure $g$ to be equal to $-1$. This adjustment ensures that if the preceding neural networks can build the left piecewise function on the upper figure, $g$ can then shift it downwards to the right figure on the top. The neural network can now be set as below.

embed (31).svg

Step 2: Now, we manually break down the function into a sum of similar functions represented by $g(x)$. It becomes apparent that within this domain, the piecewise function is composed of three triangle-like functions.