Gradient Calculation and Gradient Descent for General Functions

When dealing with complex functions where closed-form solutions for optimization are not available or feasible to derive, iterative methods are employed. These methods incrementally approach the optimum through repeated adjustments based on the function's derivative information. A common iterative technique is the Gradient Descent method.

Gradient in 1D

Denoted as $\nabla$, the gradient represents how a function $y = f(\mathbf{x})$ changes with respect to its variables $\mathbf{x}$. In a single-variable case, it simplifies to the derivative with respect to $x$:

$$ \nabla_xf(x)= \frac{d{f}}{d{x}} $$

The symbol $d$ represents a ordinary differential, and $\frac{d{f}}{d{x}}$ denotes the derivative. In engineering, a derivative signifies the change that occurs when we consider infinitesimally small differences or limits.

$$ \frac{df}{dx} = \lim_{\epsilon \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} $$

As shown in the simple example below, always keep in mind that $\lim_{\Delta x \to 0}$.

Untitled

The red curve represents the quadratic function $f(x)=x^2$.
The blue line is the secant line, which approximates the derivative using a finite difference, $Δx$.
The green dashed line is the tangent line at $x$, representing the true derivative $f'(x)$ at $x=1.0$.

Gradient Visualization

In the 2-variable example, such as a function $f$ with two input parameters $x_1$ and $x_2$, it's typically expressed as a vector of function’s derivatives with respect to each adjustable parameters like this:

$$ \nabla_xf(x)=\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}}\\\frac{\partial{f}}{\partial{x_2}} \end{bmatrix} $$

The symbol $\partial$ represents a partial differential because we control two variables $x_1$, and $x_2$ that affects the $f$ value. The figure below demonstrates the function $y=f(x_1, x_2)=xe^{-(x_1^2+x_2^2)}$, with its gradients visualized in both two-dimensional. The third dimension $y$ is represented through both height and color variations. Brighter colors signify higher $y$ values, while darker shades indicate lower $y$ values.

Untitled

In this example, a heatmap is placed on the right side to visualize the gradient vectors $[\frac{\partial{f}}{\partial{x_1}}, \frac{\partial{f}}{\partial{x_2}}]$ in arrows at various position $(x_1, x_2)$. On the heatmap, you can see that near the deep blue areas, the arrows (i.e.,the gradient vector) point outward, which means the function values are going up. In contrast, near the light yellow areas, the arrows point inward, indicating that the function values are increasing in that direction. When we map this to the 3D space, these arrows at any given $x_1,x_2$ position show us the direction where the function $y$ increases the most.

Takeaway: When considering a specific point $x$, the gradient $\nabla_f(x)$ indicates the direction in which the value of the function increases most rapidly.

The magnitude of the gradient (represented by the length of the vector) illustrates the steepness of this increase.
If we adjust $x$ along the direction of the gradient, the value of $f$ will increase. Conversely, if we modify $x$ in the direction opposite to the gradient, the value of $f$ will decrease.