When dealing with complex functions where closed-form solutions for optimization are not available or feasible to derive, iterative methods are employed. These methods incrementally approach the optimum through repeated adjustments based on the function's derivative information. A common iterative technique is the Gradient Descent method.

Gradient in 1D

Denoted as $\nabla$, the gradient represents how a function $y = f(\mathbf{x})$ changes with respect to its variables $\mathbf{x}$. In a single-variable case, it simplifies to the derivative with respect to $x$:

$$ \nabla_xf(x)= \frac{d{f}}{d{x}} $$

The symbol $d$ represents a ordinary differential, and $\frac{d{f}}{d{x}}$ denotes the derivative. In engineering, a derivative signifies the change that occurs when we consider infinitesimally small differences or limits.

$$ \frac{df}{dx} = \lim_{\epsilon \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} $$

As shown in the simple example below, always keep in mind that $\lim_{\Delta x \to 0}$.

Untitled

Gradient Visualization

In the 2-variable example, such as a function $f$ with two input parameters $x_1$ and $x_2$, it's typically expressed as a vector of function’s derivatives with respect to each adjustable parameters like this:

$$ \nabla_xf(x)=\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}}\\\frac{\partial{f}}{\partial{x_2}} \end{bmatrix} $$

The symbol $\partial$ represents a partial differential because we control two variables $x_1$, and $x_2$ that affects the $f$ value. The figure below demonstrates the function $y=f(x_1, x_2)=xe^{-(x_1^2+x_2^2)}$, with its gradients visualized in both two-dimensional. The third dimension $y$ is represented through both height and color variations. Brighter colors signify higher $y$ values, while darker shades indicate lower $y$ values.

Untitled

In this example, a heatmap is placed on the right side to visualize the gradient vectors $[\frac{\partial{f}}{\partial{x_1}}, \frac{\partial{f}}{\partial{x_2}}]$ in arrows at various position $(x_1, x_2)$. On the heatmap, you can see that near the deep blue areas, the arrows (i.e.,the gradient vector) point outward, which means the function values are going up. In contrast, near the light yellow areas, the arrows point inward, indicating that the function values are increasing in that direction. When we map this to the 3D space, these arrows at any given $x_1,x_2$ position show us the direction where the function $y$ increases the most.

Takeaway: When considering a specific point $x$, the gradient $\nabla_f(x)$ indicates the direction in which the value of the function increases most rapidly.