The earlier example of a linear model with a linear constraint was primarily used to simplify the illustration of the long division algorithm. While division itself isn't overly complex, there are more efficient numerical methods for solving such problems, notably the gradient-based algorithms.

To do so, we need to further transform the division problem $\frac{5}{9}$ into another optimization problem, specifically an unconstrained convex optimization problem, as shown below:

$$ \arg\min_{x} \quad (5 - 9x)^2

$$

Simple Proof: The minimal value obtained from the equation is $5 - 9x=0$. This result is achieved because the square function always produces non-negative values, making the minimum non-negative output in this case equal to 0. In other words, if we find $x$ so that $(5 - 9x)^2$ is minimized, it means $x$ should satisfy $5 - 9x=0$.

Gradient descent is a common optimization technique for unconstrained problems. The algorithm follows a simple principle:

As you move in the opposite direction to the gradient of the function, the value of the function decreases.

What is a Gradient?

A gradient, denoted as $\nabla$ , represents how a function $f(x, y)$ changes with respect to its adjustable variables, such as $x$ and $y$. In the 2-variable example, it's typically expressed as a vector of function’s derivatives with respect to each adjustable parameters like this:

$$ \nabla_f=\begin{bmatrix} \frac{\partial{f}}{\partial{x}}\\\frac{\partial{f}}{\partial{y}} \end{bmatrix} $$

The symbol $\partial$ represents a partial derivative because we control two variables $x$, and $y$ that affects the $f$ value. When we're dealing with only one variable to adjust, the partial derivative simplifies to an ordinary derivative $d$. In engineering, a derivative signifies the change that occurs when we consider infinitesimally small differences or limits.

$$ \frac{df}{dx} = \lim_{\epsilon \to 0} \frac{f(x + \epsilon) - f(x)}{\epsilon} \approx \frac{\Delta f}{\Delta x} $$

Untitled

Gradient Visualization

The figure below demonstrates the function $z=f(x, y)=xe^{-(x^2+y^2)}$, with its gradients visualized in both two-dimensional. The third dimension $z$ is represented through both height and color variations. Brighter colors signify higher $z$ values, while darker shades indicate lower $z$ values.

Untitled

In this example, a heatmap is placed on the right side to visualize the gradients $[\frac{\partial{f}}{\partial{x}}, \frac{\partial{f}}{\partial{y}}]$ at various position $(x, y)$. On the heatmap, you can see that near the deep blue areas, the arrows (i.e.,the gradient vector) point outward, which means the function values are going up. In contrast, near the light yellow areas, the arrows point inward, indicating that the function values are increasing in that direction. When we map this to the 3D space, these arrows at any given $x,y$ position show us the direction where the function $z$ increases the most.