Read - Finite Difference Gradient Approximation (need refinement)

Finite difference gradient approximation is a first-order numerical method to estimate the gradient of a function when an analytical derivative is not available or is too costly to compute. This technique leverages the idea of perturbing the input slightly and observing the corresponding change in the function's output.

Unlike methods that rely on symbolic or automatic differentiation, this approach uses the central difference formula:

$$ \text{grad} = \frac{f(x + h) - f(x - h)}{2h} $$

Finite Difference Gradient Approximation Design

When dealing with functions that are not easily differentiable or are considered blackbox—meaning we cannot directly evaluate their derivative—the gradient approximation becomes an invaluable tool.

In essence, this method "looks" to the left and right of the current point to determine which direction offers a more promising descent. There are a few factors to consider:

for instance, the choice of the interval over which to approximate the gradient, often denoted as $h$, is critical. If the interval is too large, the approximation may miss important local details; if it's too small, numerical errors or noise may dominate the estimation.
Additionally, when the function exhibits a very bumpy or noisy behavior, it can be beneficial to average the gradient estimates over multiple evaluations. This averaging helps smooth out erratic fluctuations, leading to a more stable and reliable direction for descent. Ultimately, the goal is to balance the trade-offs between sensitivity and robustness, ensuring that the approximation provides a practical direction for optimization even when the underlying function resists analytical differentiation.

Have a lookg at this wikipedia page.

https://en.wikipedia.org/wiki/Finite_difference

Code

import torch
import matplotlib.pyplot as plt

class FiniteDifferenceSolver:
    def __init__(self, func, h=1e-4):
        self.func = func
        self.h = h

    def gradient(self, x):
        # Ensure x is a float
        x = float(x)
        f_plus = self.func(torch.tensor(x + self.h))
        f_minus = self.func(torch.tensor(x - self.h))
        grad = (f_plus - f_minus) / (2 * self.h)
        return grad.item()

    def step(self, x, learning_rate):
        grad = self.gradient(x)
        x_new = x - learning_rate * grad
        return x_new, self.func(torch.tensor(x_new)), grad

# Example function to minimize
def f(x):
    # Example: A cubic function
    return x**3 - 6*x**2 + 9*x + 1

# Create an instance of FiniteDifferenceSolver
solver = FiniteDifferenceSolver(f, h=1e-4)

# Parameters for optimization
x_start = -1.0
learning_rate = 0.002
max_iter = 50
tolerance = 1e-6

# Lists to store optimization path for visualization
x_values = [x_start]
f_values = [f(torch.tensor(x_start)).item()]

x_current = x_start

# Optimization loop using finite difference gradient approximation
for i in range(max_iter):
    x_new, f_new, grad = solver.step(x_current, learning_rate)
    x_values.append(x_new)
    f_values.append(f_new.item())
    
    # Print iteration details
    print(f"Iteration {i+1}: x = {x_new:0.6f}, f(x) = {f_new.item():0.6f}, grad = {grad:0.6f}")
    
    # Check for convergence using the gradient magnitude
    if abs(grad) < tolerance:
        print("Convergence achieved.")
        break
    
    x_current = x_new

# Plotting the function and optimization path
x_plot = torch.linspace(min(x_values)-1, max(x_values)+1, 400)
f_plot = [f(x_val) for x_val in x_plot]

plt.figure(figsize=(12, 3))
plt.plot(x_plot.numpy(), f_plot, label='f(x)')
plt.scatter(x_values, f_values, color='red', label='Optimization Path')
plt.title("Finite Difference Gradient Descent")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.legend()
plt.grid(True)
plt.show()