In this section, we explore using the somewhat 'old-fashioned' symbolic computation framework, Sympy, to calculate gradients and implement gradient descent for a simple neural network.
Symbolic computation is a branch of computing that manipulates mathematical expressions in a symbolic form rather than as numerical approximations. It involves the use of algorithms and data structures to perform tasks such as differentiation, integration, equation solving, and simplification, all while preserving the exactness of the mathematical expressions.
Unlike numerical methods that provide approximate solutions, symbolic computation retains the formal structure of mathematical entities, making it especially useful in areas where precision is paramount or where general formulas and theoretical analyses are needed.
Consider the task of integrating the function $\sin^2(x)$. Using a symbolic computation tool like Sympy, we can obtain the exact antiderivative as below on the below left.
$$ \int_a^b \sin^2(x)\, dx = \left(\frac{x}{2} - \frac{\sin(2x)}{4} + C\right)|_a^b, $$
Symbolic computation retains this exact form, so once you substitute the range for $x$, you obtain an exact expression for the integral.
In contrast, on the right upper figure, a 0th-order hold approach approximates $\sin^2(x)$ as a piecewise constant function, resulting in only an approximate integral over any interval.
Note that, we do not need to discuss backpropagation here, since the symbolic framework internally automatically computes gradients with respect to the parameters (regardless of whether it's via backpropagation). Our focus is solely on the step of gradient calculation and descent.
In Python, sympy
is a widely used symbolic mathematics library. It is used for creating symbolic variables and expressions, performing symbolic differentiation, and converting these symbolic expressions into numerical functions (via lambdify
) that can be used in the training loop.
import numpy as np
import matplotlib.pyplot as plt
import sympy as sp
We consider two classes. For each class, we generate 100 samples with 2 features drawn from the standard normal distribution. Then, we assign labels so that the first 100 samples belong to class 0
and the next 100 samples belong to class 1
.
# Sets the seed for NumPy's random number generator. This ensures that the randomly generated data is reproducible.
np.random.seed(42)
n_samples = 100
# Create two Gaussian clusters: one around (0,0) and one around (3,3)
X1 = np.random.randn(n_samples, 2) + np.array([0, 0])
X2 = np.random.randn(n_samples, 2) + np.array([3, 3])
X = np.vstack([X1, X2])
y_data = np.array([0] * n_samples + [1] * n_samples)
Here, we check if the data distribution is correct via visualizing data points.
# Plot the data
plt.figure(figsize=(6, 5))
plt.scatter(X[:, 0], X[:, 1], c=y_data, cmap='bwr', edgecolor='k')
plt.title("2D Data for Binary Classification")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()
Below, symbolic variables (x1
, x2
, y_sym
) represent the two input features and the target output, allowing for mathematical expressions that define neural network operations.
The symbolic parameters include hidden layer parameters: w11
, w12
, and b1
for the first neuron, and w21
, w22
, and b2
for the second neuron. The output layer parameters consist of v1
and v2
, which are weights connecting hidden neurons to the output, along with b3 as the bias for the output neuron. The sigmoid function applies non-linear activation to neuron outputs.
# Define symbolic variables for inputs and target
x1, x2, y_sym = sp.symbols('x1 x2 y')
# Define symbolic parameters for the hidden layer (2 neurons) and output layer
w11, w12, b1 = sp.symbols('w11 w12 b1')
w21, w22, b2 = sp.symbols('w21 w22 b2')
v1, v2, b3 = sp.symbols('v1 v2 b3')
# Define the sigmoid activation function
def sigmoid(z):
return 1 / (1 + sp.exp(-z))