The essence of building up a neural network lies in its training process, which involves adjusting model parameters according to a training dataset to ensure the output meets certain standards.

3 Pillars

The primary aim of training is to discover the optimal model $f$ so that the prediction $\hat{y}$ is as desired.

Untitled

This explanation will focus on the training aspect of supervised learning. The three core elements in machine learning are:

Training Process in Machine Learning

The training process in machine learning is a systematic approach to refining the model $f$ so that its predictions $\hat{y}$ closely match the actual labels $y$. This process can be broken down into several key steps:

  1. Initialize Parameters: Start with initial guesses for the parameters of the model $f$. These parameters might be set randomly or according to a specific initialization rule.
  2. Forward Propagation: For each training data point, input $x$, compute the predicted output $\hat{y} = f(x)$. This step involves passing the data forward through the model (e.g., through the layers of a neural network).
  3. Calculate Loss: Compute the loss $J = L(\hat{y}, y)$, which measures the discrepancy between the predicted output $\hat{y}$ and the actual label $y$. The loss function $L$ quantifies how well the model is performing; the lower the loss, the better the model's predictions.
  4. Backward Propagation: Calculate the gradient of the loss function with respect to each parameter of the model. This involves applying the chain rule to find $\nabla_{\theta} J$, where $\theta$ represents the parameters of $f$.
  5. Update Parameters: Adjust the parameters $\theta$ of the model using the gradients computed in the previous step. This is done using an optimization algorithm, such as gradient descent, where $\theta = \theta - \eta \nabla_{\theta} J$. Here, $\eta$ is the learning rate, a small positive scalar determining the step size.
  6. Iterate: Repeat steps 2 through 5 for a set number of iterations or until the change in loss is below a predetermined threshold, or the number of iterations is reached. Each complete iteration through all training data is called an epoch.

Overall, the model $f$ is trained to minimize the loss function $J$, leading to improved accuracy in predicting the output $\hat{y}$ for given inputs $x$.

Note: During training, the model structure, hyperparameters, loss function, and optimization algorithm are typically set beforehand and remain constant throughout the training phase.