The essence of building up a neural network lies in its training process, which involves adjusting model parameters according to a training dataset to ensure the output meets certain standards.

In supervised learning, the goal is to reduce the disparity between the model's outputs and the given labels.
In unsupervised learning, the focus is on making outputs adhere to a specific probability distribution (as in clustering) or reducing information loss relative to the inputs (as in dimensionality reduction).

3 Pillars

The primary aim of training is to discover the optimal model $f$ so that the prediction $\hat{y}$ is as desired.

embed - 2025-01-27T104426.555.svg

This explanation will focus on the training aspect of supervised learning. The three core elements in machine learning are:

Machine Learning Models: The architecture of machine learning models, denoted by $f$ and characterized by parameters $\theta$, varies based on the application. These models take input data $x$ and produce predictions $\hat{y}$.
Loss Functions: The loss function $L$ computes the loss value $J$, which quantifies the difference between the predicted output $\hat{y}$ and the true label $y$. A lower value of $J$ indicates higher accuracy for the model $f$.
Optimization Algorithm: This algorithm, symbolized by $\nabla$, adjusts the parameters of the model to minimize the loss value $J$, thereby enhancing the performance of the model.

Training Process in Machine Learning

The training process in machine learning is a systematic approach to refining the model $f$ so that its predictions $\hat{y}$ closely match the actual labels $y$. This process can be broken down into several key steps:

Initialize Parameters: Start with initial guesses for the parameters of the model $f$. These parameters might be set randomly or according to a specific initialization rule.
Forward Propagation: For each training data point, input $x$, compute the predicted output $\hat{y} = f(x)$. This step involves passing the data forward through the model (e.g., through the layers of a neural network).
Calculate Loss: Compute the loss $J = L(\hat{y}, y)$, which measures the discrepancy between the predicted output $\hat{y}$ and the actual label $y$. The loss function $L$ quantifies how well the model is performing; the lower the loss, the better the model's predictions.
Backward Propagation: Calculate the gradient of the loss function with respect to each parameter of the model. This involves applying the chain rule to find $\nabla_{\theta} J$, where $\theta$ represents the parameters of $f$.
Update Parameters: Adjust the parameters $\theta$ of the model using the gradients computed in the previous step. This is done using an optimization algorithm, such as gradient descent, where $\theta = \theta - \eta \nabla_{\theta} J$. Here, $\eta$ is the learning rate, a small positive scalar determining the step size.
Iterate: Repeat steps 2 through 5 for a set number of iterations or until the change in loss is below a predetermined threshold, or the number of iterations is reached. Each complete iteration through all training data is called an epoch.

Overall, the model $f$ is trained to minimize the loss function $J$, leading to improved accuracy in predicting the output $\hat{y}$ for given inputs $x$.

Note: During training, the model structure, hyperparameters, loss function, and optimization algorithm are typically set beforehand and remain constant throughout the training phase.