Although, training minimizes loss by adjusting model parameters $\theta$, training doesn't ensure the best solutions because hyperparameters, which significantly influence the model, must be set separately and cannot be adjusted during training.
Hyperparameters, including the number of layers, the number of hidden units per layer, and the learning rate, differ from trainable parameters in that they cannot be adjusted during training because they are not differentiable. There are a few methods to optimize these hyperparameters such as grid search or Bayes approximation.
Note: Hyperparameters influence not only the loss but also overall task-specific performance (usually known as metrics).
Hyperparameters are tuned through enumeration and validation, which serves as a super-loop over the training loop, with each validation iteration corresponding to a full training cycle.
The goal is to optimize task-specific metrics, by selecting the best model structures, loss functions, and optimization algorithms.
Validation is essential due to 1) the limitations of loss functions and 2) the risk of overfitting.
Loss functions, designed to be simple and differentiable for optimization, often fail to capture the complexities of the target task, potentially hindering model performance. Moreover, strong training results do not ensure effectiveness on unseen data, highlighting the critical role of validation.
Therefore, during the validation stage, we prioritize task-specific metrics when available, while also monitoring the relationship between training loss and validation loss to detect overfitting.
Example: In image restoration tasks, the training often uses mean square error as the pixel-level loss function. Validation then assesses the model with perceptual metrics like PSNR and SSIM. These metrics, which are too complex for training, evaluate the perceptual quality of the models for a specific configuration of model structure, hyperparameters, and optimization algorithms.
Testing-stage evaluation involves assessing model performance on fixed test data before deployment, typically without iteration, to evaluate both model loss and task performance. To avoid overfitting and bias from hyperparameter tuning during validation, the testing phase uses entirely new data (the testing dataset) that the model has never encountered, ensuring stable performance.
During testing, metrics or loss functions evaluate model performance, with this example using a metric function.
Note: Testing does not modify parameters or configurations and focuses solely on assessing performance without further optimization.
Deployment, or the runtime phase, is where a machine learning model is applied to real-world tasks to deliver practical solutions. Unlike the controlled environments of training and testing, deployment involves dynamic and often unpredictable conditions. Tensorflow Serving, & TorchServe.
Deploying deep learning models comes with unique challenges: