Although, training minimizes loss by adjusting model parameters $\theta$, training doesn't ensure the best solutions because hyperparameters, which significantly influence the model, must be set separately and cannot be adjusted during training.

Hyperparameters: such as the number of layers, the number of hidden units per layer, the learning rate, and the number of training iterations, differ fundamentally from trainable parameters in that they are not subject to optimization during training, as they are not differentiable. Instead, hyperparameter tuning must be performed through external optimization techniques, such as grid search or Bayesian optimization, to identify configurations that enhance model performance.

Note: Hyperparameters influence not only the loss but also overall task-specific performance (usually known as metrics).

Validation

Validation takes place during training, typically after each epoch or a few iterations, to evaluate model performance and inform hyperparameter tuning. Hyperparameter optimization, which involves techniques like enumeration, operates as a super-loop over the training process, with each iteration representing a complete training cycle.

The goal is to optimize task-specific metrics by selecting the best model structures, loss functions, and optimization algorithms.

Untitled

Validation is essential due to (1) the limitations of loss functions and (2) the risk of overfitting.

Testing

Testing-stage evaluation involves assessing model performance on fixed test data before deployment, typically without iteration, to evaluate both model loss and task performance. To avoid overfitting and bias from hyperparameter tuning during validation, the testing phase uses entirely new data (the testing dataset) that the model has never encountered, ensuring stable performance.

embed (37).svg

During testing, metrics or loss functions evaluate model performance, with this example using a metric function.

Note: Testing does not modify parameters or configurations and focuses solely on assessing performance without further optimization.

Deployment or Runtime

Deployment, or the runtime phase, is where a machine learning model is applied to real-world tasks to deliver practical solutions. Unlike the controlled environments of training and testing, deployment involves dynamic and often unpredictable conditions. Tensorflow Serving, & TorchServe.

Deploying deep learning models comes with unique challenges: