Adaptive Learning Rate - AdaGrad

Gradient Accumulation - Momentum

Learning Rate Schedules


Code - Optimization Comparison