Adaptive Learning Rate - AdaGrad
Gradient Accumulation - Momentum
Learning Rate Schedules
Code - Optimization Comparison