커뮤니티

공지사항

SSL’s paper is accepted to EuroSys 2025!

JABAS: Joint Adaptive Batching and Automatic Scaling for DNN Training on Heterogeneous GPUs

We present JABAS, a novel DNN training system that combines adaptive batching and automatic scaling to accelerate training on heterogeneous GPUs without accuracy loss.
Built on the IIDP framework, JABAS ensures the same theoretical convergence rate as distributed SGD.
It dynamically adjusts batch sizes within an epoch and optimally scales GPU resources between epochs.
Evaluated on LLMs and other DNNs, JABAS achieves 33.3% faster training and 54.2% lower cost than state-of-the-art adaptive training methods.

jabas-page-2
Download