"large language model training" Papers
3 papers found
Conference
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Haocheng Xi, Han Cai, Ligeng Zhu et al.
ICLR 2025arXiv:2410.19313
19
citations
MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization
Da Chang, Ganzhao Yuan
NEURIPS 2025spotlight
Understanding the Training Speedup from Sampling with Approximate Losses
Rudrajit Das, Xi Chen, Bertram Ieong et al.
ICML 2024arXiv:2402.07052
4
citations