"model size scaling" Papers
2 papers found
Conference
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
NEURIPS 2025spotlightarXiv:2503.09799
14
citations
Do Efficient Transformers Really Save Computation?
Kai Yang, Jan Ackermann, Zhenyu He et al.
ICML 2024arXiv:2402.13934
29
citations