Poster "compute-optimal training" Papers
2 papers found
Conference
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.
ICLR 2025arXiv:2403.08540
79
citations
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
ICLR 2025arXiv:2502.15938
24
citations