"llm pre-training" Papers
3 papers found
Conference
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Yuda Song, Hanlin Zhang, Carson Eisenach et al.
ICLR 2025arXiv:2412.02674
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NEURIPS 2025arXiv:2505.13738
17
citations
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.
ICLR 2025arXiv:2501.06842
15
citations