by Nolan Dey Papers
3 papers found
Conference
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey, Bin Zhang, Lorenzo Noci et al.
NEURIPS 2025arXiv:2505.01618
33
citations
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NEURIPS 2025arXiv:2505.13738
17
citations
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
ICLR 2025arXiv:2502.15938
24
citations