by Will Merrill Papers
3 papers found
Conference
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
Will Merrill, Ashish Sabharwal
NEURIPS 2025arXiv:2503.03961
33
citations
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NEURIPS 2025spotlightarXiv:2505.23971
6
citations
Exact Expressive Power of Transformers with Padding
Will Merrill, Ashish Sabharwal
NEURIPS 2025arXiv:2505.18948
7
citations