"language model training" Papers
21 papers found
Conference
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
ASGO: Adaptive Structured Gradient Optimization
Kang An, Yuxing Liu, Rui Pan et al.
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Rosie Zhao, Depen Morwani, David Brandfonbrener et al.
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi, Fan Nie, Alexandre Alahi et al.
FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models
Pukang Ye, Luo Junwei, Jiachen Shen et al.
Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin SU, Liang Wang et al.
Gradient descent with generalized Newton’s method
Zhiqi Bu, Shiyun Xu
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
Learning from negative feedback, or positive feedback or both
Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.
Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
Haoran Lian, Yizhe Xiong, Jianwei Niu et al.
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
Teaching Language Models to Reason with Tools
Chengpeng Li, Zhengyang Tang, Ziniu Li et al.
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Minhak Song, Beomhan Baek, Kwangjun Ahn et al.
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere et al.
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Fewer Truncations Improve Language Modeling
Hantian Ding, Zijian Wang, Giovanni Paolini et al.
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
QuRating: Selecting High-Quality Data for Training Language Models
Alexander Wettig, Aatmik Gupta, Saumya Malik et al.
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.