"language model scaling" Papers

13 papers found

Filters:language model scaling Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference

Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

ICLR 2025arXiv:2310.10845

citations

Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training

Woojin Chung, Jeonghoon Kim

NEURIPS 2025arXiv:2508.15390

citations

Language models scale reliably with over-training and on downstream tasks

Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.

ICLR 2025arXiv:2403.08540

citations

Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination’s Impact on Machine Translation

Muhammed Yusuf Kocyigit, Eleftheria Briakou, Daniel Deutsch et al.

ICML 2025oralarXiv:2501.18771

citations

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

Xiuying Wei, Anunay Yadav, Razvan Pascanu et al.

NEURIPS 2025arXiv:2507.04416

Reasoning with Latent Thoughts: On the Power of Looped Transformers

Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.

ICLR 2025arXiv:2502.17416

citations

Scaling Laws for Precision

Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.

ICLR 2025arXiv:2411.04330

citations

SkyLadder: Better and Faster Pretraining via Context Window Scheduling

Tongyao Zhu, Qian Liu, Haonan Wang et al.

NEURIPS 2025arXiv:2503.15450

citations

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425

citations

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Nikhil Sardana, Jacob Portes, Alexandre (Sasha) Doubov et al.

ICML 2024arXiv:2401.00448

123

citations

Data Engineering for Scaling Language Models to 128K Context

Yao Fu, Rameswar Panda, Xinyao Niu et al.

ICML 2024arXiv:2402.10171

186

citations

Mechanistic Design and Scaling of Hybrid Architectures

Michael Poli, Armin Thomas, Eric Nguyen et al.

ICML 2024arXiv:2403.17844

citations

Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.

ICML 2024arXiv:2405.19592

citations