"scaling laws" Papers

46 papers found

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Yiding Jiang, Allan Zhou, Zhili Feng et al.

ICLR 2025arXiv:2410.11820
36
citations

Bayesian scaling laws for in-context learning

Aryaman Arora, Dan Jurafsky, Christopher Potts et al.

COLM 2025paperarXiv:2410.16531
13
citations

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Zachary Charles, Gabriel Teston, Lucio Dery et al.

NEURIPS 2025spotlightarXiv:2503.09799
14
citations

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Xinran Gu, Kaifeng Lyu, Jiazheng Li et al.

NEURIPS 2025spotlightarXiv:2505.18091
2
citations

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Thomson Yen, Andrew Siah, Haozhe Chen et al.

NEURIPS 2025arXiv:2503.21023
2
citations

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025arXiv:2507.15857
26
citations

Emergence and scaling laws in SGD learning of shallow neural networks

Yunwei Ren, Eshaan Nichani, Denny Wu et al.

NEURIPS 2025arXiv:2504.19983
17
citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NEURIPS 2025arXiv:2502.06857
10
citations

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

Muhammed Ildiz, Halil Gozeten, Ege Taga et al.

ICLR 2025arXiv:2410.18837
13
citations

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

Arthur Jacot, Seok Hoan Choi, Yuxiao Wen

ICLR 2025arXiv:2407.05664
6
citations

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025arXiv:2410.21676
43
citations

Hyperparameter Loss Surfaces Are Simple Near their Optima

Nicholas Lourie, He He, Kyunghyun Cho

COLM 2025paper
1
citations

Inverse Scaling: When Bigger Isn't Better

Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.

ICLR 2025arXiv:2306.09479
186
citations

Language Modeling by Language Models

Junyan Cheng, Peter Clark, Kyle Richardson

NEURIPS 2025spotlightarXiv:2506.20249
3
citations

Language models scale reliably with over-training and on downstream tasks

Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.

ICLR 2025arXiv:2403.08540
79
citations

Learning in Compact Spaces with Approximately Normalized Transformer

Jörg Franke, Urs Spiegelhalter, Marianna Nezhurina et al.

NEURIPS 2025arXiv:2505.22014
1
citations

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Yuda Song, Hanlin Zhang, Carson Eisenach et al.

ICLR 2025arXiv:2412.02674

(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning

Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

ICLR 2025
9
citations

One Filters All: A Generalist Filter For State Estimation

Shiqi Liu, Wenhan Cao, Chang Liu et al.

NEURIPS 2025arXiv:2509.20051
2
citations

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.

NEURIPS 2025arXiv:2505.13738
17
citations

Predictable Scale (Part II) --- Farseer: A Refined Scaling Law in LLMs

Houyi Li, Wenzhen Zheng, Qiufeng Wang et al.

NEURIPS 2025spotlight

Quantifying Elicitation of Latent Capabilities in Language Models

Elizabeth Donoway, Hailey Joren, Arushi Somani et al.

NEURIPS 2025

Reasoning with Latent Thoughts: On the Power of Looped Transformers

Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.

ICLR 2025arXiv:2502.17416
79
citations

RegMix: Data Mixture as Regression for Language Model Pre-training

Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.

ICLR 2025arXiv:2407.01492
105
citations

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025arXiv:2406.04093
326
citations

Scaling Laws For Scalable Oversight

Joshua Engels, David Baek, Subhash Kantamneni et al.

NEURIPS 2025spotlightarXiv:2504.18530
4
citations

Scaling up Masked Diffusion Models on Text

Shen Nie, Fengqi Zhu, Chao Du et al.

ICLR 2025oralarXiv:2410.18514
124
citations

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025arXiv:2410.13638
33
citations

TabDPT: Scaling Tabular Foundation Models on Real Data

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.

NEURIPS 2025arXiv:2410.18164
19
citations

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.

ICLR 2025arXiv:2501.12486
1
citations

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.

ICLR 2025arXiv:2409.16040
194
citations

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.

NEURIPS 2025spotlightarXiv:2504.09597
6
citations

U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models

Tung-Yu Wu, Melody Lo

ICLR 2025arXiv:2410.01692
5
citations

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Wentao Zhang, Junliang Guo, Tianyu He et al.

ICLR 2025arXiv:2407.07356
7
citations

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

Brian Bartoldson, James Diffenderfer, Konstantinos Parasyris et al.

ICML 2024arXiv:2404.09349
37
citations

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.

ICML 2024arXiv:2402.07043
110
citations

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Shikai Qiu, Andres Potapczynski, Marc Finzi et al.

ICML 2024arXiv:2406.06248
23
citations

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Obando Ceron, Ghada Sokar, Timon Willi et al.

ICML 2024spotlightarXiv:2402.08609
64
citations

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag et al.

ICML 2024spotlightarXiv:2311.03233
2
citations

NeRF-XL: NeRF at Any Scale with Multi-GPU

Ruilong Li, Sanja Fidler, Angjoo Kanazawa et al.

ECCV 2024

Scaling Laws for Fine-Grained Mixture of Experts

Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski et al.

ICML 2024arXiv:2402.07871
120
citations

Scaling Laws for the Value of Individual Data Points in Machine Learning

Ian Covert, Wenlong Ji, Tatsunori Hashimoto et al.

ICML 2024arXiv:2405.20456
11
citations

Scaling Laws of Synthetic Images for Model Training ... for Now

Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.

CVPR 2024arXiv:2312.04567
108
citations

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Haowei Lin, Baizhou Huang, Haotian Ye et al.

ICML 2024arXiv:2402.02314
29
citations

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Itay Lavie, Guy Gur-Ari, Zohar Ringel

ICML 2024arXiv:2402.05173
10
citations

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang, Liang Luo, Yuxin Chen et al.

ICML 2024arXiv:2403.02545
78
citations