Poster "scaling laws" Papers

35 papers found

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Yiding Jiang, Allan Zhou, Zhili Feng et al.

ICLR 2025arXiv:2410.11820
36
citations

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Thomson Yen, Andrew Siah, Haozhe Chen et al.

NEURIPS 2025arXiv:2503.21023
2
citations

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NEURIPS 2025arXiv:2507.15857
26
citations

Emergence and scaling laws in SGD learning of shallow neural networks

Yunwei Ren, Eshaan Nichani, Denny Wu et al.

NEURIPS 2025arXiv:2504.19983
17
citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NEURIPS 2025arXiv:2502.06857
10
citations

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

Muhammed Ildiz, Halil Gozeten, Ege Taga et al.

ICLR 2025arXiv:2410.18837
13
citations

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

Arthur Jacot, Seok Hoan Choi, Yuxiao Wen

ICLR 2025arXiv:2407.05664
6
citations

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025arXiv:2410.21676
43
citations

Inverse Scaling: When Bigger Isn't Better

Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.

ICLR 2025arXiv:2306.09479
186
citations

Language models scale reliably with over-training and on downstream tasks

Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.

ICLR 2025arXiv:2403.08540
79
citations

Learning in Compact Spaces with Approximately Normalized Transformer

Jörg Franke, Urs Spiegelhalter, Marianna Nezhurina et al.

NEURIPS 2025arXiv:2505.22014
1
citations

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Yuda Song, Hanlin Zhang, Carson Eisenach et al.

ICLR 2025arXiv:2412.02674

(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning

Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

ICLR 2025
9
citations

One Filters All: A Generalist Filter For State Estimation

Shiqi Liu, Wenhan Cao, Chang Liu et al.

NEURIPS 2025arXiv:2509.20051
2
citations

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.

NEURIPS 2025arXiv:2505.13738
17
citations

Quantifying Elicitation of Latent Capabilities in Language Models

Elizabeth Donoway, Hailey Joren, Arushi Somani et al.

NEURIPS 2025

Reasoning with Latent Thoughts: On the Power of Looped Transformers

Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.

ICLR 2025arXiv:2502.17416
79
citations

RegMix: Data Mixture as Regression for Language Model Pre-training

Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.

ICLR 2025arXiv:2407.01492
105
citations

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025arXiv:2406.04093
326
citations

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025arXiv:2410.13638
33
citations

TabDPT: Scaling Tabular Foundation Models on Real Data

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.

NEURIPS 2025arXiv:2410.18164
19
citations

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.

ICLR 2025arXiv:2501.12486
1
citations

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.

ICLR 2025arXiv:2409.16040
194
citations

U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models

Tung-Yu Wu, Melody Lo

ICLR 2025arXiv:2410.01692
5
citations

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Wentao Zhang, Junliang Guo, Tianyu He et al.

ICLR 2025arXiv:2407.07356
7
citations

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

Brian Bartoldson, James Diffenderfer, Konstantinos Parasyris et al.

ICML 2024arXiv:2404.09349
37
citations

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.

ICML 2024arXiv:2402.07043
110
citations

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Shikai Qiu, Andres Potapczynski, Marc Finzi et al.

ICML 2024arXiv:2406.06248
23
citations

NeRF-XL: NeRF at Any Scale with Multi-GPU

Ruilong Li, Sanja Fidler, Angjoo Kanazawa et al.

ECCV 2024

Scaling Laws for Fine-Grained Mixture of Experts

Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski et al.

ICML 2024arXiv:2402.07871
120
citations

Scaling Laws for the Value of Individual Data Points in Machine Learning

Ian Covert, Wenlong Ji, Tatsunori Hashimoto et al.

ICML 2024arXiv:2405.20456
11
citations

Scaling Laws of Synthetic Images for Model Training ... for Now

Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.

CVPR 2024arXiv:2312.04567
108
citations

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Haowei Lin, Baizhou Huang, Haotian Ye et al.

ICML 2024arXiv:2402.02314
29
citations

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Itay Lavie, Guy Gur-Ari, Zohar Ringel

ICML 2024arXiv:2402.05173
10
citations

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang, Liang Luo, Yuxin Chen et al.

ICML 2024arXiv:2403.02545
78
citations