Poster "scaling laws" Papers
35 papers found
Conference
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang, Allan Zhou, Zhili Feng et al.
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Thomson Yen, Andrew Siah, Haozhe Chen et al.
Diffusion Beats Autoregressive in Data-Constrained Settings
Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.
Emergence and scaling laws in SGD learning of shallow neural networks
Yunwei Ren, Eshaan Nichani, Denny Wu et al.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Muhammed Ildiz, Halil Gozeten, Ege Taga et al.
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
Arthur Jacot, Seok Hoan Choi, Yuxiao Wen
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.
Learning in Compact Spaces with Approximately Normalized Transformer
Jörg Franke, Urs Spiegelhalter, Marianna Nezhurina et al.
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Yuda Song, Hanlin Zhang, Carson Eisenach et al.
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Margaret Li, Sneha Kudugunta, Luke Zettlemoyer
One Filters All: A Generalist Filter For State Estimation
Shiqi Liu, Wenhan Cao, Chang Liu et al.
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
Quantifying Elicitation of Latent Capabilities in Language Models
Elizabeth Donoway, Hailey Joren, Arushi Somani et al.
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupre la Tour, Henk Tillman et al.
Scaling Wearable Foundation Models
Girish Narayanswamy, Xin Liu, Kumar Ayush et al.
TabDPT: Scaling Tabular Foundation Models on Real Data
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Tung-Yu Wu, Melody Lo
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang, Junliang Guo, Tianyu He et al.
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
Brian Bartoldson, James Diffenderfer, Konstantinos Parasyris et al.
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu, Andres Potapczynski, Marc Finzi et al.
NeRF-XL: NeRF at Any Scale with Multi-GPU
Ruilong Li, Sanja Fidler, Angjoo Kanazawa et al.
Scaling Laws for Fine-Grained Mixture of Experts
Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski et al.
Scaling Laws for the Value of Individual Data Points in Machine Learning
Ian Covert, Wenlong Ji, Tatsunori Hashimoto et al.
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Haowei Lin, Baizhou Huang, Haotian Ye et al.
Towards Understanding Inductive Bias in Transformers: A View From Infinity
Itay Lavie, Guy Gur-Ari, Zohar Ringel
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Buyun Zhang, Liang Luo, Yuxin Chen et al.