"weak-to-strong generalization" Papers
9 papers found
Conference
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
Junsoo Oh, Jerry Song, Chulhee Yun
NEURIPS 2025arXiv:2510.24812
2
citations
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Muhammed Ildiz, Halil Gozeten, Ege Taga et al.
ICLR 2025arXiv:2410.18837
13
citations
Provable weak-to-strong generalization via benign overfitting
David Wu, Anant Sahai
ICLR 2025arXiv:2410.04638
14
citations
Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models
Junhao Dong, Cong Zhang, Xinghua Qu et al.
NEURIPS 2025spotlight
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
Scott Geng, Hamish Ivison, Chun-Liang Li et al.
COLM 2025paperarXiv:2507.06187
8
citations
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin, John Cooper, Frederic Sala
ICLR 2025arXiv:2412.03881
14
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
ICLR 2025arXiv:2410.18640
15
citations
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan, John Hughes, Dan Valentine et al.
ICML 2024arXiv:2402.06782
212
citations
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns, Pavel Izmailov, Jan Kirchner et al.
ICML 2024arXiv:2312.09390
406
citations