by Yujin Song Papers
3 papers found
Conference
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Ryotaro Kawata, Yujin Song, Alberto Bietti et al.
NEURIPS 2025spotlightarXiv:2512.18634
1
citations
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
Wei Huang, Andi Han, Yujin Song et al.
NEURIPS 2025arXiv:2510.17526
1
citations
Nonlinear transformers can perform inference-time feature learning
Naoki Nishikawa, Yujin Song, Kazusato Oko et al.
ICML 2025