Spotlight "attention heads" Papers
2 papers found
Conference
Extrapolation by Association: Length Generalization Transfer In Transformers
Ziyang Cai, Nayoung Lee, Avi Schwarzschild et al.
NEURIPS 2025spotlightarXiv:2506.09251
8
citations
Understanding Parametric and Contextual Knowledge Reconciliation within Large Language Models
Jun Zhao, Yongzhuo Yang, Xiang Hu et al.
NEURIPS 2025spotlight