"interpretability framework" Papers
2 papers found
Conference
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
NEURIPS 2025arXiv:2512.10978
3
citations
Quantifying the Plausibility of Context Reliance in Neural Machine Translation
Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim et al.
ICLR 2024arXiv:2310.01188
6
citations