Poster "retrieval heads" Papers
3 papers found
Conference
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
ICLR 2025arXiv:2410.10819
179
citations
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang, Yang Lin, Jing Lin et al.
ICLR 2025arXiv:2407.15891
62
citations
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
ICLR 2025arXiv:2404.15574
150
citations