by Wayne Xiong Papers
3 papers found
Conference
Integrative Decoding: Improving Factuality via Implicit Self-consistency
Yi Cheng, Xiao Liang, Yeyun Gong et al.
ICLR 2025arXiv:2410.01556
7
citations
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Yu Fu, Zefan Cai, Abedelkadir Asi et al.
ICLR 2025arXiv:2410.19258
60
citations
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Zefan Cai, Yichi Zhang, Bofei Gao et al.
COLM 2025paperarXiv:2406.02069
204
citations