by Vikram Appia Papers
2 papers found
Conference
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li, Mehdi Rezagholizadeh, Mingyu Yang et al.
COLM 2025paperarXiv:2503.11132
1
citations
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li et al.
NEURIPS 2025arXiv:2505.17272
7
citations