Poster "low-rank compression" Papers
4 papers found
Conference
FlashBias: Fast Computation of Attention with Bias
Haixu Wu, Minghao Guo, Yuezhou Ma et al.
NEURIPS 2025arXiv:2505.12044
1
citations
SALS: Sparse Attention in Latent Space for KV Cache Compression
Junlin Mu, Hantao Huang, Jihang Zhang et al.
NEURIPS 2025arXiv:2510.24273
Error Feedback Can Accurately Compress Preconditioners
Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic et al.
ICML 2024arXiv:2306.06098
6
citations
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
Samuel Horváth, Stefanos Laskaridis, Shashank Rajput et al.
ICML 2024