Poster "multi-head attention" Papers
6 papers found
Conference
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou, Dayu Li, Jinshan Pan et al.
ICCV 2025arXiv:2503.20174
1
citations
On the Optimization and Generalization of Multi-head Attention
Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.
ICLR 2025arXiv:2310.12680
44
citations
SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.
NEURIPS 2025arXiv:2507.07694
2
citations
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
ICML 2024arXiv:2403.08058
13
citations
Evolving Subnetwork Training for Large Language Models
hanqi li, Lu Chen, Da Ma et al.
ICML 2024arXiv:2406.06962
2
citations
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Qingye Meng, Shengping Li et al.
ICML 2024arXiv:2405.08553
6
citations