"attention concentration" Papers
2 papers found
Conference
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
NEURIPS 2025arXiv:2511.07378
5
citations
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
NEURIPS 2025oralarXiv:2502.02175
27
citations