"token efficiency" Papers
3 papers found
Conference
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.
ICCV 2025arXiv:2411.14401
11
citations
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NEURIPS 2025spotlightarXiv:2505.23971
6
citations
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
NEURIPS 2025arXiv:2505.19217
5
citations