"computational cost reduction" Papers
15 papers found
Conference
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Zhuomin He, Yizhen Yao, Pengfei Zuo et al.
AAAI 2025paperarXiv:2501.02336
12
citations
Attribution-Driven Adaptive Token Pruning for Transformers
YAOYAO YAN, Hui Yu, Weizhi Xu
NEURIPS 2025
Diffusion on Demand: Selective Caching and Modulation for Efficient Generation
Hee Min Choi, Hyoa Kang, Dokwan Oh et al.
NEURIPS 2025
MMTEB: Massive Multilingual Text Embedding Benchmark
Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.
ICLR 2025arXiv:2502.13595
80
citations
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu, Wei Wu, Guanyu Feng et al.
ICCV 2025arXiv:2508.16212
6
citations
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Wei Suo, Ji Ma, Mengyang Sun et al.
ICCV 2025arXiv:2412.06458
1
citations
Reasoning Planning for Language Models
Ngoc Bao Nguyen, Trung Hieu Nguyen, Ruifeng She et al.
NEURIPS 2025spotlightarXiv:2511.00521
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.
ICCV 2025arXiv:2504.00502
4
citations
SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era.
Elizaveta Semenova, Siobhan Mackenzie Hall, Timothy James Hitge et al.
NEURIPS 2025arXiv:2502.06753
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
Felix Krause, Timy Phan, Ming Gui et al.
ICCV 2025arXiv:2501.04765
13
citations
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
ICCV 2025arXiv:2505.22664
1
citations
Accelerating PDE Data Generation via Differential Operator Action in Solution Space
huanshuo dong, Hong Wang, Haoyang Liu et al.
ICML 2024arXiv:2402.05957
14
citations
Online Cascade Learning for Efficient Inference over Streams
Lunyiu Nie, Zhimin Ding, Erdong Hu et al.
ICML 2024arXiv:2402.04513
16
citations
Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints
Xiaobo Xia, Jiale Liu, Shaokun Zhang et al.
ICML 2024spotlightarXiv:2311.08675
15
citations
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang, Bhishma Dedhia, Niraj Jha
CVPR 2024arXiv:2305.17328
61
citations