"efficient inference" Papers

20 papers found

ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation

Xiaomeng Yang, LEI LU, Qihui Fan et al.

NEURIPS 2025oralarXiv:2505.21817
1
citations

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paperarXiv:2502.01976
24
citations

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025paperarXiv:2403.14520
110
citations

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.

ICLR 2025arXiv:2412.00876
42
citations

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.

ICCV 2025arXiv:2506.21080
2
citations

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Hongwei Niu, Jie Hu, Jianghang Lin et al.

AAAI 2025paperarXiv:2412.08628
6
citations

Plug-and-Play Context Feature Reuse for Efficient Masked Generation

Xuejie Liu, Anji Liu, Guy Van den Broeck et al.

NEURIPS 2025arXiv:2505.19089
4
citations

PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation

Ao Wang, Hui Chen, Jianchao Tan et al.

NEURIPS 2025arXiv:2412.03409
6
citations

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Tianyu Fu, Yi Ge, Yichen You et al.

NEURIPS 2025arXiv:2505.21600
13
citations

Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury et al.

CVPR 2025arXiv:2505.23763

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

Samir Khaki, Junxian Guo, Jiaming Tang et al.

ICCV 2025arXiv:2510.17777
1
citations

Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models

Yoojin Jung, Byung Cheol Song

CVPR 2025arXiv:2504.04747
1
citations

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.

AAAI 2024paperarXiv:2312.11882
17
citations

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024arXiv:2403.15447
49
citations

Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

Cheng Gong, Yao Chen, Qiuyang Luo et al.

ECCV 2024arXiv:2407.13986
3
citations

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ZUYAN LIU, Benlin Liu, Jiahui Wang et al.

ECCV 2024arXiv:2407.18121
26
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024arXiv:2403.06082
16
citations

MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices

Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.

ECCV 2024arXiv:2311.16567
36
citations

OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks

JINGYANG XIANG, Zuohui Chen, Siqi Li et al.

ECCV 2024arXiv:2407.05257
5
citations

Switchable Decision: Dynamic Neural Generation Networks

Shujian Zhang, Korawat Tanwisuth, Chengyue Gong et al.

ICML 2024arXiv:2405.04513