"efficient inference" Papers
20 papers found
Conference
ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation
Xiaomeng Yang, LEI LU, Qihui Fan et al.
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.
EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation
Hongwei Niu, Jie Hu, Jianghang Lin et al.
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Xuejie Liu, Anji Liu, Guy Van den Broeck et al.
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang, Hui Chen, Jianchao Tan et al.
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu, Yi Ge, Yichen You et al.
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury et al.
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki, Junxian Guo, Jiaming Tang et al.
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models
Yoojin Jung, Byung Cheol Song
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
Cheng Gong, Yao Chen, Qiuyang Luo et al.
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
ZUYAN LIU, Benlin Liu, Jiahui Wang et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
JINGYANG XIANG, Zuohui Chen, Siqi Li et al.
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang, Korawat Tanwisuth, Chengyue Gong et al.