"inference time optimization" Papers
4 papers found
Conference
Context-aware Dynamic Pruning for Speech Foundation Models
Masao Someki, Yifan Peng, Siddhant Arora et al.
ICLR 2025
7
citations
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
CVPR 2025arXiv:2503.18278
24
citations
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.
ICML 2024arXiv:2403.09636
94
citations
Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids
Wontae Kim, Nam Ik Cho
ECCV 2024
7
citations