"inference optimization" Papers
4 papers found
Conference
Golden Cudgel Network for Real-Time Semantic Segmentation
Guoyu Yang, Yuan Wang, Daming Shi et al.
CVPR 2025arXiv:2503.03325
6
citations
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Tian Jin, Ahmed Imtiaz Humayun, Utku Evci et al.
ICLR 2025arXiv:2501.12486
1
citations
BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
Lancheng Zou, Wenqian Zhao, Shuo Yin et al.
ICML 2024
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu, Zheng Wang, Yonggan Fu et al.
ICML 2024arXiv:2406.15765
47
citations