"low-precision inference" Papers
3 papers found
Conference
Pushing the Limits of BFP on Narrow Precision LLM Inference
Hui Wang, Yuan Cheng, Xiaomeng Han et al.
AAAI 2025paperarXiv:2502.00026
1
citations
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.
ICLR 2025arXiv:2410.06040
11
citations
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
AAAI 2024paperarXiv:2306.02272
105
citations