"hardware efficiency" Papers
3 papers found
Conference
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
ICLR 2025arXiv:2405.20935
19
citations
A2Q+: Improving Accumulator-Aware Weight Quantization
Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig et al.
ICML 2024arXiv:2401.10432
10
citations
BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
Lancheng Zou, Wenqian Zhao, Shuo Yin et al.
ICML 2024