"weight-only quantization" Papers
3 papers found
Conference
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
Gunho Park, Jeongin Bae, Byeongwook Kim et al.
NEURIPS 2025arXiv:2512.17970
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
DEOKJAE LEE, Hyun Oh Song
NEURIPS 2025arXiv:2509.20214
QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng, Jerry Chee, Qingyao Sun et al.
ICML 2024arXiv:2402.04396
241
citations