Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

49citations

arXiv:2411.02038 Project

citations

#48

in ICCV 2025

of 2701 papers

Top Authors

Data Points

Top Authors

Yongxin Zhu Bocheng Li Yifei Xin Zhihua Xia Linli Xu

Topics

vector quantization representation collapse codebook utilization disjoint codebook optimization linear reparameterization unsupervised learning image generation audio generation

Abstract

Vector Quantization (VQ) is essential for discretizing continuous representations in unsupervised learning but suffers from representation collapse, causing low codebook utilization and limiting scalability. Existing solutions often rely on complex optimizations or reduce latent dimensionality, which compromises model capacity and fails to fully solve the problem. We identify the root cause as disjoint codebook optimization, where only a few code vectors are updated via gradient descent. To fix this, we propose \textbf{Sim}ple\textbf{VQ}, which reparameterizes code vectors through a learnable linear transformation layer over a latent basis, optimizing the \textit{entire linear space} rather than nearest \textit{individual code vectors}. Although the multiplication of two linear matrices is equivalent to applying a single linear layer, this simple approach effectively prevents collapse. Extensive experiments on image and audio tasks demonstrate that SimVQ improves codebook usage, is easy to implement, and generalizes well across modalities and architectures. The code is available at https://github.com/youngsheen/SimVQ.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026

44+44

Feb 5, 2026

46+2

Feb 13, 2026

49+3

Feb 13, 2026