A General Framework for Producing Interpretable Semantic Text Embeddings

8citations

arXiv:2410.03435 Project

citations

#1710

in ICLR 2025

of 3827 papers

Top Authors

Data Points

Top Authors

Yiqun Sun Qiang Huang Yixuan Tang Anthony Tung Jun Yu

Topics

semantic text embeddings interpretable embeddings contrastive question generation multi-task binary qa discriminative question generation transparency in nlp downstream task evaluation

Abstract

Semantic text embedding is essential to many tasks in Natural Language Processing (NLP). While black-box models are capable of generating high-quality embeddings, their lack of interpretability limits their use in tasks that demand transparency. Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce \algo{CQG-MBQA} (Contrastive Question Generation - Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. Our framework systematically generates highly discriminative, low cognitive load yes/no questions through the \algo{CQG} method and answers them efficiently with the \algo{MBQA} model, resulting in interpretable embeddings in a cost-effective manner. We validate the effectiveness and interpretability of \algo{CQG-MBQA} through extensive experiments and ablation studies, demonstrating that it delivers embedding quality comparable to many advanced black-box models while maintaining inherently interpretability. Additionally, \algo{CQG-MBQA} outperforms other interpretable text embedding methods across various downstream tasks.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 2, 2026

8+8

Feb 7, 2026

Feb 13, 2026