Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship

2citations

PDF Project

citations

#1456

in AAAI 2025

of 3028 papers

Top Authors

Data Points

Top Authors

Junfeng Kang Rui Li Qi Liu Zhenya Huang Zheng Zhang Yanjiang Chen Linbo Zhu Yu Su

Abstract

Dense retrieval has emerged as the leading approach in information retrieval, aiming to find semantically relevant documents based on natural language queries. Given that a single document can be retrieved by multiple distinct queries, existing methods aim to represent a document with multiple vectors. Each vector is aligned with a different query to model the many-to-one relationship between queries and documents. However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). Specifically, we use vectors to represent queries and distributions to represent documents. This approach not only captures the relationships between multiple queries corresponding to the same document but also avoids the need to use multiple vectors to represent the document. Furthermore, to ensure search efficiency for DDR, we propose a dot product-based computation method to calculate the similarity between documents represented by distributions and queries represented by vectors. This allows for seamless integration with existing approximate nearest neighbor (ANN) search algorithms for efficient search. Finally, we conduct extensive experiments on real-world datasets, which demonstrate that our method significantly outperforms traditional dense retrieval methods.

Citation History

Jan 27, 2026

Feb 4, 2026

2+2