Abstract
Cross-modal hashing provides an efficient solution for retrieval tasks across various modalities, such as images and text. However, most existing methods are deterministic models, which overlook the reliability associated with the retrieved results. This omission renders them unreliable for determining matches between data pairs based solely on Hamming distance. To bridge the gap, in this paper, we propose a novel method called Deep Evidential Cross-modal Hashing (DECH). This method equips hashing models with the ability to quantify the reliability level of the association between a query sample and each corresponding retrieved sample, bringing a new dimension of reliability to the cross-modal retrieval process. To achieve this, our method addresses two key challenges: i) To leverage evidential theory in guiding the model to learn hash codes, we design a novel evidence acquisition module to collect evidence and place the evidence captured by hash codes on a Beta distribution to derive a binomial opinion. Unlike existing evidential learning approaches that rely on classifiers, our method collects evidence directly through hash codes. ii) To tackle the task-oriented challenge, we first introduce a method to update the derived binomial opinion, allowing it to present the uncertainty caused by conflicting evidence. Following this manner, we present a strategy to precisely evaluate the reliability level of retrieved results, culminating in performance improvement. We validate the efficacy of our DECH through extensive experimentation on four benchmark datasets. The experimental results demonstrate our superior performance compared to 12 state-of-the-art methods.