Multi-Instance Multi-Label Classification from Crowdsourced Labels
Abstract
Multi-instance multi-label classification (MIML) is a fundamental task in machine learning, where each data sample comprises a bag containing several instances and multiple binary labels. Despite its wide applications, the data collection process involves matching multiple instances and labels, typically resulting in high annotation costs. In this paper, we study a novel yet practical crowdsourced multi-instance multi-label classification (CMIML) setup, where labels are collected from multiple crowd sources. To address this problem, we first propose a novel data generation process for CMIML, i.e., cross-label transition, where cross-label annotation error is more likely to appear rather than previous single-label transition assumption, due to the inherent similarity of localized instances from different classes. Then, we formally define the cross-label transition by cross-label transition matrices which are dependent across classes. Subsequently, we establish the first unbiased risk estimator for CMIML and further improve it through aggregation techniques, along with a rigorous generalization error bound. We also provide a practical implementation of cross-label transition matrix estimation. Comprehensive experiments on six benchmark datasets under various scenarios demonstrate that our algorithm outperforms the baselines by a large margin, validating its effectiveness in handling the CMIML problem.