Top Authors
Abstract
In privacy-preserving distributed learning environments, data stored on local clients cannot be shared with other clients or servers. We consider a new active learning problem setup for these environments, where the server aims to build a centralized model by distributing labeling budgets across different clients. Our algorithm identifies which clients and their data points warrant annotation by estimating the global impact of the resulting labels. We evaluate this impact by embedding the clients into the manifold of learner parameters, formed by the task learner's predictions on unlabeled data, and diffusing the reduction in predictive uncertainties caused by labeling. The algorithm effectively selects clients with high estimated impact while achieving diversity in client selection, all without accessing local client data. In experiments, our approach demonstrates substantial advancements when compared to adaptations of existing active learning algorithms.