EA-KD: Entropy-based Adaptive Knowledge Distillation

3
citations
#875
in ICCV 2025
of 2701 papers
7
Top Authors
6
Data Points

Abstract

Knowledge distillation (KD) enables a smaller "student" model to mimic a larger "teacher" model by transferring knowledge from the teacher's output or features. However, most KD methods treat all samples uniformly, overlooking the varying learning value of each sample and thereby limiting their effectiveness. In this paper, we propose Entropy-based Adaptive Knowledge Distillation (EA-KD), a simple yet effective plug-and-play KD method that prioritizes learning from valuable samples. EA-KD quantifies each sample's learning value by strategically combining the entropy of the teacher and student output, then dynamically reweights the distillation loss to place greater emphasis on high-entropy samples. Extensive experiments across diverse KD frameworks and tasks -- including image classification, object detection, and large language model (LLM) distillation -- demonstrate that EA-KD consistently enhances performance, achieving state-of-the-art results with negligible computational cost. Code is available at: https://github.com/cpsu00/EA-KD

Citation History

Jan 26, 2026
3
Jan 27, 2026
3
Feb 3, 2026
3
Feb 13, 2026
3
Feb 13, 2026
3
Feb 13, 2026
3