Poster "expert specialization" Papers
3 papers found
Conference
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura, Takuya Akiba, Kazuki Fujii et al.
ICLR 2025arXiv:2502.19261
9
citations
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao et al.
NEURIPS 2025arXiv:2306.08586
3
citations
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Huy Nguyen, Pedram Akbarian, Nhat Ho
ICML 2024arXiv:2401.13875
20
citations