Occult: Optimizing Collaborative Communications across Experts for Accelerated Parallel MoE Training and Inference

2citations

arXiv:2505.13345 Project

citations

#1626

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

Shuqing Luo Pingzhi Li Jie Peng Yang Zhao Yu Cao Yu Cheng Tianlong Chen

Abstract

Mixture-of-experts (MoE) architectures could achieve impressive computational efficiency with expert parallelism, which relies heavily on all-to-all communication across devices. Unfortunately, such communication overhead typically constitutes a significant portion of the total runtime, hampering the scalability of distributed training and inference for modern MoE models (consuming over 40% runtime in large-scale training). In this paper, we first define $\textit{collaborative communication}$ to illustrate this intrinsic limitation, and then propose system- and algorithm-level innovations to reduce communication costs. Specifically, given a pair of experts co-activated by one token, we call them as $\textit{collaborated}$, which comprises $2$ cases as $\textit{intra-}$ and $\textit{inter-collaboration}$, depending on whether they are kept on the same device. Our pilot investigations reveal that augmenting the proportion of intra-collaboration can accelerate expert parallel at scale. It motivates us to strategically $\underline{\texttt{o}}$ptimize $\underline{\texttt{c}}$ollaborative $\underline{\texttt{c}}$omm$\underline{\texttt{u}}$nication for acce$\underline{\texttt{l}}$era$\underline{\texttt{t}}$ed MoE training and inference, dubbed $\textbf{\texttt{Occult}}$. Our designs are capable of $\underline{either}$ delivering exact results with reduced communication cost, $\underline{or}$ controllably minimizing the cost with collaboration pruning, materialized by modified fine-tuning. Comprehensive experiments on various MoE-LLMs demonstrate that $\texttt{Occult}$ can be faster than popular state-of-the-art inference or training frameworks (over 50% speed up across multiple tasks and models) with comparable or superior quality compared to the standard fine-tuning. Codes will be available upon acceptance.

Citation History

Jan 28, 2026

Feb 13, 2026

2+2

Feb 13, 2026