Poster "sparse architectures" Papers
2 papers found
Conference
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Zhijian Zhuo, Ya Wang, Yutao Zeng et al.
ICLR 2025arXiv:2411.03884
6
citations
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen, Xinyu Zhao, Tianlong Chen et al.
ICML 2024arXiv:2406.11353
6
citations