by DaYou Du Papers
3 papers found
Conference
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang, Yao Fu, Yeqi Huang et al.
NEURIPS 2025arXiv:2505.11415
1
citations
SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling
Yizhao Gao, Zhichen Zeng, DaYou Du et al.
NEURIPS 2025
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li, Yuedong Zhong et al.
ICLR 2025arXiv:2408.01803
32
citations