by Xiaoqing Li Papers
2 papers found
Conference
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo, Yutao Zeng, Ya Wang et al.
NEURIPS 2025arXiv:2503.04598
9
citations
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Zhijian Zhuo, Ya Wang, Yutao Zeng et al.
ICLR 2025arXiv:2411.03884
6
citations