by Michael Qizhe Shieh Papers
6 papers found
Conference
Efficient Process Reward Model Training via Active Learning
Keyu Duan, Zichen Liu, Xin Mao et al.
COLM 2025paperarXiv:2504.10559
9
citations
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Guanzheng Chen, Xin Li, Michael Qizhe Shieh et al.
ICLR 2025arXiv:2502.13922
15
citations
MixEval-X: Any-to-any Evaluations from Real-world Data Mixture
Jinjie Ni, Yifan Song, Deepanway Ghosal et al.
ICLR 2025arXiv:2410.13754
5
citations
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu, Jinjie Ni, Zijian Wu et al.
NEURIPS 2025arXiv:2504.13055
57
citations
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Yuxin Chen, Yiran Zhao, Yang Zhang et al.
NEURIPS 2025arXiv:2506.09890
9
citations
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
ICLR 2025
29
citations