"generative reward models" Papers
4 papers found
Conference
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
NEURIPS 2025arXiv:2505.18531
7
citations
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.
NEURIPS 2025arXiv:2505.11475
38
citations
RMB: Comprehensively benchmarking reward models in LLM alignment
Enyu Zhou, Guodong Zheng, Binghai Wang et al.
ICLR 2025arXiv:2410.09893
47
citations
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini et al.
COLM 2025paper
24
citations