by Chetan Bansal Papers
5 papers found
Conference
AMPO: Active Multi Preference Optimization for Self-play Preference Selection
Taneesh Gupta, Rahul Madhavan, Xuchao Zhang et al.
ICML 2025arXiv:2502.18293
2
citations
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
ICLR 2025arXiv:2504.19276
11
citations
CREAM: Consistency Regularized Self-Rewarding Language Models
Zhaoyang Wang, Weilei He, Zhiyuan Liang et al.
ICLR 2025arXiv:2410.12735
28
citations
Generative Caching for Structurally Similar Prompts and Responses
Sarthak Chakraborty, Suman Nath, Xuchao Zhang et al.
NEURIPS 2025arXiv:2511.17565
1
citations
REFA: Reference Free Alignment with Fine-Grained Length Control
Taneesh Gupta, Rahul Madhavan, Xuchao Zhang et al.
COLM 2025paper