"synthetic preference data" Papers
2 papers found
Conference
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong, Li Dong, Xingxing Zhang et al.
ICLR 2025arXiv:2410.06961
32
citations
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan, Alexander Khazatsky, Eric Mitchell et al.
ICML 2024arXiv:2402.10893
14
citations