"self-generated data" Papers
2 papers found
Conference
Importance Weighting Can Help Large Language Models Self-Improve
Chunyang Jiang, Chi-Min Chan, Wei Xue et al.
AAAI 2025paperarXiv:2408.09849
11
citations
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
ICLR 2025arXiv:2410.04612
18
citations