Poster "q-value estimation" Papers
2 papers found
Conference
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
ICLR 2025arXiv:2410.04612
18
citations
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Tianying Ji, Yu Luo, Fuchun Sun et al.
ICML 2024arXiv:2306.02865
21
citations