"q-value estimation" Papers
3 papers found
Conference
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
ICLR 2025arXiv:2410.04612
18
citations
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Yinmin Zhang, Jie Liu, Chuming Li et al.
AAAI 2024paperarXiv:2312.07685
25
citations
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Tianying Ji, Yu Luo, Fuchun Sun et al.
ICML 2024arXiv:2306.02865
21
citations