"reward maximization" Papers
4 papers found
Conference
Delay as Payoff in MAB
Ofir Schlisselberg, Ido Cohen, Tal Lancewicki et al.
AAAI 2025paperarXiv:2408.15158
4
citations
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Guanghe Li, Yixiang Shan, Zhengbang Zhu et al.
ICML 2024arXiv:2402.02439
36
citations
Feedback Efficient Online Fine-Tuning of Diffusion Models
Masatoshi Uehara, Yulai Zhao, Kevin Black et al.
ICML 2024arXiv:2402.16359
44
citations
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li, Samy Jelassi, Hugh Zhang et al.
ICML 2024arXiv:2402.14688
15
citations