"off-policy rl" Papers
2 papers found
Conference
Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data
Shilong Deng, Zetao Zheng, Hongcai He et al.
AAAI 2025paperarXiv:2501.07346
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan et al.
ICML 2024oralarXiv:2402.19446
135
citations