"off-policy reinforcement learning" Papers

12 papers found

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

Yigit Korkmaz, Urvi Bhuwania, Ayush Jain et al.

NEURIPS 2025arXiv:2510.18828

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025arXiv:2410.18252
43
citations

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang, Xiangteng Zhang, Yinuo Wang et al.

NEURIPS 2025arXiv:2510.25529

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Daniel Palenicek, Florian Vogt, Joe Watson et al.

NEURIPS 2025arXiv:2502.07523
9
citations

Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control

Georgios Papoudakis, Thomas Coste, Jianye Hao et al.

NEURIPS 2025arXiv:2509.01720

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Ge Li, Dong Tian, Hongyi Zhou et al.

ICLR 2025oralarXiv:2410.09536
11
citations

Zero-shot Model-based Reinforcement Learning using Large Language Models

Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat et al.

ICLR 2025arXiv:2410.11711
5
citations

Learning a Diffusion Model Policy from Rewards via Q-Score Matching

Michael Psenka, Alejandro Escontrela, Pieter Abbeel et al.

ICML 2024arXiv:2312.11752
70
citations

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Yu Luo, Tianying Ji, Fuchun Sun et al.

ICML 2024arXiv:2405.18520
7
citations

Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

Michal Nauman, Michał Bortkiewicz, Piotr Milos et al.

ICML 2024arXiv:2403.00514
41
citations

RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

Yukinari Hisaki, Isao Ono

ICML 2024arXiv:2408.01972
4
citations

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Tianying Ji, Yu Luo, Fuchun Sun et al.

ICML 2024arXiv:2306.02865
21
citations