"off-policy reinforcement learning" Papers

12 papers found

Filters:off-policy reinforcement learning Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

Yigit Korkmaz, Urvi Bhuwania, Ayush Jain et al.

NEURIPS 2025arXiv:2510.18828

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025arXiv:2410.18252

citations

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang, Xiangteng Zhang, Yinuo Wang et al.

NEURIPS 2025arXiv:2510.25529

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Daniel Palenicek, Florian Vogt, Joe Watson et al.

NEURIPS 2025arXiv:2502.07523

citations

Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control

Georgios Papoudakis, Thomas Coste, Jianye Hao et al.

NEURIPS 2025arXiv:2509.01720

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Ge Li, Dong Tian, Hongyi Zhou et al.

ICLR 2025oralarXiv:2410.09536

citations

Zero-shot Model-based Reinforcement Learning using Large Language Models

Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat et al.

ICLR 2025arXiv:2410.11711

citations

Learning a Diffusion Model Policy from Rewards via Q-Score Matching

Michael Psenka, Alejandro Escontrela, Pieter Abbeel et al.

ICML 2024arXiv:2312.11752

citations

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Yu Luo, Tianying Ji, Fuchun Sun et al.

ICML 2024arXiv:2405.18520

citations

Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

Michal Nauman, Michał Bortkiewicz, Piotr Milos et al.

ICML 2024arXiv:2403.00514

citations

RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

Yukinari Hisaki, Isao Ono

ICML 2024arXiv:2408.01972

citations

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Tianying Ji, Yu Luo, Fuchun Sun et al.

ICML 2024arXiv:2306.02865

citations