"offline reinforcement learning" Papers

103 papers found • Page 1 of 3

$q$-exponential family for policy optimization

Lingwei Zhu, Haseeb Shah, Han Wang et al.

ICLR 2025arXiv:2408.07245
2
citations

A Clean Slate for Offline Reinforcement Learning

Matthew T Jackson, Uljad Berdica, Jarek Liesen et al.

NEURIPS 2025oralarXiv:2504.11453
2
citations

Active Reinforcement Learning Strategies for Offline Policy Improvement

Ambedkar Dukkipati, Ranga Shaarad Ayyagari, Bodhisattwa Dasgupta et al.

AAAI 2025paperarXiv:2412.13106
3
citations

Adaptable Safe Policy Learning from Multi-task Data with Constraint Prioritized Decision Transformer

Ruiqi Xue, Ziqian Zhang, Lihe Li et al.

NEURIPS 2025

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

Yixiu Mao, Yun Qu, Qi Wang et al.

NEURIPS 2025spotlightarXiv:2511.02567

ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning

Zeyuan Liu, Zhihe Yang, Jiawei Xu et al.

NEURIPS 2025arXiv:2505.23871
2
citations

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

Hyungkyu Kang, Min-hwan Oh

ICLR 2025arXiv:2503.05306
3
citations

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Jifeng Hu, Sili Huang, Zhejian Yang et al.

NEURIPS 2025arXiv:2505.01822

A Principled Path to Fitted Distributional Evaluation

Sungee Hong, Jiayi Wang, Zhengling Qi et al.

NEURIPS 2025spotlightarXiv:2506.20048

Are Expressive Models Truly Necessary for Offline RL?

Guan Wang, Haoyi Niu, Jianxiong Li et al.

AAAI 2025paperarXiv:2412.11253
6
citations

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Wesley Suttle, Aamodh Suresh, Carlos Nieto-Granda

ICLR 2025oralarXiv:2502.04141
4
citations

ContraDiff: Planning Towards High Return States via Contrastive Learning

Yixiang Shan, Zhengbang Zhu, Ting Long et al.

ICLR 2025

DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning

Chao Li, Ziwei Deng, Chenxing Lin et al.

ICLR 2025
7
citations

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025arXiv:2412.07762
30
citations

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu

ICLR 2025arXiv:2503.04975
29
citations

Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies

Lingwei Zhu, Han Wang, Yukie Nagai

ICLR 2025

Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset

Yiqin Yang, Quanwei Wang, Chenghao Li et al.

ICLR 2025arXiv:2502.18955

Finite-Time Bounds for Average-Reward Fitted Q-Iteration

Jongmin Lee, Ernest Ryu

NEURIPS 2025arXiv:2510.17391

Forecasting in Offline Reinforcement Learning for Non-stationary Environments

Suzan Ece Ada, Georg Martius, Emre Ugur et al.

NEURIPS 2025spotlightarXiv:2512.01987

FOSP: Fine-tuning Offline Safe Policy through World Models

Chenyang Cao, Yucheng Xin, Silang Wu et al.

ICLR 2025arXiv:2407.04942
3
citations

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Mianchu Wang, Rui Yang, Xi Chen et al.

ICLR 2025arXiv:2310.20025
16
citations

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

NEURIPS 2025arXiv:2502.14819
20
citations

Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning

Mianchu Wang, Yue Jin, Giovanni Montana

ICLR 2025arXiv:2412.03258
2
citations

Learning Preferences without Interaction for Cooperative AI: A Hybrid Offline-Online Approach

Haitong Ma, Haoran Yu, Haobo Fu et al.

NEURIPS 2025

Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning

Kyowoon Lee, Jaesik Choi

ICML 2025arXiv:2506.00867
4
citations

Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling

Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.

NEURIPS 2025arXiv:2506.08681

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

Kwanyoung Park, Youngwoon Lee

ICLR 2025arXiv:2407.00699
5
citations

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.

ICLR 2025arXiv:2408.08994
10
citations

Model-Free Offline Reinforcement Learning with Enhanced Robustness

Chi Zhang, Zain Ulabedeen Farhat, George Atia et al.

ICLR 2025
5
citations

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Pai Liu, Lingfeng Zhao, Shivangi Agarwal et al.

NEURIPS 2025arXiv:2502.08021
4
citations

MOSDT: Self-Distillation-Based Decision Transformer for Multi-Agent Offline Safe Reinforcement Learning

Yuchen Xia, Yunjian Xu

NEURIPS 2025

Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL

Cevahir Koprulu, Franck Djeumou, ufuk topcu

ICLR 2025

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

Zongkai Liu, Qian Lin, Chao Yu et al.

AAAI 2025paperarXiv:2412.07639
8
citations

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.

NEURIPS 2025arXiv:2506.06964
3
citations

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

Ahana Deb, Roberto Cipollone, Anders Jonsson et al.

ICLR 2025

OGBench: Benchmarking Offline Goal-Conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach et al.

ICLR 2025arXiv:2410.20092
90
citations

Online Optimization for Offline Safe Reinforcement Learning

Yassine Chemingui, Aryan Deshwal, Alan Fern et al.

NEURIPS 2025arXiv:2510.22027

Preference Elicitation for Offline Reinforcement Learning

Alizée Pace, Bernhard Schölkopf, Gunnar Ratsch et al.

ICLR 2025arXiv:2406.18450
2
citations

Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Jongchan Park, Mingyu Park, Donghwan Lee

NEURIPS 2025arXiv:2505.05701
1
citations

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Joey Hong, Anca Dragan, Sergey Levine

ICLR 2025arXiv:2411.05193
8
citations

Rebalancing Return Coverage for Conditional Sequence Modeling in Offline Reinforcement Learning

Wensong Bai, Chufan Chen, Yichao Fu et al.

NEURIPS 2025

REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA

Rui Miao, Babak Shahbaba, Annie Qu

NEURIPS 2025arXiv:2505.09496
1
citations

RLZero: Direct Policy Inference from Language Without In-Domain Supervision

Harshit Sushil Sikchi, Siddhant Agarwal, Pranaya Jajoo et al.

NEURIPS 2025arXiv:2412.05718
3
citations

RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning

Qianlan Yang, Yu-Xiong Wang

ICLR 2025

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Baiting Luo, Ava Pettet, Aron Laszka et al.

ICLR 2025oralarXiv:2502.21186
3
citations

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

Jie Cheng, Ruixi Qiao, ma yingwei et al.

ICLR 2025oralarXiv:2410.00564
8
citations

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

Hai Zhang, Boyuan Zheng, Tianying Ji et al.

ICLR 2025arXiv:2405.12001

Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.

ICLR 2025oral

SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch

Shengyu Feng, Yiming Yang

AAAI 2025paperarXiv:2412.15534
5
citations

State-Covering Trajectory Stitching for Diffusion Planners

Kyowoon Lee, Jaesik Choi

NEURIPS 2025oralarXiv:2506.00895
4
citations
PreviousNext