"policy evaluation" Papers

21 papers found

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Feichen Gan, Lu Youcun, Yingying Zhang et al.

NEURIPS 2025oralarXiv:2510.26026

Doubly Optimal Policy Evaluation for Reinforcement Learning

Shuze Liu, Claire Chen, Shangtong Zhang

ICLR 2025arXiv:2410.02226
5
citations

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Claire Chen, Shuze Liu, Shangtong Zhang

ICLR 2025arXiv:2410.05655
1
citations

Estimation and Inference in Distributional Reinforcement Learning

Liangyu Zhang, Yang Peng, Jiadong Liang et al.

NEURIPS 2025arXiv:2309.17262
4
citations

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

Yang Xu, Washim Mondal, Vaneet Aggarwal

NEURIPS 2025arXiv:2502.16816
8
citations

IRASim: A Fine-Grained World Model for Robot Manipulation

Fangqi Zhu, Hongtao Wu, Song Guo et al.

ICCV 2025arXiv:2406.14540
22
citations

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Pai Liu, Lingfeng Zhao, Shivangi Agarwal et al.

NEURIPS 2025arXiv:2502.08021
4
citations

On Evaluating Policies for Robust POMDPs

Merlijn Krale, Eline M. Bovy, Maris F. L. Galesloot et al.

NEURIPS 2025

ReSim: Reliable World Simulation for Autonomous Driving

Jiazhi Yang, Kashyap Chitta, Shenyuan Gao et al.

NEURIPS 2025spotlightarXiv:2506.09981
18
citations

Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do

Yoav Wald, Mark Goldstein, Yonathan Efroni et al.

ICLR 2025arXiv:2503.15890

Towards Provable Emergence of In-Context Reinforcement Learning

Jiuqi Wang, Rohan Chandra, Shangtong Zhang

NEURIPS 2025oralarXiv:2509.18389
1
citations

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.

ICLR 2025oralarXiv:2405.13861
15
citations

Combining Experimental and Historical Data for Policy Evaluation

Ting Li, Chengchun Shi, Qianglin Wen et al.

ICML 2024arXiv:2406.00317
4
citations

Discerning Temporal Difference Learning

Jianfei Ma

AAAI 2024paperarXiv:2310.08091
1
citations

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Shuze Liu, Shangtong Zhang

ICML 2024arXiv:2301.13734
7
citations

Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization

Jin Liu, Xiaokang Pan, Junwen Duan et al.

AAAI 2024paperarXiv:2308.09604
4
citations

Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery

Yassir Jedra, William Réveillard, Stefan Stojanovic et al.

ICML 2024arXiv:2402.15739
2
citations

Policy-conditioned Environment Models are More Generalizable

Ruifeng Chen, Xiong-Hui Chen, Yihao Sun et al.

ICML 2024

Policy Evaluation for Variance in Average Reward Reinforcement Learning

Shubhada Agrawal, Prashanth L.A., Siva Maguluri

ICML 2024oral

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Subhojyoti Mukherjee, Josiah Hanna, Robert Nowak

ICML 2024arXiv:2406.02165

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

Khurram Javed, Haseeb Shah, Richard Sutton et al.

ICML 2024arXiv:2302.05326
10
citations