"policy evaluation" Papers
21 papers found
Conference
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Feichen Gan, Lu Youcun, Yingying Zhang et al.
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu, Claire Chen, Shangtong Zhang
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Claire Chen, Shuze Liu, Shangtong Zhang
Estimation and Inference in Distributional Reinforcement Learning
Liangyu Zhang, Yang Peng, Jiadong Liang et al.
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Yang Xu, Washim Mondal, Vaneet Aggarwal
IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu, Hongtao Wu, Song Guo et al.
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
Pai Liu, Lingfeng Zhao, Shivangi Agarwal et al.
On Evaluating Policies for Robust POMDPs
Merlijn Krale, Eline M. Bovy, Maris F. L. Galesloot et al.
ReSim: Reliable World Simulation for Autonomous Driving
Jiazhi Yang, Kashyap Chitta, Shenyuan Gao et al.
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Yoav Wald, Mark Goldstein, Yonathan Efroni et al.
Towards Provable Emergence of In-Context Reinforcement Learning
Jiuqi Wang, Rohan Chandra, Shangtong Zhang
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.
Combining Experimental and Historical Data for Policy Evaluation
Ting Li, Chengchun Shi, Qianglin Wen et al.
Discerning Temporal Difference Learning
Jianfei Ma
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
Shuze Liu, Shangtong Zhang
Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization
Jin Liu, Xiaokang Pan, Junwen Duan et al.
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery
Yassir Jedra, William Réveillard, Stefan Stojanovic et al.
Policy-conditioned Environment Models are More Generalizable
Ruifeng Chen, Xiong-Hui Chen, Yihao Sun et al.
Policy Evaluation for Variance in Average Reward Reinforcement Learning
Shubhada Agrawal, Prashanth L.A., Siva Maguluri
SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Subhojyoti Mukherjee, Josiah Hanna, Robert Nowak
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
Khurram Javed, Haseeb Shah, Richard Sutton et al.