"markov decision processes" Papers

29 papers found

Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes

Haotian Wu, Gongpu Chen, Deniz Gunduz

ICLR 2025arXiv:2502.03335
7
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NEURIPS 2025arXiv:2509.18714
2
citations

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Yifan Lin, Enlu Zhou

AAAI 2025paperarXiv:2301.11415
1
citations

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

NEURIPS 2025spotlightarXiv:2505.12049

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NEURIPS 2025arXiv:2210.14051
18
citations

CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

Shengzhuang Chen, Yikai Liao, Xiaoxiao Sun et al.

ICLR 2025arXiv:2503.04655
1
citations

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.

ICLR 2025arXiv:2406.11810
3
citations

Efficient Preference-Based Reinforcement Learning: Randomized Exploration meets Experimental Design

Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour

NEURIPS 2025arXiv:2506.09508
3
citations

Efficient Reinforcement Learning in Probabilistic Reward Machines

Xiaofeng Lin, Xuezhou Zhang

AAAI 2025paperarXiv:2408.10381
2
citations

Multiple Mean-Payoff Optimization Under Local Stability Constraints

David Klaška, Antonín Kučera, Vojtěch Kůr et al.

AAAI 2025paperarXiv:2412.13369

Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu, Mateusz Majka

NEURIPS 2025arXiv:2505.22760
1
citations

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.

NEURIPS 2025oralarXiv:2510.20725

On the Convergence of Single-Timescale Actor-Critic

Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi et al.

NEURIPS 2025arXiv:2410.08868
2
citations

Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach

Swetha Ganesh, Vaneet Aggarwal

NEURIPS 2025arXiv:2505.19986
3
citations

REINFORCE Converges to Optimal Policies with Any Learning Rate

Samuel Robertson, Thang Chu, Bo Dai et al.

NEURIPS 2025

REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA

Rui Miao, Babak Shahbaba, Annie Qu

NEURIPS 2025arXiv:2505.09496
1
citations

SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation

Jongmin Lee, Meiqi Sun, Pieter Abbeel

ICLR 2025arXiv:2512.10042

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Xinsong Feng, Zihan Yu, Yanhai Xiong et al.

ICLR 2025arXiv:2502.05537
2
citations

AI Alignment with Changing and Influenceable Reward Functions

Micah Carroll, Davis Foote, Anand Siththaranjan et al.

ICML 2024arXiv:2405.17713
43
citations

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy

ICML 2024arXiv:2405.19017

Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction

Riccardo De Santi, Federico Arangath Joseph, Noah Liniger et al.

ICML 2024arXiv:2407.13364
3
citations

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective

Lei Zhao, Mengdi Wang, Yu Bai

ICML 2024arXiv:2312.00054
3
citations

Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

Kishan Panaganti, Adam Wierman, Eric Mazumdar

ICML 2024

On The Statistical Complexity of Offline Decision-Making

Thanh Nguyen-Tang, Raman Arora

ICML 2024arXiv:2501.06339
2
citations

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

David Klaska, Antonin Kucera, Vojtěch Kůr et al.

AAAI 2024paperarXiv:2312.12325
1
citations

REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes

David Ireland, Giovanni Montana

ICLR 2024arXiv:2401.08850
6
citations

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Subhojyoti Mukherjee, Josiah Hanna, Robert Nowak

ICML 2024arXiv:2406.02165

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Fengdi Che, Chenjun Xiao, Jincheng Mei et al.

ICML 2024oralarXiv:2405.21043
7
citations

Test-Time Regret Minimization in Meta Reinforcement Learning

Mirco Mutti, Aviv Tamar

ICML 2024arXiv:2406.02282
4
citations