Alekh Agarwal

papers

1,099

total citations

papers (14)

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity

NEURIPS 2022arXiv

citations

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

ICML 2024arXiv

citations

Policy Improvement via Imitation of Multiple Oracles

NEURIPS 2020arXiv

citations

Design Considerations in Offline Preference-based RL

ICML 2025arXiv

citations

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

NEURIPS 2023arXiv

citations

The Non-linear $F$-Design and Applications to Interactive Learning

ICML 2024

citations

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

NEURIPS 2020

citations

Alekh Agarwal

papers (14)

Bellman-consistent Pessimism for Offline Reinforcement Learning

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Safe Reinforcement Learning via Curriculum Induction

Theoretical guarantees on the best-of-n alignment policy

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

Policy Improvement via Imitation of Multiple Oracles

Design Considerations in Offline Preference-based RL

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

The Non-linear $F$-Design and Applications to Interactive Learning

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

papers (14)

Bellman-consistent Pessimism for Offline Reinforcement Learning

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Safe Reinforcement Learning via Curriculum Induction

Theoretical guarantees on the best-of-n alignment policy

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

Policy Improvement via Imitation of Multiple Oracles

Design Considerations in Offline Preference-based RL

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

The Non-linear $F$-Design and Applications to Interactive Learning

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration