Jiantao Jiao

papers

689

total citations

papers (13)

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

NEURIPS 2021arXiv

318

citations

Toward the Fundamental Limits of Imitation Learning

NEURIPS 2020arXiv

108

citations

How to Evaluate Reward Models for RLHF

ICLR 2025arXiv

citations

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

ICML 2025arXiv

citations

MADE: Exploration via Maximizing Deviation from Explored Regions

NEURIPS 2021arXiv

citations

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

ICML 2024arXiv

citations

Minimax Optimal Online Imitation Learning via Replay Estimation

NEURIPS 2022arXiv

citations

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

NEURIPS 2023arXiv

citations

Jiantao Jiao

papers (13)

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Toward the Fundamental Limits of Imitation Learning

How to Evaluate Reward Models for RLHF

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

MADE: Exploration via Maximizing Deviation from Explored Regions

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Minimax Optimal Online Imitation Learning via Replay Estimation

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

SLIP: Learning to predict in unknown dynamical systems with long-term memory

On the Value of Interaction and Function Approximation in Imitation Learning

Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits

Towards Optimal Caching and Model Selection for Large Model Inference

Doubly-Robust Self-Training

papers (13)

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Toward the Fundamental Limits of Imitation Learning

How to Evaluate Reward Models for RLHF

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

MADE: Exploration via Maximizing Deviation from Explored Regions

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Minimax Optimal Online Imitation Learning via Replay Estimation

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

SLIP: Learning to predict in unknown dynamical systems with long-term memory

On the Value of Interaction and Function Approximation in Imitation Learning

Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits

Towards Optimal Caching and Model Selection for Large Model Inference

Doubly-Robust Self-Training