Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF

30citations

arXiv:2402.06886 PDF

citations

#450

in ICML 2024

of 2635 papers

Top Authors

Data Points

Top Authors

Han Shen Zhuoran Yang Tianyi Chen

Topics

bilevel optimization reinforcement learning inverse reinforcement learning rl from human feedback penalty-based methods stackelberg game policy gradient algorithms dynamic objective functions

Abstract

Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structures are considered. But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functions that go beyond the simple static objective structures, which pose significant challenges of using existing bilevel solutions. To tackle this new class of bilevel problems, we introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation. We provide theoretical studies of the problem landscape and its penalty-based (policy) gradient algorithms. We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg game and RLHF.

Citation History

Jan 28, 2026

Feb 13, 2026

30+30

Feb 13, 2026