Multiobjective Lipschitz Bandits under Lexicographic Ordering

4citations
4
citations
#1318
in AAAI 2024
of 2289 papers
5
Top Authors
2
Data Points

Abstract

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize ? objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT)^(2/3)) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Omega(KlogT). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T^((d_z^i+1)/(d_z^i+2))) for the i-th objective, where d_z^i is the zooming dimension for the i-th objective, with i in {1,2,...,m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

Citation History

Jan 27, 2026
0
Feb 13, 2026
4+4