"upper confidence bound" Papers
4 papers found
Conference
Improved Regret Bounds for Online Fair Division with Bandit Learning
Benjamin Schiffer, Shirley Zhang
AAAI 2025paperarXiv:2501.07022
5
citations
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
ICLR 2025arXiv:2501.12735
20
citations
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li, Heyang Zhao, Quanquan Gu
ICML 2024arXiv:2404.06013
17
citations
Stochastic Bandits with ReLU Neural Networks
Kan Xu, Hamsa Bastani, Surbhi Goel et al.
ICML 2024arXiv:2405.07331
1
citations