Poster "upper confidence bound" Papers
3 papers found
Conference
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
ICLR 2025arXiv:2501.12735
20
citations
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li, Heyang Zhao, Quanquan Gu
ICML 2024arXiv:2404.06013
17
citations
Stochastic Bandits with ReLU Neural Networks
Kan Xu, Hamsa Bastani, Surbhi Goel et al.
ICML 2024arXiv:2405.07331
1
citations