"regret bounds" Papers
26 papers found
Conference
A Bayesian Approach to Contextual Dynamic Pricing using the Proportional Hazards Model with Discrete Price Data
Dongguen Kim, Young-Geun Choi, Minwoo Chae
Agnostic Continuous-Time Online Learning
Pramith Devulapalli, Changlong Wu, Ananth Grama et al.
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang, Zhiquan Luo
Contextual Thompson Sampling via Generation of Missing Data
Kelly W Zhang, Tianhui Cai, Hongseok Namkoong et al.
Delay as Payoff in MAB
Ofir Schlisselberg, Ido Cohen, Tal Lancewicki et al.
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions
Marc Brooks, Gabriel Durham, Kihyuk Hong et al.
Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards
Artin Tajdini, Jonathan Scarlett, Kevin Jamieson
Improved Regret Bounds for Online Fair Division with Bandit Learning
Benjamin Schiffer, Shirley Zhang
Learning Across the Gap: Hybrid Multi-armed Bandits with Heterogeneous Offline and Online Data
Qijia He, Minghan Wang, Xutong Liu et al.
Mixture of Online and Offline Experts for Non-Stationary Time Series
Zhilin Zhao, Longbing Cao, Yuanyu Wan
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.
Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting
Sifan Yang, Yuanyu Wan, Lijun Zhang
p-Mean Regret for Stochastic Bandits
Anand Krishna, Philips George John, Adarsh Barik et al.
Prediction with expert advice under additive noise
Alankrita Bhatt, Victoria Kostina
Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback
Orin Levy, Liad Erez, Alon Peled-Cohen et al.
Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
Wenhao Xu, Xuefeng Gao, Xuedong He
Revisiting Projection-Free Online Learning with Time-Varying Constraints
Yibo Wang, Yuanyu Wan, Lijun Zhang
Robust Satisficing Gaussian Process Bandits Under Adversarial Attacks
Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin
Statistical Parity with Exponential Weights
Stephen Pasteris, Chris Hicks, Vasilios Mavroudis
Thompson Sampling in Function Spaces via Neural Operators
Rafael Oliveira, Xuesong Wang, Kian Ming Chai et al.
Tightening Regret Lower and Upper Bounds in Restless Rising Bandits
Cristiano Migali, Marco Mussi, Gianmarco Genalti et al.
Uniform Wrappers: Bridging Concave to Quadratizable Functions in Online Optimization
Mohammad Pedramfar, Christopher Quinn, Vaneet Aggarwal
$\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits
Pierre Clavier, Tom Huix, Alain Oliviero Durmus
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
Xutong Liu, Siwei Wang, Jinhang Zuo et al.
Leveraging (Biased) Information: Multi-armed Bandits with Offline Data
Wang Chi Cheung, Lixing Lyu
Reinforcement Learning and Regret Bounds for Admission Control
Lucas Weber, Ana Busic, Jiamin ZHU