"regret bounds" Papers

26 papers found

A Bayesian Approach to Contextual Dynamic Pricing using the Proportional Hazards Model with Discrete Price Data

Dongguen Kim, Young-Geun Choi, Minwoo Chae

NEURIPS 2025

Agnostic Continuous-Time Online Learning

Pramith Devulapalli, Changlong Wu, Ananth Grama et al.

NEURIPS 2025

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NEURIPS 2025arXiv:2210.14051
18
citations

Contextual Thompson Sampling via Generation of Missing Data

Kelly W Zhang, Tianhui Cai, Hongseok Namkoong et al.

NEURIPS 2025arXiv:2502.07064
2
citations

Delay as Payoff in MAB

Ofir Schlisselberg, Ido Cohen, Tal Lancewicki et al.

AAAI 2025paperarXiv:2408.15158
4
citations

Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions

Marc Brooks, Gabriel Durham, Kihyuk Hong et al.

NEURIPS 2025arXiv:2505.16311
1
citations

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini, Jonathan Scarlett, Kevin Jamieson

NEURIPS 2025arXiv:2506.04775
2
citations

Improved Regret Bounds for Online Fair Division with Bandit Learning

Benjamin Schiffer, Shirley Zhang

AAAI 2025paperarXiv:2501.07022
5
citations

Learning Across the Gap: Hybrid Multi-armed Bandits with Heterogeneous Offline and Online Data

Qijia He, Minghan Wang, Xutong Liu et al.

NEURIPS 2025

Mixture of Online and Offline Experts for Non-Stationary Time Series

Zhilin Zhao, Longbing Cao, Yuanyu Wan

AAAI 2025paperarXiv:2202.05996

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.

NEURIPS 2025oralarXiv:2510.20725

Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting

Sifan Yang, Yuanyu Wan, Lijun Zhang

AAAI 2025paperarXiv:2508.00523
1
citations

p-Mean Regret for Stochastic Bandits

Anand Krishna, Philips George John, Adarsh Barik et al.

AAAI 2025paperarXiv:2412.10751
5
citations

Prediction with expert advice under additive noise

Alankrita Bhatt, Victoria Kostina

NEURIPS 2025

Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback

Orin Levy, Liad Erez, Alon Peled-Cohen et al.

NEURIPS 2025spotlightarXiv:2510.09127
2
citations

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Wenhao Xu, Xuefeng Gao, Xuedong He

ICLR 2025arXiv:2406.05366
1
citations

Revisiting Projection-Free Online Learning with Time-Varying Constraints

Yibo Wang, Yuanyu Wan, Lijun Zhang

AAAI 2025paperarXiv:2501.16046
5
citations

Robust Satisficing Gaussian Process Bandits Under Adversarial Attacks

Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin

NEURIPS 2025arXiv:2506.01625

Statistical Parity with Exponential Weights

Stephen Pasteris, Chris Hicks, Vasilios Mavroudis

NEURIPS 2025

Thompson Sampling in Function Spaces via Neural Operators

Rafael Oliveira, Xuesong Wang, Kian Ming Chai et al.

NEURIPS 2025arXiv:2506.21894

Tightening Regret Lower and Upper Bounds in Restless Rising Bandits

Cristiano Migali, Marco Mussi, Gianmarco Genalti et al.

NEURIPS 2025

Uniform Wrappers: Bridging Concave to Quadratizable Functions in Online Optimization

Mohammad Pedramfar, Christopher Quinn, Vaneet Aggarwal

NEURIPS 2025

$\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits

Pierre Clavier, Tom Huix, Alain Oliviero Durmus

ICML 2024

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

Xutong Liu, Siwei Wang, Jinhang Zuo et al.

ICML 2024arXiv:2406.01386
8
citations

Leveraging (Biased) Information: Multi-armed Bandits with Offline Data

Wang Chi Cheung, Lixing Lyu

ICML 2024spotlight

Reinforcement Learning and Regret Bounds for Admission Control

Lucas Weber, Ana Busic, Jiamin ZHU

ICML 2024arXiv:2406.04766
1
citations