"aggregate bandit feedback" Papers
2 papers found
Conference
Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback
Shinji Ito, Kevin Jamieson, Haipeng Luo et al.
NEURIPS 2025arXiv:2510.17103
2
citations
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf Cassel, Haipeng Luo, Aviv Rosenberg et al.
ICML 2024arXiv:2405.07637
5
citations