Poster "exploration-exploitation tradeoff" Papers
9 papers found
Conference
Breaking the $\log(1/\Delta_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids
Tianyuan Jin, Qin Zhang, Dongruo Zhou
ICLR 2025
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
Emile Anand, Sarah Liaw
NEURIPS 2025arXiv:2507.15290
3
citations
LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models
JUNRU SONG, Yang Yang, Huan Xiao et al.
ICLR 2025
7
citations
Learning to price with resource constraints: from full information to machine-learned prices
Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi
NEURIPS 2025arXiv:2501.14155
3
citations
Online Feedback Efficient Active Target Discovery in Partially Observable Environments
Anindya Sarkar, Binglin Ji, Yevgeniy Vorobeychik
NEURIPS 2025arXiv:2505.06535
1
citations
PlanU: Large Language Model Reasoning through Planning under Uncertainty
Ziwei Deng, Mian Deng, Chenjing Liang et al.
NEURIPS 2025arXiv:2510.18442
Entropy-Reinforced Planning with Large Language Models for Drug Discovery
Xuefeng Liu, Chih-chan Tien, Peng Ding et al.
ICML 2024arXiv:2406.07025
7
citations
Optimal Batched Linear Bandits
Xuanfei Ren, Tianyuan Jin, Pan Xu
ICML 2024arXiv:2406.04137
5
citations
Stochastic Bandits with ReLU Neural Networks
Kan Xu, Hamsa Bastani, Surbhi Goel et al.
ICML 2024arXiv:2405.07331
1
citations