"off-policy learning" Papers
9 papers found
Conference
Bootstrap Off-policy with World Model
Guojian Zhan, Likun Wang, Xiangteng Zhang et al.
NEURIPS 2025arXiv:2511.00423
1
citations
MultiScale Contextual Bandits for Long Term Objectives
Richa Rastogi, Yuta Saito, Thorsten Joachims
NEURIPS 2025arXiv:2503.17674
Revisiting a Design Choice in Gradient Temporal Difference Learning
Xiaochi Qian, Shangtong Zhang
ICLR 2025oralarXiv:2308.01170
6
citations
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
Jongmin Lee, Meiqi Sun, Pieter Abbeel
ICLR 2025arXiv:2512.10042
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier, Nathan Grinsztajn, Raphaël Avalos et al.
NEURIPS 2025arXiv:2505.11081
2
citations
Simplifying Deep Temporal Difference Learning
Matteo Gallici, Mattie Fellows, Benjamin Ellis et al.
ICLR 2025oralarXiv:2407.04811
56
citations
Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
Tal Fiskus, Uri Shaham
NEURIPS 2025arXiv:2507.11269
Value Improved Actor Critic Algorithms
Yaniv Oren, Moritz Zanger, Pascal van der Vaart et al.
NEURIPS 2025arXiv:2406.01423
1
citations
Learning to Explore in POMDPs with Informational Rewards
Annie Xie, Logan M. Bhamidipaty, Evan Liu et al.
ICML 2024