Spotlight "reinforcement learning" Papers

22 papers found

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Daolang Huang, Xinyi Wen, Ayush Bharti et al.

NEURIPS 2025spotlightarXiv:2506.07259
2
citations

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NEURIPS 2025spotlightarXiv:2412.11979
9
citations

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
32
citations

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Jingjing Jiang, Chongjie Si, Jun Luo et al.

NEURIPS 2025spotlightarXiv:2505.17534
5
citations

CURE: Co-Evolving Coders and Unit Testers via Reinforcement Learning

Yinjie Wang, Ling Yang, Ye Tian et al.

NEURIPS 2025spotlight

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NEURIPS 2025spotlightarXiv:2504.12216
87
citations

EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling

Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun et al.

NEURIPS 2025spotlightarXiv:2502.00466
2
citations

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

Qianyue Hao, Yiwen Song, Qingmin Liao et al.

NEURIPS 2025spotlightarXiv:2505.15293
3
citations

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Gunshi Gupta, Karmesh Yadav, Zsolt Kira et al.

NEURIPS 2025spotlightarXiv:2510.19732

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201
28
citations

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NEURIPS 2025spotlightarXiv:2505.21908
5
citations

Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach

Chenbei Lu, Zaiwei Chen, Tongxin Li et al.

NEURIPS 2025spotlightarXiv:2510.18687
1
citations

Reverse Engineering Human Preferences with Reinforcement Learning

Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.

NEURIPS 2025spotlightarXiv:2505.15795

Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

Bastien Dubail, Stefan Stojanovic, Alexandre Proutiere

NEURIPS 2025spotlightarXiv:2509.05193

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837
183
citations

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.

ICML 2024spotlightarXiv:2402.04764
27
citations

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Yinjun Wu, Mayank Keoliya, Kan Chen et al.

ICML 2024spotlightarXiv:2406.00611
3
citations

EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

Shengjie Wang, Shaohuai Liu, Weirui Ye et al.

ICML 2024spotlightarXiv:2403.00564
31
citations

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski et al.

ICML 2024spotlightarXiv:2402.02868
26
citations

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Obando Ceron, Ghada Sokar, Timon Willi et al.

ICML 2024spotlightarXiv:2402.08609
64
citations

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Zelei Cheng, Xian Wu, Jiahao Yu et al.

ICML 2024spotlightarXiv:2405.03064
10
citations

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung et al.

ICML 2024spotlightarXiv:2405.18986
17
citations