Paper "reinforcement learning" Papers
39 papers found
Conference
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.
Efficient Reinforcement Learning in Probabilistic Reward Machines
Xiaofeng Lin, Xuezhou Zhang
Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder
Yuhan Zhang, Guoqing Ma, Guangfu Hao et al.
Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data
Shilong Deng, Zetao Zheng, Hongcai He et al.
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
Runlong Zhou, Maryam Fazel, Simon Shaolei Du
FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program
Yi-Xiang Hu, Feng Wu, Shaoang Li et al.
GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL
Lang Qin, Ziming Wang, Runhao Jiang et al.
Intelligent OPC Engineer Assistant for Semiconductor Manufacturing
Guojin Chen, Haoyu Yang, Bei Yu et al.
Learning to Reason for Long-Form Story Generation
Alexander Gurung, Mirella Lapata
MALT: Improving Reasoning with Multi-Agent LLM Training
Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.
Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning
Chenglu Sun, Shuo Shen, Wenzhi Tao et al.
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
Johannes Ackermann, Takashi Ishida, Masashi Sugiyama
On Shallow Planning Under Partial Observability
Randy Lefebvre, Audrey Durand
REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization
Huyen Nguyen, Hieu Dam, Nguyen Hoang Khoi Do et al.
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin, Hansi Zeng, Zhenrui Yue et al.
SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch
Shengyu Feng, Yiming Yang
Teaching Models to Improve on Tape
Liat Bezalel, Eyal Orgad, Amir Globerson
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.
Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems
Atsumoto Ohashi, Ryuichiro Higashinaka
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Fan Nie, Lan Feng, Haotian Ye et al.
Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
Zizhao Wang, Caroline Wang, Xuesu Xiao et al.
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
Wenze Chen, Shiyu Huang, Yuan Chiang et al.
DiffAIL: Diffusion Adversarial Imitation Learning
Bingzheng Wang, Guoqiang Wu, Teng Pang et al.
Discerning Temporal Difference Learning
Jianfei Ma
Dynamic Knowledge Injection for AIXI Agents
Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter
Episodic Return Decomposition by Difference of Implicitly Assigned Sub-trajectory Reward
Haoxin Lin, Hongqiu Wu, Jiaji Zhang et al.
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
Zilin Wang, Haolin Zhuang, Lu Li et al.
Learning Diverse Risk Preferences in Population-Based Self-Play
Yuhua Jiang, Qihan Liu, Xiaoteng Ma et al.
Learning Uncertainty-Aware Temporally-Extended Actions
Joongkyu Lee, Seung Joon Park, Yunhao Tang et al.
OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments
Jinyi Liu, Zhi Wang, Yan Zheng et al.
Parameterized Projected Bellman Operator
Théo Vincent, Alberto Maria Metelli, Boris Belousov et al.
Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning
Longchao Da, Minquan Gao, Hua Wei et al.
Rating-Based Reinforcement Learning
Devin White, Mingkang Wu, Ellen Novoseller et al.
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
Meshal Alharbi, Mardavij Roozbehani, Munther Dahleh
UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution
Gengrui Zhang, Xiaoshuang Chen, Yao WANG et al.