"reinforcement learning" Papers
300 papers found • Page 3 of 6
Conference
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
Johannes Ackermann, Takashi Ishida, Masashi Sugiyama
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Hao Zhong, Muzhi Zhu, Zongze Du et al.
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu, Jiyuan Tu, Xi Chen et al.
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Pouya Hamadanian, Arash Nasr-Esfahany, Malte Schwarzkopf et al.
Online-to-Offline RL for Agent Alignment
Xu Liu, Haobo Fu, Stefano V. Albrecht et al.
On Shallow Planning Under Partial Observability
Randy Lefebvre, Audrey Durand
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams, Micah Carroll, Adhyyan Narang et al.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
On the Sample Complexity of Differentially Private Policy Optimization
Yi He, Xingyu Zhou
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Jingcheng Hu, Yinmin Zhang, Qi Han et al.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin et al.
Open-World Drone Active Tracking with Goal-Centered Rewards
Haowei Sun, Jinwu Hu, Zhirui Zhang et al.
Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning
Baiyuan Chen, Shinji Ito, Masaaki Imaizumi
OptionZero: Planning with Learned Options
Po-Wei Huang, Pei-Chiun Peng, Hung Guei et al.
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning
Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.
Pareto Prompt Optimization
Guang Zhao, Byung-Jun Yoon, Gilchan Park et al.
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
Periodic Skill Discovery
Jonghae Park, Daesol Cho, Jusuk Lee et al.
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.
Policy Gradient with Kernel Quadrature
Tetsuro Morimura, Satoshi Hayakawa
Preference Distillation via Value based Reinforcement Learning
Minchan Kwon, Junwon Ko, Kangil kim et al.
Progress Reward Model for Reinforcement Learning via Large Language Models
Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Mingjie Liu, Shizhe Diao, Ximing Lu et al.
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao et al.
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
Yiyang Zhou, Yangfan He, Yaofeng Su et al.
Real-World Reinforcement Learning of Active Perception Behaviors
Edward Hu, Jie Wang, Xingfang Yuan et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Stephen Zhao, Aidan Li, Rob Brekelmans et al.
Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model
Yicong Chen, Jiahua Rao, Jiancong Xie et al.
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.
Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards
Zhaohui JIANG, Xuening Feng, Paul Weng et al.
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Suorong Yang, Peijia Li, Furao Shen et al.
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu, Bryan Wilder, Elias Khalil et al.
Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach
Chenbei Lu, Zaiwei Chen, Tongxin Li et al.
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang et al.
REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization
Huyen Nguyen, Hieu Dam, Nguyen Hoang Khoi Do et al.
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan Rodriguez, Haotian Zhang, Abhay Puri et al.
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.
Retro-R1: LLM-based Agentic Retrosynthesis
Wei Liu, Jiangtao Feng, Hongli Yu et al.
Reverse Engineering Human Preferences with Reinforcement Learning
Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.
REvolve: Reward Evolution with Large Language Models using Human Feedback
RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan, Wencheng Han, xia zhou et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Haozhen Zhang, Tao Feng, Jiaxuan You
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.
Safety Representations for Safer Policy Learning
Kaustubh Mani, Vincent Mai, Charlie Gauthier et al.