"reinforcement learning" Papers
300 papers found • Page 4 of 6
Conference
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang, Zunnan Xu, Jun Zhou et al.
Scaling RL to Long Videos
Yukang Chen, Wei Huang, Baifeng Shi et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin, Hansi Zeng, Zhenrui Yue et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston et al.
Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens
Bohan Wang, Mingze Zhou, Zhongqi Yue et al.
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
Sequential Attention-based Sampling for Histopathological Analysis
Tarun Gogisetty, Naman Malpani, Gugan Chandrashekhar Mallika Thoppe et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning
Bastien Dubail, Stefan Stojanovic, Alexandre Proutiere
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling
Yitian Chen, Jingfan Xia, Siyu Shao et al.
SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch
Shengyu Feng, Yiming Yang
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Peixian Ma, Xialie Zhuang, Chengjin Xu et al.
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing, Vernon Luk, Jean Oh
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Hung Le, Dung Nguyen, Kien Do et al.
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Hoang Khoi Nguyen Do, Truc Nguyen, Malik Hassanaly et al.
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.
Teaching Models to Improve on Tape
Liat Bezalel, Eyal Orgad, Amir Globerson
Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster
Patrick Schnell, Luca Guastoni, Nils Thuerey
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.
Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
Ye Wang, Ziheng Wang, Boshen Xu et al.
Training-Free Generation of Temporally Consistent Rewards from VLMs
Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.
Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
Tal Fiskus, Uri Shaham
Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization
Haoran Xu, Liyuan Mao, Hui Jin et al.
Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems
Atsumoto Ohashi, Ryuichiro Higashinaka
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
wang lin, Wentao Hu, Liyu Jia et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang et al.
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Qingtao Liu, Yu Cui, Zhengnan Sun et al.
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Fan Nie, Lan Feng, Haotian Ye et al.
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu et al.
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Activation-Descent Regularization for Input Optimization of ReLU Networks
Hongzhan Yu, Sicun Gao