Poster "reinforcement learning" Papers
220 papers found • Page 3 of 5
Conference
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Suorong Yang, Peijia Li, Furao Shen et al.
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu, Bryan Wilder, Elias Khalil et al.
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang et al.
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan Rodriguez, Haotian Zhang, Abhay Puri et al.
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.
Retro-R1: LLM-based Agentic Retrosynthesis
Wei Liu, Jiangtao Feng, Hongli Yu et al.
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.
REvolve: Reward Evolution with Large Language Models using Human Feedback
RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan, Wencheng Han, xia zhou et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Haozhen Zhang, Tao Feng, Jiaxuan You
Safety Representations for Safer Policy Learning
Kaustubh Mani, Vincent Mai, Charlie Gauthier et al.
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang, Zunnan Xu, Jun Zhou et al.
Scaling RL to Long Videos
Yukang Chen, Wei Huang, Baifeng Shi et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston et al.
Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens
Bohan Wang, Mingze Zhou, Zhongqi Yue et al.
Sequential Attention-based Sampling for Histopathological Analysis
Tarun Gogisetty, Naman Malpani, Gugan Chandrashekhar Mallika Thoppe et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling
Yitian Chen, Jingfan Xia, Siyu Shao et al.
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Peixian Ma, Xialie Zhuang, Chengjin Xu et al.
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing, Vernon Luk, Jean Oh
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Hung Le, Dung Nguyen, Kien Do et al.
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Hoang Khoi Nguyen Do, Truc Nguyen, Malik Hassanaly et al.
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.
Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu
Training-Free Generation of Temporally Consistent Rewards from VLMs
Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
Tal Fiskus, Uri Shaham
Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization
Haoran Xu, Liyuan Mao, Hui Jin et al.
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
wang lin, Wentao Hu, Liyu Jia et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Qingtao Liu, Yu Cui, Zhengnan Sun et al.
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu et al.
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Activation-Descent Regularization for Input Optimization of ReLU Networks
Hongzhan Yu, Sicun Gao
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni, Yulin Wang, Renping Zhou et al.