Poster "policy optimization" Papers
66 papers found • Page 1 of 2
Conference
$q$-exponential family for policy optimization
Lingwei Zhu, Haseeb Shah, Han Wang et al.
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen, Chandrajit Bajaj
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Jifeng Hu, Sili Huang, Zhejian Yang et al.
Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability
Shayan Karimi, Xiaoqi Tan
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
DPAIL: Training Diffusion Policy for Adversarial Imitation Learning without Policy Optimization
Yunseon Choi, Minchan Jeong, Soobin Um et al.
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Qirui Mi, Qipeng Yang, Zijun Fan et al.
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Sourav Ganguly, Kishan Panaganti, Arnob Ghosh et al.
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng, Hengquan Guo, Jiawei Zhang et al.
EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution
Zhebei Shen, Qifan Yu, Juncheng Li et al.
Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies
Lingwei Zhu, Han Wang, Yukie Nagai
How to Train Your LLM Web Agent: A Statistical Diagnosis
Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza et al.
Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning
Ke Sun, Yingnan Zhao, Enze Shi et al.
Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
Mianchu Wang, Yue Jin, Giovanni Montana
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks
WANTONG XIE, Yi-Xiang Hu, Jieyang Xu et al.
NaDRO: Leveraging Dual-Reward Strategies for LLMs Training on Noisy Data
Haolong Qian, Xianliang Yang, Ling Zhang et al.
Non-convex entropic mean-field optimization via Best Response flow
Razvan-Andrei Lascu, Mateusz Majka
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Pouya Hamadanian, Arash Nasr-Esfahany, Malte Schwarzkopf et al.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
On the Sample Complexity of Differentially Private Policy Optimization
Yi He, Xingyu Zhou
Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization
Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi et al.
Progress Reward Model for Reinforcement Learning via Large Language Models
Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok, Skander Moalla, Caglar Gulcehre
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
Yiyang Zhou, Yangfan He, Yaofeng Su et al.
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Jiaru Zou, Ling Yang, Jingwen Gu et al.
ReDit: Reward Dithering for Improved LLM Policy Optimization
Chenxing Wei, Jiarui Yu, Ying He et al.
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model
Yicong Chen, Jiahua Rao, Jiancong Xie et al.
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
Giseung Park, Youngchul Sung
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu, Wei Xiong, Jie Ren et al.
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Wanxin Tian, Shijie Zhang, Kevin Zhang et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing, Vernon Luk, Jean Oh
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Zexu Sun, Yiju Guo, Yankai Lin et al.
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
Feng Gao, Liangzhi Shi, Shenao Zhang et al.
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Hao Hu, yiqin yang, Jianing Ye et al.
Constrained Reinforcement Learning Under Model Mismatch
Zhongchang Sun, Sihong He, Fei Miao et al.
Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
Gergely Neu, Nneka Okolo
Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration
Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.
EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Yan Zheng, Hongyao Tang et al.
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Yihan Du, Anna Winnicki, Gal Dalal et al.