"proximal policy optimization" Papers

12 papers found

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

Zijia Zhao, Longteng Guo, Jie Cheng et al.

ICLR 2025arXiv:2410.10456
8
citations

A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning

Yuzheng Hu, Fan Wu, Haotian Ye et al.

NEURIPS 2025oralarXiv:2505.19281
3
citations

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Xin Mao, Huimin Xu, Feng-Lin Li et al.

ICLR 2025arXiv:2410.04834
3
citations

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

Chau Pham, Quan Dao, Mahesh Bhosale et al.

NEURIPS 2025arXiv:2509.15031
1
citations

Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling

Jakob Hollenstein, Georg Martius, Justus Piater

AAAI 2024paperarXiv:2312.11091
8
citations

Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

Minchan Kim, Minyeong Kim, Junik Bae et al.

ECCV 2024arXiv:2403.16167
10
citations

Graph-Based Prediction and Planning Policy Network (GP3Net) for Scalable Self-Driving in Dynamic Environments Using Deep Reinforcement Learning

Jayabrata Chowdhury, Venkataramanan Shivaraman, Suresh Sundaram et al.

AAAI 2024paperarXiv:2312.05784
10
citations

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Shusheng Xu, Wei Fu, Jiaxuan Gao et al.

ICML 2024arXiv:2404.10719
253
citations

Learning Diverse Risk Preferences in Population-Based Self-Play

Yuhua Jiang, Qihan Liu, Xiaoteng Ma et al.

AAAI 2024paperarXiv:2305.11476
8
citations

Multimodal Label Relevance Ranking via Reinforcement Learning

Taian Guo, Taolin Zhang, Haoqian Wu et al.

ECCV 2024arXiv:2407.13221
1
citations

Reflective Policy Optimization

Yaozhong Gan, yan renye, zhe wu et al.

ICML 2024arXiv:2406.03678
2
citations

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Ziniu Li, Tian Xu, Yushun Zhang et al.

ICML 2024arXiv:2310.10505
147
citations