Poster "proximal policy optimization" Papers
8 papers found
Conference
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Zijia Zhao, Longteng Guo, Jie Cheng et al.
ICLR 2025arXiv:2410.10456
8
citations
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Huimin Xu, Feng-Lin Li et al.
ICLR 2025arXiv:2410.04834
3
citations
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
Chau Pham, Quan Dao, Mahesh Bhosale et al.
NEURIPS 2025arXiv:2509.15031
1
citations
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim, Minyeong Kim, Junik Bae et al.
ECCV 2024arXiv:2403.16167
10
citations
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024arXiv:2404.10719
253
citations
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo, Taolin Zhang, Haoqian Wu et al.
ECCV 2024arXiv:2407.13221
1
citations
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ICML 2024arXiv:2406.03678
2
citations
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.
ICML 2024arXiv:2310.10505
147
citations