"reinforcement learning optimization" Papers
7 papers found
Conference
DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
Rui Yu, Xianghang Zhang, Runkai Zhao et al.
ICCV 2025arXiv:2508.05402
4
citations
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
NEURIPS 2025arXiv:2505.18531
7
citations
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
Yu Zhou, Xingyu Wu, Jibin Wu et al.
NEURIPS 2025spotlightarXiv:2409.18893
7
citations
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Zhongwei Yu, Wannian Xia, Xue Yan et al.
NEURIPS 2025arXiv:2510.12157
2
citations
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
NEURIPS 2025arXiv:2505.19217
5
citations
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
NEURIPS 2025arXiv:2505.14631
40
citations
ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance
Liwen Sun, Abhineet Agarwal, Aaron Kornblith et al.
ICML 2024arXiv:2402.13448
6
citations