"reinforcement fine-tuning" Papers
14 papers found
Conference
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
Ran Xu, Yuchen Zhuang, Zihan Dong et al.
NEURIPS 2025spotlightarXiv:2509.24193
5
citations
Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals
Qinsi Wang, Jinghan Ke, Hancheng Ye et al.
NEURIPS 2025spotlight
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
ICLR 2025arXiv:2410.20971
20
citations
CollabLLM: From Passive Responders to Active Collaborators
Shirley Wu, Michel Galley, Baolin Peng et al.
ICML 2025oralarXiv:2502.00640
43
citations
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei, Yifei Huang, Jilan Xu et al.
NEURIPS 2025oralarXiv:2510.23569
4
citations
Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning
Jian Liu, Jing Xu, Song Guo et al.
NEURIPS 2025spotlightarXiv:2505.16761
7
citations
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Enshen Zhou, Jingkun An, Cheng Chi et al.
NEURIPS 2025arXiv:2506.04308
58
citations
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Wanxin Tian, Shijie Zhang, Kevin Zhang et al.
NEURIPS 2025arXiv:2506.21669
6
citations
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li, Jing Cheng, Shaoyong Jia et al.
NEURIPS 2025oralarXiv:2509.18056
7
citations
To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning
Ming Li, Jike Zhong, Shitian Zhao et al.
NEURIPS 2025spotlight
Understanding Data Influence in Reinforcement Finetuning
Haoru Tan, Xiuzhe Wu, Sitong Wu et al.
NEURIPS 2025oral
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
Qi Wang, Yanrui Yu, Ye Yuan et al.
NEURIPS 2025oralarXiv:2505.12434
34
citations
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
ICCV 2025arXiv:2503.01785
357
citations
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang, Jie Lu, En Yu
NEURIPS 2025oral
6
citations