"reinforcement fine-tuning" Papers

14 papers found

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Ran Xu, Yuchen Zhuang, Zihan Dong et al.

NEURIPS 2025spotlightarXiv:2509.24193
5
citations

Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals

Qinsi Wang, Jinghan Ke, Hancheng Ye et al.

NEURIPS 2025spotlight

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025arXiv:2410.20971
20
citations

CollabLLM: From Passive Responders to Active Collaborators

Shirley Wu, Michel Galley, Baolin Peng et al.

ICML 2025oralarXiv:2502.00640
43
citations

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

Baoqi Pei, Yifei Huang, Jilan Xu et al.

NEURIPS 2025oralarXiv:2510.23569
4
citations

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

Jian Liu, Jing Xu, Song Guo et al.

NEURIPS 2025spotlightarXiv:2505.16761
7
citations

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025arXiv:2506.04308
58
citations

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

Wanxin Tian, Shijie Zhang, Kevin Zhang et al.

NEURIPS 2025arXiv:2506.21669
6
citations

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Yunheng Li, Jing Cheng, Shaoyong Jia et al.

NEURIPS 2025oralarXiv:2509.18056
7
citations

To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning

Ming Li, Jike Zhong, Shitian Zhao et al.

NEURIPS 2025spotlight

Understanding Data Influence in Reinforcement Finetuning

Haoru Tan, Xiuzhe Wu, Sitong Wu et al.

NEURIPS 2025oral

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434
34
citations

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang et al.

ICCV 2025arXiv:2503.01785
357
citations

Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning

Xiaoyu Yang, Jie Lu, En Yu

NEURIPS 2025oral
6
citations