"reinforcement learning fine-tuning" Papers

17 papers found

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Haonan Han, Xiangzuo Wu, Huan Liao et al.

CVPR 2025arXiv:2411.18654
5
citations

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Adibvafa Fallahpour, Andrew Magnuson, Purav Gupta et al.

NEURIPS 2025arXiv:2505.23579
17
citations

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025arXiv:2506.07464
28
citations

Diffusion Policy Policy Optimization

Allen Ren, Justin Lidard, Lars Ankile et al.

ICLR 2025arXiv:2409.00588
146
citations

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025arXiv:2412.07762
30
citations

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

Yichuan Ma, Linyang Li, Yongkang Chen et al.

NEURIPS 2025arXiv:2601.16447

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639
26
citations

Regulatory DNA Sequence Design with Reinforcement Learning

Zhao Yang, Bing Su, Chuan Cao et al.

ICLR 2025arXiv:2503.07981
4
citations

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025arXiv:2505.22094
21
citations

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

Jiaxing Wu, Lin Ning, Luyang Liu et al.

AAAI 2025paperarXiv:2409.04421
7
citations

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

Zhenyuan Qin, Xincheng Shuai, Henghui Ding

NEURIPS 2025spotlightarXiv:2511.16666
1
citations

ShiQ: Bringing back Bellman to LLMs

Pierre Clavier, Nathan Grinsztajn, Raphaël Avalos et al.

NEURIPS 2025arXiv:2505.11081
2
citations

Teaching Language Models to Reason with Tools

Chengpeng Li, Zhengyang Tang, Ziniu Li et al.

NEURIPS 2025arXiv:2510.20342
2
citations

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NEURIPS 2025arXiv:2505.01481
15
citations

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.

ICML 2024

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu et al.

ECCV 2024arXiv:2409.18343
25
citations

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024arXiv:2402.10207
125
citations