"grpo algorithm" Papers
2 papers found
Conference
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li et al.
NEURIPS 2025oralarXiv:2503.21776
257
citations
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang et al.
NEURIPS 2025spotlightarXiv:2504.08837
183
citations