Oral "reward shaping" Papers
3 papers found
Conference
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Muzhi Dai, Chenxu Yang, Qingyi Si
NEURIPS 2025oralarXiv:2505.07686
52
citations
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li, Jing Cheng, Shaoyong Jia et al.
NEURIPS 2025oralarXiv:2509.18056
7
citations
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
Yunpeng Jiang, Jianshu Hu, Paul Weng et al.
NEURIPS 2025oralarXiv:2505.13925