"policy updates" Papers
2 papers found
Conference
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu, Fan Wu, Haotian Ye et al.
NEURIPS 2025oralarXiv:2505.19281
3
citations
Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Jiahao Wang, Weiye Xu, Aijun Yang et al.
NEURIPS 2025arXiv:2511.10648