"reinforcement learning framework" Papers
11 papers found
Conference
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
CVPR 2025arXiv:2411.18203
33
citations
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
Lu Zhang, Jiazuo Yu, Haomiao Xiong et al.
NEURIPS 2025arXiv:2510.21311
1
citations
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Sheryl Hsu, Omar Khattab, Chelsea Finn et al.
ICLR 2025arXiv:2410.23214
16
citations
Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework
Lan Luo, Chengchun Shi, Jitao Wang et al.
NEURIPS 2025arXiv:2310.16203
2
citations
Reinforced Context Order Recovery for Adaptive Reasoning and Planning
Long Ma, Fangwei Zhong, Yizhou Wang
NEURIPS 2025arXiv:2508.13070
3
citations
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang, Xianfang Zeng, Kangcong Li et al.
ICCV 2025arXiv:2508.06125
3
citations
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
zhentao he, Can Zhang, Ziheng Wu et al.
NEURIPS 2025arXiv:2506.20168
2
citations
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao, Xingyu Sui, Yulin Hu et al.
NEURIPS 2025arXiv:2505.15456
13
citations
Train on Pins and Test on Obstacles for Rectilinear Steiner Minimum Tree
Xingbo Du, Ruizhe Zhong, Junchi Yan
NEURIPS 2025
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
Romy Luo, Zihui (Sherry) Xue, Alex Dimakis et al.
NEURIPS 2025arXiv:2510.06077
4
citations
Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning
Chengzhengxu Li, Xiaoming Liu, Yichen Wang et al.
AAAI 2024paperarXiv:2308.07272
7
citations