"reinforcement learning framework" Papers

11 papers found

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025arXiv:2411.18203
33
citations

FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning

Lu Zhang, Jiazuo Yu, Haomiao Xiong et al.

NEURIPS 2025arXiv:2510.21311
1
citations

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

Sheryl Hsu, Omar Khattab, Chelsea Finn et al.

ICLR 2025arXiv:2410.23214
16
citations

Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework

Lan Luo, Chengchun Shi, Jitao Wang et al.

NEURIPS 2025arXiv:2310.16203
2
citations

Reinforced Context Order Recovery for Adaptive Reasoning and Planning

Long Ma, Fangwei Zhong, Yizhou Wang

NEURIPS 2025arXiv:2508.13070
3
citations

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025arXiv:2508.06125
3
citations

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models

zhentao he, Can Zhang, Ziheng Wu et al.

NEURIPS 2025arXiv:2506.20168
2
citations

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

Weixiang Zhao, Xingyu Sui, Yulin Hu et al.

NEURIPS 2025arXiv:2505.15456
13
citations

Train on Pins and Test on Obstacles for Rectilinear Steiner Minimum Tree

Xingbo Du, Ruizhe Zhong, Junchi Yan

NEURIPS 2025

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

Romy Luo, Zihui (Sherry) Xue, Alex Dimakis et al.

NEURIPS 2025arXiv:2510.06077
4
citations

Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning

Chengzhengxu Li, Xiaoming Liu, Yichen Wang et al.

AAAI 2024paperarXiv:2308.07272
7
citations