"visual reasoning" Papers

30 papers found

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

Minghe Gao, Xuqi Liu, Zhongqi Yue et al.

ICCV 2025arXiv:2504.06606
10
citations

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025arXiv:2402.04236
36
citations

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NEURIPS 2025oralarXiv:2504.13837
540
citations

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.

ICCV 2025arXiv:2503.19263
6
citations

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.

CVPR 2025arXiv:2503.12821
3
citations

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

Tianyi Bai, Yuxuan Fan, Qiu Jiantao et al.

NEURIPS 2025arXiv:2506.07227
3
citations

Latent Chain-of-Thought for Visual Reasoning

Guohao Sun, Hang Hua, Jian Wang et al.

NEURIPS 2025arXiv:2510.23925
13
citations

Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning

Oleh Kolner, Thomas Ortner, Stanisław Woźniak et al.

ICLR 2025arXiv:2409.20213

Neurosymbolic Diffusion Models

Emile van Krieken, Pasquale Minervini, Edoardo Maria Ponti et al.

NEURIPS 2025arXiv:2505.13138
5
citations

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei, Liang Zhao, Jianjian Sun et al.

NEURIPS 2025arXiv:2507.05255
14
citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NEURIPS 2025arXiv:2503.17352
16
citations

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

Ye Liu, Zongyang Ma, Junfu Pu et al.

NEURIPS 2025arXiv:2509.18094
4
citations

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025arXiv:2504.09282
3
citations

VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou et al.

NEURIPS 2025arXiv:2506.09049
9
citations

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Xueqing Wu, Yuheng Ding, Bingxuan Li et al.

CVPR 2025arXiv:2412.02172
13
citations

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Zhangquan Chen, Xufang Luo, Dongsheng Li

ICCV 2025arXiv:2503.07523
25
citations

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025arXiv:2502.06787
31
citations

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Anand Bhattad, Konpat Preechakul, Alexei Efros

NEURIPS 2025arXiv:2503.21770
8
citations

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
35
citations

Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs

Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.

NEURIPS 2025
1
citations

ViUniT: Visual Unit Tests for More Robust Visual Programming

Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.

CVPR 2025arXiv:2412.08859
2
citations

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

William Zhu, Keren Ye, Junjie Ke et al.

ECCV 2024arXiv:2408.04102
2
citations

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Ian Huang, Guandao Yang, Leonidas Guibas

ECCV 2024arXiv:2404.17672
11
citations

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

Chuanhao Li, Zhen Li, Chenchen Jing et al.

ECCV 2024
1
citations

ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang et al.

ICML 2024arXiv:2401.13311
20
citations

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.

ECCV 2024arXiv:2403.12884
26
citations

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024arXiv:2404.16006
163
citations

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Chun Feng, Joy Hsu, Weiyu Liu et al.

CVPR 2024arXiv:2404.19696
9
citations

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Mingyu Zhang, Jiting Cai, Mingyu Liu et al.

ECCV 2024arXiv:2407.19666
9
citations

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

yunxin li, Baotian Hu, Haoyuan Shi et al.

ICML 2024arXiv:2405.04950
28
citations