"visual reasoning" Papers
30 papers found
Conference
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Ji Qi, Ming Ding, Weihan Wang et al.
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue, Zhiqi Chen, Rui Lu et al.
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai, Yuxuan Fan, Qiu Jiantao et al.
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun, Hang Hua, Jian Wang et al.
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning
Oleh Kolner, Thomas Ortner, Stanisław Woźniak et al.
Neurosymbolic Diffusion Models
Emile van Krieken, Pasquale Minervini, Edoardo Maria Ponti et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei, Liang Zhao, Jianjian Sun et al.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin et al.
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
Tianhe Wu, Jian Zou, Jie Liang et al.
Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs
Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.
ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Zhu, Keren Ye, Junjie Ke et al.
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Ian Huang, Guandao Yang, Leonidas Guibas
Compositional Substitutivity of Visual Reasoning for Visual Question Answering
Chuanhao Li, Zhen Li, Chenchen Jing et al.
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang et al.
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying, Fanqing Meng, Jin Wang et al.
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng, Joy Hsu, Weiyu Liu et al.
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang, Jiting Cai, Mingyu Liu et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.