"visual prompting" Papers

19 papers found

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025arXiv:2408.00754
9
citations

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

ICLR 2025arXiv:2403.20271
87
citations

Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation

Shohei Enomoto

NEURIPS 2025arXiv:2510.07823

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Zhangheng LI, Keen You, Haotian Zhang et al.

ICLR 2025arXiv:2410.18967
45
citations

Selective Visual Prompting in Vision Mamba

Yifeng Yao, Zichen Liu, Zhenyu Cui et al.

AAAI 2025paperarXiv:2412.08947
11
citations

StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

Jaeseok Jeong, Junho Kim, Youngjung Uh et al.

ICCV 2025arXiv:2510.06827
2
citations

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang et al.

ICLR 2025oralarXiv:2411.05420
9
citations

Attention Prompting on Image for Large Vision-Language Models

Runpeng Yu, Weihao Yu, Xinchao Wang

ECCV 2024arXiv:2409.17143
28
citations

DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Jiaxin Zhang, Dezhi Peng, Chongyu Liu et al.

CVPR 2024arXiv:2405.04408
29
citations

Encapsulating Knowledge in One Prompt

Qi Li, Runpeng Yu, Xinchao Wang

ECCV 2024arXiv:2407.11902
3
citations

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.

CVPR 2024highlightarXiv:2404.11207
18
citations

FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

Jiedong Zhuang, Jiaqi Hu, Lianrui Mu et al.

ECCV 2024arXiv:2407.05578
8
citations

Finding Visual Task Vectors

Alberto Hojel, Yutong Bai, Trevor Darrell et al.

ECCV 2024arXiv:2404.05729
14
citations

Generative Multimodal Models are In-Context Learners

Quan Sun, Yufeng Cui, Xiaosong Zhang et al.

CVPR 2024arXiv:2312.13286
438
citations

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Shibo Jie, Yehui Tang, Ning Ding et al.

ICML 2024arXiv:2405.05615
20
citations

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024arXiv:2402.07872
188
citations

Tokenize Anything via Prompting

Ting Pan, Lulu Tang, Xinlong Wang et al.

ECCV 2024arXiv:2312.09128
36
citations

Unifying Image Processing as Visual Prompting Question Answering

Yihao Liu, Xiangyu Chen, Xianzheng Ma et al.

ICML 2024arXiv:2310.10513
31
citations

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024arXiv:2404.11732
21
citations