"visual-language models" Papers

19 papers found

CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation

Matan Rusanovsky, Or Hirschorn, Shai Avidan

ICLR 2025arXiv:2406.00384
8
citations

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

Wonseok Roh, Hwanhee Jung, JongWook Kim et al.

ICCV 2025arXiv:2412.12906
6
citations

Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning

Yilun Li, Miaomiao Cheng, Xu Han et al.

ICLR 2025
6
citations

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025arXiv:2503.17044
3
citations

Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement

Ruitao Wu, Yifan Zhao, Jia Li

ICCV 2025arXiv:2509.00527
1
citations

MASS: Overcoming Language Bias in Image-Text Matching

Jiwan Chung, Seungwon Lim, Sangkyu Lee et al.

AAAI 2025paperarXiv:2501.11469

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.

AAAI 2025paperarXiv:2409.16084
11
citations

Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering

Peize Li, Qingyi Si, Peng Fu et al.

AAAI 2025paperarXiv:2412.14880
1
citations

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025arXiv:2508.06125
3
citations

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025arXiv:2504.01020
3
citations

E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation

Peijun Bao, Zihao Shao, Wenhan Yang et al.

ECCV 2024
6
citations

Efficient Vision-Language Pre-training by Cluster Masking

Zihao Wei, Zixuan Pan, Andrew Owens

CVPR 2024arXiv:2405.08815
15
citations

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang, di wu et al.

ICML 2024arXiv:2406.00645
26
citations

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

Chenglin Yang, Siyuan Qiao, Yuan Cao et al.

ECCV 2024arXiv:2311.17072
3
citations

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024paperarXiv:2312.08212
30
citations

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553
85
citations

Prompt-based Visual Alignment for Zero-shot Policy Transfer

Haihan Gao, Rui Zhang, Qi Yi et al.

ICML 2024arXiv:2406.03250
1
citations

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Mu Cai, Haotian Liu, Yuheng Li et al.

ECCV 2024arXiv:2410.00905
7
citations

SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models

Yang Zhou, Yongjian Wu, Jiya Saiyin et al.

ECCV 2024arXiv:2407.11414
2
citations