Poster "visual-language models" Papers

14 papers found

CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation

Matan Rusanovsky, Or Hirschorn, Shai Avidan

ICLR 2025arXiv:2406.00384
8
citations

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

Wonseok Roh, Hwanhee Jung, JongWook Kim et al.

ICCV 2025arXiv:2412.12906
6
citations

Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning

Yilun Li, Miaomiao Cheng, Xu Han et al.

ICLR 2025
6
citations

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025arXiv:2503.17044
3
citations

Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement

Ruitao Wu, Yifan Zhao, Jia Li

ICCV 2025arXiv:2509.00527
1
citations

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

Lin Zhang, Xianfang Zeng, Kangcong Li et al.

ICCV 2025arXiv:2508.06125
3
citations

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Junyu Xie, Tengda Han, Max Bain et al.

ICCV 2025arXiv:2504.01020
3
citations

E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation

Peijun Bao, Zihao Shao, Wenhan Yang et al.

ECCV 2024
6
citations

Efficient Vision-Language Pre-training by Cluster Masking

Zihao Wei, Zixuan Pan, Andrew Owens

CVPR 2024arXiv:2405.08815
15
citations

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang, di wu et al.

ICML 2024arXiv:2406.00645
26
citations

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

Chenglin Yang, Siyuan Qiao, Yuan Cao et al.

ECCV 2024arXiv:2311.17072
3
citations

Prompt-based Visual Alignment for Zero-shot Policy Transfer

Haihan Gao, Rui Zhang, Qi Yi et al.

ICML 2024arXiv:2406.03250
1
citations

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Mu Cai, Haotian Liu, Yuheng Li et al.

ECCV 2024arXiv:2410.00905
7
citations

SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models

Yang Zhou, Yongjian Wu, Jiya Saiyin et al.

ECCV 2024arXiv:2407.11414
2
citations