Poster "visual-language models" Papers
14 papers found
Conference
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
Matan Rusanovsky, Or Hirschorn, Shai Avidan
ICLR 2025arXiv:2406.00384
8
citations
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh, Hwanhee Jung, JongWook Kim et al.
ICCV 2025arXiv:2412.12906
6
citations
Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning
Yilun Li, Miaomiao Cheng, Xu Han et al.
ICLR 2025
6
citations
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth, David Rozenberszki, Angela Dai
ICCV 2025arXiv:2503.17044
3
citations
Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement
Ruitao Wu, Yifan Zhao, Jia Li
ICCV 2025arXiv:2509.00527
1
citations
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang, Xianfang Zeng, Kangcong Li et al.
ICCV 2025arXiv:2508.06125
3
citations
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie, Tengda Han, Max Bain et al.
ICCV 2025arXiv:2504.01020
3
citations
E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
Peijun Bao, Zihao Shao, Wenhan Yang et al.
ECCV 2024
6
citations
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei, Zixuan Pan, Andrew Owens
CVPR 2024arXiv:2405.08815
15
citations
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
Yuwei Fu, Haichao Zhang, di wu et al.
ICML 2024arXiv:2406.00645
26
citations
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Chenglin Yang, Siyuan Qiao, Yuan Cao et al.
ECCV 2024arXiv:2311.17072
3
citations
Prompt-based Visual Alignment for Zero-shot Policy Transfer
Haihan Gao, Rui Zhang, Qi Yi et al.
ICML 2024arXiv:2406.03250
1
citations
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Mu Cai, Haotian Liu, Yuheng Li et al.
ECCV 2024arXiv:2410.00905
7
citations
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
Yang Zhou, Yongjian Wu, Jiya Saiyin et al.
ECCV 2024arXiv:2407.11414
2
citations