Poster "vision-language alignment" Papers
20 papers found
Conference
$\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization
Lin Zhu, Yifeng Yang, Xinbing Wang et al.
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching
Yang Liu, Wentao Feng, Zhuoyao Liu et al.
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.
CompCap: Improving Multimodal Large Language Models with Composite Captions
Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu, Liang Pang, Yunchang Zhu et al.
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
Kun Zhang, Jingyu Li, Zhe Li et al.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.
DS-VLM: Diffusion Supervision Vision Language Model
Zhen Sun, Yunhang Shen, Jie Li et al.
ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models
Duy M. H. Nguyen, Nghiem Diep, Trung Nguyen et al.
Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
Bingchao Wang, Zhiwei Ning, Jianyu Ding et al.
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
Liang Yin, Xudong Xie, Zhang Li et al.
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
Jonggwon Park, Byungmu Yoon, Soobum Kim et al.
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei, Tao Chen, Yujia Wang et al.
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Shufan Shen, Junshu Sun, Qingming Huang et al.
APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
Yaxin Luo, Jiayi Ji, Xiaofu Chen et al.
LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation
Pengwei Yin, Jingjing Wang, Guanzhong Zeng et al.
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar, Zhenshi Li, Feng Gu et al.
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei SUN, Yaoxian Song, Xinglin Pan et al.
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Meng Chu, Zhedong Zheng, Wei Ji et al.
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han, Fangrui Zhu, Qianru Lao et al.