"vision-language model" Papers
21 papers found
Conference
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim, Hyunjung Shim
Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments
Jiahui Wang, Chao Chen, Jiacheng Xu et al.
Image as a World: Generating Interactive World from Single Image via Panoramic Video Generation
Dongnan Gui, Xun Guo, Wengang Zhou et al.
ImgEdit: A Unified Image Editing Dataset and Benchmark
Yang Ye, Xianyi He, Zongjian Li et al.
IntelliCap: Intelligent Guidance for Consistent View Sampling
Ayaka Yasunaga, Hideo Saito, Dieter Schmalstieg et al.
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
Jinsol Song, Jiamu Wang, Anh Nguyen et al.
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
Wenxi Lv, Qinliang Su, Wenchao Xu
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
Agneet Chatterjee, Rahim Entezari, Maksym Zhuravinskyi et al.
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta, Anirban Roy, Rama Chellappa et al.
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.
Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
SI-QI LIU, Qirui Wang, Pong Chi Yuen
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin, Ryuichi Takanobu, Cai Zhang et al.
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun et al.
Image Fusion via Vision-Language Model
Zixiang Zhao, Lilun Deng, Haowen Bai et al.
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.
PALM: Predicting Actions through Language Models
Sanghwan Kim, Daoji Huang, Yongqin Xian et al.
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li, Zhizhong Zhang, Xin Tan et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
Retrieval Across Any Domains via Large-scale Pre-trained Model
Jiexi Yan, Zhihui Yin, Chenghao Xu et al.