Poster "semantic alignment" Papers
35 papers found
Conference
Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning
Jiawei Huang, Minming Li, Hu Ding
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.
CREA: A Collaborative Multi-Agent Framework for Creative Image Editing and Generation
Kavana Venkatesh, Connor Dunlop, Pinar Yanardag
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.
DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing
Zixiang Li, Haoyu Wang, Wei Wang et al.
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Keon Lee, Dong Won Kim, Jaehyeon Kim et al.
DS-VLM: Diffusion Supervision Vision Language Model
Zhen Sun, Yunhang Shen, Jie Li et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu
GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
Qiao Li, Jie Li, Yukang Zhang et al.
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X. Chen, Misha Sra, Pradeep Sen
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon, Loïck Chambon, Louis Serrano et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization
Hao Zheng, Jingjun Yi, Qi Bi et al.
LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models
Yihao LIU, Xinqi Lyu, Dong Wang et al.
Low-Biased General Annotated Dataset Generation
Dengyang Jiang, Haoyu Wang, Lei Zhang et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong, Meng Lan, Qian Zhang et al.
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation
Xiaoan Zhu, Yue Zhao, Tianyang Hu et al.
OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models
Boyang Peng, Sanqing Qu, Tianpei Zou et al.
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
Jian Xiao, Zijie Song, Jialong Hu et al.
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.
Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
Danfeng Li, Hui Zhang, Sheng Wang et al.
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu, Hao Fei, Xiangtai Li et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Xuelu Feng, Dongdong Chen, Junsong Yuan et al.
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng et al.
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Shiyue Zhang, Zheng Chong, Xujie Zhang et al.
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang, Zhaochen Yu, Chenlin Meng et al.
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.