"semantic alignment" Papers
45 papers found
Conference
Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning
Jiawei Huang, Minming Li, Hu Ding
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
CREA: A Collaborative Multi-Agent Framework for Creative Image Editing and Generation
Kavana Venkatesh, Connor Dunlop, Pinar Yanardag
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.
DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing
Zixiang Li, Haoyu Wang, Wei Wang et al.
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Keon Lee, Dong Won Kim, Jaehyeon Kim et al.
DS-VLM: Diffusion Supervision Vision Language Model
Zhen Sun, Yunhang Shen, Jie Li et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
Yuping He, Yifei Huang, Guo Chen et al.
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu
GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
Qiao Li, Jie Li, Yukang Zhang et al.
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
HeGTa: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding
Rihui Jin, Yu Li, Guilin Qi et al.
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X. Chen, Misha Sra, Pradeep Sen
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon, Loïck Chambon, Louis Serrano et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization
Hao Zheng, Jingjun Yi, Qi Bi et al.
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu, Yuhang Ma, Zhen Yang et al.
LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models
Yihao LIU, Xinqi Lyu, Dong Wang et al.
Low-Biased General Annotated Dataset Generation
Dengyang Jiang, Haoyu Wang, Lei Zhang et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong, Meng Lan, Qian Zhang et al.
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation
Xiaoan Zhu, Yue Zhao, Tianyang Hu et al.
OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models
Boyang Peng, Sanqing Qu, Tianpei Zou et al.
PC-Net: Weakly Supervised Compositional Moment Retrieval via Proposal-Centric Network
Mingyao Zhou, Hao Sun, Wei Xie et al.
Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach
Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
Jian Xiao, Zijie Song, Jialong Hu et al.
ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection
Haowei Zhu, Tianxiang Pan, Rui Qin et al.
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.
Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
Danfeng Li, Hui Zhang, Sheng Wang et al.
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu, Hao Fei, Xiangtai Li et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Xuelu Feng, Dongdong Chen, Junsong Yuan et al.
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng et al.
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Shiyue Zhang, Zheng Chong, Xujie Zhang et al.
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang, Zhaochen Yu, Chenlin Meng et al.
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.