"semantic alignment" Papers

45 papers found

Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning

Jiawei Huang, Minming Li, Hu Ding

NEURIPS 2025

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.

CVPR 2025arXiv:2412.12718
11
citations

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang et al.

CVPR 2025highlightarXiv:2406.10462
12
citations

CREA: A Collaborative Multi-Agent Framework for Creative Image Editing and Generation

Kavana Venkatesh, Connor Dunlop, Pinar Yanardag

NEURIPS 2025arXiv:2504.05306
2
citations

Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.

ICLR 2025arXiv:2410.11236
14
citations

DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing

Zixiang Li, Haoyu Wang, Wei Wang et al.

NEURIPS 2025arXiv:2506.02560
1
citations

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025arXiv:2406.11427
28
citations

DS-VLM: Diffusion Supervision Vision Language Model

Zhen Sun, Yunhang Shen, Jie Li et al.

ICML 2025
1
citations

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

Xin Xie, Dong Gong

CVPR 2025arXiv:2412.00759
16
citations

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs

Yuping He, Yifei Huang, Guo Chen et al.

NEURIPS 2025oralarXiv:2507.18342
11
citations

FormalAlign: Automated Alignment Evaluation for Autoformalization

Jianqiao Lu, Yingjia Wan, Yinya Huang et al.

ICLR 2025arXiv:2410.10135
10
citations

Generalizable Object Re-Identification via Visual In-Context Prompting

Zhizhong Huang, Xiaoming Liu

ICCV 2025arXiv:2508.21222
3
citations

GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification

Qiao Li, Jie Li, Yukang Zhang et al.

NEURIPS 2025arXiv:2510.22268
1
citations

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization

Zitang Zhou, Ke Mei, Yu Lu et al.

CVPR 2025arXiv:2503.01725
7
citations

HeGTa: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

Rihui Jin, Yu Li, Guilin Qi et al.

AAAI 2025paperarXiv:2403.19723
6
citations

Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning

Sherry X. Chen, Misha Sra, Pradeep Sen

CVPR 2025arXiv:2503.18406
4
citations

JAFAR: Jack up Any Feature at Any Resolution

Paul Couairon, Loïck Chambon, Louis Serrano et al.

NEURIPS 2025arXiv:2506.11136
7
citations

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025arXiv:2406.05404
11
citations

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

Hao Zheng, Jingjun Yi, Qi Bi et al.

NEURIPS 2025

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
33
citations

LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models

Yihao LIU, Xinqi Lyu, Dong Wang et al.

NEURIPS 2025

Low-Biased General Annotated Dataset Generation

Dengyang Jiang, Haoyu Wang, Lei Zhang et al.

CVPR 2025arXiv:2412.10831

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025arXiv:2503.18135
10
citations

MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation

Fu Rong, Meng Lan, Qian Zhang et al.

ICCV 2025arXiv:2501.13667
3
citations

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis

Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.

CVPR 2025arXiv:2411.16503
3
citations

OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation

Xiaoan Zhu, Yue Zhao, Tianyang Hu et al.

NEURIPS 2025

OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models

Boyang Peng, Sanqing Qu, Tianpei Zou et al.

NEURIPS 2025

PC-Net: Weakly Supervised Compositional Moment Retrieval via Proposal-Centric Network

Mingyao Zhou, Hao Sun, Wei Xie et al.

NEURIPS 2025oral

Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach

Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.

AAAI 2025paperarXiv:2404.03702
16
citations

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

Jian Xiao, Zijie Song, Jialong Hu et al.

NEURIPS 2025arXiv:2505.12499

ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection

Haowei Zhu, Tianxiang Pan, Rui Qin et al.

NEURIPS 2025spotlightarXiv:2510.15783
1
citations

RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.

NEURIPS 2025arXiv:2509.15257

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.

ICCV 2025arXiv:2502.06593
2
citations

Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control

Danfeng Li, Hui Zhang, Sheng Wang et al.

NEURIPS 2025arXiv:2506.00596
2
citations

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Shengqiong Wu, Hao Fei, Xiangtai Li et al.

ICLR 2025arXiv:2406.05127
58
citations

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

Shulei Wang, Wang Lin, Hai Huang et al.

CVPR 2025arXiv:2503.17675
10
citations

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.

CVPR 2025arXiv:2412.14167
75
citations

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Ruichen Wang, Zekang Chen, Chen Chen et al.

AAAI 2024paperarXiv:2305.13921
93
citations

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

Xuelu Feng, Dongdong Chen, Junsong Yuan et al.

ECCV 2024arXiv:2403.12042
17
citations

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng et al.

ICML 2024arXiv:2403.06702
7
citations

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

Shiyue Zhang, Zheng Chong, Xujie Zhang et al.

ECCV 2024arXiv:2408.12352
11
citations

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Ling Yang, Zhaochen Yu, Chenlin Meng et al.

ICML 2024arXiv:2401.11708
200
citations

Prioritized Semantic Learning for Zero-shot Instance Navigation

Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.

ECCV 2024arXiv:2403.11650
26
citations

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318
58
citations

Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution

AAAI 2024paperarXiv:2312.07823
10
citations