"visual representation learning" Papers

31 papers found

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu, Yichen Zhu, Ning Liu et al.

AAAI 2025paperarXiv:2403.06199
27
citations

Do vision models perceive objects like toddlers ?

Arthur Aubret, Jochen Triesch

ICLR 2025

DS-VLM: Diffusion Supervision Vision Language Model

Zhen Sun, Yunhang Shen, Jie Li et al.

ICML 2025
1
citations

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICCV 2025arXiv:2503.21979
37
citations

Learning Visual Proxy for Compositional Zero-Shot Learning

Shiyu Zhang, Cheng Yan, Yang Liu et al.

ICCV 2025arXiv:2501.13859

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Siyuan Li, Luyuan Zhang, Zedong Wang et al.

CVPR 2025arXiv:2504.00999
7
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NEURIPS 2025arXiv:2506.22434

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.

CVPR 2025arXiv:2406.14235
19
citations

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.

ICCV 2025arXiv:2507.21391
7
citations

Nested Diffusion Models Using Hierarchical Latent Priors

Xiao Zhang, Ruoxi Jiang, Rebecca Willett et al.

CVPR 2025arXiv:2412.05984
2
citations

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.

ICLR 2025arXiv:2410.06940
342
citations

Token Bottleneck: One Token to Remember Dynamics

Taekyung Kim, Dongyoon Han, Byeongho Heo et al.

NEURIPS 2025oralarXiv:2507.06543
1
citations

Autoencoding Conditional Neural Processes for Representation Learning

Victor Prokhorov, Ivan Titov, Siddharth N

ICML 2024arXiv:2305.18485

Denoising Autoregressive Representation Learning

Yazhe Li, Jorg Bornschein, Ting Chen

ICML 2024arXiv:2403.05196
7
citations

Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing

Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, ioannis Patras

ECCV 2024arXiv:2407.11168
2
citations

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.

CVPR 2024arXiv:2401.06209
593
citations

FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors

Chen-Wei Xie, Siyang Sun, Liming Zhao et al.

ECCV 2024

Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations

Stefan Sylvius Wagner Martinez, Stefan Harmeling

ICML 2024

Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual Features

Thalles Silva, Helio Pedrini, Adín Ramírez Rivera

ICML 2024arXiv:2407.17486
7
citations

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.

ICML 2024arXiv:2405.00740
39
citations

Multi-Label Cluster Discrimination for Visual Representation Learning

Xiang An, Kaicheng Yang, Xiangzi Dai et al.

ECCV 2024arXiv:2407.17331
14
citations

Open-World Dynamic Prompt and Continual Visual Representation Learning

Youngeun Kim, Jun Fang, Qin Zhang et al.

ECCV 2024arXiv:2409.05312
5
citations

Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization

Jiayun Wang, Yubei Chen, Stella Yu

ECCV 2024arXiv:2403.14973
6
citations

Rejuvenating image-GPT as Strong Visual Representation Learners

Sucheng Ren, Zeyu Wang, Hongru Zhu et al.

ICML 2024arXiv:2312.02147
14
citations

Self-supervised visual learning from interactions with objects

Arthur Aubret, Céline Teulière, Jochen Triesch

ECCV 2024arXiv:2407.06704
10
citations

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Kaibin Tian, Yanhua Cheng, Yi Liu et al.

AAAI 2024paperarXiv:2401.00701
16
citations

Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning

Yibing Wei, Abhinav Gupta, Pedro Morgado

ECCV 2024arXiv:2407.15837
16
citations

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang et al.

ICML 2024arXiv:2401.09417
1457
citations

Visual Alignment Pre-training for Sign Language Translation

Peiqi Jiao, Yuecong Min, Xilin CHEN

ECCV 2024
17
citations

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024arXiv:2403.13043
71
citations

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

Swetha Sirnam, Jinyu Yang, Tal Neiman et al.

ECCV 2024arXiv:2407.13851
11
citations