"visual representation learning" Papers
31 papers found
Conference
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu, Yichen Zhu, Ning Liu et al.
Do vision models perceive objects like toddlers ?
Arthur Aubret, Jochen Triesch
DS-VLM: Diffusion Supervision Vision Language Model
Zhen Sun, Yunhang Shen, Jie Li et al.
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu, Wenwei Zhang, Lumin Xu et al.
Learning Visual Proxy for Compositional Zero-Shot Learning
Shiyu Zhang, Cheng Yan, Yang Liu et al.
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li, Luyuan Zhang, Zedong Wang et al.
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu et al.
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.
Nested Diffusion Models Using Hierarchical Latent Priors
Xiao Zhang, Ruoxi Jiang, Rebecca Willett et al.
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.
Token Bottleneck: One Token to Remember Dynamics
Taekyung Kim, Dongyoon Han, Byeongho Heo et al.
Autoencoding Conditional Neural Processes for Representation Learning
Victor Prokhorov, Ivan Titov, Siddharth N
Denoising Autoregressive Representation Learning
Yazhe Li, Jorg Bornschein, Ting Chen
Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, ioannis Patras
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.
FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
Chen-Wei Xie, Siyang Sun, Liming Zhao et al.
Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations
Stefan Sylvius Wagner Martinez, Stefan Harmeling
Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual Features
Thalles Silva, Helio Pedrini, Adín Ramírez Rivera
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai et al.
Open-World Dynamic Prompt and Continual Visual Representation Learning
Youngeun Kim, Jun Fang, Qin Zhang et al.
Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
Jiayun Wang, Yubei Chen, Stella Yu
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren, Zeyu Wang, Hongru Zhu et al.
Self-supervised visual learning from interactions with objects
Arthur Aubret, Céline Teulière, Jochen Triesch
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Kaibin Tian, Yanhua Cheng, Yi Liu et al.
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei, Abhinav Gupta, Pedro Morgado
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang et al.
Visual Alignment Pre-training for Sign Language Translation
Peiqi Jiao, Yuecong Min, Xilin CHEN
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao et al.
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Swetha Sirnam, Jinyu Yang, Tal Neiman et al.