"masked image modeling" Papers
27 papers found
Conference
Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations
Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.
Dataset Ownership Verification for Pre-trained Masked Models
Yuechen Xie, Jie Song, Yicheng Shan et al.
Denoising with a Joint-Embedding Predictive Architecture
Chen Dengsheng, Jie Hu, Xiaoming Wei et al.
Enhancing Vision-Language Model with Unmasked Token Alignment
Hongsheng Li, Jihao Liu, Boxiao Liu et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang, Hongzhen Wang, Di Wang et al.
Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection
Hyewon Park, Hyejin Park, Jueun Ko et al.
Learning Mask Invariant Mutual Information for Masked Image Modeling
Tao Huang, Yanxiang Ma, Shan You et al.
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Jinbin Bai, Tian Ye, Wei Chow et al.
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li, Luyuan Zhang, Zedong Wang et al.
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
MATTHIEU CORD, Antonin Vobecky, Oriane Siméoni et al.
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim, Byeongho Heo, Dongyoon Han
Reconstruction Target Matters in Masked Image Modeling for Cross-Domain Few-Shot Learning
Ran Ma, Yixiong Zou, Yuhua Li et al.
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
Xiang Li, Yong Tao, Siyuan Zhang et al.
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
Mohammad Mohammadi, Ziyi Wu, Igor Gilitschenski
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Yuheng Li, Tianyu Luan, Yizhou Wu et al.
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han, Shuai Zhang, Xingjian Shi et al.
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung et al.
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim, Sanghyuk Chun, Byeongho Heo et al.
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition
Cong Wu, Xiao-Jun Wu, Josef Kittler et al.
Stochastic positional embeddings improve masked image modeling
Amir Bar, Florian Bordes, Assaf Shocher et al.
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei, Abhinav Gupta, Pedro Morgado
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
Youssef Mansour, Xuyang Zhong, Serdar Caglar et al.
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang, Dongyoung Kim, Junsu Kim et al.
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Dezhi Peng, Chongyu Liu, Yuliang Liu et al.
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Swetha Sirnam, Jinyu Yang, Tal Neiman et al.