"masked image modeling" Papers

27 papers found

Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations

Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.

ICCV 2025arXiv:2412.03215
4
citations

Dataset Ownership Verification for Pre-trained Masked Models

Yuechen Xie, Jie Song, Yicheng Shan et al.

ICCV 2025arXiv:2507.12022
1
citations

Denoising with a Joint-Embedding Predictive Architecture

Chen Dengsheng, Jie Hu, Xiaoming Wei et al.

ICLR 2025arXiv:2410.03755
5
citations

Enhancing Vision-Language Model with Unmasked Token Alignment

Hongsheng Li, Jihao Liu, Boxiao Liu et al.

ICLR 2025arXiv:2405.19009

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025arXiv:2412.12032
11
citations

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling

Fengxiang Wang, Hongzhen Wang, Di Wang et al.

ICCV 2025arXiv:2406.11933
11
citations

Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection

Hyewon Park, Hyejin Park, Jueun Ko et al.

ICCV 2025arXiv:2409.08566
1
citations

Learning Mask Invariant Mutual Information for Masked Image Modeling

Tao Huang, Yanxiang Ma, Shan You et al.

ICLR 2025arXiv:2502.19718
4
citations

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Jinbin Bai, Tian Ye, Wei Chow et al.

ICLR 2025arXiv:2410.08261
44
citations

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Siyuan Li, Luyuan Zhang, Zedong Wang et al.

CVPR 2025arXiv:2504.00999
7
citations

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

MATTHIEU CORD, Antonin Vobecky, Oriane Siméoni et al.

ICLR 2025arXiv:2307.09361
9
citations

Morphing Tokens Draw Strong Masked Image Models

Taekyung Kim, Byeongho Heo, Dongyoon Han

ICLR 2025arXiv:2401.00254
3
citations

Reconstruction Target Matters in Masked Image Modeling for Cross-Domain Few-Shot Learning

Ran Ma, Yixiong Zou, Yuhua Li et al.

AAAI 2025paperarXiv:2412.19101
3
citations

REOBench: Benchmarking Robustness of Earth Observation Foundation Models

Xiang Li, Yong Tao, Siyuan Zhang et al.

NEURIPS 2025arXiv:2505.16793
3
citations

TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras

Mohammad Mohammadi, Ziyi Wu, Igor Gilitschenski

ICCV 2025arXiv:2508.00913

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking

Yuheng Li, Tianyu Luan, Yizhou Wu et al.

ECCV 2024arXiv:2407.06468
17
citations

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024arXiv:2404.01260
45
citations

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024paperarXiv:2304.10520
22
citations

Emerging Property of Masked Token for Effective Pre-training

Hyesong Choi, Hunsang Lee, Seyoung Joung et al.

ECCV 2024arXiv:2404.08330
10
citations

Learning with Unmasked Tokens Drives Stronger Vision Learners

Taekyung Kim, Sanghyuk Chun, Byeongho Heo et al.

ECCV 2024arXiv:2310.13593
3
citations

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition

Cong Wu, Xiao-Jun Wu, Josef Kittler et al.

AAAI 2024paperarXiv:2309.05834
26
citations

Stochastic positional embeddings improve masked image modeling

Amir Bar, Florian Bordes, Assaf Shocher et al.

ICML 2024arXiv:2308.00566
6
citations

Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning

Yibing Wei, Abhinav Gupta, Pedro Morgado

ECCV 2024arXiv:2407.15837
16
citations

TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts

Youssef Mansour, Xuyang Zhong, Serdar Caglar et al.

ECCV 2024
8
citations

Visual Representation Learning with Stochastic Frame Prediction

Huiwon Jang, Dongyoung Kim, Junsu Kim et al.

ICML 2024oralarXiv:2406.07398
9
citations

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Dezhi Peng, Chongyu Liu, Yuliang Liu et al.

AAAI 2024paperarXiv:2306.12106
18
citations

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

Swetha Sirnam, Jinyu Yang, Tal Neiman et al.

ECCV 2024arXiv:2407.13851
11
citations