"pre-training efficiency" Papers
4 papers found
Conference
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-training of Deep Networks
Siddharth Joshi, Jiayi Ni, Baharan Mirzasoleiman
ICLR 2025arXiv:2410.02116
4
citations
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung et al.
ECCV 2024arXiv:2404.08330
10
citations
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin et al.
ICML 2024arXiv:2410.03440
6
citations
Getting the most out of your tokenizer for pre-training and domain adaptation
Gautier Dagan, Gabriel Synnaeve, Baptiste Roziere
ICML 2024arXiv:2402.01035
62
citations