"training efficiency" Papers
30 papers found
Conference
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Suorong Yang, Peng Ye, Wanli Ouyang et al.
Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals
Qinsi Wang, Jinghan Ke, Hancheng Ye et al.
BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training
Chenyi yang, Wenjie Nie, Yuxin Zhang et al.
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
Han Lin, Jaemin Cho, Amir Zadeh et al.
Cut Your Losses in Large-Vocabulary Language Models
Erik Wijmans, Brody Huval, Alexander Hertzberg et al.
DataRater: Meta-Learned Dataset Curation
Dan Andrei Calian, Greg Farquhar, Iurii Kemaev et al.
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura, Takuya Akiba, Kazuki Fujii et al.
Efficient Representativeness-Aware Coreset Selection
Zihao Cheng, Binrui Wu, Zhiwei Li et al.
Faster and Better 3D Splatting via Group Training
Chengbo Wang, Guozheng Ma, Yizhen Lao et al.
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Yiqin Yang, Quanwei Wang, Chenghao Li et al.
Improved Noise Schedule for Diffusion Training
Tiankai Hang, Shuyang Gu, Jianmin Bao et al.
Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking
Xin Tong, Shi Peng, Baojie Tian et al.
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Enshu Liu, Junyi Zhu, Zinan Lin et al.
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Fu-Yun Wang, Ling Yang, Zhaoyang Huang et al.
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Suorong Yang, Peijia Li, Furao Shen et al.
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
Ge Wu, Shen Zhang, Ruijing Shi et al.
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu, Qian Liu, Haonan Wang et al.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
Felix Krause, Timy Phan, Ming Gui et al.
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu, Faisal Hamman, Sanghamitra Dutta
VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting
Junhyeok Kang, Yooju Shin, Jae-Gil Lee
Bucketed Ranking-based Losses for Efficient Training of Object Detectors
Feyza Yavuz, Baris Can Cam, Adnan Harun Dogan et al.
BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges
Hoyong Choi, Nohyun Ki, Hye Won Chung
Diversified Batch Selection for Training Acceleration
Feng Hong, Yueming LYU, Jiangchao Yao et al.
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen, Xuchen Pan, Yaliang Li et al.
Ranking-based Client Imitation Selection for Efficient Federated Learning
Chunlin Tian, Zhan Shi, Xinpeng Qin et al.
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa, Shreyas Saxena, Abhay Gupta et al.