"memory-efficient training" Papers
17 papers found
Conference
Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation
Muquan Li, Hang Gou, Dongyang Zhang et al.
NEURIPS 2025arXiv:2510.04838
1
citations
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.
CVPR 2025arXiv:2412.00071
6
citations
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Haocheng Xi, Han Cai, Ligeng Zhu et al.
ICLR 2025arXiv:2410.19313
19
citations
Efficient Training of Neural Fractional-Order Differential Equation via Adjoint Backpropagation
Qiyu Kang, Xuhao Li, Kai Zhao et al.
AAAI 2025paperarXiv:2503.16666
3
citations
Gradient Multi-Normalization for Efficient LLM Training
Meyer Scetbon, Chao Ma, Wenbo Gong et al.
NEURIPS 2025
3
citations
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian, Xin Zhao, Ji-Rong Wen
NEURIPS 2025arXiv:2501.12896
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades
Yanan Li, Fanxu Meng, Muhan Zhang et al.
NEURIPS 2025arXiv:2505.13515
2
citations
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
Dang Nguyen, Wenhan Yang, Rathul Anand et al.
ICLR 2025arXiv:2407.19580
7
citations
Private Training Large-scale Models with Efficient DP-SGD
Liangyu Wang, Junxiao Wang, Jie Ren et al.
NEURIPS 2025
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao, Sizhe Dang, Haishan Ye et al.
ICLR 2025
30
citations
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael, Guy Smorodinsky, Tom Tirer et al.
NEURIPS 2025arXiv:2505.24749
7
citations
DPZero: Private Fine-Tuning of Language Models without Backpropagation
Liang Zhang, Bingcong Li, Kiran Thekumparampil et al.
ICML 2024arXiv:2310.09639
22
citations
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao, Zhenyu Zhang, Beidi Chen et al.
ICML 2024arXiv:2403.03507
371
citations
Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC
Wu Lin, Felix Dangel, Runa Eschenhagen et al.
ICML 2024
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
Ruonan Yu, Songhua Liu, Jingwen Ye et al.
ECCV 2024arXiv:2410.07579
13
citations
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
Young Kwon, Rui Li, Stylianos Venieris et al.
ICML 2024arXiv:2307.09988
22
citations
ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-Order Optimization
Shuoran Jiang, Qingcai Chen, Yang Xiang et al.
AAAI 2024paperarXiv:2312.15184
21
citations