"memory-efficient training" Papers

17 papers found

Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation

Muquan Li, Hang Gou, Dongyang Zhang et al.

NEURIPS 2025arXiv:2510.04838
1
citations

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.

CVPR 2025arXiv:2412.00071
6
citations

COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training

Haocheng Xi, Han Cai, Ligeng Zhu et al.

ICLR 2025arXiv:2410.19313
19
citations

Efficient Training of Neural Fractional-Order Differential Equation via Adjoint Backpropagation

Qiyu Kang, Xuhao Li, Kai Zhao et al.

AAAI 2025paperarXiv:2503.16666
3
citations

Gradient Multi-Normalization for Efficient LLM Training

Meyer Scetbon, Chao Ma, Wenbo Gong et al.

NEURIPS 2025
3
citations

Irrational Complex Rotations Empower Low-bit Optimizers

Zhen Tian, Xin Zhao, Ji-Rong Wen

NEURIPS 2025arXiv:2501.12896

LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades

Yanan Li, Fanxu Meng, Muhan Zhang et al.

NEURIPS 2025arXiv:2505.13515
2
citations

Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures

Dang Nguyen, Wenhan Yang, Rathul Anand et al.

ICLR 2025arXiv:2407.19580
7
citations

Private Training Large-scale Models with Efficient DP-SGD

Liangyu Wang, Junxiao Wang, Jie Ren et al.

NEURIPS 2025

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao, Sizhe Dang, Haishan Ye et al.

ICLR 2025
30
citations

SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training

Yehonathan Refael, Guy Smorodinsky, Tom Tirer et al.

NEURIPS 2025arXiv:2505.24749
7
citations

DPZero: Private Fine-Tuning of Language Models without Backpropagation

Liang Zhang, Bingcong Li, Kiran Thekumparampil et al.

ICML 2024arXiv:2310.09639
22
citations

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Jiawei Zhao, Zhenyu Zhang, Beidi Chen et al.

ICML 2024arXiv:2403.03507
371
citations

Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Wu Lin, Felix Dangel, Runa Eschenhagen et al.

ICML 2024

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

Ruonan Yu, Songhua Liu, Jingwen Ye et al.

ECCV 2024arXiv:2410.07579
13
citations

TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

Young Kwon, Rui Li, Stylianos Venieris et al.

ICML 2024arXiv:2307.09988
22
citations

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-Order Optimization

Shuoran Jiang, Qingcai Chen, Yang Xiang et al.

AAAI 2024paperarXiv:2312.15184
21
citations