"memory footprint optimization" Papers
2 papers found
Conference
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
NEURIPS 2025arXiv:2507.07101
22
citations
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
ICML 2024arXiv:2401.06118
160
citations