Kwangjun Ahn

papers

511

total citations

papers (11)

Transformers learn to implement preconditioned gradient descent for in-context learning

NEURIPS 2023arXiv

252

citations

SGD with shuffling: optimal rates without component convexity and large epoch requirements

NEURIPS 2020arXiv

citations

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

ICML 2024arXiv

citations

How to Escape Sharp Minima with Random Perturbations

ICML 2024arXiv

citations

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

ICML 2025arXiv

citations

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

NEURIPS 2025arXiv

citations

Learning threshold neurons via edge of stability

NEURIPS 2023

citations

Kwangjun Ahn

papers (11)

Transformers learn to implement preconditioned gradient descent for in-context learning

SGD with shuffling: optimal rates without component convexity and large epoch requirements

Efficient constrained sampling via the mirror-Langevin algorithm

The Crucial Role of Normalization in Sharpness-Aware Minimization

Reproducibility in Optimization: Theoretical Framework and Limits

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

How to Escape Sharp Minima with Random Perturbations

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

Learning threshold neurons via edge of stability

papers (11)

Transformers learn to implement preconditioned gradient descent for in-context learning

SGD with shuffling: optimal rates without component convexity and large epoch requirements

Efficient constrained sampling via the mirror-Langevin algorithm

The Crucial Role of Normalization in Sharpness-Aware Minimization

Reproducibility in Optimization: Theoretical Framework and Limits

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

How to Escape Sharp Minima with Random Perturbations

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

Learning threshold neurons via edge of stability