"training dynamics" Papers

28 papers found

A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement

Hui Yuan, Yifan Zeng, Yue Wu et al.

ICLR 2025arXiv:2410.13828
5
citations

Attention layers provably solve single-location regression

Pierre Marion, Raphaël Berthier, Gérard Biau et al.

ICLR 2025arXiv:2410.01537
11
citations

Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation

Muquan Li, Hang Gou, Dongyang Zhang et al.

NEURIPS 2025arXiv:2510.04838
1
citations

Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

Chaitanya Kapoor, Sudhanshu Srivastava, Meenakshi Khosla

NEURIPS 2025arXiv:2502.18710
1
citations

Contrastive Learning with Data Misalignment: Feature Purity, Training Dynamics and Theoretical Generalization Guarantees

Jiawei Sun, Shuai Zhang, Hongkang Li et al.

NEURIPS 2025

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining

Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu et al.

ICLR 2025arXiv:2502.06733
13
citations

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

Binghui Li, Zhixuan Pan, Kaifeng Lyu et al.

ICLR 2025arXiv:2410.10322

Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking

Ting Han, Linara Adilova, Henning Petzka et al.

NEURIPS 2025oralarXiv:2509.17738
3
citations

From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

Shubhra Mishra, Gabriel Poesia, Noah Goodman

COLM 2025paperarXiv:2407.00900
4
citations

Global Convergence in Neural ODEs: Impact of Activation Functions

Tianxiang Gao, Siyuan Sun, Hailiang Liu et al.

ICLR 2025arXiv:2509.22436
3
citations

Memorization in Graph Neural Networks

Adarsh Jamadandi, Jing Xu, Adam Dziedzic et al.

NEURIPS 2025arXiv:2508.19352

On the Feature Learning in Diffusion Models

Andi Han, Wei Huang, Yuan Cao et al.

ICLR 2025arXiv:2412.01021
14
citations

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

Xianliang Li, Jun Luo, Zhiwei Zheng et al.

ICLR 2025arXiv:2411.19671
4
citations

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.

ICLR 2025arXiv:2503.09543
16
citations

Position: Algebra Unveils Deep Learning - An Invitation to Neuroalgebraic Geometry

Giovanni Luca Marchetti, Vahid Shahverdi, Stefano Mereta et al.

ICML 2025spotlight
9
citations

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Daniel Palenicek, Florian Vogt, Joe Watson et al.

NEURIPS 2025arXiv:2502.07523
9
citations

The emergence of sparse attention: impact of data distribution and benefits of repetition

Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.

NEURIPS 2025oralarXiv:2505.17863
7
citations

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression

Jiarui Jiang, Wei Huang, Miao Zhang et al.

NEURIPS 2025arXiv:2509.23779
1
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025arXiv:2502.21212
22
citations

Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training

Tony Bonnaire, Raphaël Urfin, Giulio Biroli et al.

NEURIPS 2025oral

Dynamic Data Selection for Efficient SSL via Coarse-to-Fine Refinement

Aditay Tripathi, Pradeep Shenoy, Anirban Chakraborty

ECCV 2024
3
citations

Evolving Subnetwork Training for Large Language Models

hanqi li, Lu Chen, Da Ma et al.

ICML 2024arXiv:2406.06962
2
citations

How Graph Neural Networks Learn: Lessons from Training Dynamics

Chenxiao Yang, Qitian Wu, David Wipf et al.

ICML 2024arXiv:2310.05105
2
citations

Learning Associative Memories with Gradient Descent

Vivien Cabannnes, Berfin Simsek, Alberto Bietti

ICML 2024

Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting

Masoumeh Zareapoor, Pourya Shamsolmoali

ECCV 2024

Stability-Informed Initialization of Neural Ordinary Differential Equations

Theodor Westny, Arman Mohammadi, Daniel Jung et al.

ICML 2024arXiv:2311.15890
4
citations

United We Stand: Using Epoch-Wise Agreement of Ensembles to Combat Overfit

Uri Stern, Daniel Shwartz, Daphna Weinshall

AAAI 2024paperarXiv:2310.11077
6
citations

What is Dataset Distillation Learning?

William Yang, Ye Zhu, Zhiwei Deng et al.

ICML 2024arXiv:2406.04284
13
citations