Poster "layer normalization" Papers
5 papers found
Conference
Impact of Layer Norm on Memorization and Generalization in Transformers
Rishi Singhal, Jung-Eun Kim
NEURIPS 2025arXiv:2511.10566
1
citations
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li, Lu Yin, Shiwei Liu
ICLR 2025arXiv:2412.13795
26
citations
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
ICLR 2025arXiv:2410.10986
10
citations
On the Nonlinearity of Layer Normalization
Yunhao Ni, Yuxin Guo, Junlong Jia et al.
ICML 2024arXiv:2406.01255
7
citations
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang, Yingdong Shi, Cheems Wang et al.
ICML 2024arXiv:2406.16282
3
citations