"vanishing gradients" Papers
6 papers found
Conference
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph, Jerome Sieber, Melanie Zeilinger et al.
ICLR 2025arXiv:2410.10609
2
citations
Revisiting Glorot Initialization for Long-Range Linear Recurrences
Noga Bar, Mariia Seleznova, Yotam Alexander et al.
NEURIPS 2025arXiv:2505.19827
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Giyeong Oh, Woohyun Cho, Siyeol Kim et al.
NEURIPS 2025arXiv:2505.11881
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dongyoung Lim, Sotirios Sabanis
ICML 2024arXiv:2105.13937
13
citations
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia et al.
ICML 2024arXiv:2403.09635
11
citations
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
Antonio Orvieto, Soham De, Caglar Gulcehre et al.
ICML 2024arXiv:2307.11888
35
citations