Implicit regularization of deep residual networks towards neural ODEs

21citations

arXiv:2309.01213

citations

#923

in ICLR 2024

of 2297 papers

Top Authors

Data Points

Top Authors

Pierre Marion Yu-Han Wu Michael Sander Gérard Biau

Abstract

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Łojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.

Citation History

Jan 28, 2026

Feb 13, 2026

21+21

Feb 13, 2026