Wavelet Convolutions for Large Receptive Fields

348citations

arXiv:2407.05848 PDF

348

citations

#20

in ECCV 2024

of 2387 papers

Top Authors

Data Points

Top Authors

Shahaf Finder Roy Amoyal Eran Treister Oren Freifeld

Abstract

In recent years, there have been attempts to increase the kernel size of Convolutional Neural Nets (CNNs) to mimic the global receptive field of Vision Transformers' (ViTs) self-attention blocks. That approach, however, quickly hit an upper bound and saturated way before achieving a global receptive field. In this work, we demonstrate that by leveraging the Wavelet Transform (WT), it is, in fact, possible to obtain very large receptive fields without suffering from over-parameterization, e.g., for a k times k receptive field, the number of trainable parameters in the proposed method grows only logarithmically with k. The proposed layer, named WTConv, can be used as a drop-in replacement in existing architectures, results in an effective multi-frequency response, and scales gracefully with the size of the receptive field. We demonstrate the effectiveness of the WTConv layer within ConvNeXt and MobileNetV2 architectures for image classification, as well as backbones for downstream tasks, and show it yields additional properties such as robustness to image corruption and increased response to shapes over textures. Our code will be released upon acceptance.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026

Feb 13, 2026

348+348

Feb 13, 2026

348

Feb 13, 2026

348