Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization

22citations
arXiv:2412.17739
22
citations
#246
in ICML 2025
of 3340 papers
10
Top Authors
4
Data Points

Abstract

Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend.While prior works mainly address RoPE's limitations within attention, this paper uncovers the adverse effects on length generalization from nearly all parts of LMs.UsingDiscrete Signal Processingtheory, we show that RoPE enables periodic attention by implicitly achievingNon-Uniform Discrete Fourier Transform.However, this periodicity is undermined by the spectrum damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we proposeFourier Position Embedding (FoPE), which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization. FoPE constructsFourier Seriesand zero-outs the destructive frequency components, increasing model robustness against the spectrum damage.Experiments across various model scales and benchmarks show that, within varying context windows, FoPE maintains a more stable performance compared to other baselines.Several analyses and ablations bring further support to our method and theoretical modeling.

Citation History

Jan 28, 2026
0
Feb 13, 2026
22+22
Feb 13, 2026
22
Feb 13, 2026
22