Poster "length generalization" Papers

16 papers found

A Formal Framework for Understanding Length Generalization in Transformers

Xinting Huang, Andy Yang, Satwik Bhattamishra et al.

ICLR 2025arXiv:2410.02140
29
citations

Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

Yi Hu, Shijia Kang, Haotong Yang et al.

NEURIPS 2025arXiv:2502.11525
4
citations

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.

NEURIPS 2025arXiv:2505.21785
3
citations

Generalizing Reasoning Problems to Longer Lengths

Changnan Xiao, Bing Liu

ICLR 2025
4
citations

Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access

Xiang Hu, Jiaqi Leng, Jun Zhao et al.

NEURIPS 2025arXiv:2504.16795
3
citations

Language Models Need Inductive Biases to Count Inductively

Yingshan Chang, Yonatan Bisk

ICLR 2025arXiv:2405.20131
20
citations

Length Generalization via Auxiliary Tasks

Pranjal Awasthi, Anupam Gupta, Ravi Kumar

NEURIPS 2025

Looped Transformers for Length Generalization

Ying Fan, Yilun Du, Kannan Ramchandran et al.

ICLR 2025arXiv:2409.15647
41
citations

Mamba Modulation: On the Length Generalization of Mamba Models

Peng Lu, Jerry Huang, QIUHAO Zeng et al.

NEURIPS 2025

Provable Length Generalization in Sequence Prediction via Spectral Filtering

Annie Marsden, Evan Dogariu, Naman Agarwal et al.

ICML 2025arXiv:2411.01035
1
citations

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Liliang Ren, Yang Liu, Yadong Lu et al.

ICLR 2025arXiv:2406.07522
122
citations

Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization

Yu Huang, Zixin Wen, Aarti Singh et al.

NEURIPS 2025arXiv:2511.07378
5
citations

Case-Based or Rule-Based: How Do Transformers Do the Math?

Yi Hu, Xiaojuan Tang, Haotong Yang et al.

ICML 2024arXiv:2402.17709
32
citations

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024arXiv:2312.06635
329
citations

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

Yixuan Weng, Minjun Zhu, Fei Xia et al.

ICLR 2024arXiv:2304.01665
12
citations

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

Zixuan Wang, Stanley Wei, Daniel Hsu et al.

ICML 2024arXiv:2406.06893
21
citations