Poster "length generalization" Papers
16 papers found
Conference
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra et al.
ICLR 2025arXiv:2410.02140
29
citations
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs
Yi Hu, Shijia Kang, Haotong Yang et al.
NEURIPS 2025arXiv:2502.11525
4
citations
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
NEURIPS 2025arXiv:2505.21785
3
citations
Generalizing Reasoning Problems to Longer Lengths
Changnan Xiao, Bing Liu
ICLR 2025
4
citations
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
NEURIPS 2025arXiv:2504.16795
3
citations
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang, Yonatan Bisk
ICLR 2025arXiv:2405.20131
20
citations
Length Generalization via Auxiliary Tasks
Pranjal Awasthi, Anupam Gupta, Ravi Kumar
NEURIPS 2025
Looped Transformers for Length Generalization
Ying Fan, Yilun Du, Kannan Ramchandran et al.
ICLR 2025arXiv:2409.15647
41
citations
Mamba Modulation: On the Length Generalization of Mamba Models
Peng Lu, Jerry Huang, QIUHAO Zeng et al.
NEURIPS 2025
Provable Length Generalization in Sequence Prediction via Spectral Filtering
Annie Marsden, Evan Dogariu, Naman Agarwal et al.
ICML 2025arXiv:2411.01035
1
citations
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren, Yang Liu, Yadong Lu et al.
ICLR 2025arXiv:2406.07522
122
citations
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
NEURIPS 2025arXiv:2511.07378
5
citations
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu, Xiaojuan Tang, Haotong Yang et al.
ICML 2024arXiv:2402.17709
32
citations
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang, Bailin Wang, Yikang Shen et al.
ICML 2024arXiv:2312.06635
329
citations
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Yixuan Weng, Minjun Zhu, Fei Xia et al.
ICLR 2024arXiv:2304.01665
12
citations
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
Zixuan Wang, Stanley Wei, Daniel Hsu et al.
ICML 2024arXiv:2406.06893
21
citations