"long-context modeling" Papers
11 papers found
Conference
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
ICLR 2025arXiv:2410.05258
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
NEURIPS 2025arXiv:2504.16795
3
citations
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Szymon Płotka, Gizem Mert, Maciej Chrabaszcz et al.
NEURIPS 2025arXiv:2507.06363
1
citations
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
CVPR 2025arXiv:2410.00871
9
citations
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck
ICLR 2025arXiv:2408.03350
24
citations
One-Minute Video Generation with Test-Time Training
Jiarui Xu, Shihao Han, Karan Dalal et al.
CVPR 2025arXiv:2504.05298
67
citations
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
NEURIPS 2025arXiv:2501.18795
20
citations
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu, Qian Liu, Haonan Wang et al.
NEURIPS 2025arXiv:2503.15450
3
citations
Stuffed Mamba: Oversized States Lead to the Inability to Forget
Yingfa Chen, Xinrong Zhang, Shengding Hu et al.
COLM 2025paper
3
citations
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
NEURIPS 2025arXiv:2503.14376
8
citations
MEMORYLLM: Towards Self-Updatable Large Language Models
Yu Wang, Yifan Gao, Xiusi Chen et al.
ICML 2024arXiv:2402.04624
43
citations