"sequence modeling" Papers

43 papers found

Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data

Tianyi Chen, Pengxiao Lin, Zhiwei Wang et al.

NEURIPS 2025spotlightarXiv:2509.17514
1
citations

Adaptive Computation Pruning for the Forgetting Transformer

Zhixuan Lin, Johan Obando-Ceron, Xu Owen He et al.

COLM 2025paperarXiv:2504.06949
3
citations

BlockScan: Detecting Anomalies in Blockchain Transactions

Jiahao Yu, Xian Wu, Hao Liu et al.

NEURIPS 2025arXiv:2410.04039
3
citations

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025arXiv:2412.01003
43
citations

Controllable Generation via Locally Constrained Resampling

Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck

ICLR 2025arXiv:2410.13111
9
citations

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Julien Siems, Timur Carstensen, Arber Zela et al.

NEURIPS 2025arXiv:2502.10297
26
citations

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Wenlong Wang, Ivana Dusparic, Yucheng Shi et al.

ICLR 2025arXiv:2410.08893
3
citations

EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling

Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun et al.

NEURIPS 2025spotlightarXiv:2502.00466
2
citations

Enhancing the Maximum Effective Window for Long-Term Time Series Forecasting

Jiahui Zhang, Zhengyang Zhou, Wenjie Du et al.

NEURIPS 2025

Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models

Yasha Ektefaie, Andrew Shen, Lavik Jain et al.

NEURIPS 2025

Hankel Singular Value Regularization for Highly Compressible State Space Models

Paul Schwerdtner, Jules Berman, Benjamin Peherstorfer

NEURIPS 2025arXiv:2510.22951
2
citations

Improving Bilinear RNN with Closed-loop Control

Jiaxi Hu, Yongqi Pan, Jusen Du et al.

NEURIPS 2025spotlightarXiv:2506.02475
3
citations

Learning Video-Conditioned Policy on Unlabelled Data with Joint Embedding Predictive Transformer

Hao Luo, Zongqing Lu

ICLR 2025
4
citations

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubic, Federico Soldà, Aurelio Sulser et al.

ICLR 2025arXiv:2405.16674
18
citations

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025arXiv:2407.14207
31
citations

MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization

Yiwen Chen, Yikai Wang, Yihao Luo et al.

ICCV 2025arXiv:2408.02555
72
citations

Neural Attention Search

Difan Deng, Marius Lindauer

NEURIPS 2025arXiv:2502.13251
1
citations

Parallel Sequence Modeling via Generalized Spatial Propagation Network

Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.

CVPR 2025arXiv:2501.12381
3
citations

Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory

Svetha Venkatesh, Kien Do, Hung Le et al.

ICLR 2025

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214
32
citations

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Mónika Farsang, Radu Grosu

NEURIPS 2025arXiv:2505.21717
5
citations

SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling

Yizhao Gao, Zhichen Zeng, DaYou Du et al.

NEURIPS 2025

Selective induction Heads: How Transformers Select Causal Structures in Context

Francesco D'Angelo, francesco croce, Nicolas Flammarion

ICLR 2025arXiv:2509.08184
6
citations

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

Naoki Nishikawa, Taiji Suzuki

ICLR 2025arXiv:2405.19036
6
citations

Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models

Benjamin Walker, Lingyi Yang, Nicola Muca Cirone et al.

NEURIPS 2025spotlightarXiv:2505.17761
8
citations

Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling

Jiawei Xu, Rui Yang, Shuang Qiu et al.

ICLR 2025oralarXiv:2407.04285
4
citations

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
34
citations

Unsupervised Meta-Learning via In-Context Learning

Anna Vettoruzzo, Lorenzo Braccaioli, Joaquin Vanschoren et al.

ICLR 2025arXiv:2405.16124
3
citations

What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains

Chanakya Ekbote, Ashok Vardhan Makkuva, Marco Bondaschi et al.

NEURIPS 2025spotlightarXiv:2508.07208
1
citations

ZeroS: Zero‑Sum Linear Attention for Efficient Transformers

Jiecheng Lu, Xu Han, Yan Sun et al.

NEURIPS 2025spotlightarXiv:2602.05230

An Information-Theoretic Analysis of In-Context Learning

Hong Jun Jeon, Jason Lee, Qi Lei et al.

ICML 2024arXiv:2401.15530
37
citations

Efficient World Models with Context-Aware Tokenization

Vincent Micheli, Eloi Alonso, François Fleuret

ICML 2024arXiv:2406.19320
18
citations

From Generalization Analysis to Optimization Designs for State Space Models

Fusheng Liu, Qianxiao Li

ICML 2024oralarXiv:2405.02670
11
citations

How Transformers Learn Causal Structure with Gradient Descent

Eshaan Nichani, Alex Damian, Jason Lee

ICML 2024arXiv:2402.14735
102
citations

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Zeyang Liu, Lipeng Wan, Xinrui Yang et al.

AAAI 2024paperarXiv:2402.17978
7
citations

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li, Aditya Grover, Harkanwar Singh

ECCV 2024arXiv:2402.05892
106
citations

Reinformer: Max-Return Sequence Modeling for Offline RL

Zifeng Zhuang, Dengyun Peng, Jinxin Liu et al.

ICML 2024arXiv:2405.08740
25
citations

Repeat After Me: Transformers are Better than State Space Models at Copying

Samy Jelassi, David Brandfonbrener, Sham Kakade et al.

ICML 2024arXiv:2402.01032
162
citations

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

Yi Ma, Jianye Hao, Hebin Liang et al.

ICML 2024arXiv:2311.00267
14
citations

Self-Distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Ziyin Zhang, Ning Lu, Minghui Liao et al.

AAAI 2024paperarXiv:2308.08806
20
citations

Timer: Generative Pre-trained Transformers Are Large Time Series Models

Yong Liu, Haoran Zhang, Chenyu Li et al.

ICML 2024arXiv:2402.02368
148
citations

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

Antonio Orvieto, Soham De, Caglar Gulcehre et al.

ICML 2024arXiv:2307.11888
35
citations

Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions

Yongqiang Cai

ICML 2024spotlightarXiv:2305.12205
11
citations