"transformer models" Papers

24 papers found

A multiscale analysis of mean-field transformers in the moderate interaction regime

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

NEURIPS 2025oralarXiv:2509.25040
8
citations

Can In-context Learning Really Generalize to Out-of-distribution Tasks?

Qixun Wang, Yifei Wang, Xianghua Ying et al.

ICLR 2025arXiv:2410.09695
16
citations

Geometry of Decision Making in Language Models

Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi

NEURIPS 2025arXiv:2511.20315
1
citations

Learning Randomized Algorithms with Transformers

Johannes von Oswald, Seijin Kobayashi, Yassir Akram et al.

ICLR 2025arXiv:2408.10818
1
citations

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.

CVPR 2025arXiv:2503.18314
8
citations

Mixture of Parrots: Experts improve memorization more than reasoning

Samy Jelassi, Clara Mohri, David Brandfonbrener et al.

ICLR 2025arXiv:2410.19034
14
citations

Multi-modal brain encoding models for multi-modal stimuli

SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.

ICLR 2025arXiv:2505.20027
10
citations

Robust Message Embedding via Attention Flow-Based Steganography

Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang et al.

CVPR 2025arXiv:2405.16414
5
citations

SelectFormer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

ICLR 2025

Self-Verifying Reflection Helps Transformers with CoT Reasoning

Zhongwei Yu, Wannian Xia, Xue Yan et al.

NEURIPS 2025arXiv:2510.12157
2
citations

StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs

Qijun Luo, Mengqi Li, Lei Zhao et al.

NEURIPS 2025arXiv:2506.03077
1
citations

Toward Understanding In-context vs. In-weight Learning

Bryan Chan, Xinyi Chen, Andras Gyorgy et al.

ICLR 2025arXiv:2410.23042
15
citations

TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding

Shukai Gong, YIYANG FU, Fengyuan Ran et al.

NEURIPS 2025oralarXiv:2507.09252

Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity

Jiachen Jiang, Jinxin Zhou, Zhihui Zhu

ICLR 2025arXiv:2406.14479
18
citations

Case-Based or Rule-Based: How Do Transformers Do the Math?

Yi Hu, Xiaojuan Tang, Haotong Yang et al.

ICML 2024arXiv:2402.17709
32
citations

Delving into Differentially Private Transformer

Youlong Ding, Xueyang Wu, Yining meng et al.

ICML 2024arXiv:2405.18194
11
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024arXiv:2403.06082
16
citations

Interpretability Illusions in the Generalization of Simplified Models

Dan Friedman, Andrew Lampinen, Lucas Dixon et al.

ICML 2024arXiv:2312.03656
20
citations

Learning Associative Memories with Gradient Descent

Vivien Cabannnes, Berfin Simsek, Alberto Bietti

ICML 2024

MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp, Ruben Ohana, Michael Eickenberg et al.

ICML 2024arXiv:2305.07583
20
citations

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Mikail Khona, Maya Okawa, Jan Hula et al.

ICML 2024arXiv:2402.07757
10
citations

Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence

Ripon Saha, Dehao Qin, Nianyi Li et al.

CVPR 2024arXiv:2404.13605
9
citations

Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations

Jan Hagnberger, Marimuthu Kalimuthu, Daniel Musekamp et al.

ICML 2024oralarXiv:2406.03919
10
citations

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Aaditya Singh, Ted Moskovitz, Feilx Hill et al.

ICML 2024spotlightarXiv:2404.07129
64
citations