"transformer models" Papers
24 papers found
Conference
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Qixun Wang, Yifei Wang, Xianghua Ying et al.
Geometry of Decision Making in Language Models
Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi
Learning Randomized Algorithms with Transformers
Johannes von Oswald, Seijin Kobayashi, Yassir Akram et al.
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi, Clara Mohri, David Brandfonbrener et al.
Multi-modal brain encoding models for multi-modal stimuli
SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.
Robust Message Embedding via Attention Flow-Based Steganography
Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang et al.
SelectFormer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation
Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Zhongwei Yu, Wannian Xia, Xue Yan et al.
StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs
Qijun Luo, Mengqi Li, Lei Zhao et al.
Toward Understanding In-context vs. In-weight Learning
Bryan Chan, Xinyi Chen, Andras Gyorgy et al.
TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding
Shukai Gong, YIYANG FU, Fengyuan Ran et al.
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu, Xiaojuan Tang, Haotong Yang et al.
Delving into Differentially Private Transformer
Youlong Ding, Xueyang Wu, Yining meng et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman, Andrew Lampinen, Lucas Dixon et al.
Learning Associative Memories with Gradient Descent
Vivien Cabannnes, Berfin Simsek, Alberto Bietti
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp, Ruben Ohana, Michael Eickenberg et al.
Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
Mikail Khona, Maya Okawa, Jan Hula et al.
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
Ripon Saha, Dehao Qin, Nianyi Li et al.
Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations
Jan Hagnberger, Marimuthu Kalimuthu, Daniel Musekamp et al.
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Aaditya Singh, Ted Moskovitz, Feilx Hill et al.