"transformer scaling" Papers
5 papers found
Conference
Infinity∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Jian Han, Jinlai Liu, Yi Jiang et al.
CVPR 2025arXiv:2412.04431
201
citations
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem et al.
ICLR 2025arXiv:2410.23168
10
citations
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang, Chaozheng Wang, Jing Li
NEURIPS 2025spotlightarXiv:2505.07260
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong, Hui Chen, Tianxiang Hao et al.
ECCV 2024arXiv:2403.09192
26
citations
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia et al.
ICML 2024arXiv:2403.09635
11
citations