Poster "feed-forward networks" Papers
8 papers found
Conference
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia et al.
ICCV 2025arXiv:2510.09094
5
citations
Masked Gated Linear Unit
Yukito Tajima, Nakamasa Inoue, Yusuke Sekikawa et al.
NEURIPS 2025arXiv:2506.23225
ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation
Pengcheng Huang, Zhenghao Liu, Yukun Yan et al.
NEURIPS 2025arXiv:2502.15543
4
citations
The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling
Ruochen Zhang, Qinan Yu, Matianyu Zang et al.
ICLR 2025arXiv:2410.09223
16
citations
Accelerating Transformer Pre-training with 2:4 Sparsity
Yuezhou Hu, Kang Zhao, Weiyu Huang et al.
ICML 2024arXiv:2404.01847
18
citations
On the Diminishing Returns of Width for Continual Learning
Etash Guha, Vihan Lakshman
ICML 2024arXiv:2403.06398
9
citations
ReLU Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate
Chenghao Liu, Minghua Chen
ICML 2024
Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen et al.
ICML 2024