"transformer-based models" Papers

16 papers found

AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

David McCoy, Yulun Wu, Zachary Butzin-Dozier

NEURIPS 2025arXiv:2511.01077

Enhancing the Maximum Effective Window for Long-Term Time Series Forecasting

Jiahui Zhang, Zhengyang Zhou, Wenjie Du et al.

NEURIPS 2025

From Promise to Practice: Realizing High-performance Decentralized Training

Zesen Wang, Jiaojiao Zhang, Xuyang Wu et al.

ICLR 2025arXiv:2410.11998
3
citations

Infer Human’s Intentions Before Following Natural Language Instructions

Yanming Wan, Yue Wu, Yiping Wang et al.

AAAI 2025paperarXiv:2409.18073
6
citations

Retrieval Head Mechanistically Explains Long-Context Factuality

Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.

ICLR 2025arXiv:2404.15574
150
citations

SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting

Hui Chen, Viet Luong, Lopamudra Mukherjee et al.

ICLR 2025oral
14
citations

TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Wenting Xu, Viorela Ila, Luping Zhou et al.

AAAI 2025paperarXiv:2412.05596
2
citations

UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming

Hao Lin, Ke Wu, Jie Li et al.

CVPR 2025arXiv:2307.16375
4
citations

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
36
citations

An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention

Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.

AAAI 2024paperarXiv:2312.10325
104
citations

EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation

Chenhongyi Yang, Anastasia Tkach, Shreyas Hampali et al.

ECCV 2024arXiv:2403.18080
5
citations

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

Shangchao Su, Bin Li, Xiangyang Xue

ECCV 2024arXiv:2311.11227
21
citations

Harnessing Joint Rain-/Detail-aware Representations to Eliminate Intricate Rains

Wu Ran, Peirong Ma, Zhiquan He et al.

ICLR 2024arXiv:2404.12091
4
citations

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

Zhanpeng Zeng, Karthikeyan Sankaralingam, Vikas Singh

ICML 2024arXiv:2403.07339
1
citations

MERGE: Fast Private Text Generation

Zi Liang, Pinghui Wang, Ruofei Zhang et al.

AAAI 2024paperarXiv:2305.15769
14
citations

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Haozheng Luo et al.

ICML 2024arXiv:2404.03828
42
citations