"softmax function" Papers
2 papers found
Conference
From Attention to Activation: Unraveling the Enigmas of Large Language Models
Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.
ICLR 2025arXiv:2410.17174
8
citations
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.
ICML 2024arXiv:2310.12956
11
citations