Poster by Acyr Locatelli Papers
4 papers found
Conference
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Laura Ruis, Maximilian Mozes, Juhan Bae et al.
ICLR 2025arXiv:2411.12580
28
citations
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
NEURIPS 2025arXiv:2501.18795
20
citations
To Code or Not To Code? Exploring Impact of Code in Pre-training
Viraat Aryabumi, Yixuan Su, Raymond Ma et al.
ICLR 2025arXiv:2408.10914
44
citations
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.
ICLR 2024arXiv:2309.05444
143
citations