ALLaM: Large Language Models for Arabic and English

49citations

Top Authors

M Saiful Bari Yazeed Alnumay Norah Alzahrani Nouf Alotaibi Hisham Alyahya AlRashed Faisal Mirza Shaykhah Alsubaie Hassan Alahmed Ghadah Alabduljabbar Raghad Alkhathran Yousef Almushayqih Raneem Alnajim Salman I Alsubaihi Maryam Al Mansour Saad Hassan Majed Alrubaian Ali Alammari Zaki Alawami Abdulmohsen Al-Thubaity Ahmed Abdelali Jeril Kuriakose Abdalghani Abujabal Nora Al-Twairesh Areeb Alowisheq Haidar Khan

Topics

large language models arabic language technologies vocabulary expansion knowledge transfer multilingual pretraining human preference alignment language alignment parallel data utilization

Abstract

We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is carefully trained considering the values of language alignment and knowledge transfer at scale. Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining on a mixture of Arabic and English text can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English). Furthermore, we highlight the effectiveness of using parallel/translated data to aid the process of knowledge alignment between languages. Finally, we show that extensive alignment with human preferences can significantly enhance the performance of a language model compared to models of a larger scale with lower quality alignment. ALLaM achieves state-of-the-art performance in various Arabic benchmarks, including MMLU Arabic, ACVA, and Arabic Exams. Our aligned models improve both in Arabic and English from their base aligned models.

Citation History

Jan 26, 2026

46

Jan 27, 2026

47+1

Feb 3, 2026

47

Feb 13, 2026

49+2

Feb 13, 2026

49

Feb 13, 2026

49