The New Arab LLM Race: Sovereign AI and the Limits of Gulf-Egyptian Ambition

The recent wave of Arabic large language models, Jais in the UAE, ALLaM in Saudi Arabia, and Karnak in Egypt, marks a decisive shift in the region’s AI politics. These models are not merely technical products, but also instruments of state branding, digital sovereignty, linguistic competition, and future public-service infrastructure.

Their emergence reflects a shared Arab concern: most global AI systems still treat Arabic as a secondary language, often weak in dialects, cultural references, political context, and local institutional knowledge. But the three projects are very different in maturity, transparency, and strategic depth.

Jais is the most technically documented and open of the three. Launched in 2023 by G42’s Inception with MBZUAI and Cerebras, Jais began as a 13-billion-parameter Arabic-centric model and was released with both foundation and chat versions. Its paper describes it as a GPT-3-style decoder-only model trained on Arabic, English, and code, with strong Arabic performance compared with other open Arabic and multilingual models of similar size.

In 2024, the Jais family expanded dramatically. Inception released 20 models across sizes from 590M to 70B parameters, including models trained from scratch and models adapted from Llama 2. The public model card says the family was trained on up to 1.6 trillion tokens of Arabic, English, and code data, and released under Apache 2.0.

This makes Jais important for two reasons. First, it is not only a national showcase, but a usable infrastructure. Developers, researchers, and companies can download or fine-tune open-weight models. Second, it represents a pragmatic engineering strategy. Instead of insisting that all Arabic sovereign AI must be trained from scratch, Jais combines from-scratch models with adapted models built on top of Llama 2, adding Arabic tokenizer expansion and Arabic-heavy continued training.

That is technically significant because Arabic tokenization is one of the hidden weaknesses of many global LLMs. Arabic can require more tokens for the same meaning, raising cost and reducing fluency. Jais explicitly addresses this by adding 32,000 Arabic tokens to the Llama 2 tokenizer in adapted models.

ALLaM, Saudi Arabia’s model, is more closely tied to state-enterprise deployment. It began under SDAIA and later became central to HUMAIN, Saudi Arabia’s PIF-owned AI company.

IBM announced in May 2024 that SDAIA’s ALLaM was operational on IBM watsonx, describing it as an Arabic LLM available through an enterprise AI platform. In September 2024, IBM said watsonx.ai and ALLaM would be available to Saudi government entities through the DEEM government cloud, explicitly linking the model to government AI transformation.

Technically, ALLaM is also well documented compared with Karnak. Its ICLR 2025 paper presents models at 7B, 13B, 34B, and 70B scales and says the system was trained for Arabic-English transfer, alignment, and human preference optimization. The paper reports Arabic benchmark evaluations across MMLU Arabic, Arabic exams, truthfulness, math, and other tasks, and states that evaluation included automatic, LLM-based, and human assessments. The model’s alignment process is especially notable: the authors report 25,854 human preference samples in Arabic and English, expanded into 245,000 samples after filtering for preference training.

Saudi Arabia’s 2025 HUMAIN Chat release pushed ALLaM from research model to consumer-facing application. HUMAIN said the app is powered by ALLaM 34B and designed for Arabic speakers and Muslim users underserved by global generative AI.

Saudipedia describes HUMAIN Chat as supporting Arabic voice input, dialects, Arabic-English switching, real-time search, and hosting on Saudi infrastructure in compliance with the Saudi Personal Data Protection Law. It also says ALLaM 34B was refined with contributions from around 600 experts and 250 evaluators.

Karnak, Egypt’s model, is the least transparent of the three. Egypt announced it in February 2026 at Ai Everything MEA in Cairo as a national large language model and part of a broader AI infrastructure push.

ITIDA’s official release says Karnak is Egypt’s national LLM and claims it is the “highest-ranking Arabic LLM” in the 30–40B and 70–80B parameter classes. It also says Karnak will serve as a local intelligence foundation for startups, enterprises, and public institutions.

The same release links Karnak to concrete applications: SIA, a personalized AI tutor for Arabic and Egyptian history in high schools; a legal and regulatory assistant; Digital Egypt call-center auditing tools; medical AI tools for diabetic retinopathy, macular edema, and breast cancer; Torgoman for specialized translation; BelMasry for colloquial Arabic NLP; and Loghat for English-language education.

This is ambitious, but the evidence gap is striking. Unlike Jais and ALLaM, Karnak currently lacks a public technical paper, model card, training-data description, benchmark methodology, safety report, external audit, licensing terms, or public API/weights disclosure.

The official Egyptian statement gives claims, use cases, and strategic language, but not enough technical evidence to independently assess whether Karnak is a genuinely competitive foundation model, an adapted open model, a family of models, or a state-branded layer over multiple AI systems. That does not mean Karnak is weak; it means its public accountability is weak.

The three models therefore represent three different models of “sovereign AI.” The UAE’s Jais is the most open-weight and research-facing. Saudi Arabia’s ALLaM is the most state-enterprise integrated, with a strong deployment path through IBM, DEEM Cloud, HUMAIN, and government infrastructure. Egypt’s Karnak is the most public-sector-use-case oriented, but also the most opaque.

The central promise of all three is linguistic justice. Arabic is spoken by more than 400 million people, but it is not one simple language environment. It includes Modern Standard Arabic, religious Arabic, administrative Arabic, media Arabic, and dozens of dialects from Moroccan Darija to Egyptian Arabic to Gulf dialects. Global models often perform adequately in formal Arabic but struggle with dialect, sarcasm, political nuance, low-resource local references, and code-switching. Jais, ALLaM, and Karnak all claim to solve this problem by embedding Arabic more deeply into training and deployment.

But the real test is not whether they can answer in Arabic. The real test is whether they can handle Arabic as Arabs actually use it: dialect-switching, informal spelling, mixed Arabic-English professional language, political sensitivity, religious reference, and local institutional context. Jais has the strongest open ecosystem for researchers to test that. ALLaM has the strongest Saudi institutional machinery. Karnak has the largest potential Egyptian public-service testing ground, especially because Egypt’s dialect and media production dominate much of the Arabic-speaking public sphere.

The political economy is equally important. These models are part of a larger regional competition over who will own the Arab AI stack: data, compute, cloud, applications, safety standards, language infrastructure, and public-sector deployment. The UAE has built its AI posture around G42, MBZUAI, and global partnerships. Saudi Arabia is embedding AI within Vision 2030, PIF, SDAIA, HUMAIN, and sovereign cloud infrastructure. Egypt is trying to position itself as a regional AI hub through Digital Egypt, offshoring, Arabic NLP, public services, and lower-cost technical talent.

Serious risks should also be seriously taken. State-built LLMs can easily become instruments of centralized knowledge control if they are deployed in education, legal guidance, media, or public administration without transparency. “Cultural alignment” can be positive when it improves linguistic relevance, but dangerous when it becomes political filtering or ideological conformity.

Public-sector AI systems can reproduce state errors at scale: wrong legal advice, biased service triage, flawed education content, or opaque citizen profiling. Fourth, Arabic AI can deepen inequality if it serves governments and enterprises before journalists, schools, researchers, civil society, and independent developers.

The minimum governance standard for these models should be clear. Each should publish a model card, training-data categories, benchmark results, dialect coverage, hallucination testing, safety evaluation, political-bias testing, data-protection rules, red-team results, external audit summaries, and clear licensing/API access terms.

Jais is closest to this norm because its Hugging Face documentation includes model architecture, training data categories, intended uses, out-of-scope uses, and risks. ALLaM is partially there through its ICLR paper and enterprise deployment documentation. Karnak is not yet there.

The Arab LLM race is not mainly about who has the largest model. It is about who can build trustworthy Arabic AI infrastructure.

Author

Editorial

THE PUNDIT