Al-Āmmīyah (Colloquial Arabic) and Generative AI - a snapshot of its emerging text-to-text abilities: experiments and evaluation results

04 February 2026, 14:00 - 15:00

The Arabic language provides an extraordinary wealth of comparative material. It functions in four major registers: the daily language (Colloquial Arabic), the language of media (Modern Standard Arabic), the language of literature (Classical Arabic), and the religious language (Qurʾānic Arabic), all four with various degrees of overlapping. Further, the Colloquial Arabic exists in a multitude of dialects. These greatly differ not just regionally (e.g. North Africa / Eastern Mediterranean), but also within regions (e.g. Lebanon / Palestine), within countries (e.g. Nablus / al-Khalīl), and even within smaller areas (e.g. city / villages). Historically, the dialects tended not to be used in writing but only in speech. However, this dramatically changed with the advent of the digital age. On social media you will see many written expressions of Colloquial Arabic with greatly diverse orthography.

In the context of the recent AI developments, the following question arises: given that they are largely trained on Modern Standard Arabic, how good Arabic Large Language Models are at handling low-resource Arabic dialects? This question is explored in our RSE-funded project “Al-ʿĀmmīyah (Colloquial Arabic) and Generative AI – a snapshot of its emerging text-to-text abilities” (co-applicants: J. Zbrzezny (PI), E. Reiter, W. Zhao). The paper presents our experiments with three popular models (gpt-4o-mini, gpt-4o, gemini-2.0), and discusses how their outputs from and to dialect in particularly engineered prompts were evaluated by our Palestinian team in the West Bank (A. Hroub et al.). The preliminary results of the evaluation show (1) the weakness of the models in creating authentic written expressions of local dialects, but also (2) their ability to level dialects towards Modern Standard Arabic. This is particularly concerning in the context of the widespread use of Auto-corrections and Predictive text functions on smartphone keyboards especially among younger generations, which, in the long-term, will impact the inner diversity of Arabic, making it a linguistically poorer language

Speaker: Jakub Zbrzezny
Venue: Meston G05 and Microsoft Teams

Add to Calendar

Add this event to your calendar application

Browse by Month

2026

Jan There are no items to show for January 2026
Feb
Mar There are no items to show for March 2026
Apr There are no items to show for April 2026
May There are no items to show for May 2026
Jun There are no items to show for June 2026
Jul There are no items to show for July 2026
Aug There are no items to show for August 2026
Sep There are no items to show for September 2026
Oct There are no items to show for October 2026
Nov There are no items to show for November 2026
Dec There are no items to show for December 2026

Add to Calendar

Browse by Month

2026

Search Events