Al-Āmmīyah (Colloquial Arabic) and Generative AI - a snapshot of its emerging text-to-text abilities: experiments and evaluation results

In this section
Al-Āmmīyah (Colloquial Arabic) and Generative AI - a snapshot of its emerging text-to-text abilities: experiments and evaluation results
-

The Arabic language provides an extraordinary wealth of comparative material. It functions in four major registers: the daily language (Colloquial Arabic), the language of media (Modern Standard Arabic), the language of literature (Classical Arabic), and the religious language (Qurʾānic Arabic), all four with various degrees of overlapping. Further, the Colloquial Arabic exists in a multitude of dialects. These greatly differ not just regionally (e.g. North Africa / Eastern Mediterranean), but also within regions (e.g. Lebanon / Palestine), within countries (e.g. Nablus / al-Khalīl), and even within smaller areas (e.g. city / villages). Historically, the dialects tended not to be used in writing but only in speech. However, this dramatically changed with the advent of the digital age. On social media you will see many written expressions of Colloquial Arabic with greatly diverse orthography.

In the context of the recent AI developments, the following question arises: given that they are largely trained on Modern Standard Arabic, how good Arabic Large Language Models are at handling low-resource Arabic dialects? This question is explored in our RSE-funded project “Al-ʿĀmmīyah (Colloquial Arabic) and Generative AI – a snapshot of its emerging text-to-text abilities” (co-applicants: J. Zbrzezny (PI), E. Reiter, W. Zhao). The paper presents our experiments with three popular models (gpt-4o-mini, gpt-4o, gemini-2.0), and discusses how their outputs from and to dialect in particularly engineered prompts were evaluated by our Palestinian team in the West Bank (A. Hroub et al.). The preliminary results of the evaluation show (1) the weakness of the models in creating authentic written expressions of local dialects, but also (2) their ability to level dialects towards Modern Standard Arabic. This is particularly concerning in the context of the widespread use of Auto-corrections and Predictive text functions on smartphone keyboards especially among younger generations, which, in the long-term, will impact the inner diversity of Arabic, making it a linguistically poorer language

Speaker
Jakub Zbrzezny
Venue
Meston G05 and Microsoft Teams

Add to Calendar

Add this event to your calendar application

Browse by Month

2026

  1. Jan There are no items to show for January 2026
  2. Feb
  3. Mar There are no items to show for March 2026
  4. Apr There are no items to show for April 2026
  5. May There are no items to show for May 2026
  6. Jun There are no items to show for June 2026
  7. Jul There are no items to show for July 2026
  8. Aug There are no items to show for August 2026
  9. Sep There are no items to show for September 2026
  10. Oct There are no items to show for October 2026
  11. Nov There are no items to show for November 2026
  12. Dec There are no items to show for December 2026

Search Events