On the Practices for Evaluating Generative AI

Recent years have seen remarkable advancements in Generative AI, ranging from text-to-text models like ChatGPT, to text-to-image models like DALLE3, and most recently, text-to-video models like Sora. With groundbreaking innovations in unsupervised learning and scalable architectures, these generative models demonstrate wide-ranging capabilities in understanding textual instructions and producing responses in texts, images, and videos. This significant progress gains worldwide curiosities on the risks and opportunities of Generative AI, resulting in many works evaluating its abilities. In this talk, I will discuss the practices for evaluating Generative AI from three perspectives: evaluation metrics, rigorous system comparison, and transparency of the evaluation process, and shed light on future directions.

Wei Zhao
Meston 009