Skip to Content


Last modified: 22 May 2019 17:07

Course Overview

How do we assess whether an AI system works and is effective?  Indeed, what does it mean for an AI system to be effective?  In this course, we will look at different ways of evaluating AI systems, including performance on benchmark data sets, usefulness at helping users achieve a task, and subjective opinions (ie, do people like the system).   Much of the course is devoted to statistics (including the R programming language), experimental design, and ethical issues.  In practical and assessment work, students will evaluate deployed AI systems, and also critique evaluations in published AI research papers.

Course Details

Study Type Postgraduate Level 5
Session First Sub Session Credit Points 15 credits (7.5 ECTS credits)
Campus Old Aberdeen Sustained Study No
  • Professor Ehud Reiter
  • Dr Nigel Beacham

What courses & programmes must have been taken before this course?

  • Either Any Postgraduate Programme (Studied) or Master of Engineering in Computing Science

What other courses must be taken with this course?


What courses cannot be taken with this course?


Are there a limited number of places available?


Course Description

The course will cover concepts, methods, techniques and tools/technologies for evaluating AI systems. Students will be equipped with knowledge on statistical analysis (e.g., variance, correlations and regression) and learn to use software/tools for statistical analysis. The course will introduce criteria for the evaluation of AI systems (e.g., usability, accessibility and learnability), and the theoretical evaluation of AI systems (e.g., guarantees regarding correctness, completeness, complexity, admissibility of heuristics, and so on). The course will provide a comprehensive exposition to issues pertaining to the empirical evaluation of AI Systems, including the design of experiments (to address specific criteria/issues), human-driven experiments (including the design of forms and questionnaires, interviews, “talk-aloud” experiments, logging/filming, etc.), systems with optimal behaviours vs. (sub-optimal) human-like behaviour, crowd-sourcing of experiments (including Amazon’s “Mechanic Turk” and others), evaluation through gaming, and other related topics.

Contact Teaching Time

Information on contact teaching time is available from the course guide.

Teaching Breakdown

  • 10 Lectures during University weeks 16 - 17
  • 5 Practicals during University weeks 16 - 17

More Information about Week Numbers

Summative Assessments

Group report (50%); Individual report (50%).

Resit: where a student fails the course overall they will be afforded the opportunity to resit those parts of the course that they failed (pass marks will be carried forward).


Formative Assessment

There are no assessments for this course.


Formative feedback for in-course assessments will be provided in written form. Additionally, formative feedback on performance will be provided informally during practical sessions.

Course Learning Outcomes


Compatibility Mode

We have detected that you are have compatibility mode enabled or are using an old version of Internet Explorer. You either need to switch off compatibility mode for this site or upgrade your browser.