HCAI Network Event: Beyond Pass/Fail: Extracting Insights from Large-Scale AI Agent Safety Evaluations

12 August 2025, 14:00 - 16:00

This is a past event

The University’s Human-Centred AI Network will be hosting an event on Tuesday 12 August, 2.00 - 3.30 pm in New Kings 1. The session will feature a guest talk from Mario Giulianelli, Senior Research Scientist at the UK AI Security Institute, who will present "Beyond Pass/Fail: Extracting Behavioural Insights from Large-Scale AI Agent Safety Evaluations". The talk will be followed by a Q&A. From 3.00 pm onwards, Georgios Leontidis and Emma Morrison will outline the key details of the UKRI Turing AI Pioneer Interdisciplinary Fellowships call. Please see more information for this call below. Attendees interested in the call are encouraged to pitch their project idea (1 minute per pitch) and it will be a great opportunity to connect with core AI experts and explore potential collaborations. This will be followed by a short networking session. This event is open to all. To register, please email interdisciplinary@abdn.ac.uk. If you would like to pitch an idea, please indicate this in your registration.

Beyond Pass/Fail: Extracting Behavioural Insights from Large-Scale AI Agent Safety Evaluations

Automated LLM-based agent evaluations have become a standard for assessing AI capabilities in both industry and government, but current reporting practices focus on what agents accomplish without resolution on how they accomplish it. In this talk, Dr Mario Giulianelli will discuss how UK AISI mines evaluation transcripts to (i) detect issues in evaluation tasks that could lead to misestimating capabilities, and (ii) understand how agent capabilities are evolving. Dr Giulianelli will survey a selection of AISI's methods, tools, and results, and outline research opportunities for better analysis instruments and their connection to safety and governance. Dr Mario Giulianelli Mario is a senior research scientist at the UK AI Security Institute, where he coordinates and oversees technical programmes for evaluating the impact of AI systems in high-stakes domains. His research at AISI focuses on the science of evaluation, which involves developing and applying techniques for the measurement of AI system capabilities so they are accurate, robust, and useful in decision making. More broadly, Mario's research explores the computational principles of perception, action, and interaction in artificial and natural cognitive systems. He is also set to join UCL’s Faculty of Brain Sciences as an associate professor in September.

---

About the Network: The interdisciplinary Human-Centred AI (HCAI) network involves a wide range of colleagues from across the University who have an interest in the intersection of AI technologies and the role played by humans in its development, as decision-makers, end-users, affected parties, collaborators, and designers. The network considers aspects related to linguistics, psychology, human creativity and culture, and policy, bias/discrimination/ramifications of generative AI, social and legal implications, philosophical elements of AI, AI-to-AI interactions, and more. The network’s aim is to enhance interdisciplinarity in the above areas and help to develop interdisciplinary projects and funding proposals, supporting engagement activities that will enhance the external profile of the University.

Venue: New King's 1
Contact: To register for this event please email interdisciplinary@abdn.ac.uk

Wellbeing Support