Dr. Edward Chuah is a Lecturer (equivalent to Assistant Professor in the US) in the Computing Science Department, University of Aberdeen. He received his PhD in Computer Science at the University of Warwick (His PhD supervisor was Dr. Arshad Jhumka). Before his PhD, he worked in Singapore as a research engineer, software engineer and lecturer (1.5 years in software engineering, 8 years in R&D and 2 years in teaching). After his PhD, he continued his research in failure diagnosis and developed new research in network security as a post-doc at Lancaster University (His advisor was Prof. Neeraj Suri). He also taught at the University of Exeter. His current research focus is in High-Performance Computing (HPC) systems failure diagnosis and identifying attacks in large networks. He has been working on the topic of failure diagnosis since 2010. His work involves processing large volumes of real data to generate new insights into a distributed system and to improve its reliability and security.
Research topics: Large-scale systems dependability, Network security (attacks identification), HPC reliability (failure diagnosis, failure prediction, error propagation and error detection), Data analysis.
Prospective research students: If you are interested in studying for a PhD in one of the aforementioned research topics, then email your CV, academic transcripts and proposal to Dr. Chuah at firstname.lastname@example.org for an informal discussion.
Service to the community:
2022: Reviewer for the Latin American Journal of Computing, ACM Computing Surveys.
2021: PC member for the 2nd International Conference on Information and Software Technologies (ICI2ST).
2020: Reviewer for the 2nd International Conference on Machine Learning and Intelligent Systems (MLIS).
2019: Reviewer for IEEE Access.
2018: Reviewer for ACM Computing Surveys, Software: Practice and Experience.
- PhD Computer Science2020 - The University of Warwick
- MSc Distributed Systems Engineering2004 - Lancaster University
- BSc (Hons) Computer Science2003 - The University of Leicester
Prizes and Awards
- J. Tinsley Oden Faculty Fellowship, The Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, USA. September 2011. The Fellowship was to collaborate with the late Prof. Emeritus James C. Browne on research in HPC health monitoring and fault management.
- The Alan Turing Institute Doctoral Studentship, UK. September 2016 to March 2020. Link
Edward's main research interest is in large-scale systems dependability. His initial research interest is focused on reliability [1-5], one of the attributes of dependability. Currently, his focus is on the security aspect of dependability where he investigate security in large networks . He also has a general interest in machine learning, anomaly detection, causal inference and software security.
Edward's expertise is in system failure diagnostics and data analysis. He is the main developer of FDiag, a system log-based failure diagnostics toolkit . FDiag has been used by HPC systems administrators at the Texas Advanced Computing Center to uncover previously unknown causes of compute node soft lockups. He also developed several more system log-based diagnostics tools. ANCOR is a novel anomaly-correlation approach that linked system resource usage anomalies with system failures . CORRMEXT is a new workflow that identified patterns of error propagation on large supercomputer systems . FDiag, CORRMEXT, etc. are available on GitHub.
- E. Chuah, A. Jhumka, S. Alt, R.T. Evans, N. Suri, Failure Diagnosis for Cluster Systems Using Partial Correlations, in Proceedings of IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2021.
- E. Chuah, A. Jhumka, S. Alt, D.B.-Thomert, J.C. Browne, M. Parashar, Towards Comprehensive Dependability-Driven Resource Use and Message Log-Analysis for Cluster Systems Diagnosis, Journal of Parallel and Distributed Computing, 2019.
- E. Chuah, A. Jhumka, J.C. Browne, N. Gurumdimma, S. Narasimhamurthy, B. Barth, Using Message Logs and Resource Use Data for Cluster Failure Diagnosis, in Proceedings of IEEE International Conference on High-Performance Computing, Data and Analytics (HiPC), 2016.
- E. Chuah, A. Jhumka, S. Narasimhamurthy, J. Hammond, J.C. Browne, B. Barth, Linking Resource Usage Anomalies with System Failures from Cluster Log Data, in Proceedings of IEEE International Symposium on Reliable Distributed Systems (SRDS), 2013.
- E. Chuah, S. Kuo, P. Hiew, W.C. Tjhi, G. Lee, J. Hammond, M.T. Michalewicz, T. Hung, J.C. Browne, Diagnosing the Root-Causes of Failures from Cluster Log-Files, in Proceedings of IEEE International Conference on High-Performance Computing (HiPC), 2010.
- E. Chuah, N. Suri, A. Jhumka, S. Alt, Challenges in Identifying Network Attacks Using NetFlow Data, in Proceedings of IEEE International Symposium on Network Computing and Applications (NCA), 2021.
I am currently accepting PhDs in Computing Science.
Please get in touch if you would like to discuss your research ideas further.
Computing ScienceAccepting PhDs
Edward taught various courses ranging from Learning from Data to Software Engineering and Algorithm Analysis to undergraduate and postgraduate students.
Page 1 of 2 Results 1 to 10 of 19
An Empirical Study of Major Page Faults for Failure Diagnosis in Cluster SystemsJournal of SupercomputingContributions to Journals: Articles
A Survey of Log-Correlation Tools for Failure Diagnosis and Prediction in Cluster SystemsIEEE Access, vol. 10, pp. 133487-133503Contributions to Journals: Articles
Challenges in Identifying Network Attacks Using Netflow DataChapters in Books, Reports and Conference Proceedings: Conference Proceedings
Failure Diagnosis for Cluster Systems using Partial CorrelationsChapters in Books, Reports and Conference Proceedings: Conference Proceedings
Sentiment Analysis based Error Detection for Large-Scale SystemsChapters in Books, Reports and Conference Proceedings: Conference Proceedings
Using Resource Use Data and System Logs for HPC System Error Propagation and Recovery DiagnosisChapters in Books, Reports and Conference Proceedings: Conference Proceedings
Towards comprehensive dependability-driven resource use and message log-analysis for HPC systems diagnosisJournal of Parallel and Distributed Computing (JPDC), vol. 132, pp. 95-112Contributions to Journals: Articles
Enabling Dependability-Driven Resource Use and Message Log-Analysis for Cluster System DiagnosisChapters in Books, Reports and Conference Proceedings: Conference Proceedings
Using Message Logs and Resource Use Data for Cluster Failure DiagnosisChapters in Books, Reports and Conference Proceedings: Conference Proceedings
CRUDE: Combining Resource Usage Data and Error Logs for Accurate Error Detection in Large-Scale Distributed SystemsChapters in Books, Reports and Conference Proceedings: Conference Proceedings