CS4042: DATA ENGINEERING (2025-2026)

Last modified: 29 Aug 2025 13:16

Course Overview

The aim of this course is to provide students with the specialist knowledge, understanding and skills required to develop modern data engineering applications. The course builds on core computer science subjects such as software engineering, distributed systems, and enterprise computing along with AI to engineer efficient data pipelines based on real-time data and streaming processes at scale.

Course Details

Study Type	Undergraduate	Level	4
Term	First Term	Credit Points	15 credits (7.5 ECTS credits)
Campus	Aberdeen	Sustained Study	No
Co-ordinators	Dr Mingjun Zhong Dr Yaji Sripada

What courses & programmes must have been taken before this course?

Any Undergraduate Programme (Studied)
Either Programme Level 3 or Programme Level 4
Computing Science (CS)

What other courses must be taken with this course?

None.

What courses cannot be taken with this course?

None.

Are there a limited number of places available?

Course Description

Data Engineering is the design of automated workflows to reduce the human work and effort processing big data, as an end user or data analyst or data scientist.

This includes the consideration of cloud-based and edge-based technologies, tools and techniques to solve complex computational problems found within real-world data science applications.

As well as core data engineering concepts, principles and theories, the course covers important key aspects found in associated disciplines such as visualisation, data science and computational science, with the intention of building usable pipelines for data scientists.

Students will explore a range of different topics on data engineering so that they can building real world practical applications. The topics covered include:

The role of data engineering in terms of data science, machine learning, etc
The data engineering landscape and scope of data science
Data engineering SDLC and frameworks
Data pipelines and data workflows
Development of data pipelines and tools and techniques, such as Apache Airflow, TensorFlow TFX, etc
Data storage, ingestion, transformation
Linear regression
Data cleaning
Data quality and validation, and data pre-processing for missing variables and finding problems in data and engineer more effective feature sets
Data analysis and visualisation - Role of organised data for machine learning, such as Matplotlib, Seaborn, and Bokeh
Evaluation of data pipelines and workflows – ethics, usability and human factors
Wider implications of data engineering

Contact Teaching Time

Information on contact teaching time is available from the course guide.

Teaching Breakdown

More Information about Week Numbers

Details, including assessments, may be subject to change until 31 August 2025 for 1st Term courses and 19 December 2025 for 2nd Term courses.

Summative Assessments

Report: Individual

Assessment Type	Summative	Weighting	50
Assessment Weeks		Feedback Weeks		Look up Week Numbers
Feedback	1,200-word individual report worth 50% of the overall grade.

Learning Outcomes

Knowledge Level	Thinking Skill	Outcome
Conceptual	Analyse	Demonstrate the use of techniques for cleaning, anomaly detection and pre-processing of big data.
Procedural	Analyse	Analyse and visualise organised data for patterns and trends based on analytics, metrics, segments, aggregates, features and training data.
Procedural	Apply	Manage the collection of raw data from instrumentation, logging, sensors, external data, and user generated contents.
Procedural	Apply	Build computer systems to handle big data that provides reliable data flow, infrastructure, pipelines, ETL (extract, transform, and load), structured and unstructured data storage.
Reflection	Create	Build and evaluate complex data pipelines using A/B testing and experimentation approaches.

Report: Group

Assessment Type	Summative	Weighting	50
Assessment Weeks		Feedback Weeks		Look up Week Numbers
Feedback	3,000-word group report worth 50% of the overall grade. Peer assessment will form part of students' individual marks.

Learning Outcomes

Knowledge Level	Thinking Skill	Outcome
Conceptual	Analyse	Demonstrate the use of techniques for cleaning, anomaly detection and pre-processing of big data.
Procedural	Analyse	Analyse and visualise organised data for patterns and trends based on analytics, metrics, segments, aggregates, features and training data.
Procedural	Apply	Build computer systems to handle big data that provides reliable data flow, infrastructure, pipelines, ETL (extract, transform, and load), structured and unstructured data storage.
Procedural	Apply	Manage the collection of raw data from instrumentation, logging, sensors, external data, and user generated contents.
Reflection	Create	Build and evaluate complex data pipelines using A/B testing and experimentation approaches.

Formative Assessment

There are no assessments for this course.

Resit Assessments

Resubmission of failed elements

Assessment Type	Summative	Weighting	100
Assessment Weeks		Feedback Weeks		Look up Week Numbers
Feedback	A resit individual task will be provided in place of groupwork.

Learning Outcomes

Sorry, we don't have this information available just now. Please check the course guide on MyAberdeen or with the Course Coordinator
Knowledge Level	Thinking Skill	Outcome

Course Learning Outcomes

Knowledge Level	Thinking Skill	Outcome
Procedural	Apply	Manage the collection of raw data from instrumentation, logging, sensors, external data, and user generated contents.
Procedural	Apply	Build computer systems to handle big data that provides reliable data flow, infrastructure, pipelines, ETL (extract, transform, and load), structured and unstructured data storage.
Procedural	Analyse	Analyse and visualise organised data for patterns and trends based on analytics, metrics, segments, aggregates, features and training data.
Conceptual	Analyse	Demonstrate the use of techniques for cleaning, anomaly detection and pre-processing of big data.
Reflection	Create	Build and evaluate complex data pipelines using A/B testing and experimentation approaches.

CS4042: DATA ENGINEERING (2025-2026)

Course Overview

Course Details

What courses & programmes must have been taken before this course?

What other courses must be taken with this course?

What courses cannot be taken with this course?

Are there a limited number of places available?

Course Description

Contact Teaching Time

Teaching Breakdown

Summative Assessments

Report: Individual

Learning Outcomes

Report: Group

Learning Outcomes

Formative Assessment

Resit Assessments

Resubmission of failed elements

Learning Outcomes

Course Learning Outcomes

Compatibility Mode