production
Skip to Content

CS4042: DATA ENGINEERING (2025-2026)

Last modified: 29 Aug 2025 13:16


Course Overview

The aim of this course is to provide students with the specialist knowledge, understanding and skills required to develop modern data engineering applications. The course builds on core computer science subjects such as software engineering, distributed systems, and enterprise computing along with AI to engineer efficient data pipelines based on real-time data and streaming processes at scale.

Course Details

Study Type Undergraduate Level 4
Term First Term Credit Points 15 credits (7.5 ECTS credits)
Campus Aberdeen Sustained Study No
Co-ordinators
  • Dr Mingjun Zhong
  • Dr Yaji Sripada

What courses & programmes must have been taken before this course?

  • Any Undergraduate Programme (Studied)
  • Either Programme Level 3 or Programme Level 4
  • Computing Science (CS)

What other courses must be taken with this course?

None.

What courses cannot be taken with this course?

None.

Are there a limited number of places available?

No

Course Description

Data Engineering is the design of automated workflows to reduce the human work and effort processing big data, as an end user or data analyst or data scientist.

This includes the consideration of cloud-based and edge-based technologies, tools and techniques to solve complex computational problems found within real-world data science applications.

As well as core data engineering concepts, principles and theories, the course covers important key aspects found in associated disciplines such as visualisation, data science and computational science, with the intention of building usable pipelines for data scientists.

Students will explore a range of different topics on data engineering so that they can building real world practical applications. The topics covered include:

  1. The role of data engineering in terms of data science, machine learning, etc
  2. The data engineering landscape and scope of data science
  3. Data engineering SDLC and frameworks
  4. Data pipelines and data workflows
  5. Development of data pipelines and tools and techniques, such as Apache Airflow, TensorFlow TFX, etc
  6. Data storage, ingestion, transformation
  7. Linear regression
  8. Data cleaning
  9. Data quality and validation, and data pre-processing for missing variables and finding problems in data and engineer more effective feature sets
  10. Data analysis and visualisation - Role of organised data for machine learning, such as Matplotlib, Seaborn, and Bokeh
  11. Evaluation of data pipelines and workflows – ethics, usability and human factors
  12. Wider implications of data engineering

Contact Teaching Time

Information on contact teaching time is available from the course guide.

Teaching Breakdown

More Information about Week Numbers


Details, including assessments, may be subject to change until 31 August 2025 for 1st Term courses and 19 December 2025 for 2nd Term courses.

Summative Assessments

Report: Individual

Assessment Type Summative Weighting 50
Assessment Weeks Feedback Weeks

Look up Week Numbers

Feedback

1,200-word individual report worth 50% of the overall grade.

Learning Outcomes
Knowledge LevelThinking SkillOutcome
ConceptualAnalyseDemonstrate the use of techniques for cleaning, anomaly detection and pre-processing of big data.
ProceduralAnalyseAnalyse and visualise organised data for patterns and trends based on analytics, metrics, segments, aggregates, features and training data.
ProceduralApplyManage the collection of raw data from instrumentation, logging, sensors, external data, and user generated contents.
ProceduralApplyBuild computer systems to handle big data that provides reliable data flow, infrastructure, pipelines, ETL (extract, transform, and load), structured and unstructured data storage.
ReflectionCreateBuild and evaluate complex data pipelines using A/B testing and experimentation approaches.

Report: Group

Assessment Type Summative Weighting 50
Assessment Weeks Feedback Weeks

Look up Week Numbers

Feedback

3,000-word group report worth 50% of the overall grade. Peer assessment will form part of students' individual marks.

Learning Outcomes
Knowledge LevelThinking SkillOutcome
ConceptualAnalyseDemonstrate the use of techniques for cleaning, anomaly detection and pre-processing of big data.
ProceduralAnalyseAnalyse and visualise organised data for patterns and trends based on analytics, metrics, segments, aggregates, features and training data.
ProceduralApplyBuild computer systems to handle big data that provides reliable data flow, infrastructure, pipelines, ETL (extract, transform, and load), structured and unstructured data storage.
ProceduralApplyManage the collection of raw data from instrumentation, logging, sensors, external data, and user generated contents.
ReflectionCreateBuild and evaluate complex data pipelines using A/B testing and experimentation approaches.

Formative Assessment

There are no assessments for this course.

Resit Assessments

Resubmission of failed elements

Assessment Type Summative Weighting 100
Assessment Weeks Feedback Weeks

Look up Week Numbers

Feedback

A resit individual task will be provided in place of groupwork.

Learning Outcomes
Knowledge LevelThinking SkillOutcome
Sorry, we don't have this information available just now. Please check the course guide on MyAberdeen or with the Course Coordinator

Course Learning Outcomes

Knowledge LevelThinking SkillOutcome
ProceduralApplyManage the collection of raw data from instrumentation, logging, sensors, external data, and user generated contents.
ProceduralApplyBuild computer systems to handle big data that provides reliable data flow, infrastructure, pipelines, ETL (extract, transform, and load), structured and unstructured data storage.
ProceduralAnalyseAnalyse and visualise organised data for patterns and trends based on analytics, metrics, segments, aggregates, features and training data.
ConceptualAnalyseDemonstrate the use of techniques for cleaning, anomaly detection and pre-processing of big data.
ReflectionCreateBuild and evaluate complex data pipelines using A/B testing and experimentation approaches.

Compatibility Mode

We have detected that you are have compatibility mode enabled or are using an old version of Internet Explorer. You either need to switch off compatibility mode for this site or upgrade your browser.