Distributed Computing for Health Data Workshop

09 December 2025, 09:00 - 17:00

This is a past event

HDR UK Event

This full day workshop offers a hands-on introduction to common big data tools, such as Hadoop, Spark, and Kafka, applied to stored (batch) and real-time (streaming) processing of healthcare data. The first half will introduce Google Cloud Platform (GCP) and distributed (cloud) storage systems, along with Spark libraries for machine learning prediction and classification problems. The second half will concentrate on real-time processing in Kafka, based on synthetic data generation and interactive analytical dashboards. A drop-in session will be held after the workshop sessions to address technical questions. The focus of the workshop is how to orchestrate big data pipelines by combining different tools and libraries, rather than on interpreting results.

Prerequisite knowledgeThis workshop makes use of Linux commands and Python programming. Some experience in any of these tools may be helpful but not necessarily required. All example code and commands will be provided.

Intended audience

Anyone working in data science or healthcare who wants to gain practical experience with big data tools; this could be for use in an academic or industry role, in a dissertation or publication.
MSc or PhD students (e.g. in data science, health informatics or epidemiology) gaining skills in preparation for roles in academia or industry.
Clinical researchers who wish to practice big data tools in a more controlled environment before deploying in trusted research environments on real datasets.
The workshop emphasis is on orchestrating big data pipelines rather than interpreting results.

Capacity30 people

Share this event

Browse or Search Archive