High Performance Computing

High Performance Computing

Use of the HPC service is subject to the University of Aberdeen’s Conditions for using IT Facilities and other applicable IT Policies.

All users must agree to abide by these rules.

Dedicated HPC resource (Macleod) now available for teaching and learning. Find out more below.
The Maxwell Cluster for Research
  1. About our HPC
  2. Guide to Services
  3. What is High Performance Computing?
  4. Technical Specification
  5. Accessing Maxwell
  6. Compute Resources and Storage per User
  7. Backup and Restore
  8. Scheduling/Prioritisation
  9. Availability, Maintenance, and Unplanned Disruptions
  10. Monitoring and Arbitration

1. About our HPC

High Performance Computing refers to the practice of aggregating the processing power of several computers to provide a parallel processing environment for solving complex computational problems, usually in a significantly faster timescale than would be possible on a single device.

The university operates its own in-house HPC cluster named Maxwell after James Clerk Maxwell, the eminent Scottish Theoretical Physicist who spent several years as a professor at the University’s Marischal College.  Access is available to all staff and students, and external collaborators (via PI), and to external organisations.

The in-house HPC infrastructure is self-service orientated. We maintain the hardware and software on behalf of all users, but there is an expectation that users understand how to use the software and perform the work themselves. Users can request installation of software if what they require is not already available.

2. Guide to Services

For the following service information, please see our Maxwell HPC Guide to Services:

  • Request Process
  • Features
  • Users (further details of who can access)
  • Level of Service
  • Support and Documentation
  • Costs for using Maxwell (costs do not always apply)

3. What is High Performance Computing?

High Performance Computing involves using interconnected clusters of many computers - sometimes referred to as a supercomputer - to accelerate large computing tasks. The HPC service is suited to solving problems that require considerable computational power or involve huge amounts of data that would normally take weeks or months to analyse on a desktop PC, if it can be done at all. HPC clusters can provide over a thousand desktop computers’ worth of resources for days on end, completing the work a single desktop computer would take a year to do in just one day.

HPC clusters are already being used by researchers across schools in a wide range of disciplines and research topics – e.g., genome sequencing and analysis, chemical pathway simulation, climate change impact assessment and financial systems modelling – and as a catalyst and enabler for inter school and interdisciplinary research in areas such as systems biology.

If your research is being bottlenecked by computing resources, the HPC service could help by improving time to analyse and report on data, carrying out analysis that might otherwise not be possible due to resource constraints.

4. Technical Specification

Maxwell is a Linux supercomputing cluster housed in the Edward Wright Datacentre and provides:

  • 1240 CPU cores/10 TB of RAM
  • One Very High Memory Node (3 TB): Use of the very high memory node is governed by the Digital Research Team to ensure it is used appropriately.
  • Additional high memory nodes
  • Resilient high-speed networks including 100Gb/s InfiniBand interconnects for multi-node MPI jobs and 10Gb to the Campus network
  • Over 1 Petabyte tiered storage including 15TB of very high-performance storage
  • A wide variety of commercial and free HPC-optimized software - see current list
  • Compilers for user developed applications
  • Galaxy - Access to a Galaxy server which utilises the HPC’s resources. This open-source, web-based platform is designed to make computational biology accessible to researchers that do not have computer programming experience. Galaxy uses a graphical user interface to run software on the worker nodes of the cluster - see current list of software. It allows for workflows to be saved and shared, supporting reproducible and transparent analysis. Researchers at the University of Aberdeen are part of the Galaxy Training Network, who run subject specific workshops on Galaxy throughout the academic year.

Where your requirements exceed what is available in Maxwell, we can also provide access to external services and can advise on the use of much larger HPC Clusters.

5. Accessing Maxwell

Maxwell can be accessed both on-campus and off campus using a variety of clients that support the SSH protocol. This includes X Windows graphical desktops and applications, and SCP for file transfer.

  • Off campus access to Maxwell, including when connected to Eduroam, is supported via the University’s SSH jump host (ssh-gateway).
  • For those already familiar with accessing Linux servers, you can use your own choice of SSH, or SFTP/SCP software.
  • For novice users we recommend using the X2Go (graphical remote desktop) client, PuTTY for SSH access and WinSCP for file transfer.
  • Preconfigured PuTTY and X2Go clients can be used for off-campus access using non-university Windows PCs.
  • Alternatively, for manual instructions for installing and configuring clients, please see the User Guide.

These applications are available to install automatically on university Windows PCs via the Software Centre and can also be downloaded free of charge to use on your own computer.

Web access, via Galaxy, is available from within the campus network.

Several of these access methods will need alterations to their configuration to work from non-university computers. For detailed information on installing the software and connecting to Maxwell, please see our User Guide.

Please contact the Digital Services Team at digitalresearch@abdn.ac.uk for further support.

6. Compute Resources and Storage per User

Compute

  • Access to up to 200 job slots/CPU cores at a time, subject to availability. This can be increased by arrangement.
  • Ability to run jobs with RAM allocation up to 200GB

Storage

  • 50GB of resilient, backed-up personal home space.
  • In addition, users are allocated a quota of 1TB scratch space for working storage.
  • Additional Vault storage may be available by negotiation. Vault facilitates longer term storage on Maxwell and eases the need to pull data across from other non-HPC storage locations for ongoing projects. It is useful where projects need to repeatedly use the same data over time.

7. Backup and Restore

Storage HPC Backup and Restore Policy
Home Space Data is backed up as follows:
  • Daily backups kept for 14 days<
  • Weekly backups kept for 1 week
Files can be restored as follows:
  • To the path that folder/files came, or
  • To a different specified HPC file path, or
  • To a specified shared drive
Request process:
Shared scratch Not backed up
Vault Not backed up

8. Scheduling/Prioritisation

The cluster runs Slurm workload manager to automatically allocate jobs submitted by users onto available compute nodes. Projects that supply grant or other funds to support Maxwell usage are given priority on the scheduler.

  • The scheduler balances the availability of slots among all users to permit fair access to the system.
  • It considers the specific requirements of each job (e.g., number of CPUs, amount of RAM, job duration, and node affinity requirements) and prioritisation.
  • The scheduler starts queued jobs as space becomes available and can be set to advise users of job status by email.
  • Larger jobs requiring more time and resource are more difficult to schedule, so it is to the advantage of all users to make sure the requested resource is as accurate as possible.
  • Smaller jobs will be scheduled to run/backfill into available space and may therefore started/completed earlier than larger jobs.

An awareness of the following will help users provide accurate information when scheduling a job:

  • When insufficient memory is requested for a job, the job will not run, and will need to be rescheduled with more memory requested.
  • Where more memory is requested than is used, the user will have the full amount allocated to them as this resource is blocked and is not usable elsewhere.
  • The default runtime of any job is 24 hours
    • If more time is needed this must be explicitly stated.
    • Less time can also be requested.
  • When insufficient time is allocated to a job, the job will be stopped when the allocated time has elapsed and will need to be rescheduled with more time requested.
  • Where the actual time used is less than the time requested, only the actual time will be attributed to a user’s account.
  • Interactive jobs can run only when the requested resources (e.g. CPUs and memory) are immediately available on Maxwell.
  • Once scheduled, data on the job are available from Maxwell using the ‘squeue’ command. This advises users of the status of a running job or the priority of a queued job.

9. Availability, Maintenance, and Unplanned Disruptions

  • Maxwell is designed to ensure maximum availability and continued run time even if some cores or nodes stop functioning correctly. When this happens, these issues will be resolved, as far as possible, without any additional disruption to the cluster.
  • Planned maintenance will be communicated in advance to all users and will be scheduled to cause minimum disruption.
  • Every effort will be made to ensure there are no unplanned disruptions to the service. Where events, either internal or external to the HPC, do cause disruption, every effort will be made by Digital Research to restore service as quickly as possible. This may involve work with our suppliers.

10. Monitoring and Arbitration

The Digital Research Services Team is responsible for monitoring use of the system and should be contacted via digitalresearch@abdn.ac.uk to resolve any perceived scheduling or prioritisation issues.

The Macleod Cluster for Teaching and Learning

The University operates its own in-house HPC cluster (Macleod) with specific resources dedicated to teaching and learning to improve availability and performance when classes are underway.

The service provides substantial amounts of computational processing power and is available to all University staff and students.  Staff can request access for teaching here.

What is High Performance Computing?

High Performance Computing involves using interconnected clusters of many computers - sometimes referred to as a supercomputer - to accelerate large computing tasks. The HPC service is suited to solving problems that require considerable computational power or involve vast amounts of data that would normally take weeks or months to analyse on a desktop PC, if it can be done at all.

HPC clusters are already being used by several staff across schools in a wide range of disciplines, for both research and teaching purposes, e.g., genome sequencing and analysis, chemical pathway simulation, climate change impact assessment and financial systems modelling – and as a catalyst and enabler for inter school and interdisciplinary research in areas such as systems biology.

Technical Specification

Macleod is a Linux supercomputing cluster housed in the Edward Wright Datacentre and provides:

  • 120 CPU cores and 1.2TB of RAM - minimum 256GB per node
  • Specialist nodes: Two nodes each with 3xA100 GPU cards providing 21 GPU partitions
  • High-speed network – a 10Gb network
  • Over 50TB scratch storage
  • A wide variety of commercial and free HPC-optimized software
  • Compilers for user developed applications
  • Galaxy  
    • Access to a Galaxy server which utilises the HPC’s resources. This open-source, web-based platform is designed to make computational biology accessible to researchers that do not have computer programming experience.
    • Galaxy uses a graphical user interface to run software on the worker nodes of the cluster. It allows for workflows to be saved and shared, supporting reproducible and transparent analysis.
    • Researchers at the University of Aberdeen are part of the Galaxy Training Network, who run subject specific workshops on Galaxy throughout the academic year.

Where requirements exceed what is available in Macleod and relates to delivery of a research project, please check out the resources available in Maxwell, the HPC for research.

Further information

Any queries, please contact digitalresearch@abdn.ac.uk

External Resources

Is your cutting edge/novel research problem beyond the capabilities of the Maxwell HPC and you need more computing power? We can provide access to more powerful and more specialised HPC e.g. GPU based HPC. They will have that edge you need to achieve your research goals. We will facilitate your access to these Tier 2 (Regional level) HPC resources. They all have their own access costs and conditions.

Access to the following Tier 2 HPC:

With their abundance of memory, these HPC’s allow you to run memory intensive multi- and single core problems simultaneously. The user friendly ‘batch’ environment of these HPC’s provides a quality of life service for user, allowing them to queue computing jobs without further interaction.

If you think your computational needs are likely to bring Maxwell’s to its knees, then please contact us and we can see about getting you access to external HPC resources.

FAQ
Is High Performance Computing of use to me?

If your research requires the processing of large amounts of data and/or your current processing of data takes weeks/months to process on a personal computer, there is a high possibility you would benefit from using the new institutional HPC service.

Who can use Maxwell?

All staff and researchers with a requirement to use the HPC service will have the opportunity to do so.

Postgraduate research students should present business cases via their research advisor in the first instance.

Who provides support for Maxwell?

Our dedicated Digital Research Infrastructure Support team will advise researchers on the use of the cluster, the scheduling of jobs and any software relating to their requirements. Contact the Digital Research Services Team via the digitalresearch@abdn.ac.uk

The 24/7 monitoring and support of hardware and management of software and Maxwell users is provided by a third-party provider experienced in the installation and maintenance of HPC systems.

How do I access HPC Services?

To request access to Maxwell (for Research Projects), please complete request form here.

To request access to Macleod (for Teaching and Learning), please complete request form here.

Once registered, you will receive further guidance on accessing the services. You can also see our User Guide for further information.

Can I access Maxwell off campus?

Yes. Portal access is also available off-campus from any PC with an internet connection, including personal desktop or laptops PCs. See our User Guide for more.

Can I access the HPCC from my personal device?

Yes, any personal desktop or laptop can be used to access the cluster via the portal or through the VPN.

Can I change my password?

Passwords can be changed via the standard route for changing IT account passwords. Users are reminded that passwords for Maxwell are not specific to the cluster.

What software is provided?

We provide a wide range of commercial and free HPC-optimized software - see current list for details.

We also provide a range of software for Galaxy - see current list for details.

Can I add software to the cluster?

Yes - any requirement for specific software packages to be run on the Maxwell will be reviewed internally and will be added by the support contractor. We anticipate that the majority of software requirements will be able to be installed on the cluster, however users will be responsible for the licensing of any commercial software required.

Users are encouraged to ensure that software required is loaded onto the shared repository for re-use. It can also be shared on a nationwide database for use by other institutions. The Digital Research Services Team will advise you on how to do this.

How do I acknowledge the use of Maxwell?

We suggest the following:

"The authors would like to acknowledge the support of the Maxwell Compute Cluster funded by the University of Aberdeen."

Where can I store my research data when it is not being used on the HPCC and is it secure?

IT Services provides and manages a resilient networked data storage solution for the University which is replicated continuously to a Disaster Recovery site and is backed up nightly. This system is tiered and provides short-term and long-term storage facilities to the University, ensuring data is readily accessible.

Data is stored within secure shared drives, typically set up for teams, research grants etc. Each shared drive has a senior individual within the team nominated as the shared drive owner. This individual has responsibility for authorising access rights to the shared drive, as well as ensuring data is curated in compliance with University policies

How much does it cost to store my research data?

The cost of data storage and additional file sharing services are detailed in the IT Service Catalogue.

www.abdn.ac.uk/it/service-portfolio/sc-pc-filestore.php